Probablistic model wrt random variable X is defined by probability density function $$$p(x;\theta)$$$ (or probability mass function $$$P(x;\theta)$$$) $$$x$$$ from $$$p(x;\theta)$$$: real numbers which X can have $$$\theta$$$: representive symbol for parameters which is for probability density function (probablistic model) ================================================================================ For example, $$$\theta$$$ is $$$\mu$$$ and $$$\sigma^2$$$ in the case of Gaussian normal probability distribution function $$$p(x; \theta) \\ =p(x; \mu, \sigma^2) \\ =\dfrac{1}{\sqrt{2\pi\sigma^2}} \exp\left({-\frac{(x-\mu)^2}{2\sigma^2}}\right)$$$ ================================================================================ In the perspective of function, $$$\theta$$$ is fixed value and $$$x$$$ is variable For example, it's like random variable model is already fixed, and it's like it outputs relative probability against given real number input data. ================================================================================ But in the perspective of inference problem, you know x (which is realized sample) but you want to know parameter $$$\theta$$$ So, it's like you should see probability density function to find $$$\theta$$$ in the situation where you're given x ================================================================================ MLE (Maximum Likelihood Estimation) It's the method which finds $$$\theta$$$ which makes maximal likelihood wrt given sample. ================================================================================ - Suppose you know a random variable which follows normal distribution. - Suppose variance $$$\sigma^2$$$ of that random variable is 1 - Suppose you don't know $$$\mu$$$ of that random variable and you would like to find it. - Suppose you have one sample $$$x_1=1$$$ - Which $$$\mu$$$ has maximal likelihood? ================================================================================ - Likelihood (probability density) where $$$x=1$$$ occurs from $$$N(x;\mu=-1)$$$ is 0.05 (red under triangle point) - Likelihood (probability density) where $$$x=1$$$ occurs from $$$N(x;\mu=0)$$$ is 0.24 (blue triangle point) - Likelihood (probability density) where $$$x=1$$$ occurs from $$$N(x;\mu=1)$$$ is 0.40 (green square point) In conclusion, inferenced value based on MLE is 1 $$$\hat{\mu}_{\text{Maximul Likelihood}}=1$$$ ================================================================================ For inference, you generally have multiple samples like $$$\{x_1,\cdots,x_N\}$$$ of random variable So, likelihood should also be found from joint probability density $$$p_{X_1,\cdots,X_N}(x_1,\cdots,x_N;\theta)$$$ $$$\{x_1,\cdots,x_N\}$$$ is independent values which come from same random variable so, you can use multiplication $$$\text{Likelihood}(\theta;x_1,\cdots,x_N) \\ =L(\theta;\{x_i\}_{i=1}^N) \\ =p(x_1;\theta)\times p(x_2;\theta)\times \cdots \times p(x_N;\theta) \\ =\prod\limits_{i=1}^N p(x_i;\theta)$$$ ================================================================================ For example, if sample data is $$$x_1=1, x_2=0, x_3=-3$$$ which you get from Gaussian normal distribution, likelihood function is following: $$$\text{Likelihood}(\theta;x_1,x_2,x_3) \\ = \dfrac{1}{\sqrt{2\pi\sigma^2}} \exp \left( -\frac{(1-\mu)^2}{2\sigma^2} \right) \times \dfrac{1}{\sqrt{2\pi\sigma^2}} \exp \left( -\frac{(0-\mu)^2}{2\sigma^2} \right) \times \dfrac{1}{\sqrt{2\pi\sigma^2}} \exp \left( -\frac{(-3-\mu)^2}{2\sigma^2} \right) \\ = \dfrac{1}{(2\pi\sigma^2)^{\frac{3}{2}}} \exp\left({-\frac{\mu^2 + (1-\mu)^2 + (-3-\mu)^2}{2\sigma^2}}\right) \\ = \dfrac{1}{(2\pi\sigma^2)^{\frac{3}{2}}} \exp\left({-\frac{3\mu^2+4\mu+10}{2\sigma^2}}\right)$$$ ================================================================================ Implement MLE algorithm It's numerical optimization problem where you should find $$$\theta$$$ which makes likelihood maximum $$$\hat{\theta}_{\text{Maximum Likelihood}} = \arg_{\theta}\max L(\theta;\{x_i\})$$$ ================================================================================ Instead you use likelihood function L, you generally use log likelihood function $$$LL=\log{L}$$$ $$$\hat{\theta}_{\text{Maximum Likelihood}} = \arg_{\theta}\max \log{L(\theta;\{x_i\})}$$$ The reason of using LL is - Maximum value location doesn't change - Calculation will be simple by using $$$\log{AB}=\log{A}+\log{B}$$$