- Keywords:
Normal distribution
Central limit theorem
The reason that gaussian distribution is much used
Relationship between pattern of covariance and data's distribution
================================================================================
Since Gaussian probability distribution is very much used,
Gaussian probability distribution is also alternatively called normal distribution
================================================================================
The reason that Gaussian distribution is much used:
- When you use 1D feature vector like $$$x=[170,160,180]$$$ representing height values
only 2 parameters are enough to define normal distribution; mean $$$\mu$$$ and std $$$\sigma$$$
- Due to central limit theorem
- There are many cases where you need to use Gaussian distribution
================================================================================
Central limit theorem
- Suppose mean $$$\mu$$$, variance $$$\sigma^2$$$ of population.
- Since it's difficult to analyze entire population, you extract sample from population.
- As size of sample becomes bigger and bigger,
that sample becomes data which has $$$\mu$$$ and $$$\frac{\sigma^2}{N}$$$
which represent statistical values of population
* Code
mean_of_pop=mean(population_data)
variance_of_pop=variance(population_data)
mean_of_sample=mean(large_sample_from_pop)
variance_of_sample=variance(large_sample_from_pop)
If n is enough big,
mean_of_pop $$$\approx$$$ mean_of_sample
variance_of_pop $$$\approx$$$ variance_of_sample
================================================================================
* Example
* Suppose you want to predict average height of entire population.
* You would extract sample (people).
* Suppose you extract 1 person and you calculate average height of that person
Then, you will get this distribution
* Suppose you extract 4 people and you calculate average height of that people
Then, you will get this distribution
* Suppose you extract 7 people and you calculate average height of that people
Then, you will get this distribution
* Suppose you extract 10 people and you calculate average height of that people
Then, you will get this distribution
* As you increase size of sample, distribution becomes Gaussian distribution more and more.
================================================================================
* Multivariate Gaussian probability distribution function
$$$f_X(x)= \dfrac{1}{\sqrt{2\pi}^{\frac{n}{2}} |\Sigma|^{\frac{1}{2}}} \exp \left[ -\frac{1}{2} (x-\mu)^T \Sigma^-1 (x-\mu) \right]$$$
* Shape of Gaussian probability distribution is determined
by covariance $$$\Sigma$$$ between random variables
* $$$\mu$$$ only affects the peak, not for the shape.
================================================================================
* If covariance between random variables is shaped like following
$$$\Sigma = \begin{bmatrix} \sigma_1^2&&c_{12}\\c_{12}&&\sigma_2^2 \end{bmatrix}$$$
that data which is expressed via random variables has following distribution
================================================================================
* If covariance between random variables has following shape
$$$\Sigma = \begin{bmatrix} \sigma_1^2&&0\\0&&\sigma_2^2 \end{bmatrix}$$$
* $$$0$$$: correlation of data has none
* $$$\sigma_1^2 > \sigma_2^2$$$
data has following distribution
================================================================================
* If covariance between random variables has following shape
$$$\Sigma = \begin{bmatrix} \sigma^2&&0\\0&&\sigma^2 \end{bmatrix}$$$
* $$$0$$$: correlation of data has none
* $$$\sigma^2 = \sigma^2$$$
data has following distribution
================================================================================