================================================================================
Covariance and correlation coefficient are the numbers
which represent correlation between multivarite random variables
================================================================================
Sample covariance
$$$s_{xy}=\frac{1}{N} \sum\limits_{i=1}^{N} (x_i-\bar{x})(y_i-\bar{y})$$$
$$$x_i, y_i$$$: ith x and y samples
$$$\bar{x}, \bar{y}$$$: sample's average of x and y
================================================================================
Like sample variance, sample covariance represents how far sample is located from the mean value
================================================================================
Sample covariance shows "size" and "direction"
which represents how samples are distributed around the mean value.
Size of distribution can be revealed by "variance" without using "covariance"
So, you need a way which you can use to see "direction" of sample.
To do that, you can use "sample correlation coefficient"
================================================================================
$$$r_{xy}=\frac{s_{xy}}{\sqrt{s_x^2\cdot s_y^2}}$$$
$$$s_{xy}$$$: sample covariance
$$$\sqrt{s_x^2}$$$: std of x
$$$\sqrt{s_y^2}$$$: std of y
================================================================================
There are other correlation coefficients which are differently defined
such as Pearson correlation coefficient.
================================================================================
Covariance of 2 random variables X and Y is defined using E
$$$Cov[X,Y]=E[(X-E[X])(Y-E[Y])]$$$
================================================================================
Correlation coefficient of 2 random variables is defined as following
$$$\rho[X,Y]=\dfrac{Cov[X,Y]}{\sqrt{Var[X]\cdot Var[Y]}}$$$
$$$-1 \le \rho\le 1$$$
================================================================================
$$$\rho=1$$$: full liearly correlation relationship
$$$\rho=0$$$: none correlation relationship
$$$\rho=-1$$$: full opposed liearly correlation relationship
================================================================================
Slope of plot has no effect to correlation coefficient.
================================================================================
Sample covariance of multivariate random variables
- M-dimensional random variable $$$x=\begin{bmatrix} x_1\\x_2\\\vdots\\x_M \end{bmatrix}$$$
$$$x_1, x_2, ...$$$: random variables which return scalar values
- Suppose you extract N number of sample, creating $$$N\times M$$$ feature matrix X
$$$X =
\begin{bmatrix}
x_{1, 1} & x_{1, 2} & \cdots & x_{1, M} \\
x_{2, 1} & x_{2, 2} & \cdots & x_{2, M} \\
\vdots & \vdots & \ddots & \vdots \\
x_{N, 1} & x_{N, 2} & \cdots & x_{N, M} \\
\end{bmatrix}$$$
================================================================================
$$$c_1 =
\begin{bmatrix}
x_{1, 1}\\
x_{2, 1}\\
\vdots \\
x_{N, 1}
\end{bmatrix}$$$
$$$x_{1, 1},x_{2, 1},\cdots,x_{N, 1}$$$: sample data from random variable $$$x_1$$$
================================================================================
$$$\text{E}[x_j] = \bar{x}_j = \dfrac{1}{N} \sum_{i=1}^N x_{i,j}
= \dfrac{1}{N} \mathbf{1}_N^T c_j = \dfrac{1}{N} c_j^T \mathbf{1}_N$$$
$$$E[x_j]$$$: expectation value of each random variable
$$$1_N$$$: N-dim 1 vector