This is personal study note Copyright and original reference: https://www.youtube.com/watch?v=SYdqCHfMycM&list=PLsri7w6p16vu3mMWzijxOmhrlvN23W04_&index=3 ================================================================================ How to "see correlational relationship" between "variables" - Use scatter plot ================================================================================ - Covariance: - "Scattering of 2 random variables" is in "positive direction" or "negative direction"? - When "random variable X" changes, how "random variable Y" changes? - There can be "variance" of "random variable A" - There can be "variance" of "random variable B" - There can be "common variance" from "A and B" ================================================================================ How to calculate covariance - $$$Cov(X,Y) = \dfrac{\sum\limits_{i=1}^{N} (X_i-\bar{X}) (Y_i-\bar{Y}) }{N}\\$$$ - You can see variance of X $$$(X_i-\bar{X})$$$ and variance of Y $$$(Y_i-\bar{Y})\\$$$ - $$$Covariance = \dfrac{\text{sum[(each_X_data-mean_of_X)*(each_Y_data-mean_of_Y)]}}{\text{num_combination}}\\$$$ - $$$Covariance = \dfrac{\text{sum[mean_deviation_of_X*mean_deviation_of_Y]}}{\text{num_combination}}$$$ ================================================================================ - Correlation coefficient ================================================================================ * Example $$$\bar{x}_{\text{ad_price}} = \dfrac{13+8+\cdots+21+25}{15} = 16.467$$$ $$$\bar{x}_{\text{profit}} = \dfrac{94+70+\cdots+105+121}{15} = 98.933$$$ ================================================================================ * Deviation values ================================================================================ $$$Cov = \dfrac{17.103+244.976+\cdots+27.502+188.298}{15} = \dfrac{703.471}{15} = 46.898$$$ ================================================================================ How does 46.898 mean? It's "positive value" so "positive correlational relationship" But you can't see the intensity of correlation ================================================================================ For above limitation of covariance, you can use "correlation coefficient" correlation_coefficient=normalize(covariance) ================================================================================ $$$Cov(X,Y) = \dfrac{\sum\limits_{i=1}^{N} (X_i-\bar{X}) (Y_i-\bar{Y}) }{N}\\$$$ $$$Corr(X,Y) = \dfrac{Cov(X,Y)}{ \sqrt{ \dfrac{\sum (X-\mu)^2}{N} \cdot \dfrac{\sum (Y-\mu)^2}{N} } } $$$ $$$Corr(X,Y) = \dfrac{Cov(X,Y)}{ \sigma_Y \cdot \sigma_Y } $$$ $$$Corr(X,Y) = \dfrac{\text{Cov of X and Y}}{ \text{(std_of_X)} \times \text{(std_of_Y)} } $$$ ================================================================================ ================================================================================ standard deviation of ad_price and profit ================================================================================ Calculate correlation coefficient -1 < correlation coefficient < 1