This is personal study note
Copyright and original reference:
https://www.youtube.com/watch?v=QcINiIQFN2s&list=PLsri7w6p16vscJ4rkstBZQJqNtZf8Tkxq&index=2&t=0s
================================================================================
Suppose there are 2 variables
Regression inspects "effect of one variable (cause)" to "other variable (result)"
================================================================================
Can you see the causality between advertisement_price and profit
- By using "correlation analysis", you can see "correlation (like positive or negative correlation)" between 2 variables
- By using "regression analysis", you can predict future data (like profit of 2016) based on past data (like profit of 2001-2015)
- Or, by using "regression analysis", you can explain the relationship between "independent variable" and "dependent variable"
================================================================================
The concept of "dependent variable" and "independent variable"
- Independent variable
- It occurs by itself
- It independently affects other variables "by itself", so, it's independent
- Dependent variable
- It's affected by independent variables, and then, it's changed
================================================================================
Simple regression analysis
Simple regression analysis
- Analysis method which analyzes "effect of independent variable X" to "dependent variable Y"
- by using regression equation
- Note that the number of "independent variable" is only 1
- For example, X: advertisement_price, Y: profit
- If the number of "independent variable" is greater than 2, it's multiple regression analysis
- For example, X1: advertisement_price, X2: the size of company, X3: the location of company, Y: profit
================================================================================
Different regression equation graph pattern in natural science/engineering and social science
================================================================================
How to calculate the slope $$$\beta_1 = \dfrac{\delta Y} {\delta X}$$$
================================================================================
Residual $$$\epsilon$$$
$$$Y=\beta_0 + \beta_1 X_1$$$ can't fully explain the pattern of data
So, you need to add residual term $$$\epsilon$$$ into regression equation
$$$Y=\beta_0 + \beta_1 X_1 + \epsilon$$$
Residual $$$\epsilon$$$ tries to minimize the error between "predicted line" and "pattern of real data"
Residual $$$\epsilon$$$ occurs randomly in its value