https://datascienceschool.net/view-notebook/e6ef730b7a3b4be7be4ff028d39d67f7/ ================================================================================ * $$$\hat{y}$$$ is expressed by linear combination * $$$\hat{y} = w_1x_1 + w_2x_2 + \cdots + w_Nx_N$$$ * If all columns are linear independent, $$$\hat{y}$$$ should be located in vector space which has basis vector composed of each column of $$$c_1,\cdots,c_M$$$ ================================================================================ $$$\hat{y}=X\omega$$$ * $$$X$$$: multiple feature vectors * $$$\omega$$$: trainable parameters * $$$\hat{y}$$$: prediction $$$= [c_1, \cdots, c_M] \begin{bmatrix} \omega_1\\\vdots\\\omega_M \end{bmatrix}$$$ $$$= \omega_1c_1 + \cdots + \omega_Mc_M$$$ ================================================================================ * Residual vector $$$\epsilon$$$ * $$$\epsilon = y-\hat{y}$$$ * $$$\hat{y}$$$ is vector which minimizes $$$\epsilon$$$, which is nearest to $$$y$$$ * Residual vector $$$\epsilon$$$ $$$\perp$$$ vector space * y_hat_vector,residual_vec=projection_to_vector_space(y_vector) vector_space basis_vectors=[c_1,...,c_M] is_perp(vector_space,residual_vec) # True ================================================================================ ================================================================================ b=Ta * Code scope:linear vector space transformed_vector_b=transform_mat*vector_a ================================================================================ $$$\epsilon = M y$$$ $$$\hat{y} = H y$$$ * Code scope:linear vector space residual_vec=transform_mat_M*label_vec_y pred_vec_y=transform_mat_H*label_vec_y ================================================================================ $$$e = y - \hat{y} \\ = y - Xw \\ = y - X(X^TX)^{-1}X^Ty \\ = (I - X(X^TX)^{-1}X^T)y \\ = My$$$ $$$M = I - X(X^TX)^{-1}X^T$$$ M: residual matrix ================================================================================ $$$\hat{y} =y - e \\ =y - My \\ =(I - M)y \\ =X(X^TX)^{-1}X^T y \\ =Hy$$$ $$$H = X(X^TX)^{-1}X^T$$$ H: projection matrix, hat matrix, influence matrix ================================================================================ Characteristics of residual matrix and projection matrix (1) Symetric matrix $$$M^T=M$$$ $$$H^T=H$$$ inv(M)=M inv(H)=H (2) idempotent matrix $$$M^2=M, M^3=M, \cdots$$$ $$$H^2=M, H^3=M, \cdots$$$ pow(M,2)=M pow(M,3)=M pow(H,2)=H pow(H,3)=H (3) M and H are perpendual $$$MH=HM=0$$$ mul(M,H)==mul(H,M)==0 # True (4) M and X are perpendual $$$MX=0$$$ mul(M,X)==0 # True (5) Multiply X by H doesn't change X HX=X ================================================================================ Remember $$$y^Ty = \hat{y}^T \hat{y}+ e^T e$$$ pow(y,2)=pow(pred_y,2)+pow(residual_vec,2)