13-01 LDA(linear discriminant analysis) =========================================================== PCA, LDA, both are algorithms for dimensionality reduction. @ PCA : reduces dimensionality with keeping characteristics of data in high dimension is much used in case that you don't use multi class more precisely, you reduce dimensionality on each class (each data) LDA : reduces dimensionality with purpose of optimal classification in low (reduced) dimension keeps information of discriminant of data is applied on entire data =========================================================== D dimension sample data set $$$X = \{x^{(1)}, ..., x^{(N)}\}$$$ Number of sample involved in $$$\omega_{1}$$$ class is $$$N_{1}$$$ Number of sample involved in $$$\omega_{2}$$$ class is $$$N_{2}$$$ That is, data X is sample mixed with 2 classes data ========================================================= When you want to obtain scalar y by performing projection of x along specific line (axis), you can use following inner product formular, resulting in projected vector : $$$y = W^{T}x$$$ $$$W^{T}$$$ is matrix which has column vectors which have projection direction you want to use when performing projecion @ There can be numorous lines (axises) which have direction you want to project vector data Out of all possible lines (axises), purpose of LDA is to find lines (axises) which make classification of scalar values (projected data) most easy ========================================================= Suppose you have 2 dimension input data Red dots are $$$w_{1}$$$ class data Blue dots are $$$w_{2}$$$ class data You can project 2 dimension data onto one line (axis, vector) Following picture shows PCA 2018-06-07 09-18-03.png ========================================================= Result distribution after projection can show differ according to kind of 1 dimension line (axis, vector) Some lines make classification of projected data easier Methodology to find direction of these lines (which make classification of projected data easier) is LDA 2018-06-07 09-22-53.png ========================================================= LDA reduces dimensionality of high dimensional feature vector data by maximizing ratio between "variance between classes" and "variance within class" Small variance within class (==within-class scatter) : high density data in each class Large variance between class (==between-class scatter) : far between each class centroid clear classification between classes small variance within class large variance between class good example for class separation 2018-06-07 09-32-23.png large variance within class small variance between class bad example for class separation 2018-06-07 09-32-37.png ========================================================= Mathematically, $$$\frac{\text{between-class Scatter}}{\text{within-class Scatter}}$$$ large $$$value = \frac{large}{small}$$$ good example for class separation high density within class far distance between centroids of classes small $$$value = \frac{small}{large}$$$ bad example for class separation low density within class short distance between centroids of classes Finding large $$$value = \frac{large}{small}$$$ is goal of LDA ========================================================= 2018-06-07 09-38-39.png