================================================================================ https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/04/06/pcasvdlsa/ /mnt/1T-5e7/mycodehtml/NLP/Latent_semantic_analysis/Ratsgo/main.html ================================================================================ Dimensionality reduction - SVD - PCA ================================================================================ SVD is used for LSA ================================================================================ PCA - Preserve variance of data - Find new axes (basies) which are perpendicular new axes are independent - Vectors in high dimension space into low dimension space Example (vector in 3D space into vector in 2D space) - Principal components: PC1, PC2 ================================================================================ How to do PCA - Your goal is to find new axes (new basies) which maximize "variance" of the data - Find covariance matrix A from data matrix A - Suppose each variable has 0 mean - Covariance matrix A is calculated by - Find new axes of PCA - Perform "Eigen decomposition" on "covariance matrix A" - $$$\Lambda$$$: matrix, diagonal elements are eigenvalues of covariance matrix other locations are filled with 0 - $$$V$$$: matrix, column vectors are eigenvectors of covariance matrix $$$A^TA$$$ ================================================================================ Diagonal elements of $$$\Lambda$$$: Each variable (or each feature)'s variance from data matrix A - Remember that your goal is to preserve "variance" of original data - So, select some of large eigenvalues - Find corresponding eigenvector from selected eigenvalue - Perform projection (or linear transform) "original data" onto "eigenvector" - Complete of PCA ================================================================================ Example You have matrix of 100 dimension vectors - Perform PCA - Selcted 2 largest eigenvalues - Perform projection (or linear transform) by using corresponding eigenvector - You can get matrix of 2 dimension vectors ================================================================================ Lesson on SVD Matrix A: (m,n) A is decomposed into $$$U \Sigma V^T$$$ ================================================================================ Column vectors in U and V are called "singular vector" All singular vectors have "perpendicular characteristic" ================================================================================ Singular value of matrix $$$\Sigma$$$ $$$\ge$$$ 0 Singular value of matrix $$$\Sigma$$$ are arrange in descending order kth diagonal element of $$$\Sigma$$$ $$$= \sqrt{\text{kth eigenvalue of matrix A}}$$$ ================================================================================ Compare SVD with PCA Perform squre on A $$$\Sigma$$$: diagonal elements are "eigenvalues of mat A" other elements are 0 $$$\text{Diagonal_matrix}^2$$$ = square of elements $$$\Sigma^2 = \text{eigenvalue of mat A}$$$ But $$$ \text{eigenvalue of A} = \sqrt{\text{eigenvalue of A}}$$$ Therefore, $$$\Sigma^2 = \Lambda$$$ $$$\Lambda$$$ is composed of eigenvalue of $$$A^TA$$$ ================================================================================ ================================================================================ LSA - There are 3 documents doc1: I go to school doc2: Mary go to school doc3: I like Mary - You can create Term-Document-Matrix doc1 doc2 doc3 I 1 0 1 go 1 1 0 to 1 1 0 school 1 1 0 Mary 0 1 1 like 0 1 1 - LSA: - You perform SVD on the input data: Term-Document-Matrix or Window-Based_co_occurrence - You reduce the dimensionality of input data - You increate computational efficiency - You extract latent semantic from input data