https://www.youtube.com/watch?v=GVPTGq53H5I ================================================================================ ================================================================================ - Let's use bag of word - Meaning - pizza and hamburger are US food - cosine_similarity(pizza,hamburger)=0 - cosine_similarity is dot product of pizza vector ([1,0,0,0,0]) and hamburger vector ([0,1,0,0,0]) - cosine_similarity(pizza,ramen)=0 - cosine_similarity(pizza,hamburger)=0 doesn't make sense ================================================================================ Let's perform same task by using TF-IDF But you will get same incorrect similarity result ================================================================================ Why this incorrect result occurs? It's because bag of word and TF-IDF are word-based vectors It means you can't find any topic from the word, so you get 0 similarity ================================================================================ LSA can find similarity based on "topic" ================================================================================ Word-document matrix Words in y axis: individual words Elements in x axis: 6 sentences Let's call above 2D array as A ================================================================================ Perform singular vector decomposition $$$A \approx U \times \Sigma \times V^t$$$ ================================================================================ $$$U$$$ can be considered as "word matrix" for topic ================================================================================ $$$V^T$$$ can be considered as "document (or sentence) matrix" for topic ================================================================================ $$$\Sigma$$$ can be considered as "strength matrix" for topic ================================================================================ What you are interested in "document (or sentence) matrix" for topic So, multiply "strength" by "document (or sentence) matrix" for topic ================================================================================ Due to characteristic of $$$\Sigma$$$, diagonal elements are descending order which has importance ================================================================================ For simplicity, select 2 importances: t1 and t2 $$$(2,2) \cdot (2,6) = (2,6)$$$ ================================================================================ Actual result ================================================================================ Let's plot this table on 2D space ================================================================================ Result ================================================================================ Let's calculate cosine similarities cosine_similarity(d1,d2)=1 cosine_similarity(d1,d3)=1 cosine_similarity(d2,d3)=1 Max of cosine_similarity is 1 Max similarity ================================================================================ cosine_similarity(d4,d5)=1 cosine_similarity(d4,d6)=1 cosine_similarity(d5,d6)=1 Max of cosine_similarity is 1 Max similarity ================================================================================ Conclusion - You can know the latent semantic of 2 axes ================================================================================ d2 has more strength But cosine similarity ignores strength so that you get 1 from both circles ================================================================================