https://www.youtube.com/watch?v=F4sIkIlGG78 ================================================================================ - Review the corpus data based on several options - Perform topic modeling by using LDA - Visualize topic modeling Left: 1 to 5 topics Right: keywords in each topic Based on keywords in topic2, you can guess topic2 is related to skin-trouble - Perform word embedding * Meaning of word (or conceptt) is determined by around-words * Words of Moon, Trump, Jinping are determined as president thing because there is predident as around-word * Python is kind of programming language because there are programming words around Python - Perform sentiment analysis * Scaled f-score: which keywords are related to "positive"? which keywords are related to "negative"? * Use scattertext * Keywords which has "positive" sentiment * Keywords which has "negative" sentiment - perform interactive visualization by using pyLDAvis - Get consumer insights ================================================================================ Word embedding - Create corpus by using 2.6M reviews - Corpus = {docs} - Use Word2Vec - Get word embeding - Test - Draw chart by using PCA - Use Tensorboard's Projector * keyword_you_input={related_keywords} ================================================================================ Future works ================================================================================ * Feature engineering Words (corpus) are too domain specific, so you should replace old "corpus analyzer" with new updated one ================================================================================ Example