https://www.youtube.com/watch?v=P6Dm-c1pb90
================================================================================
Topic modeling (document modeling, topic mining):
With respect to the topic of the document,
extract set of keywords which are related to topic
================================================================================
ML algorithm should get topics in unsupervised manner
================================================================================
Each topic should be related to keywords in the document
================================================================================
Pipeline of topic modeling
- Collect data by using web crawling
- Output: document data
- Preprocess data
- Remove HTML markups
- Useless characters
- Get useful data
- Data analysis
- Find "topics" from document
- Find keywords (or annoations) which are related to "found topic"
- LDA (latent dirichlet allocation which is based on Bayesian)/ATM/DTM/Nubbi/S-EGTM algorithms
- Output: set of topics and set of keywords
- Predict topic
- Analyze "keywords (or annotations)"
- Predict "future" based on analyzed keywords
- Bayesian graph (when keywords are given, probability of specific prediction ocurring)
================================================================================