This is notes which I wrote as I was taking video lecture originated from
https://www.youtube.com/watch?v=AFIO92N9xm4&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz
================================================================================
================================================================================
w: observation, words
z: you assign "words" into cluster by using z
word --z--> topic
$$$z$$$ is modeled by Multinomial distribution
N: number of entire words in one document (for example, 100 words in one document)
M: number of entire documents (for example, 10 documents)
$$$\theta$$$: document --$$$\theta$$$ --> topic
$$$\theta$$$ is modeled by Multinomial distribution
$$$\alpha$$$: prior for $$$\theta$$$
$$$\alpha$$$ is modeld by Dirichlet distribution (with respect to $$$\theta$$$)
$$$K$$$: number of topic which user can configure
$$$\phi$$$: probability of "word" showing in each topic
$$$\beta$$$: prior knowledge
================================================================================
================================================================================
$$$\alpha$$$: topics Dirichlet distribution with respect to entire corpus
Multinomial distibution $$$\theta$$$ is generated from Dirichlet distribution prior $$$\alpha$$$
================================================================================
================================================================================
================================================================================
================================================================================