% Lecture note from % Let's first review Gaussian Mixture Model K-means clustering is special case of Gaussian Mixture model In plate notation (right bottom) red circled x is data $$$\mu$$$ and $$$\Sigma$$$ represent centers of cluster (centroids) % Center dots are centroids Ellipse contour lines represent probability distribution 3 clusters $$$\rightarrow$$$ 3 centroids There are K number of $$$\mu$$$ and $$$\Sigma$$$ z is random variable which assigns each point (pixel?) into appropriate cluster $$$\pi$$$ sets distribution dividing data, for example, if there are 100 data, $$$\pi$$$ sets distribution where first cluster has 20, second cluster has 70, third cluster has 10 $$$\pi$$$ is kind of parameter value of multinomial distribution Note that $$$\pi$$$, $$$\mu$$$, $$$\Sigma$$$ are not random variables but parameters So, $$$\pi$$$, $$$\mu$$$, $$$\Sigma$$$ are not inside of plate (rectangle box, random variables are inside of rectangle box) (Note that if $$$\pi$$$ is prior, this is Bayesian version GMM) Parameter $$$\pi$$$ is vector and vector size is K For example, you can write $$$\pi = [0,2, 0.7, 0.1] $$$ Random variable z is multinomial distribution % === % There is K from $$$\pi$$$ And there is K from $$$\mu$$$ and $$$\Sigma$$$ Which K is first? When you select $$$\pi$$$, mixing coefficient $$$P(z_k)$$$ (line 5 from above picture) controls $$$P(x|z)$$$ is probability likelyhood with respect to Mixture component (line 9 from above picture) Both mixing coefficient $$$P(z_k)$$$ and mixture component $$$P(x|z)$$$ are related to K $$$P(z_k)$$$ means there should be parameters (like 0.1, 0.7, 0.2) as much as K cluster (3 cluster) $$$P(x|z)$$$ means there should be $$$\mu$$$, $$$\Sigma$$$ parameters to each cluster So, "size of vector $$$\pi$$$" and "number of assignment of random variable z" can change number of centroids represented by $$$\mu$$$ and $$$\Sigma$$$ Distribution of selection variable z is related to number of cluster K % === % In last section, you learned size of selection variable z is important, when setting number of cluster K In summary, Parameter $$$\pi$$$ affects selection random variable z Distribution of z is multinomial distribution You need to make size of choise of that multinomial distribution "free" So, you can say parameter $$$\pi$$$ is parameter inside of above multinomial distribution So, you need to be able to vary shape of above multinomial distribution So, how to control multinomial distribution? That (how to control multinomial distribution, by varying choice size) is required task How to generate multinomial distribution? It's almost not possible to manually change parameter $$$\pi$$$ to create multinomial distribution So, you need to have automatical mechanism creating above multinomial distribution Then, you need to have automatical mechanism which controls multinomial distribution So, that mechanism creates parameter and that mechanism varies size of generated parameter and then finally that mechanism creates multinomial distribution What is the method which creates parameters of multinomial distribution? As that method, you can use Dirichlet distribution % === So, let's review Dirichlet distribution % === % This is LDA plate notation LDA uses prior $$$\alpha$$$ to create $$$\theta$$$ By using $$$\theta$$$, perform assignment wrt z Generative process: by using prior $$$\alpha$$$, create $$$\theta$$$ You pass $$$\alpha$$$ into Dirichlet distribution and get $$$\theta$$$ $$$\theta_i \sim Dir(\alpha)$$$ $$$\theta_i$$$ is proper form as parameter of multinomial distribution $$$z_{i,l} \sim Mult(\theta_i)$$$ How is this possible? When you see definition of Dirichlet distribution you can see Dirichlet distribution is best tool to create parameter of multinomial distribution Because 1. $$$x_1, ..., x_{K-1} > 0$$$ 1. $$$x_1+...+_{K-1} < 1$$$ 1. $$$x_k=1-x_1-...-x_{K-1}$$$ Above 3 ones satisfies probability axiom What is $$$\alpha$$$ from $$$P(x_1,...,x_K|\alpha_1,...,\alpha_K)$$$? $$$\alpha$$$ is space when sample $$$x_1,...x_K$$$ which satisfy probability axiom is extracted from space