This is personal study note Copyright and original reference: www.youtube.com/watch?v=GJW6trVTDWM&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz ================================================================================ Gaussian Mixture Model - In 2D, multiple points - GMM finds clusters of those points - GMM can be used in 2D, 3D, 4D, ... ================================================================================ Space and time are different thing time - irreversible How you can model some algorithm with adding "time" component ================================================================================ - How HMM can be expressed? - HMM can also be expressed by graphical model - What is the major research questions in HMM? - To inspect research questions with HMM, you should know how to calculate probabilities, how to inference probabilities - Link HMM with previous lectures - EM algorithm (which is used for GMM) is also used for HMM ================================================================================ ================================================================================ * Not considering "time information" * It contains "spatial information" * 3 clusters ================================================================================ * What if points move based on time flow? * Then, how should Gaussian Mixture Model change? ================================================================================ * "Moving point in time flow" is shown in various fields * "Value at yesterday" and "value at today" has relationship * In other words, "value at yesterday" affects "value at today" * And there can be latent driving force (random variable) which moves those values ================================================================================ * Latent driving force can be changed, resulting in different value-pattern ================================================================================ Question: It will be good if there is "time series based mixture model" ================================================================================ * Gaussian Mixture Model in plate notation * $$$N$$$: N number of data point which is expressed via x * $$$z$$$: latent factor which classifies N number of data point into clusters * $$$\pi, \mu, \sigma$$$: parameters, contributes z and x ================================================================================ Unfolded view * Parameter $$$\pi$$$ (which is modeled by multinomial distribution) affects latent factor z * If parameter $$$\pi$$$ is fixed value, $$$z_1,z_2,\cdots,z_N$$$ are independent * So, that case will not be good to expressen "time" * So, above graphical model should be changed ================================================================================ * Parameter $$$\pi$$$ affects $$$z_1$$$ (which is latent factor at time 1) * $$$x_1$$$ is observed * $$$z_1$$$ affects $$$z_2$$$ (next latent factor to consider "time information") * This graphical model can model "the change of latent factors" in terms of time flow ================================================================================ "Hidden markov model" is the model where you considers GMM with "temporal causality" Hidden Markov Model = dynamic clustering ================================================================================ ================================================================================ HMM in graphical model observarions: $$$ x_1, x_2, x_3$$$ observation: discret or continuous Observation in GMM: continuous ================================================================================ Above relation is modeled by $$$P(x_1|z_1)$$$ Probability function P uses Gaussian normal distribution: continuous random variable is modeled Probability function P uses binomial or multinomial distribution: discrete random variable is modeled Probability distribution is your choice ================================================================================ $$$x_1,x_2,\cdots,x_T$$$: observations (data points) which has temporal causality $$$x_1$$$ is the thing before $$$x_2$$$ ================================================================================ Each x can have multiple observations In other words, it means x can be as vector ================================================================================ Latent (invisible) factor z Latent state z Suppose there is k number of dynamical time groups ================================================================================ Suppose x_1 is involved in "1 time group" Suppose there is no trend-change (no latent factor change) Suppose x_2 is involved in "1 time group" Suppose there is trend-change (latent factor change) Suppose x_2 is involved in "2 time group" ================================================================================ K elements: K number of components ================================================================================ Continuous latent factor from HMM case: Kalman filter method ================================================================================ ================================================================================ * Initial state probabilities $$$P(z_1) \sim Mult(\{\pi_1,\pi_2,\cdots,\pi_N\})$$$ - First latent factor $$$z_1$$$ is sampled from Multinomial distribution - That multinomial distribution has parameter $$$\pi$$$ - That means, to train HMM, you should infer that $$$\pi$$$ ================================================================================ * Transition probabilities - probability of $$$z_1$$$ to $$$z_2$$$ - $$$P(z_t|z_{t-1}^i=1) \sim Mult(\{\alpha_{i,1},\alpha_{i,2},\cdots,\alpha_{i,k}\})$$$ - at previous time, z is involved in 1, at current time, probability of cluster of z - $$$z_{t-1}^i=1$$$: z's time cluster at previous time - $$$z_t$$$: z's time cluster at current time - Parameter $$$\alpha$$$ in Mult - Other form - $$$P(z_t^j=1|z_{t-1}^i=1) = \alpha_{i,j}$$$ $$$z_{t-1}^i=1$$$: z in i-th cluster is given $$$z_t^j=1$$$: z in j-th cluster ================================================================================ - Above edge is modeled by "emission probabilities" $$$P(x_t|z_t^i=1) \sim Mult(b_{i,1},\cdots,b_{i,m}) \sim f(x_t|\theta_t)$$$ * $$$z_t^i$$$: latent factor z (in i-th cluster) is given * $$$x_t$$$ visible observation x * Since you suppose that you deal with discrete case, * you use Multinomial function to model x * b is parameter for Multinomial function * m number of possible discrete observations * How $$$b_{i,j}$$$ is modeled - $$$P(x_t^j=1|z_t^i=1) = b_{i,j}$$$ - $$$z_t^i=1$$$ z: (in ith cluster) is given - probability of j-th observation ================================================================================ - Suppose A, B, C, D are latent factors - State A ---> State B ---> State C .... based on probability, state changes - Each state can express 1 or 2 as observations based on probability ================================================================================ Example State: professor is in anger You: you can't actually know professor's mind (state, and state is hidden) And via, for example, facial expression of professor, you can see the observation emitted from professor's mind state - (Mind) state of professor (latent factor z) can change (or does transition as time goes) - Professor's facial expression always looks changing But (inner mind of professor) latent factor can stay in same state - In conclusion, the main characterisitc of HMM is that as time goes, "latent factors z" and "observations which are emited from latent factor z" are separated