This is personal study note Copyright and original reference: https://www.youtube.com/watch?v=O1U2NWaSYn4&list=PLbhbGI_ppZISMV4tAWHlytBqNq1-lb8bz ================================================================================ There is no application with only forward probability and backward probability ================================================================================ For decoding problem - Viterbi decoding algorithm ================================================================================ ================================================================================ Forward probability $$$\times$$$ backward probability ================================================================================ Why did you create forward and backward probabilities? For representing It is joint probability between prob of latent factor z in k-th cluster at time t and prob of X ================================================================================ Forward and backward probabilities can be written in detail (recursive form, repeating problem) $$$\alpha_t^k$$$ reduces $$$\alpha_{t-1}^i$$$ And you use parameter a and b (emission and transition probabilities) $$$\beta_{t+1}^i$$$: start with $$$t+1$$$, so it's also reduction in size from $$$\beta_{t}^k$$$ And you also use a and b ================================================================================ This becomes joint prob of "latent variable wrt specific time t" and X ================================================================================ Joint prob means you can use conditional prob ================================================================================ Problem: you observe all Xs But observations are used for one z But you want to assign most probable assignment into all hidden labels Zs ================================================================================ So, this is lack for above ultimate goal This is called "decoding question" ================================================================================ Viterbi decoding algorithm $$$k^*$$$: most probable assignment of specific cluster k $$$P(z^k=1|X)$$$: all observations Xs are given, prob of $$$z^k$$$ ocurring If you configure normalize term well, you can make following ================================================================================ Now, you need to model entire sequence of Z How to do it? Use dynamic programming Fill Z values from left to right by using memorization tables Perform retrace and get most probable likelihood Z sequence ================================================================================ See $$$x_1,\cdots,x_{t-1}$$$: all observations up to time $$$t-1$$$ $$$z_1,\cdots,z_{t-1}$$$: all latent variables up to time $$$t-1$$$ $$$x_t$$$: observation at time t $$$z_t^k=1$$$: your goal to do, let's assign z into specific cluster k ================================================================================ Change z values to maximize P You get maximal prob P, and it's $$$V_t^k$$$ ================================================================================ Do process ================================================================================ See this $$$V_t^k$$$ becomes "repeating problem form" You can use dynamic programming for "repeating problem form" ================================================================================ Car assembly line in the factory left 2,4: starting point right 3,2: ending point Each number means took-time in minutes When time is predicted too much, line can be transfered from A to B or B to A ================================================================================ Which routes would be the best? 2+7=9 4+8=12 ================================================================================ Find optimial route ================================================================================ Let's above process in terms of probability You want to know $$$V_t^k$$$ $$$V_t^k$$$ is assignment by performing joint of time and states Let's save 2 information of $$$V_t^k$$$ 1. Trace information 2. probabilities information ================================================================================ Initialize Viterbi algorithm Peform iteration to growth V You can calculate $$$V_t^k$$$ from $$$V_{t-1}^i$$$ You already know parameters a and b ================================================================================ Which assignment is the best? Information on it is saved in trace ================================================================================ You access T (entire sequence) from t ================================================================================ Issue of Viterbi decoding algorith * Frequent underflow problem In real world, time step is large But following a,b,V are all probabilities [0,1] Small values are multipled continuously * To resolve it, calculate it in log domain And multiplication is converted into summation ================================================================================ Summary When $$$\pi,a,b,X$$$ are given inference whole latent factor sequence Z? using Viterbi decoding algorithm