https://datascienceschool.net/view-notebook/86c3c9c806c14534bbbf58e5c31dbf09/ ================================================================================ * When you know some random variables in probabilistic model, you can find "value of other random variables" * And this finding is called "inference" ================================================================================ * Suppose you know conditional probability distribution function $$$p(X_{\text{unknown}}|\{X\}_{\text{known}})$$$ * And suppose you know some of random variables $$$\{X\}_{\text{known}}$$$ * Then, you can find probability $$$p(X_{\text{unknown}})$$$ of unknown random variables $$$X_{\text{unknown}}$$$ * So, goal of inference is to find $$$p(X_{\text{unknown}}|\{X\}_{\text{known}})$$$ ================================================================================ * Suppose there are 3 random variables A,B,C to the one student - A: health status - B: study time - C: test score * Each random variable can have $$$\{0,1,2\}$$$ representing low, medium, high ================================================================================ * Question 1. Probability distribution of C? * Find probability distribution $$$P(C)$$$ 2-1. What score will have largest probability? 2-2. When good A is given, what score will have largest probability? * Find conditional probability distribution $$$P(C|A=2)$$$ 3. When low B and good C are given, what status of A will be? * Find conditional probability distribution $$$P(A|B=0,C=2)$$$ ================================================================================ * Probabilistic graph model like Bayesian network and Markov network uses "variable elimination" and "belief propagation" to perform inference ================================================================================ * Joint probability of above example: $$$P(A,B,C)=P(A) \times P(B|A) \times P(C|B)$$$ ================================================================================ * When you know probability distribution of C, you can find probability distribution of B $$$P(B=0) \\ = \sum\limits_{A} [P(B=0|A) \times P(A)] \\ = P(B=0|A=0)P(A=0) + P(B=0|A=1)P(A=1) + P(B=0|A=2)P(A=2)$$$ * You can use same calculation for the cases of $$$B=1$$$ and $$$B=2$$$ ================================================================================ * Let's calculate probability distribution of C $$$P(C) = \sum\limits_{A,B} P(C|B)P(B|A)P(A)$$$ * $$$\sum\limits_{A,B}$$$: all possible combinations which A and B can have * $$$\sum\limits_{A,B}= \sum\limits_{A} \sum\limits_{B} $$$ * You can calculate P(C=0) as follow $$$P(C=0) \\ = P(C=0|B=0)P(B=0|A=0)P(A=0) + \\ \;\;P(C=0|B=0)P(B=0|A=1)P(A=1) + \\ \;\;P(C=0|B=0)P(B=0|A=2)P(A=2) + \\ \;\;P(C=0|B=1)P(B=1|A=0)P(A=0) + \\ \;\;P(C=0|B=1)P(B=1|A=1)P(A=1) + \\ \;\;P(C=0|B=1)P(B=1|A=2)P(A=2) + \\ \;\;P(C=0|B=2)P(B=2|A=0)P(A=0) + \\ \;\;P(C=0|B=2)P(B=2|A=1)P(A=1) + \\ \;\;P(C=0|B=2)P(B=2|A=2)P(A=2)$$$ * Let's use factorization on above equation $$$P(C=0) \\ = P(C=0|B=0) \big( P(B=0|A=0)P(A=0) + P(B=0|A=1)P(A=1) + P(B=0|A=2)P(A=2) \big) + \\ \;\;P(C=0|B=1) \big( P(B=1|A=0)P(A=0) + P(B=1|A=1)P(A=1) + P(B=1|A=2)P(A=2) \big) + \\ \;\;P(C=0|B=2) \big( P(B=2|A=0)P(A=0) + P(B=2|A=1)P(A=1) + P(B=2|A=2)P(A=2) \big) \\ = P(C=0|B=0) P(B=0) + \\ \;\;P(C=0|B=1) P(B=1) + \\ \;\;P(C=0|B=2) P(B=2)$$$ * Conclusion: If you're in state where probability distribution of B is already calculated, effect of probability distribution of A is disappear $$$P(C) = \sum\limits_{B} [P(C|B)\times P(B)]$$$ ================================================================================ * Variable elimination which you saw from right above * You calculate probability distribution * from "random variable which you already know" to "value of other randome variables" along the entire network model * this is called "variable elimination" cause you're removing variables step by step ================================================================================ Belief propagation (message passing) * Belief propagation can be used for general network like linear chain markov network, etc ================================================================================ Joint probability in linear chain markov network where N number of random variable $$$X_1,\cdots,X_N$$$ is connected in chain $$$p(X_1, \ldots, X_N) = \dfrac{1}{Z}\psi(X_1, X_2) \psi(X_2, X_3) \cdots \psi(X_{N-1}, X_{N})$$$