* Bayes_theorem
================================================================================
Bayes theorem is equation which is simplified version from conditional probability
to use Bayes theorem for pattern recognition
================================================================================
* $$$B_1, B_2, \cdots, B_N$$$: events which consists of sample space S
* $$$B_1, B_2, \cdots, B_N$$$: independent to each other
* Event A is composed of multiple events of B
* Question: when event A occured, probability of A occurring in event $$$B_j$$$
================================================================================
* S: sample space
* S is composed of multiple events Bs
* A: set of event
* when A occured, A occurred from $$$B_1$$$ or $$$B_2$$$ or?
================================================================================
* Bayes theorem is expression of above situation
* $$$P[B_j|A] \;\;\;\; (1) \\
= \dfrac{P[A\cap B_j]}{P[A]} \;\;\;\; (2) \\
= \dfrac{P[A|B_j]P[B_j]}{\sum\limits_{k=1}^N P[A|B_k]P[B_k]} \;\;\;\; (3)$$$
* $$$P[B_j|A]$$$: conditional probability
when event A occured, the probability value of "$$$A$$$ occured from $$$B_j$$$"
* (1) -> (2): conditional probability
================================================================================
* Law of total probability of event A:
$$$P[A] \\
= P[A|B_1]P[B_1] + \cdots + P[A|B_N]P[B_N] \\
= \sum\limits_{k=1}^N P[A|B_k]P[B_k]$$$
================================================================================
In (3), you replace $$$P[A]$$$ with $$$\sum\limits_{k=1}^N P[A|B_k]P[B_k]$$$
================================================================================
In (3), according to following conditional probability theorem,
$$$P[A\cap B_j] \\
= P[A|B_j]P[B_j] \\
= P[B_j|A]P[A]$$$
you can replace $$$P[A\cap B_j] with P[A|B_j]p[B_j]$$$
================================================================================
* Let's see Bayes theorem in terms of pattern recognition.
* $$$P[\omega_j|x] \;\;\;\; (1) \\
= \dfrac{P[x|\omega_j]P[\omega_j]}{\sum\limits_{k=1}^N P[x|\omega_k]P[\omega_k]} \;\;\;\; (2) \\
= \dfrac{P[x|\omega_j]P[\omega_j]}{P[x]} \;\;\;\; (3) $$$
* $$$w_j$$$: jth class, like $$$\omega_1=\text{male}$$$, $$$\omega_2=\text{female}$$$
* $$$x$$$: feature vector, like weight data $$$x_1=[70,60]$$$, $$$x_2=[40,50]$$$
* $$$P[\omega_j]$$$: prior probability of "class $$$\omega_j$$$" occuring
* $$$P[\omega_j|x]$$$: posterior probability of $$$\omega_j$$$ wrt feature vector x
when feature vector x occurred, probability of feature vector x occurring from class $$$\omega_j$$$
* $$$P[x|\omega_j]$$$: likelihood,
when class \omega_j occurred, probability of x occurring from $$$\omega_j$$$
================================================================================
* Goal you would ultimately like to solve: $$$P[\omega_j|x]$$$
when feature vector x is given, which class $$$\omega_j$$$ has that feature vector x?
* You will get probability values on each class
$$$\omega_{\text{male}}=0.7$$$
$$$\omega_{\text{female}}=0.3$$$
* You can decide class is $$$\omega_{\text{male}}$$$
================================================================================
* To know $$$P[\omega_j|x]$$$ by using Bayes theorem, you should know followings:
1. $$$P[x]$$$: probability value of feature vector x occurs
2. $$$P[\omega_j]$$$: probability value of each class occuring
3. $$$P[x|\omega_j]$$$: probability value of feature vector x occuring
from given class $$$\omega_j$$$
================================================================================
1. $$$P[\omega_j]$$$: prior probability of $$$\omega_j$$$
probability value of class $$$\omega_j$$$ occuring
2. $$$P[\omega_j|x]$$$: posterior probability of $$$\omega_j$$$ wrt feature vector x
after you've got feature vector x,
probability value of "x came from class $$$\omega_j$$$"
3. $$$P[x|\omega_j]$$$: likelihood,
probability value of x occuring when $$$\omega_j$$$ is given
For example, probability of height feature vector [170,180] occuring from class male
and probability of height feature vector [160,162] occuring from class female are different.
Probability values of feature vector occuring are different per class.
4. P[x]: probability value of given pattern x occuring,
non-influencial to the classification,
which is used as normalization constant term.
================================================================================
* Pattern recognition using Bayes theorem
* With constant term
$$$\text{posterior probability}=\dfrac{\text{likelihood}\times \text{prior probability}}{\text{normalization term}}$$$
* Without constant term
$$$\text{posterior probability}=\text{likelihood} \times \text{prior probability}$$$
================================================================================
Example
* law of the total probability
================================================================================
Example
* Bayes theorem
================================================================================