https://www.youtube.com/watch?v=CdH7U3IjRI8
================================================================================
Give the milk to the baby
Baby drink that mile all the time
So, that milk (input) has "low uncertainty"
Amount of information (from the milk) who the baby gets is "small"
================================================================================
Give the spicy food to the baby
Baby has not experience with spicy food
So, that spicy food (input) has "high uncertainty"
Amount of information (from the spicy food) who the baby gets is "large"
================================================================================
More uncertainty ---> More amount of information you can get
================================================================================
Let's represent "uncertainty" and "amount of information you can get"
in numerical value by using "shannon entropy"
Shannon entropy
$$$H(X) $$$
$$$= - \sum\limits_{i=1}^{n} (p_i \log_2{p_i})$$$
$$$= \sum\limits_{i=1}^{n} (p_i -\log_2{p_i})$$$
$$$= \sum\limits_{i=1}^{n} (p_i \log_2{p_i^{-1}})$$$
$$$= \sum\limits_{i=1}^{n} (p_i \log_2{\dfrac{1}{p_i}})$$$
================================================================================
Expectation value:
mean value from random variable
$$$E[X] = \sum\limits_{i=1}^{k} x_i p_i$$$
Entropy fomular:
$$$H(X) = \sum\limits_{i=1}^{n} (\log_2{\dfrac{1}{p_i}} * p_i)$$$
- 2 formulars have almost same form
- amount of information which you can get from the input
================================================================================
Uncertainty
=Amount_of_information
= information_of_milk * probability_of_milk
+ information_of_no_milk * probability_of_no_milk
================================================================================
Why $$$\log_2{\dfrac{1}{p_i}}$$$ represents "amount of information"?
$$$H(X) = \sum\limits_{i=1}^{n} (\log_2{\dfrac{1}{p_i}} * p_i)$$$
Replace: $$$\log_2{\dfrac{1}{p_i}} = I(x_i)$$$
$$$H(X) = \sum\limits_{i=1}^{n} (I(x_i) * p_i)$$$
High uncertainty: large amount of information
$$$p(x1) > p(x2)$$$ $$$\Rightarrow$$$ $$$I(x1) < I(x2)$$$
Meaining:
- probability of x1 occuring is higher than x2
- uncertainty (amount of information) of x1 is smaller than x2
In conclusion,
$$$I(x) = \dfrac{1}{p(x)}$$$
probability of x occuring is low ---> amount of information I(x) becomes large
================================================================================
2 independent information can be summed
3kg: first information
5kg: second information
(3+5)kg
That is, $$$I(x1,x2) = I(x1) + I(x2)$$$
When you have 2 variables, which is multiplied,
you can use log to separate them
That is, you use log in entropy
================================================================================
Shannon Entropy
$$$H(X) = \sum\limits_{i=1}^{n} (\log_2{\dfrac{1}{p_i}} * p_i) $$$
================================================================================
Why log base is 2?
Because you can think "information" in "bit unit"
If max entropy is 4, you need 2 bits
$$$4=2^2$$$
4 bits can hold all above information
If max entropy is $$$2^n$$$, you need n bits
$$$2^n=2^n$$$