https://www.youtube.com/watch?v=CdH7U3IjRI8 ================================================================================ Give the milk to the baby Baby drink that mile all the time So, that milk (input) has "low uncertainty" Amount of information (from the milk) who the baby gets is "small" ================================================================================ Give the spicy food to the baby Baby has not experience with spicy food So, that spicy food (input) has "high uncertainty" Amount of information (from the spicy food) who the baby gets is "large" ================================================================================ More uncertainty ---> More amount of information you can get ================================================================================ Let's represent "uncertainty" and "amount of information you can get" in numerical value by using "shannon entropy" Shannon entropy $$$H(X) $$$ $$$= - \sum\limits_{i=1}^{n} (p_i \log_2{p_i})$$$ $$$= \sum\limits_{i=1}^{n} (p_i -\log_2{p_i})$$$ $$$= \sum\limits_{i=1}^{n} (p_i \log_2{p_i^{-1}})$$$ $$$= \sum\limits_{i=1}^{n} (p_i \log_2{\dfrac{1}{p_i}})$$$ ================================================================================ Expectation value: mean value from random variable $$$E[X] = \sum\limits_{i=1}^{k} x_i p_i$$$ Entropy fomular: $$$H(X) = \sum\limits_{i=1}^{n} (\log_2{\dfrac{1}{p_i}} * p_i)$$$ - 2 formulars have almost same form - amount of information which you can get from the input ================================================================================ Uncertainty =Amount_of_information = information_of_milk * probability_of_milk + information_of_no_milk * probability_of_no_milk ================================================================================ Why $$$\log_2{\dfrac{1}{p_i}}$$$ represents "amount of information"? $$$H(X) = \sum\limits_{i=1}^{n} (\log_2{\dfrac{1}{p_i}} * p_i)$$$ Replace: $$$\log_2{\dfrac{1}{p_i}} = I(x_i)$$$ $$$H(X) = \sum\limits_{i=1}^{n} (I(x_i) * p_i)$$$ High uncertainty: large amount of information $$$p(x1) > p(x2)$$$ $$$\Rightarrow$$$ $$$I(x1) < I(x2)$$$ Meaining: - probability of x1 occuring is higher than x2 - uncertainty (amount of information) of x1 is smaller than x2 In conclusion, $$$I(x) = \dfrac{1}{p(x)}$$$ probability of x occuring is low ---> amount of information I(x) becomes large ================================================================================ 2 independent information can be summed 3kg: first information 5kg: second information (3+5)kg That is, $$$I(x1,x2) = I(x1) + I(x2)$$$ When you have 2 variables, which is multiplied, you can use log to separate them That is, you use log in entropy ================================================================================ Shannon Entropy $$$H(X) = \sum\limits_{i=1}^{n} (\log_2{\dfrac{1}{p_i}} * p_i) $$$ ================================================================================ Why log base is 2? Because you can think "information" in "bit unit" If max entropy is 4, you need 2 bits $$$4=2^2$$$ 4 bits can hold all above information If max entropy is $$$2^n$$$, you need n bits $$$2^n=2^n$$$