% https://www.youtube.com/watch?v=O75lE0HLu-A&index=24&t=0s&list=PLbhbGI_ppZIRPeAjprW9u9A46IJlGFdLn
% 002_Dirhichlet_Process_003.py
% ===
%
To understand Dirichlet Process,
you learned Gaussian Mixture model, general clustering model
You learned prior parameter is important
You could perform parameter sampling by using Dirichlet distribution
But still varying size is not possible
% ===
Let's define Dirichlet Process
$$$G|\alpha,H \sim DP(\alpha,H)$$$
G is being created
Last time, you created G with only $$$\alpha$$$
But this time, you will create G with $$$\alpha$$$ and H
H: base distribution, newly introduced one in DP
$$$\alpha$$$: concetration parameter, strength parameter (strength of prior)
parameter of Dirichlet distribution
% ===
$$$G|\alpha,H$$$ follows $$$DP(\alpha,H)$$$
Let's write above sentence in math form
G is generally not one number but like vector
because basically G will be used to create parameter of multinomial distribution
Sampled G: $$$G(A_1),...,G(A_r)$$$
when $$$\alpha$$$ and $$$H$$$ are given as condition
When you sample DP, you need distribution function
and you will that function as Dirichlet distribution
$$$(G(A_1),...,G(A_r))|\alpha,H \sim Dir(\alpha H(A_1),...,\alpha H(A_r))$$$
% ===
%
$$$A_1 \cap ... \cap A_r = \phi$$$
$$$A_1 \cup ... \cup A_r = \Omega$$$
It's that you insert PDF probability distribution
into Dirichlet distribution's parameter
Then, there is concetration $$$\alpha$$$ and
base distribution
So, with both which can vary, DP can vary
% ===
When you sample, what's expectaion?
Since you sample from Dirichlet distribution, expectaion will be center (green circle) and center will be base distribution
$$$E[G(A)]=H(A)$$$
Variance:
$$$V[G(A)] = \dfrac{H(A)(1-H(A))}{\alpha+1}$$$
H is distribution used to make parameter of Dirichlet distribution as base
to enable Dirichlet Process
% ===
Dirichlet process is managed by prior
Setting number of cluster is done by manipulating prior
You use likelihood on prior and you update posterior
$$$Posterior \propto Likelihood \times Prior$$$
Likelihood is about how many points are created in which region $$$(A_1, .., A_k)$$$
Prior is about how each region is divided
% ===
Let's expand more
$$$(G(A_1),...,G(A_r))|\theta_1,...,\theta_n, \alpha,H
\sim Dir(\alpha H(A_1)+n_1,...,\alpha H(A_r)+n_r)$$$
Condition $$$\alpha$$$, H still exist
$$$\theta_1, ..., \theta_n$$$: n number of trials
Each trial has each event
$$$n_k = |\{ \theta_i | \theta_i \in A_k, 1 \le i \le n \}|$$$
n: selectaion candidate
% ===
%
Definition of Dirichlet Process
$$$(G(A_1),...,G(A_r))|\alpha,H \sim DP(\alpha,H)$$$
$$$\sim Dir(\alpha H(A_1),...\alpha H(A_r))$$$
Posterior form of DP
$$$G|\theta_1,...,\theta_n,\alpha,H \sim
DP(\alpha+n,\dfrac{\alpha}{\alpha+n}H+\dfrac{n}{\alpha+n}
\dfrac{\sum_{i=1}^n \delta_{\theta_i}}{n})$$$
$$$\theta_1,...,\theta_n$$$: information from data
% ===
Definitioin of DP is done,
then how to realize (how to draw sample) definition of DP?
Above realize method is called "generation schemes" or "construction"
In case of DP, there are major 3 generation schemes (but 2 are very similar)
Stick Breaking Scheme
Polya Urn Scheme
Chinese Restraurant Process Scheme