================================================================================
In previous lectures, you learned "how to estimate probability density function of population
using parameters of samples from population"
================================================================================
Now, you will not suppose "probability density function" as Gaussian distribution shape
You will suppose arbitrary shape.
Then, you can't use parameters to estimate PDF
================================================================================
Likelihood: density of each class, $$$p(x|\omega_i)$$$
Previous chapters:
1. You supposed you know likelihood which is PDF of specific class (you used MLE)
2. You supposed you know parametric shape of likelihood (parametric estimation)
================================================================================
* You won't suppose the shape of PDF
* You will estimate PDF via sample
================================================================================
* Non parametric PDF estimation
- Histogram
- Kernel density estimation
- k-NNR
================================================================================
Sample data
Population data's PDF which is composed of multiples classes
PDF: $$$P(x_1,x_2|\omega_i)$$$
================================================================================
* Histogram
* It decribes density of data
* You separate data by fixed intervals
* You count frequency of data in each interval
* High frequency -> Tall vertical bar
================================================================================
* To get probability value, you can use following
$$$P_{H}(x) = \frac{1}{N} \frac{\text{height of bin}}{\text{width of bin}}$$$
* Advantage:
- Easy to write
* Disadvantage:
- Final shape from density estimation depends on "starting position of the bins"
- Density can look "not continous"
But it's not actuallly "discontinuity", it just look "discontinuity" due to interval of bin
- Curse of dimention: more dimension creates more number of bins
Then, most of bins become empty, resulting "discontinuity-like looking"
- So, Histogram is not useful for practical analysis
but it's useful for rapid visualization
================================================================================
* Example
12 data:
2.1, 2.4, 2.3, 2.4, 2.47, 2.7, 2.6, 2.65, 3.3, 3.39, 3.8, 3.87
* Find Histogram
- width of big: 0.5
- Interval: 1.0 - 1.5, 2.0 - 2.5, ...
* Code
Y=[2.1, 2.4, 2.3, 2.4, 2.47, 2.7, 2.6, 2.65, 3.3, 3.39, 3.8, 3.87]
X=[0.25:0.5:5]
N=HIST(Y,X)
Y=[2.1, 2.4, 2.3, 2.4, 2.47, 2.7, 2.6, 2.65, 3.3, 3.39, 3.8, 3.87]
X=[0.5:0.5:5]
N=HIST(Y,X)
================================================================================