This is note I wrote as I was take following lecture http://www.kocw.net/home/search/kemView.do?kemId=1189957 - How_to_predict_likelihood_pdf_Parameter_estimation_Non_parameter_density_function_Maximum_likelihood_estimation ================================================================================ * So far, you've learned the way which classifies feature vector "when you know likelihood (as PDF)" ================================================================================ * Then, how can you predict that likelihood PDF itself? * For example, so far, you supposed you already know the probability distribution of heights of male and female. * But you actually don't know that probability distribution. It means you should predict PDF from experiments and samples. ================================================================================ * Techniques to predict likelihood PDF: 1. Parameter Estimation: * You'll suppose the assumption that PDF will have specific shape (like Gaussian shape specifically) * Mean and variance are core elements which define Gaussian shape * Parameter Estimation finds mean and variance of Gaussian distribution by using Maximum Likelihood Estimation * Code mean,variance=maximum_likelihood_estimation(sample_data) 2. Non-parametric Density Estimation * You won't suppose the assumption that PDF will not have specific shape * Non-parametric Density Estimation just predicts likelihood PDF from data * Kernel Density Esitimation, K Nearest Neighbor Estimation, etc * Code likelihood_PDF=non_parametric_density_estimation(sample_data) ================================================================================ * Maximum Likelihood Estimation (MLE) * It's the way which finds likelihood PDF by selecting best proper mean and variance ================================================================================ * X: feature vector like heights * There should be population which generates feature vectors * What you want to do is to find the parameters which defines the shape of that population. * $$$\theta_1$$$: parameters which defines candidate population 1 * $$$\theta_2$$$: parameters which defines candidate population 2 * $$$P(X|\theta_1)$$$: probability of X occuring from population 1 which is defined by $$$\theta_1$$$ * $$$P(X|\theta_2)$$$: probability of X occuring from population 2 which is defined by $$$\theta_2$$$ ================================================================================ * Besides, other population and its corresponding $$$\theta$$$ can exist ================================================================================ * Then, you get this each probabiity of X occuring when each $$$\theta$$$ (each population) is given * $$$p(X|\theta)$$$: probability value of $$$X$$$ occuring from populations which are defined $$$\theta$$$ * $$$\hat{\theta} = \arg_{\theta} \max[p(X|\theta)]$$$ * $$$\hat{\theta}$$$: parameters $$$\theta$$$ which is best preper to define population ================================================================================ * How to find above best proper $$$\theta$$$ * Maximum likelihood function $$$\hat{\theta} = \arg_{\theta} \max P(X|\theta)$$$