13-02 LDA(linear discriminant analysis)
@
Projection by linear transform:
$$$y=W^{T}x$$$
W is (D,1) (D by 1)
Row vectors contained in matrix W have directions which are used when performing projection
x is (D,1)
@
2018-06-07 10-50-44.png
One of parameters which each class has is average parameter
Red ellipse : $$$\omega_{1}$$$ class data's distribution
Blue ellipse : $$$\omega_{2}$$$ class data's distribution
Average point of each class data's distribution is generally their centroids ($$$\mu_{1}, \mu_{2}$$$)
Average of class i $$$\mu_{i}$$$ :
$$$\mu_{i} = \frac{1}{N} \sum\limits_{x\in \omega_{i}} x$$$
=========================================================
How to find average after projection?
Average of class i $$$\mu_{i}$$$ :
$$$\mu_{i} = \frac{1}{N} \sum\limits_{x\in\omega_{i}}x$$$
$$$\widetilde{\mu}_{i}$$$ : average of class i after projection
After projection means you performed linear transform $$$y=W^{T}x$$$
After that, you use function in respect to y instead of x
$$$\widetilde{\mu}_{i} = \frac{1}{N}\sum\limits_{y\in \omega_{i}}y$$$
But you can write function in respect to x again
$$$\widetilde{\mu}_{i} = \frac{1}{N}\sum\limits_{x\in \omega_{i}}W^{T}x$$$
You can write $$$W^{T}$$$ in front area
$$$\widetilde{\mu}_{i} = W^{T}\mu_{i}$$$
Meaning :
[Projection of $$$\mu_{i} = \widetilde{\mu}_{i}$$$] == [projection of $$$\omega_{1}$$$ class data -> from projected data, find its average $$$\widetilde{\mu}_{1}$$$]
2018-06-07 11-02-06.png
=========================================================
Case that you project data onto $$$x_{2}$$$ axis
2018-06-07 11-03-06.png
=========================================================
Distribution after projection onto each axix $$$x_{1}$$$ and $$$x_{2}$$$
2018-06-07 11-04-57.png
Case you project data onto $$$x_{1}$$$ :
Wide distance between averages after projection $$$\widetilde{\mu}_{1}$$$ and $$$\widetilde{\mu}_{2}$$$
Red and blue are projected in wide range regions but they're overapped in some parts (circle on $$$x_{1}$$$)
This is good case of far distance between red class and blue class
Case you project data onto $$$x_{2}$$$ :
You get small overapped region,
which means You get more clear separation of 2 classes
This is good case of class separation
But you should achieve both of them (wide range between averages and good separation after projections)
=========================================================
You can choose "distance between average points (centroids) of each used data"
as "target function J(W)" which you should maximize :
$$$J(W) = |\widetilde{\mu}_{1} - \widetilde{\mu}_{2}|$$$
$$$J(W) = |W^{T}\mu_{1}-W^{T}\mu_{2}|$$$
$$$J(W) = |W^{T}(\mu_{1}-\mu_{2})|$$$
J(W) : target function in respect to transform matrix W
2018-06-07 11-14-47.png
This target function J(W) finds most wide distance between 2 average points ($$$\widetilde{\mu}_{1}$$$ and $$$\widetilde{\mu}_{1}$$$) after projection
But this target function is insufficient in terms of finding good separation aspect,
which will be complemented by following Fisher's LDA method
=========================================================
This method is suggested by Fisher
Target function J(W) which should be maximized is :
$$$J(W) = \frac{|\widetilde{\mu}_{1}-\widetilde{\mu}_{2}|^{2}}{\widetilde{S}_{1}^{2}+\widetilde{S}_{2}^{2}}$$$
$$$\widetilde{\mu}_{1}$$$ : average of class 1 data after projection
$$$\widetilde{\mu}_{2}$$$ : average of class 2 data after projection
$$$|\widetilde{\mu}_{1}-\widetilde{\mu}_{2}|^{2}$$$ : square of difference of 2 averages after projection
$$$\widetilde{S}_{1}^{2}$$$ : covariance matrix of class 1 data after projection
$$$\widetilde{S}_{2}^{2}$$$ : covariance matrix of class 2 data after projection
$$$\widetilde{S}_{1}^{2}+\widetilde{S}_{2}^{2}$$$ : indicator related to within-class scatter (variance within class)
Fisher's LDA is to find linear fucntion $$$W^{T}x$$$ which maximizes above target function J(W)
To maximize J(W),
$$$|\widetilde{\mu}_{1}-\widetilde{\mu}_{2}|^{2}$$$ should be largest
$$$\widetilde{S}_{1}^{2}+\widetilde{S}_{2}^{2}$$$ should be lowest
=========================================================
Scatter (variance) matrix of class i after projection
Since you're dealing with concept related to variance which generally is notated with squre,
you also use squre here
$$$\widetilde{S}_{i}^{2} = \sum\limits_{y\in\omega_{i}} (y-\widetilde{\mu}_{i})^{2} $$$
$$$\widetilde{S}_{i}^{2} = \sum\limits_{y\in\omega_{i}} (y-\widetilde{\mu}_{i})(y-\widetilde{\mu}_{i})^{T}$$$
=========================================================
Fisher's method finds transform matrix W which projects data involved in same class onto near region,
at the same time, which makes far distances between average points after projection ($$$\tilde{\mu}_{1}, \tilde{\mu}_{2}$$$)
2018-06-07 13-08-09.png
Left axis :
short distance between average points between projected data classes
good separation
below axis ($$$x_{1}$$$) :
long distance between average points between projected data classes
bad separation (overapped)
=========================================================
How to find transform matrix W which maximizes target function J(W)?
That method is already found by mathematicians
Target function J(W) you've seen is this :
$$$J(W) = \frac{|\widetilde{\mu}_{1}-\widetilde{\mu}_{2}|^{2}}{\widetilde{S}_{1}^{2}+\widetilde{S}_{2}^{2}}$$$
To find optimal transform matrix W, you need to express target function J(W) in respect to W
Assumption :
In multi-dimension feature space, scatter matrix S is identical form to covariance matrix only without scale factor $$$\frac{1}{N-1}$$$
Scatter matrix of class i $$$S_{i} = \sum\limits_{x\in\omega_{i}} (x-\mu_{i}) (x-\mu_{i})^{T}$$$
2018-06-07 13-19-48.png
Note
Covariance matrix $$$\Sigma= \frac{1}{N-1} \sum\limits_{x\in\omega_{i}} (x-\mu_{i}) (x-\mu_{i})^{T}$$$
Scatter matrix within class S_{with-in-class} :
$$$S_{with-in-class}^{2} = S_{1}^{2} + S_{2}^{2}$$$
=========================================================
After projection, scatter matrix within class \widetilde{S}_{with-in-class} has following relation
$$$\widetilde{S}_{1}^{2} + \widetilde{S}_{2}^{2} = W^{T}S_{within-class}W = \widetilde{S}_{within-class}$$$
$$$S_{within-class}$$$ : scatter matrix (covariance matrix) within class before projection
W : tranform matrix
Induce step :
$$$\widetilde{S}_{i}^{2} = \sum\limits_{y\in\omega_{i}} (y-\widetilde{\mu}) (y-\widetilde{\mu})^{T}$$$
$$$\widetilde{S}_{i}^{2} = \sum\limits_{x\in\omega_{i}} (W^{T}x-W^{T}\mu_{i}) (W^{T}x-W^{T}\mu_{i})^{T}$$$
$$$\widetilde{S}_{i}^{2} = \sum\limits_{x\in\omega_{i}} W^{T} (x-\mu_{i}) (x-\mu_{i})^{T} W$$$
$$$\widetilde{S}_{i}^{2} = W^{T}S_{i}^{2}W$$$