007-lec-001. learning rate, preprocess data, overfitting, regularization
@
learning rate is step size in gradient descent algorithm
Too big learning rate can cause overshooting
Too small learning rate can cause too long learning time or falling into local minimum
Proper starting value of learning rate is 0.01
@
Even with good learning rate, overshooting can be occurred,
when loss function shapes ellipse with wide width and short height
In this case, we need to normalize data
There are 2 ways for normalizing data(zero-centered data, normalized data)
Standardization(one of normalization techniques):
$$$x_{j}^{'} = \frac{x_{j}-\mu_{j}}{\sigma_{j}}$$$
$$$x_{j}^{'}$$$: normalized data
$$$x_{j}$$$: unnormalized data
$$$\mu_{j}$$$: mean of data
$$$\sigma_{j}$$$: standard deviation of data
X_std[:,0] = (X[:,0] - X[:,0].mean() / X[:,0].std())
@
Solutions for overfitting
1. More training data
1. Reduce number of features
1. Regularization
i: training set
$$$\lambda$$$: regularization strength
LossFunction = $$$\frac{1}{N} \sum\limits_{i} D(S(Wx + b), Loss_{i}) + \lambda \sum W^{2}$$$
Regularization term in tf:
l2regularization = 0.001 * tf.reduce_sum(tf.square(W))