010-lec-002. initialize weight,RBM,xavier initialization,he initialization
# @
# If 'some layer' has 0 weight,
# all layers which are located in front of 'some layer' have 0 gradient
# @
# So, you need to set initial value on weight wisely
# 1. Don't give 0 to all weight or model can't be trained
# 1. Consider RBM(restricted boltzmann machine)
# RBM(restricted boltzmann machine)
# Suppose you have 'forward' and 'backward'
# forward:
$$$x_{1}w_{1}=value$$$
# backward
$$$value\times w_{1}=input$$$
# Then, you compare '$$$x_{1}$$$' and 'input'
# You update weights of 'forward' model for '$$$x_{1}$$$' to have smallest difference with 'input'
# RBM is also known as 'encoder' and 'decoder'
$$$x_{1}$$$ -> value : encoder
$$$x_{1}$$$ <- value : decoder
# @
# Even though you have deep and wide nn,
# first of all, you can apply RBM to first 2 layers,
# by encoding and decoding to update good weights
# @
# But you don't need to use RBM for weight initialization
# There are simple ways you can use
# 1. xavier initialization
# 1. he initialization
# You can give random value to weight
# But consider number of input and number of ouput
# xavier initialization
W=np.random.randn(fan_in,fan_out)/np.sqrt(fan_in)
# he initialization
W=np.random.randn(fan_in,fan_out)/np.sqrt(fan_in/2)
# @
# activation functions and initialization
# initialization method maxout ReLU VLReLU tanh sigmoid
# LSUV 93 92 92 89 n/c
# OrthNorm 93 91 92 89 n/c
# OrthNorm-MSRA scaled - 91 93 - n/c
# xavier 91 90 92 89 n/c
# MSRA n/c 90 92 89 n/c
# LSUV: layer sequential uniform variance