010-lec-002. initialize weight,RBM,xavier initialization,he initialization # @ # If 'some layer' has 0 weight, # all layers which are located in front of 'some layer' have 0 gradient # @ # So, you need to set initial value on weight wisely # 1. Don't give 0 to all weight or model can't be trained # 1. Consider RBM(restricted boltzmann machine) # RBM(restricted boltzmann machine) # Suppose you have 'forward' and 'backward' # forward: $$$x_{1}w_{1}=value$$$ # backward $$$value\times w_{1}=input$$$ # Then, you compare '$$$x_{1}$$$' and 'input' # You update weights of 'forward' model for '$$$x_{1}$$$' to have smallest difference with 'input' # RBM is also known as 'encoder' and 'decoder' $$$x_{1}$$$ -> value : encoder $$$x_{1}$$$ <- value : decoder # @ # Even though you have deep and wide nn, # first of all, you can apply RBM to first 2 layers, # by encoding and decoding to update good weights # @ # But you don't need to use RBM for weight initialization # There are simple ways you can use # 1. xavier initialization # 1. he initialization # You can give random value to weight # But consider number of input and number of ouput # xavier initialization W=np.random.randn(fan_in,fan_out)/np.sqrt(fan_in) # he initialization W=np.random.randn(fan_in,fan_out)/np.sqrt(fan_in/2) # @ # activation functions and initialization # initialization method maxout ReLU VLReLU tanh sigmoid # LSUV 93 92 92 89 n/c # OrthNorm 93 91 92 89 n/c # OrthNorm-MSRA scaled - 91 93 - n/c # xavier 91 90 92 89 n/c # MSRA n/c 90 92 89 n/c # LSUV: layer sequential uniform variance