https://www.youtube.com/watch?v=WsQLdu2JMgI ================================================================================ - Encoder-decoder architecture (or seq2seq architecture) ================================================================================ - Encoder part creates "context vector" ================================================================================ - Docoder part creates "output" ================================================================================ Context vector is fixed size vector So, if sentence becomes too long, context vector can't hold all information in the short vector ================================================================================ How to overcome above problem? - All states from all RNN cells - Dynamically, decoder creates "context vector" using all states ================================================================================ Attention mechanism in seq2seq - Use FC, tanh to get scores of each state in each RNN cell - Use softmax and get probability value as "attention weight" - I: 90%, love: 0%, you: 10% are focused - Create context_vector_1 (cv1) - "cv1" and "<start> token" are passed into the first RNN cell in decoder - After performing the first decoder RNN cell, output (Nan) is created and dh1 (decoder_hidden_1) is created ================================================================================ Second step - Note that this part - Note that these 3 things (encoder RNN cell outputs) are always used - 90% attention weight on "you" ================================================================================ - Which hidden state should be attention? ================================================================================ ================================================================================ Teacher forcing - Decoder cell creates "wrong prediction" - "Wrong prediction" causes "next wrong prediction" ================================================================================ Use "teacher forcing" algorithm - Pass "true label" than "prediction" ================================================================================