My first HTML document

https://www.youtube.com/watch?v=WsQLdu2JMgI ================================================================================

- Encoder-decoder architecture (or seq2seq architecture) ================================================================================

- Encoder part creates "context vector" ================================================================================

- Docoder part creates "output" ================================================================================

Context vector is fixed size vector So, if sentence becomes too long, context vector can't hold all information in the short vector ================================================================================ How to overcome above problem?

- All states from all RNN cells

- Dynamically, decoder creates "context vector" using all states ================================================================================ Attention mechanism in seq2seq

- Use FC, tanh to get scores of each state in each RNN cell

- Use softmax and get probability value as "attention weight"

- I: 90%, love: 0%, you: 10% are focused - Create context_vector_1 (cv1)

- "cv1" and "<start> token" are passed into the first RNN cell in decoder - After performing the first decoder RNN cell, output (Nan) is created and dh1 (decoder_hidden_1) is created

================================================================================ Second step

- Note that this part

- Note that these 3 things (encoder RNN cell outputs) are always used

- 90% attention weight on "you"

================================================================================

- Which hidden state should be attention?

================================================================================

================================================================================ Teacher forcing - Decoder cell creates "wrong prediction"

- "Wrong prediction" causes "next wrong prediction"

================================================================================ Use "teacher forcing" algorithm - Pass "true label" than "prediction"

================================================================================