https://www.youtube.com/watch?v=WsQLdu2JMgI
================================================================================
- Encoder-decoder architecture (or seq2seq architecture)
================================================================================
- Encoder part creates "context vector"
================================================================================
- Docoder part creates "output"
================================================================================
Context vector is fixed size vector
So, if sentence becomes too long,
context vector can't hold all information in the short vector
================================================================================
How to overcome above problem?
- All states from all RNN cells
- Dynamically, decoder creates "context vector"
using all states
================================================================================
Attention mechanism in seq2seq
- Use FC, tanh to get scores of each state in each RNN cell
- Use softmax and get probability value as "attention weight"
- I: 90%, love: 0%, you: 10% are focused
- Create context_vector_1 (cv1)
- "cv1" and " token" are passed into the first RNN cell in decoder
- After performing the first decoder RNN cell,
output (Nan) is created
and dh1 (decoder_hidden_1) is created
================================================================================
Second step
- Note that this part
- Note that these 3 things (encoder RNN cell outputs) are always used
- 90% attention weight on "you"
================================================================================
- Which hidden state should be attention?
================================================================================
================================================================================
Teacher forcing
- Decoder cell creates "wrong prediction"
- "Wrong prediction" causes "next wrong prediction"
================================================================================
Use "teacher forcing" algorithm
- Pass "true label" than "prediction"
================================================================================