https://datascienceschool.net/view-notebook/6927b0906f884a67b0da9310d3a581ee/
================================================================================
* Bag of Words
* Word dictionary: "I", "am", "a", "boy", "girl"
* Word: index
"I":0
"am":1
"a":2
"boy":3
"girl":4
* Sentence to vector
"I am a girl" = [1 1 1 0 1]
================================================================================
* Word embedding: make a float values for one word in the text
* Following is 2D embedding example
* Word: 2D vector
"I":[0.3 0.2]
"am":[0.1 0.8]
"a":[0.5 0.6]
"boy":[0.2 0.9]
"girl":[0.4 0.7]
================================================================================
* Operation on word vectors to express one entire text
- Concatenation
- Averaging
================================================================================
* Create word embedding
* Suppose V number of words in dictionary
* One word is expressed by V-dim vector
* Encode input and output word by BOW way
* Perform one-hot-encoding on word vector x
* $$$h=\sigma(Wx)$$$
- $$$h$$$: hidden vector
- $$$W$$$: trainable parameter
- $$$x$$$: input word vector
================================================================================
* Embedding word vectors have following characteristics
================================================================================
NLP probelm
* single word context: one word is given, predict next word
* multi word context: words are given, predicts words
* Example,
- Target words: the quick brown fox jumped over the lazy dog
- When 3 words (the, quick, brown) are given, you need to predict "fox"
================================================================================
* Embedding matrix is used in common for 3 input words
================================================================================
* Skip gram embedding
* one word is give, you predict words
================================================================================
* Word2vec