================================================================================
https://datascienceschool.net/view-notebook/766fe73c5c46424ca65329a9557d0918/
================================================================================
* Ensemble
* Ensembled model = Multiple prediction models
* Less accuracy variance
* Less overfitting
* Combined weak each model makes better model
================================================================================
Ensemble
* Aggregation: fixed set of models
- Majority voting
- Bagging
- Random forest
* Boosting: models will be added continuously and gradually
- AdaBoost
- Gradient Boost
- XGBoost
================================================================================
* Majority voting
- Hard voting: naive voting from results
- Soft voting: weighted voiting, criterion is conditional probability of each model
================================================================================
* Bagging
- Create multiple models which output different output from the same one model
- Bagging uses same model and same parameters
- Bagging randomly selects train dataset
- Then, it uses majority voting
================================================================================
* Bagging
- Pasting: Not allow duplicates in train dataset
- Bagging: Allow duplicates in train dataset
- Random subspaces: select partial features
- Random patches: Random train data + random partial feature
* Evaluation uses OOB data
- OOB data $$$\cap$$$ Train data = $$$\phi$$$
================================================================================
Random forest
Random forest = decision tree1 + decision tree2 + $$$\cdots$$$
1. Reduce dimenstion of feature vector
2. Randomly select partial features
3. Then, less correlation between each decision tree, resulting in stable model
================================================================================
Extremely randomized tree
* Randomly select feature from feature vector
================================================================================