================================================================================ https://datascienceschool.net/view-notebook/766fe73c5c46424ca65329a9557d0918/ ================================================================================ * Ensemble * Ensembled model = Multiple prediction models * Less accuracy variance * Less overfitting * Combined weak each model makes better model ================================================================================ Ensemble * Aggregation: fixed set of models - Majority voting - Bagging - Random forest * Boosting: models will be added continuously and gradually - AdaBoost - Gradient Boost - XGBoost ================================================================================ * Majority voting - Hard voting: naive voting from results - Soft voting: weighted voiting, criterion is conditional probability of each model ================================================================================ * Bagging - Create multiple models which output different output from the same one model - Bagging uses same model and same parameters - Bagging randomly selects train dataset - Then, it uses majority voting ================================================================================ * Bagging - Pasting: Not allow duplicates in train dataset - Bagging: Allow duplicates in train dataset - Random subspaces: select partial features - Random patches: Random train data + random partial feature * Evaluation uses OOB data - OOB data $$$\cap$$$ Train data = $$$\phi$$$ ================================================================================ Random forest Random forest = decision tree1 + decision tree2 + $$$\cdots$$$ 1. Reduce dimenstion of feature vector 2. Randomly select partial features 3. Then, less correlation between each decision tree, resulting in stable model ================================================================================ Extremely randomized tree * Randomly select feature from feature vector ================================================================================