https://datascienceschool.net/view-notebook/293ece8b0d124fbaa4d4d52bb8f1cb42/
================================================================================
* scikit-learn package
* dataset for benchmark
* preprocessing datasets
* supervised learning algorithms
* unsupervised learning algorithms
* evaluation and selection models
================================================================================
Datasets which are provided by scikit-learn
* sklearn.datasets provides datasets
* sk.datasets.load() 계열 명령: small datasets
* sk.datasets.fetch() 계열 명령: large datasets
* sk.datasets.make() 계열 명령: dummy datasets which ared created by probability distributions
================================================================================
* sk.datasets.load()
* load_boston: Boston house prices data for regression
* load_diabetes: Diebete data for regression
* load_linnerud: linnerud data for regression
* load_iris: Iris data for classification
* load_digits: Digit data for classification
* load_wine: Wine data for classification
* load_breast_cancer: Breast cancer data for classification
================================================================================
* sk.datasets.fetch()
* fetch_california_housing: Californial house prices for regression
* fetch_covtype : Ground data for regression
* fetch_20newsgroups : News text data
* fetch_olivetti_faces : Facial image data
* fetch_lfw_people : Famous people's facial image data
* fetch_lfw_pairs : Famous people's facial image data
* fetch_rcv1 : Roiter news corpus data
* fetch_kddcup99 : Kddcup 99 Tcp dump data
================================================================================
* sk.datasets.make()
make_regression: create dummy data for regression
make_classification: create dummy data for classification
make_blobs: create dummy data for clustering
================================================================================
Properties of data
* Bunch: dataset object
* data: numpy array, feature data, independent variables
* target: numpy array, label data, dependent variables
* feature_names
* target_names
* DESCR: description on data