007. install phantomjs and selenium
# @
# We will learn how to scrap web data by using "web browser"
# @
# What we've learned for scraping is imitating request pattern and sending imitated request
# And we bring html code and analyze, parse, process, extract web data from html code
# @
# But there is case that we can't extract html code (like facebook site) even with imitating request
# @
# On facebook post, press ctrl shift i to inspect web page
# You can see various elements in here
# However, you actually can't see those elements in real html code after pressing F12
# The reason for this is that facebook page doesn't show the code for the content
# It first shows skin page by html code and then after that, facebook page shows contents by using JavaScript after bringing that contents
# Therefore, only after running JavaScript, we can scrap that contents
# @
# selenium is used to control web browser remotely
# selenium is a sort of python module
# which can access to PhantomJS(web browser without screen), firefox, chrome
# In other words, we explore web page via PhantomJS web browser, then,
# we bring that explored web page by selenium
# @
# docker pull ubuntu:16.04
# docker run -it ubuntu:16.04
# apt-get update
# You install python3 and python3-pip
# apt-get install -y python3 python3-pip
# pip3 install selenium
# pip3 install beautifulsoup4
# apt-get install -y wget libfontconfig
# You create /home/root/src folder and move to that folder cd $_
# mkdir -p /home/root/src && cd $_
# I get PhantomJS
# wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
# unzip file
# tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2
# You move to that created folder
# cd phantomjs-2.1.1-linux-x86_64/bin/
# You copy files in this folder into /usr/local/bin/
# cp phantomjs /usr/local/bin/
# You install font
# apt-get install -y fonts-nanum*
# docker ps -a
# You save above image
# docker commit imageid ubuntu-phantomjs