Skip to main content

Posts

Showing posts from August, 2015

Scrapy : A python framework for web crawling

Scrapy in the words of its creators: "Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival."  A screenshot grabbed from the site shows how concise the working code can be: Scrapy works only with Python 2.7. The objective of this blog is to get you started with Scrapy and provide you with enough information to carry further on your own. In this blog, I will setup a scrapy project and retrieve some data off my blog site. Pre-requsites Python 2.7 pip and setuptools Python packages. lxml . Most Linux distributions ships prepackaged versions of lxml. Otherwise refer to http://lxml.de/installation.html OpenSSL . This comes preinstalled in all operating systems, except Windows where the Python installer ships it bundled. Setup pip install scrapy Create a Project scrapy startproject blog This comman