这几天正好有需求实现一个爬虫程序,想到爬虫程序立马就想到了python,python相关的爬虫资料好像也特别多。于是就决定用python来实现爬虫程序了,正好发现了python有一个开源库scrapy,正是用来实现爬虫框架的,于是果断采用这个实现。下面就先安装scrapy,决定在windows下面安装。
Scrapy简介
Scrapy是一个快速,高效的网页抓取python框架。主要用于Web抓取&提取信息&格式化数据。经常用此做数据挖掘、检测、测试等。
安装所需软件
安装步骤
验证是否安装ok
C:\Users\admin>python
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>>
C:\Users\admin>python
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import zope.interface
>>>
#进入插件目录并执行命令安装
>D:\python-plugin\w3lib-1.3>python setup.py install
验证
D:\python-plugin\w3lib-1.3>python
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import w3lib
>>>
error: Unable to find vcvarsall.bat
这是因为pyOpenSSL编译需要借助VC++编译,所以如果这个时候已经安装了visual studio,就需要执行visual studio的路径:
如果安装了 Visual Studio 2010,则执行如下命令:
SET VS90COMNTOOLS=%VS100COMNTOOLS%
如果安装了 Visual Studio 2012 (Visual Studio Version 11),则执行如下命令:
SET VS90COMNTOOLS=%VS110COMNTOOLS%
如果安装了 Visual Studio 2013 (Visual Studio Version 12),那么执行下面命令
SET VS90COMNTOOLS=%VS120COMNTOOLS%
可以参考文章:http://blog.csdn.net/secretx/article/details/17472107
这是因为需要在windows下安装openssl这个库,可以到http://slproweb.com/products/Win32OpenSSL.html地址下载:
> set LIB=C:\OpenSSL-Win32\lib\VC\static;%LIB%
> set INCLUDE=C:\OpenSSL-Win32\include;%INCLUDE%
则这个时候编译通过
#进入scrapy目录并执行安装
>D:\python-plugin\Scrapy-0.16.5>python setup.py install
验证
D:\python-plugin\Scrapy-0.16.5>scrapy
Scrapy 0.16.5 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
fetch Fetch a URL using the Scrapy downloader
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
D:\python-plugin\Scrapy-0.16.5>
安装完毕 OK