python抓取数据之Scrapy框架的使用

程序员文章站 2022-06-15 20:26:07

...

首先明确一点scrapy是需要安装的。
安装scrapy >>>pip3 install scrapy
然后用scrapy -h 查看命令的使用方法，此时可以看手册去分清那些命令是需要scrapy项目，那些不需要。
比如，startproject 创建scrapy命令，是不需要有项目的。crawl 执行爬虫，就需要有项目

bogon:~ zhangxiaojing$ scrapy
Scrapy 1.5.0 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

全局命令:

startproject
settings
runspider
shell
fetch
view
version


项目(Project-only)命令:

crawl
check
list
edit
parse
genspider
deploy
bench

二、项目目录

tutorial/
    scrapy.cfg
    tutorial/               建立的爬虫目录
        __init__.py
        items.py           scrapy.Feild()是爬虫需要爬去的字段
        pipelines.py       管道：上传图片，保存到数据库方法等
        settings.py        和管道配合使用的配置文件，写好的管道在配置中调用
        spiders/            具体的爬虫文件，可以有多少爬虫
            __init__.py
            ...

scrapy使用步骤：
scrapy startproject pachong
cd pachong
tree .
cd spiders
vi pachong.py ⇒ 爬虫文件
cd ../
vi items.py ⇒ 爬虫文件需要爬取到item的字段
vi pipeline.py ⇒ 图片上传，数据入库等方法
vi settings.py ⇒ 数据库连接字段，管道使用，图片上传路径等配置项
scrapy crawl pachong ⇒ 执行爬虫
scrapy crawl –logfile=log.txt pachong ⇒ 执行爬虫，并将输出写入文件

上一篇： python+tkinter从入门到完成绘图软件

下一篇：淘宝芭芭农场如何更换果树淘宝芭芭农场更换果树教程

python抓取数据之Scrapy框架的使用

Python的Flask框架中使用Flask-SQLAlchemy管理数据库的教程

Python的Flask框架中使用Flask-Migrate扩展迁移数据库的教程

Python爬虫框架Scrapy实战之批量抓取招聘信息

Python的Django框架中使用SQLAlchemy操作数据库的教程

零基础写python爬虫之使用Scrapy框架编写爬虫

Python的Flask框架中使用Flask-Migrate扩展迁移数据库的教程

Python的Flask框架中使用Flask-SQLAlchemy管理数据库的教程

python 之数据库（多表查询之连接查询、子查询、pymysql模块的使用）

在Python3中使用asyncio库进行快速数据抓取的教程

实践Python的爬虫框架Scrapy来抓取豆瓣电影TOP250

python抓取数据之Scrapy框架的使用

Python的Flask框架中使用Flask-SQLAlchemy管理数据库的教程

Python的Flask框架中使用Flask-Migrate扩展迁移数据库的教程

Python爬虫框架Scrapy实战之批量抓取招聘信息

Python的Django框架中使用SQLAlchemy操作数据库的教程

零基础写python爬虫之使用Scrapy框架编写爬虫

Python的Flask框架中使用Flask-Migrate扩展迁移数据库的教程

Python的Flask框架中使用Flask-SQLAlchemy管理数据库的教程

python 之 数据库（多表查询之连接查询、子查询、pymysql模块的使用）

在Python3中使用asyncio库进行快速数据抓取的教程

实践Python的爬虫框架Scrapy来抓取豆瓣电影TOP250

python 之数据库（多表查询之连接查询、子查询、pymysql模块的使用）