scrapy-items
程序员文章站
2022-03-02 21:08:49
...
数据保存
# -*- coding: utf-8 -*-
import scrapy
from mySpider.mySpider.items import MyspiderItem
class BooksSpider(scrapy.Spider):
name = 'books'
allowed_domains = ['www.books.toscrape.com']
start_urls = ['http://books.toscrape.com/']
def parse(self, response):
for sel in response.css('article.product_pod'):
book = MyspiderItem()
book['name'] = sel.xpath('./h3/a/@title').extract_first()
book['price'] = sel.css('p.price_color::text').extract_first()
yield book
数据处理
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
class MyspiderPipeline(object):
# 英镑兑换⼈⺠币汇率
exchange_rate = 8.5309
def process_item(self, item, spider):
# 提取item的price 字段(如£53.74)
# 去掉前⾯英镑符号£,转换为float 类型,乘以汇率
price = float(item['price'][1:]) * self.exchange_rate
# 保留2 位⼩数,赋值回item的price 字段
item['price'] = '¥%.2f' % price
return item
```
除了必须实现的process_item⽅法外,还有3个比较常⽤的⽅法,可根
据需求选择实现:
● open_spider(self, spider)
Spider打开时(处理数据前)回调该⽅法,通常该⽅法⽤于在开
始处理数据之前完成某些初始化⼯作,如连接数据库。
● close_spider(self, spider)
Spider关闭时(处理数据后)回调该⽅法,通常该⽅法⽤于在处
理完所有数据之后完成某些清理⼯作,如关闭数据库。
● from_crawler(cls, crawler)
创建Item Pipeline对象时回调该类⽅法。通常,在该⽅法中通过
crawler.settings读取配置,根据配置创建Item Pipeline对象。
在后⾯的例⼦中,我们展⽰了以上⽅法的应⽤场景。
上一篇: Scrapy-5.Items
下一篇: scrapy-Item Pipeline