初玩scrapy:爬取淘票票(3)之保存图片
爬取图片地址,并保存到本地
1. 使用ImagesPipeline
(1) 在settings.py文件中的ITEM_PIPELINES中添加一条 'scrapy.pipelines.images.ImagesPipeline':1
(2) 在Item中添加两个字段
img_urls = scrapy_Field()
images = scrapy_Field()
(3) 在settings.py文件中添加保存路径IMAGES_STORE、图片url所在item字段IMAGES_URLS_FIELD
和文件结果所在item字段IMAGES_RESULT_FIELD
IMAGES_STORE = 'F:\\py_pic'
IMAGES_URLS_FIELD = 'img_urls'
IMAGES_RESULT_FIELD = 'images'
可以在settings.py中使用IMAGES_THUMBS制作缩略图,并设置缩略图的大小。
使用IMAGES_EXPIRES设置文件过期时间
IMAGES_THUMBS = {
'small' : (50,50),
'big' : (270,270),
}
IMAGES_EXPIRES = 30 #30天过期
2. 结果
命令:scrpay crawl taopiaopiao
保存图片结果:
3. 代码:
items.py
import scrapy
class TaopiaopiaoItem(scrapy.Item):
url = scrapy.Field()
name = scrapy.Field()
actor = scrapy.Field()
country = scrapy.Field()
img_urls = scrapy.Field() #图片url
images = scrapy.Field() #结果
taopiaopiao.py
# coding:utf-8
import scrapy
from taopiaopiao.items import TaopiaopiaoItem
class taoPiaoPiaoSpider(scrapy.Spider):
# 爬虫名称
name = "taopiaopiao"
start_urls = [
"https://www.taopiaopiao.com/showList.htm?n_s=new"
]
def parse(self, response):
# 实现网页的解析
item = TaopiaopiaoItem()
movics = response.xpath("//div[@class='movie-card-wrap']")
for movic in movics:
item["url"] = \
movic.xpath("a/@href").extract()[0]
item["name"] = \
movic.xpath("a/div[@class='movie-card-name']/span[@class='bt-l']/text()").extract()[0]
item["img_urls"] = \
movic.xpath("a/div[@class='movie-card-poster']/img/@src").extract()
item["actor"] = \
movic.xpath("a/div[@class='movie-card-info']/div[@class='movie-card-list']/span[2]/text()").extract()[0]
item["country"] = \
movic.xpath("a/div[@class='movie-card-info']/div[@class='movie-card-list']/span[4]/text()").extract()[0]
yield item
settings.py
# -*- coding: utf-8 -*-
BOT_NAME = 'taopiaopiao'
SPIDER_MODULES = ['taopiaopiao.spiders']
NEWSPIDER_MODULE = 'taopiaopiao.spiders'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {
'taopiaopiao.pipelines.TaopiaopiaoPipeline': 300,
'scrapy.pipelines.images.ImagesPipeline' : 1,
}
IMAGES_STORE = 'F:\\py_pic'
IMAGES_URLS_FIELD = 'img_urls'
IMAGES_RESULT_FIELD = 'images'
IMAGES_THUMBS = {
'small' : (50,50),
'big' : (270,270),
}
IMAGES_EXPIRES = 30
上一篇: Scss和sass的学习前的准备
下一篇: Vue2.0中使用ES6