爬虫 scrapy 抓取小说实例

程序员文章站 2022-05-06 18:47:45

...

以http://www.biquge.com/2_2970/ 这部小说为例，用scrapy对这部小说的章节进行抓取

#coding=utf-8
import scrapy,os
curpath = os.getcwd()
noveldir = ''
class novelSpider(scrapy.Spider):
	name = 'xiaoshuo'
	start_urls = ['http://www.biquge.com/2_2970/']	
	def __init__(self):
		self.noveldir = ''
		
	def parse(self,response):
		title = response.css('div#info h1::text').extract_first()    #小说名
		self.noveldir = os.path.join(curpath,title)
		self.log(self.noveldir )
		if not os.path.exists(self.noveldir ):
			os.makedirs(self.noveldir )                          #创建小说目录
		self.log('开始下载%s' % title.encode('utf8'))  
		for href in  response.css('dd a').css('a::attr(href)'):      #小说章节链接
			yield response.follow(href,self.parse_page)          
	def parse_page(self,response):
		filename = response.css('div.bookname h1::text').extract_first().strip() #章节名，同时作为文件名
		self.log('开始下载 %s' % filename.encode('utf8'))
		with open(os.path.join(self.noveldir ,filename),'w+') as f:
			for item in response.css('div#content::text').extract():         #小说章节内容写到文件
				f.write(item.encode("utf8")+"\n")

保存后执行scrapy crawl xiaoshuo就可以看到在目录下的全部小说章节了

上一篇：高频面试题总结第一天

下一篇：一天一个面试题之——反射

爬虫 scrapy 抓取小说实例

Nodejs实现爬虫抓取数据实例解析

实践Python的爬虫框架Scrapy来抓取豆瓣电影TOP250

SpringBoot爬虫小说阅读网站，定时更新小说和抓取功能

编写Python爬虫抓取暴走漫画上gif图片的实例分享

Python爬虫入门教程 31-100 36氪(36kr)数据抓取 scrapy

python中用Scrapy实现定时爬虫的实例讲解

Python爬虫入门教程 32-100 B站博人传评论数据抓取 scrapy

scrapy爬虫完整实例

使用Python3爬虫抓取网页来下载小说

Python爬虫实例实现抓取JD 搜索页数据