python爬虫之Scrapy

程序员文章站 2022-05-06 18:47:15

...

爬取数据

import scrapy

class JulyeduSpider(scrapy.Spider):
    name = "julyedu"
    start_urls = [
        'https://www.julyedu.com/category/index',
    ]

    def parse(self, response):
        for julyedu_class in response.xpath('//div[@class="course_info_box"]'):
            print(julyedu_class.xpath('a/h4/text()').extract_first())
            print(julyedu_class.xpath('a/p[@class="course-info-tip"][1]/text()').extract_first())
            print(julyedu_class.xpath('a/p[@class="course-info-tip"][2]/text()').extract_first())
            print(response.urljoin(julyedu_class.xpath('a/img[1]/@src').extract_first()))
            print("\n")

            yield {
                'title':julyedu_class.xpath('a/h4/text()').extract_first(),
                'desc': julyedu_class.xpath('a/p[@class="course-info-tip"][1]/text()').extract_first(),
                'time': julyedu_class.xpath('a/p[@class="course-info-tip"][2]/text()').extract_first(),
                'img_url': response.urljoin(julyedu_class.xpath('a/img[1]/@src').extract_first())
            }

python爬虫之Scrapy

python爬虫之Scrapy

爬取数据

神箭手云爬虫-爬取携程【国际】航班/机票信息-利用python解析返回的json文件将信息存储进Mysql数据库

Python之while循环

python学习之数据类型与运算符号

Python高级爬虫开发，高难度JS解密教程，绝地求生模拟登陆！

零基础写Java知乎爬虫之获取知乎编辑推荐内容(3)

python之Django自动化资产扫描的实现

python设计模式之抽象工厂模式详解

Python笔记之random模块！

scrapy在python爬虫中搭建出错的解决方法

教你如何利用python3爬虫爬取漫画岛-非人哉漫画