爬虫多线程高效高速爬取图片

程序员文章站 2022-10-08 20:16:03

6.23 自我总结爬虫多线程高效高速爬取图片基于之前的爬取代码我们进行函数的封装并且加入多线程之前的代码导入的模块 `可以更加快速的爬取多个内容` ......

6.23 自我总结

爬虫多线程高效高速爬取图片

基于之前的爬取代码我们进行函数的封装并且加入多线程

之前的代码

from concurrent import futures导入的模块

ex = futures.threadpoolexecutor(max_workers =22) #设置线程个数

ex.submit(方法,方法需要传入的参数)

import os
import requests
from lxml.html import etree
from concurrent import futures  #多线程

url = 'http://www.doutula.com/'
headers = {'user-agent': 'mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/74.0.3729.131 safari/537.36',}
def img_url_lis(url):
    response = requests.get(url,headers = headers)
    response.encoding = 'utf8'
    response_html = etree.html(response.text)
    img_url_lis = response_html.xpath('.//img/@data-original')
    return img_url_lis


#创建图片文件夹
img_file_path = os.path.join(os.path.dirname(__file__),'img')
if not os.path.exists(img_file_path):  # 没有文件夹名创建文件夹
    os.mkdir(img_file_path)
print(img_file_path)

def dump_one_img(url):
    name = str(url).split('/')[-1]
    response = requests.get(url, headers=headers)
    img_path = os.path.join(img_file_path, name)
    with open(img_path, 'wb') as fw:
        fw.write(response.content)


def dump_imgs(urls:list):
    for url in urls:
        ex = futures.threadpoolexecutor(max_workers =22)  #多线程
        ex.submit(dump_one_img,url)   #方法,对象
        # dump_one_img(url)


def run():
    count = 1
    while true:
        if count == 10:
            count += 1
            continue
        lis = img_url_lis(f'http://www.doutula.com/article/list/?page={count}')
        if len(lis) == 0:
            print(count)
            break
        dump_imgs(lis)
        print(f'第{count}页也就完成')
        count +=1

if __name__ == '__main__':
    run()

可以更加快速的爬取多个内容

上一篇： Pandas处理超大规模数据

下一篇：想过来打个招呼

爬虫多线程高效高速爬取图片

6.23 自我总结

爬虫多线程高效高速爬取图片

python爬虫爬取豆瓣top排行图片

利用python爬虫爬取斗鱼图片(简单详细)

python面向对象多线程爬虫爬取搜狐页面的实例代码

python爬虫系列Selenium定向爬取虎扑篮球图片详解

python爬虫项目实战：爬取500px图片

Python实现爬取百度贴吧帖子所有楼层图片的爬虫示例

Python使用Scrapy爬虫框架全站爬取图片并保存本地的实现代码

Python爬虫将爬取的图片写入world文档的方法

爬虫 Scrapy框架爬取图虫图片并下载

最适合新手练手的爬虫案例——爬取新浪微博用户图片！

爬虫多线程高效高速爬取图片

6.23 自我总结

爬虫多线程高效高速爬取图片

python爬虫爬取豆瓣top排行图片

利用python爬虫爬取斗鱼图片(简单详细)

python面向对象多线程爬虫爬取搜狐页面的实例代码

python爬虫系列Selenium定向爬取虎扑篮球图片详解

python爬虫项目实战：爬取500px图片

Python实现爬取百度贴吧帖子所有楼层图片的爬虫示例

Python使用Scrapy爬虫框架全站爬取图片并保存本地的实现代码

Python爬虫将爬取的图片写入world文档的方法

爬虫 Scrapy框架 爬取图虫图片并下载

最适合新手练手的爬虫案例——爬取新浪微博用户图片！

爬虫 Scrapy框架爬取图虫图片并下载