python线程池应用场景-爬虫

程序员文章站 2022-03-05 16:54:24

...

import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def task(url):
    print(url)
    r1 = requests.get(
        url=url,
        headers={
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36'
        }
    )

    # 查看下载下来的文本信息
    soup = BeautifulSoup(r1.text, 'html.parser')
    print(soup.text)
    content_list = soup.find('div',attrs={'id':'content-list'})
    for item in content_list.find_all('div',attrs={'class':'item'}):
        title = item.find('a').text.strip()
        target_url = item.find('a').get('href')
        print(title,target_url)


def run():
    pool = ThreadPoolExecutor(5)
    for i in range(1, 50):
        pool.submit(task, 'https://dig.chouti.com/all/hot/recent/%s' % i)


if __name__ == '__main__':
    run()

上一篇：线程加入

下一篇： DRF 框架总结 - 视图集&路由 Routers

python线程池应用场景-爬虫

Python之多线程爬虫抓取网页图片的实战代码

Python爬虫入门教程 13-100 斗图啦表情包多线程爬取

python爬虫之多线程解密

python爬虫：爬虫进阶之多线程爬虫

Python 爬虫多线程详解及实例代码

使用Python多线程爬虫爬取电影天堂资源

Python 爬虫学习笔记之多线程爬虫

Python 爬虫学习笔记之单线程爬虫

使用Python多线程爬虫爬取电影天堂资源

Python 爬虫学习笔记之多线程爬虫