Python自动爬取图片并保存实例代码

程序员文章站 2022-03-09 22:51:09

目录一、准备工作二、代码实现三、总结一、准备工作用python来实现对百度图片的爬取并保存，以情绪图片为例，百度搜索可得到下图所示f12打开源码在此处可以看到这次我们要爬取的图片的基本信息是在img...

一、准备工作

用python来实现对百度图片的爬取并保存，以情绪图片为例，百度搜索可得到下图所示

Python自动爬取图片并保存实例代码

f12打开源码

Python自动爬取图片并保存实例代码

在此处可以看到这次我们要爬取的图片的基本信息是在img - scr中

二、代码实现

这次的爬取主要用了如下的第三方库

import re
import time
import requests
from bs4 import beautifulsoup
import os

简单构思可以分为三个小部分

1.获取网页内容

2.解析网页

3.保存图片至相应位置

下面来看第一部分：获取网页内容

baseurl = 'https://cn.bing.com/images/search?q=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&qpvt=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&form=igre&first=1&cw=418&ch=652&tsc=imagebasichover'
head = {
        "user-agent": "mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/92.0.4515.131 safari/537.36 edg/92.0.902.67"}
    response = requests.get(baseurl, headers=head)  # 获取网页信息
    html = response.text  # 将网页信息转化为text形式

是不是so easy

第二部分解析网页才是大头

来看代码

img = re.compile(r'img.*src="(.*?)"')  # 正则表达式匹配图片
soup = beautifulsoup(html, "html.parser")  # beautifulsoup解析html
    #i = 0  # 计数器初始值
    data = []  # 存储图片超链接的列表
    for item in soup.find_all('img', src=""):  # soup.find_all对网页中的img—src进行迭代
        item = str(item)  # 转换为str类型
        picture = re.findall(img, item)  # 结合re正则表达式和beautifulsoup, 仅返回超链接
        for b in picture:
            data.append(b)
            #i = i + 1
            return data[-1]
 
    # print(i)

这里就运用到了beautifulsoup以及re正则表达式的相关知识，需要有一定的基础哦

下面就是第三部分：保存图片

    for m in getdata(
            baseurl='https://cn.bing.com/images/search?q=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&qpvt=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&form=igre&first=1&cw=418&ch=652&tsc=imagebasichover'):
        resp = requests.get(m)  #获取网页信息
        byte = resp.content  # 转化为content二进制
        print(os.getcwd()) # os库中输出当前的路径
        i = i + 1 # 递增
        # img_path = os.path.join(m)
        with open("path{}.jpg".format(i), "wb") as f: # 文件写入
            f.write(byte)
            time.sleep(0.5) # 每隔0.5秒下载一张图片放入d://情绪图片测试
        print("第{}张图片爬取成功!".format(i))

各行代码的解释已经给大家写在注释中啦，不明白的地方可以直接私信或评论哦~

下面是完整的代码

import re
import time
import requests
from bs4 import beautifulsoup
import os
 
 
 
# m = 'https://tse2-mm.cn.bing.net/th/id/oip-c.uihwmxddgfk4flcixx-3jghapc?w=115&amp;h=183&amp;c=7&amp;r=0&amp;o=5&amp;pid=1.7'
'''
resp = requests.get(m)
byte = resp.content
print(os.getcwd())
img_path = os.path.join(m)
'''
def main():
    baseurl = 'https://cn.bing.com/images/search?q=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&qpvt=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&form=igre&first=1&cw=418&ch=652&tsc=imagebasichover'
    datalist = getdata(baseurl)
 
 
def getdata(baseurl):
    img = re.compile(r'img.*src="(.*?)"')  # 正则表达式匹配图片
    datalist = []
    head = {
        "user-agent": "mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/92.0.4515.131 safari/537.36 edg/92.0.902.67"}
    response = requests.get(baseurl, headers=head)  # 获取网页信息
    html = response.text  # 将网页信息转化为text形式
    soup = beautifulsoup(html, "html.parser")  # beautifulsoup解析html
    # i = 0  # 计数器初始值
    data = []  # 存储图片超链接的列表
    for item in soup.find_all('img', src=""):  # soup.find_all对网页中的img—src进行迭代
        item = str(item)  # 转换为str类型
        picture = re.findall(img, item)  # 结合re正则表达式和beautifulsoup, 仅返回超链接
        for b in picture:  # 遍历列表，取最后一次结果
            data.append(b)
            # i = i + 1
            datalist.append(data[-1])
    return datalist  # 返回一个包含超链接的新列表
    # print(i)
 
'''
with open("img_path.jpg","wb") as f:
    f.write(byte)
'''
 
if __name__ == '__main__':
    os.chdir("d://情绪图片测试")
 
    main()
    i = 0  # 图片名递增
    for m in getdata(
            baseurl='https://cn.bing.com/images/search?q=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&qpvt=%e6%83%85%e7%bb%aa%e5%9b%be%e7%89%87&form=igre&first=1&cw=418&ch=652&tsc=imagebasichover'):
        resp = requests.get(m)  #获取网页信息
        byte = resp.content  # 转化为content二进制
        print(os.getcwd()) # os库中输出当前的路径
        i = i + 1 # 递增
        # img_path = os.path.join(m)
        with open("path{}.jpg".format(i), "wb") as f: # 文件写入
            f.write(byte)
            time.sleep(0.5) # 每隔0.5秒下载一张图片放入d://情绪图片测试
        print("第{}张图片爬取成功!".format(i))

最后的运行截图

Python自动爬取图片并保存实例代码

三、总结

到此这篇关于python自动爬取图片并保存实例代码的文章就介绍到这了,更多相关python爬取图片内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持！

Python自动爬取图片并保存实例代码

目录

一、准备工作

二、代码实现

三、总结

对python cv2批量灰度图片并保存的实例讲解

Python获取当前公网ip并自动断开宽带连接实例代码

Python使用Scrapy爬虫框架全站爬取图片并保存本地的实现代码

Python爬虫爬取一个网页上的图片地址实例代码

python抓取豆瓣图片并自动保存示例学习

荐 Python爬虫：基于Scrapy爬取京东商品数据并保存到mysql且下载商品图片

Python3爬取英雄联盟英雄皮肤大图实例代码

PHP实现爬虫爬取图片代码实例

Python 爬取携程所有机票的实例代码

Python爬取数据并实现可视化代码解析