python爬取天气数据

程序员文章站 2022-07-14 17:00:25

...

说明:寒假任务是做一个带UI界面的天气预报软件，先上最终结果图。
python爬取天气数据
其中用到的知识有：python网络爬虫、python的xlwt和xlwd库的使用，PyQt5的使用。
这里分享一下完成过程：

制作UI界面前先获取城市天气数据

一.爬取天气数据（有网）

第一步:找到合适的url链接
第二步:用python的urllib2库爬取对应城市的天气数据。
第三步:打印天气数据
有了思路，开始打代码：

import urllib.request
import gzip
import json
def get_weather_data() :
    city_name = input('请输入要查询的城市名称：')
    url1 = 'http://wthrcdn.etouch.cn/weather_mini?city='+urllib.parse.quote(city_name)  #带后缀的url链接
    weather_data = urllib.request.urlopen(url1).read()  #获取数据
    weather_data = gzip.decompress(weather_data).decode('utf-8') #调整编码形式
    weather_dict = json.loads(weather_data)
    return weather_dict

def show_weather(weather_data):
    weather_dict = weather_data 
    if weather_dict.get('desc') == 'invilad-citykey':
        print('你输入的城市名有误，或者天气中心未收录你所在城市')
    elif weather_dict.get('desc') =='OK':
        forecast = weather_dict.get('data').get('forecast')
        print('城市：',weather_dict.get('data').get('city'))
        print('温度：',weather_dict.get('data').get('wendu')+'℃ ')
        print('感冒：',weather_dict.get('data').get('ganmao'))
        print('风向：',forecast[0].get('fengxiang'))
        print('高温：',forecast[0].get('high'))
        print('低温：',forecast[0].get('low'))
        print('天气：',forecast[0].get('type'))
        print('日期：',forecast[0].get('date'))   
show_weather(get_weather_data())

结果:
请输入要查询的城市名称：深圳
城市：深圳
温度： 22℃
感冒：各项气象条件适宜，无明显降温过程，发生感冒机率较低。
风向：无持续风向
高温：高温 23℃
低温：低温 19℃
天气：多云
日期： 3日星期二
这里需要引用urllib.request、gzip、json库，不知道如何安装的话请查看我的另一篇博客：
https://editor.csdn.net/md/?articleId=104077502

二.爬取天气图片（有网）

第一步:找到合适的url链接
第二步:用python的request库爬取对应城市的当天天气图片。
第三步:保存天气图片至本地
思路和爬取文字天气数据差不多，但用到的是request库。
先上代码：

# coding=gbk
import requests
from bs4 import BeautifulSoup
import os
from xpinyin import Pinyin  #引入拼音库			
import time as t
# 获取网址
def getUrl(url):
    try:
        headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}  #天气网的User-Agent
        read = requests.get(url,headers = headers)  #获取url
        read.raise_for_status()   #状态响应 返回200连接成功
        read.encoding = read.apparent_encoding  #从内容中分析出响应内容编码方式
        return read.text    #Http响应内容的字符串，即url对应的页面内容
    except:
        return "连接失败！"
 
# 获取图片地址并保存下载
def getPic(html,pinyin):
    today_time = t.localtime(t.time())
    year = today_time.tm_year
    month = today_time.tm_mon
    day = today_time.tm_mday
    soup = BeautifulSoup(html, "html.parser")
    #通过分析网页内容，查找img的统一父类及属性
    all_img = soup.find('a',href = './2020-' + str(month) +'-' +str(day)  + '.html').find('div', class_='mt').find_all('img') #img为图片的标识
    for img in all_img[:1]:
        src = img['src']  #获取img标签里的src内容
        img_url = src
        print(img_url)
        root = "F:/Pic/" + str(month) + '-' + str(day) + "/" #保存的路径
        path = root + pinyin + ".png"  #保存	img的文件名
        print(path)
        try:
            if not os.path.exists(root):  #判断是否存在文件并下载img
                os.mkdir(root)
            if not os.path.exists(path):
                read = requests.get(img_url)
                with open(path, "wb")as f:
                    f.write(read.content)
                    f.close()
                    print("文件保存成功！")
            else:
                    print("文件已存在！")
        except:
            print("文件爬取失败！")
 
# 主函数
if __name__  == "__main__":
    city_name = input("请输入城市名称")
    p = Pinyin()
    city_pinyin =  p.get_pinyin(city_name,"") #将城市名称转为pinyin形    	式,作为后缀
    html_url=getUrl("https://tianqi.911cha.com/" + city_pinyin )
    getPic(html_url,city_pinyin)

进入城市天气页面时，检查今天天气图片 python爬取天气数据
发现img标签的父类是’div’,‘class = mt’,所以先find父类，再找出子类img。
因为’div’,‘class=mt’标签的父类是’a’,'href=./年-月-日’这里用到了python的time库，获取当天时间的年月日，精确获取当天天气图片.

all_img = soup.find('a',href = './2020-' + str(month) +'-' +str(day)  + '.html').find('div', class_='mt').find_all('img') #img为图片的标识

如果用request方式打开一个URL,服务器端只会收到一个单纯的对于该页面访问的请求,但是服务器并不知道发送这个请求使用的浏览器,操作系统,硬件平台等信息,而缺失这些信息的请求往往都是非正常的访问,例如爬虫。
有些网站为了防止这种非正常的访问,会验证请求信息中的UserAgent(它的信息包括硬件平台、系统软件、应用软件和用户个人偏好),如果UserAgent存在异常或者是不存在,那么这次请求将会被拒绝.
所以可以尝试在请求中加入UserAgent的信息

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}  #天气网的User-Agent

xpinyin库可将城市名称转为拼音形式

 city_name = input("请输入城市名称")
    p = Pinyin()
    city_pinyin =  p.get_pinyin(city_name,"") #将城市名称转为pinyin形    	式,作为后缀

结果：
请输入城市名称:深圳
https://ii.911cha.com/tianqi/mico/9.png
F:/Pic/3-3/9.png
文件保存成功！

有网条件下的天气图片数据和文字数据都已经爬取成功
断网时的天气图片数据和文字数据爬取请看博客：
https://blog.csdn.net/Henry41132220011/article/details/104632615

上一篇：天气网天气数据爬取

下一篇： JAVA进程空间

python爬取天气数据

制作UI界面前先获取城市天气数据

一.爬取天气数据（有网）

二.爬取天气图片（有网）

python爬取企查查企业信息之selenium自动模拟登录企查查

Python爬取微信读书实现读书免费*

Scrapy框架爬取Boss直聘网Python职位信息的源码

python爬虫学习之爬取169图片网站

python爬取全国主要城市经纬度

Python3爬虫爬取英雄联盟高清桌面壁纸功能示例【基于Scrapy框架】

Python爬虫之自动爬取某车之家各车销售数据

ASP.NET网络爬虫小研究 HtmlAgilityPack基础，爬取数据保存在数据库中再显示再自己的网页中

月薪30k的资深程序员用Python爬取了知乎百万用户！并数据分析！

Python爬虫爬取有道实现翻译功能