欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Python爬取2020新冠肺炎疫情数据及Tableau可视化分析

程序员文章站 2024-03-20 21:49:04
...

当前新冠病毒肆虐中国,全国上下统一部署全力防控疫情扩散。我们可以从多个渠道获取疫情发展的最新数据,网上也有不少程序爬取相关数据,并做可视化的案例。今天我也来小试一下。
目标:
1、爬取腾讯网新冠肺炎疫情数据;
2、Tableau可视化分析。
话不多说,直接上代码及效果图。

import requests
import json
import time
import csv
url='https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5&callback=&_=%d'%int(time.time())
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36','Referer':'https://news.qq.com/zt2020/page/feiyan.htm'}
r=requests.get(url,headers=headers)
j=json.loads(r.json()['data'])

#获取中国疫情主要数据
def getdata_chinadaily():
    d={} #空白字典用于存放数据,数据量不大,所以没有使用数据库
    #中国每日数据写入字典
    for i in j['chinaDayList']:
        # print(i) #打印观察数据
        d[i['date']]={} #需要先定义空白的子字典
        d[i['date']]['acc_confirm'] = i['confirm']
        d[i['date']]['acc_dead']=i['dead']
        d[i['date']]['acc_heal'] = i['heal']
        d[i['date']]['now_confirm']=i['nowConfirm']
        d[i['date']]['dead_rate']=i['deadRate']

    # 中国每日新增人数写入字典
    for i in j['dailyNewAddHistory']:
           d[i['date']]['dailyadd_confirm']=i['country']
    #写入csv文件
    with open('d:/cov_china_report2020.csv','w',newline='') as f:
        writer=csv.writer(f)
        column=['date','acc_confirm','acc_dead','acc_heal','now_confirm','dead_rate','dailyadd_confirm']
        writer.writerow(column)
        for i in d:
            try:
                row=['2020-' + i.replace('.','-'),d[i]['acc_confirm'], d[i]['acc_dead'],d[i]['acc_heal'],d[i]['now_confirm'],d[i]['dead_rate'],d[i]['dailyadd_confirm']]
                writer.writerow(row)
            except KeyError: #前面几天的新增数据是没有的
                row = ['2020-' + i.replace('.','-'), d[i]['acc_confirm'], d[i]['acc_dead'], d[i]['acc_heal'], d[i]['now_confirm'],d[i]['dead_rate'], 'NA']
                writer.writerow(row)

#获取全球感染人数
def getdata_worlddistribution():
    with open('d:/coronavirus2020worlddistribution.csv','w',newline='') as f:
        writer=csv.writer(f)
        column=['country','confirm','suspect','dead','deadrate']
        writer.writerow(column)
        for i in j['areaTree']:
            # print(i['name'],i['total']['confirm'],i['total']['suspect'], i['total']['dead'],i['total']['deadRate'])
            row=[i['name'],i['total']['confirm'],i['total']['suspect'], i['total']['dead'],i['total']['deadRate']]
            writer.writerow(row)

#获取中国各省感染人数
def getdata_chinadistrubution():
    with open('d:/coronavirus2020chinadistribution.csv','w',newline='') as f:
        writer=csv.writer(f)
        column=['province','confirm','suspect','dead','deadrate']
        writer.writerow(column)
        for i in j['areaTree'][0]['children']:
            # print(i['name'],i['total']['confirm'],i['total']['suspect'], i['total']['dead'],i['total']['deadRate'])
            row=[i['name'],i['total']['confirm'],i['total']['suspect'], i['total']['dead'],i['total']['deadRate']]
            writer.writerow(row)

if __name__=='__main__':
    getdata_chinadaily()
    getdata_chinadistrubution()
    getdata_worlddistribution()

技术关键点:
1、通过开发者工具找到进行请求的url地址。
2、请求返回的数据为json格式。这里的json数据有多重字典和列表的嵌套,需要小心处理。

可视化工具我选择了Tableau,虽然用python的matplotlib等包也可以实现,并且有更多的自定义功能。不过从效率来看,商业化Tableau可谓非常智能,非常高效,不亏是BI界的No.1了。以下为Tableau做的仪表盘效果图。只要简单的拖放操作,即可实现。

Python爬取2020新冠肺炎疫情数据及Tableau可视化分析