百度迁徙数据爬取 并生成excel数据
程序员文章站
2022-03-14 21:21:28
百度迁徙爬虫一、原由二、部分代码三、效果展示四、可执行.exe 下载链接一、原由学校表白墙有偿爬取百度迁徙数据,就拿下了。根据情况生成三个excel文件爬取每天的数据信息二、部分代码def JsonTextConvert(text): text = text.encode('utf-8').decode('unicode_escape') head, sep, tail = text.partition('(') tail=tail.replace(")","")...
百度迁徙爬虫
一、原由
学校表白墙有偿爬取百度迁徙数据,就拿下了。
根据情况生成三个excel文件爬取每天的数据信息
二、部分代码
def JsonTextConvert(text):
text = text.encode('utf-8').decode('unicode_escape')
head, sep, tail = text.partition('(')
tail=tail.replace(")","")
return tail
def UrlFormate(rankMethod, dt, name, migrationType, date):
list_date = list(date)
list_date.insert(4, '-')
list_date.insert(7, '-')
formatDate = ''.join(list_date)
formatDate = formatDate + " 00:00:00"
timeArray = time.strptime(formatDate, "%Y-%m-%d %H:%M:%S")
timeUnix = time.mktime(timeArray)
ID = code[name]
if migrationType == 'in' or migrationType == 'out' or rankMethod == 'historycurve':
url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&type=move_{3}&date={4}&callback=jsonp_{5}000_0000000'.format(rankMethod, dt, ID, migrationType, date, int(timeUnix))
elif rankMethod == 'internalflowhistory':
url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&date={3}&callback=jsonp_{4}000_0000000'.format(rankMethod, dt, ID, date, int(timeUnix))
return url
def GetData(cityName, moveType, date, rankMethod):
# historycurve 'cityrank'
response = requests.get(UrlFormate(rankMethod, 'city', cityName, moveType, date), timeout=10)
text = response.text
rawData = json.loads(JsonTextConvert(text))
if rawData['errno'] == 501:
return 501
data = rawData['data']
list = data['list']
return list
def write_Excel(data, data_time, move_type):
name = 'd:/qianxi/'+date_constant+'/'+data_time+"_"+move_type+".xlsx"
app = xw.App(visible=True, add_book=False)
wb = app.books.add()
sht = wb.sheets['sheet1']
sht.range('A1').options(expand='table').value = data
print(sht.range('A1').value)
wb.save(name)
# 退出工作簿
wb.close()
# 推出excel
app.quit()
return
def function_cityrank(rankMethod, type, date_time):
result = []
type_name = ['']
for i in code:
type_name.append(i)
result.append(type_name)
for a in code:
list_data = {}
list_name = []
list_name.append(a)
# historycurve 'cityrank'
tags = GetData(a, type, date_time, rankMethod)
if tags == 501:
return 501
for tag in tags:
list_data[tag['city_name']] = tag['value']
for i in code:
if i in list_data:
list_name.append(list_data[i])
else:
list_name.append(0)
result.append(list_name)
print(result)
write_Excel(result, date_time, type)
return
def function_historycurve(date_time):
result = []
type_name = ['city_name', 'move_in', 'move_out', 'internal']
result.append(type_name)
# http://huiyan.baidu.com/migration/internalflowhistory.jsonp?dt=city&id=440100&date=20201114&callback=jsonp_1605340876623_8581344
# cityName, moveType, date, rankMethod
list_data = {}
for a in code:
list_name = []
list_name.append(a)
# internalflowhistory
tags_in = GetData(a, 'in', date_time, 'historycurve')
tags_out = GetData(a, 'out', date_time, 'historycurve')
tips = GetData(a, type, date_time, 'internalflowhistory')
if date_time in tags_in:
list_name.append(tags_in[date_time])
else:
list_name.append(0)
if date_time in tags_in:
list_name.append(tags_out[date_time])
else:
list_name.append(0)
if date_time in tags_in:
list_name.append(tips[date_time])
else:
list_name.append(0)
result.append(list_name)
print(result)
write_Excel(result, date_time, "规模")
return
def create_document(date_time):
tag = function_cityrank('cityrank', 'in', date_time)
if tag == 501:
print('查无该日期信息')
return 501
print('in_excel 已经生成')
function_cityrank('cityrank', 'out', date_time)
print('out_excel 已经生成')
function_historycurve(date_time)
print('guimo_excel 已生成')
return
三、效果展示
在D:目录下,生成qianxi文件夹,进而生成日期文件夹及三个需求excel
xxxx_in.xlsx
规模.xlsx
四、可执行.exe 下载链接
本文地址:https://blog.csdn.net/qq_43183340/article/details/110220157
上一篇: 按键精灵在单机游戏的妙用
推荐阅读
-
Python实现爬取亚马逊数据并打印出Excel文件操作示例
-
Python爬取数据并写入MySQL数据库的实例
-
Scrapy爬取豆瓣图书数据并写入MySQL
-
通过抓取淘宝评论为例讲解Python爬取ajax动态生成的数据(经典)
-
python生成每日报表数据(Excel)并邮件发送的实例
-
[python爬虫]爬取英雄联盟所有英雄数据并下载所有英雄皮肤
-
PHP实现实时生成并下载超大数据量的EXCEL文件详解
-
荐 Python爬虫:基于Scrapy爬取京东商品数据并保存到mysql且下载商品图片
-
Python爬虫之简单的爬取百度贴吧数据
-
Python实现爬取亚马逊数据并打印出Excel文件操作示例