百度迁徙数据爬取并生成excel数据

程序员文章站 2022-03-14 21:21:28

百度迁徙爬虫一、原由二、部分代码三、效果展示四、可执行.exe 下载链接一、原由学校表白墙有偿爬取百度迁徙数据，就拿下了。根据情况生成三个excel文件爬取每天的数据信息二、部分代码def JsonTextConvert(text): text = text.encode('utf-8').decode('unicode_escape') head, sep, tail = text.partition('(') tail=tail.replace(")","")...

百度迁徙爬虫

一、原由

学校表白墙有偿爬取百度迁徙数据，就拿下了。
根据情况生成三个excel文件爬取每天的数据信息

二、部分代码


def JsonTextConvert(text):
    text = text.encode('utf-8').decode('unicode_escape')
    head, sep, tail = text.partition('(')
    tail=tail.replace(")","")
    return tail

def UrlFormate(rankMethod, dt, name, migrationType, date):
    list_date = list(date)
    list_date.insert(4, '-')
    list_date.insert(7, '-')
    formatDate = ''.join(list_date)
    formatDate = formatDate + " 00:00:00"
    timeArray = time.strptime(formatDate, "%Y-%m-%d %H:%M:%S")
    timeUnix = time.mktime(timeArray)
    ID = code[name]
    if migrationType == 'in' or migrationType == 'out' or rankMethod == 'historycurve':
        url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&type=move_{3}&date={4}&callback=jsonp_{5}000_0000000'.format(rankMethod, dt, ID, migrationType, date, int(timeUnix))
    elif rankMethod == 'internalflowhistory':
        url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&date={3}&callback=jsonp_{4}000_0000000'.format(rankMethod, dt, ID, date, int(timeUnix))
    return url

def GetData(cityName, moveType, date, rankMethod):
    # historycurve 'cityrank'
    response = requests.get(UrlFormate(rankMethod, 'city', cityName, moveType, date), timeout=10)
    text = response.text
    rawData = json.loads(JsonTextConvert(text))
    if rawData['errno'] == 501:
        return 501
    data = rawData['data']
    list = data['list']
    return list

def write_Excel(data, data_time, move_type):
    name = 'd:/qianxi/'+date_constant+'/'+data_time+"_"+move_type+".xlsx"
    app = xw.App(visible=True, add_book=False)
    wb = app.books.add()
    sht = wb.sheets['sheet1']
    sht.range('A1').options(expand='table').value = data

    print(sht.range('A1').value)

    wb.save(name)

    # 退出工作簿
    wb.close()

    # 推出excel
    app.quit()
    return

def function_cityrank(rankMethod, type, date_time):
    result = []
    type_name = ['']
    for i in code:
        type_name.append(i)
    result.append(type_name)

    for a in code:
        list_data = {}
        list_name = []
        list_name.append(a)
        # historycurve 'cityrank'
        tags = GetData(a, type, date_time, rankMethod)
        if tags == 501:
            return 501
        for tag in tags:
            list_data[tag['city_name']] = tag['value']
        for i in code:
            if i in list_data:
                list_name.append(list_data[i])
            else:
                list_name.append(0)
        result.append(list_name)
    print(result)
    write_Excel(result, date_time, type)
    return

def function_historycurve(date_time):
    result = []
    type_name = ['city_name', 'move_in', 'move_out', 'internal']
    result.append(type_name)
# http://huiyan.baidu.com/migration/internalflowhistory.jsonp?dt=city&id=440100&date=20201114&callback=jsonp_1605340876623_8581344
# cityName, moveType, date, rankMethod
    list_data = {}
    for a in code:
        list_name = []
        list_name.append(a)
        # internalflowhistory
        tags_in = GetData(a, 'in', date_time, 'historycurve')
        tags_out = GetData(a, 'out', date_time, 'historycurve')
        tips = GetData(a, type, date_time, 'internalflowhistory')
        if date_time in tags_in:
            list_name.append(tags_in[date_time])
        else:
            list_name.append(0)
        if date_time in tags_in:
            list_name.append(tags_out[date_time])
        else:
            list_name.append(0)
        if date_time in tags_in:
            list_name.append(tips[date_time])
        else:
            list_name.append(0)
        result.append(list_name)
    print(result)
    write_Excel(result, date_time, "规模")
    return

def create_document(date_time):

    tag = function_cityrank('cityrank', 'in', date_time)
    if tag == 501:
        print('查无该日期信息')
        return 501
    print('in_excel 已经生成')
    function_cityrank('cityrank', 'out', date_time)
    print('out_excel 已经生成')
    function_historycurve(date_time)
    print('guimo_excel 已生成')
    return

三、效果展示

百度迁徙数据爬取并生成excel数据
在D:目录下，生成qianxi文件夹，进而生成日期文件夹及三个需求excel

百度迁徙数据爬取并生成excel数据
xxxx_in.xlsx

百度迁徙数据爬取并生成excel数据
规模.xlsx

四、可执行.exe 下载链接

本文地址：https://blog.csdn.net/qq_43183340/article/details/110220157

百度迁徙数据爬取并生成excel数据

百度迁徙爬虫

一、原由

二、部分代码

三、效果展示

四、可执行.exe 下载链接

Python实现爬取亚马逊数据并打印出Excel文件操作示例

Python爬取数据并写入MySQL数据库的实例

Scrapy爬取豆瓣图书数据并写入MySQL

通过抓取淘宝评论为例讲解Python爬取ajax动态生成的数据(经典)

python生成每日报表数据(Excel)并邮件发送的实例

[python爬虫]爬取英雄联盟所有英雄数据并下载所有英雄皮肤

PHP实现实时生成并下载超大数据量的EXCEL文件详解

荐 Python爬虫：基于Scrapy爬取京东商品数据并保存到mysql且下载商品图片

Python爬虫之简单的爬取百度贴吧数据

Python实现爬取亚马逊数据并打印出Excel文件操作示例

百度迁徙数据爬取 并生成excel数据

百度迁徙爬虫

一、原由

二、部分代码

三、效果展示

四、可执行.exe 下载链接

Python实现爬取亚马逊数据并打印出Excel文件操作示例

Python爬取数据并写入MySQL数据库的实例

Scrapy爬取豆瓣图书数据并写入MySQL

通过抓取淘宝评论为例讲解Python爬取ajax动态生成的数据(经典)

python生成每日报表数据(Excel)并邮件发送的实例

[python爬虫]爬取英雄联盟所有英雄数据并下载所有英雄皮肤

PHP实现实时生成并下载超大数据量的EXCEL文件详解

荐 Python爬虫：基于Scrapy爬取京东商品数据并保存到mysql且下载商品图片

Python爬虫之简单的爬取百度贴吧数据

Python实现爬取亚马逊数据并打印出Excel文件操作示例

百度迁徙数据爬取并生成excel数据