欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

百度迁徙数据爬取 并生成excel数据

程序员文章站 2022-03-14 21:21:28
百度迁徙爬虫一、原由二、部分代码三、效果展示四、可执行.exe 下载链接一、原由学校表白墙有偿爬取百度迁徙数据,就拿下了。根据情况生成三个excel文件爬取每天的数据信息二、部分代码def JsonTextConvert(text): text = text.encode('utf-8').decode('unicode_escape') head, sep, tail = text.partition('(') tail=tail.replace(")","")...

一、原由

学校表白墙有偿爬取百度迁徙数据,就拿下了。
根据情况生成三个excel文件爬取每天的数据信息

二、部分代码


def JsonTextConvert(text):
    text = text.encode('utf-8').decode('unicode_escape')
    head, sep, tail = text.partition('(')
    tail=tail.replace(")","")
    return tail

def UrlFormate(rankMethod, dt, name, migrationType, date):
    list_date = list(date)
    list_date.insert(4, '-')
    list_date.insert(7, '-')
    formatDate = ''.join(list_date)
    formatDate = formatDate + " 00:00:00"
    timeArray = time.strptime(formatDate, "%Y-%m-%d %H:%M:%S")
    timeUnix = time.mktime(timeArray)
    ID = code[name]
    if migrationType == 'in' or migrationType == 'out' or rankMethod == 'historycurve':
        url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&type=move_{3}&date={4}&callback=jsonp_{5}000_0000000'.format(rankMethod, dt, ID, migrationType, date, int(timeUnix))
    elif rankMethod == 'internalflowhistory':
        url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&date={3}&callback=jsonp_{4}000_0000000'.format(rankMethod, dt, ID, date, int(timeUnix))
    return url

def GetData(cityName, moveType, date, rankMethod):
    # historycurve 'cityrank'
    response = requests.get(UrlFormate(rankMethod, 'city', cityName, moveType, date), timeout=10)
    text = response.text
    rawData = json.loads(JsonTextConvert(text))
    if rawData['errno'] == 501:
        return 501
    data = rawData['data']
    list = data['list']
    return list

def write_Excel(data, data_time, move_type):
    name = 'd:/qianxi/'+date_constant+'/'+data_time+"_"+move_type+".xlsx"
    app = xw.App(visible=True, add_book=False)
    wb = app.books.add()
    sht = wb.sheets['sheet1']
    sht.range('A1').options(expand='table').value = data

    print(sht.range('A1').value)

    wb.save(name)

    # 退出工作簿
    wb.close()

    # 推出excel
    app.quit()
    return

def function_cityrank(rankMethod, type, date_time):
    result = []
    type_name = ['']
    for i in code:
        type_name.append(i)
    result.append(type_name)

    for a in code:
        list_data = {}
        list_name = []
        list_name.append(a)
        # historycurve 'cityrank'
        tags = GetData(a, type, date_time, rankMethod)
        if tags == 501:
            return 501
        for tag in tags:
            list_data[tag['city_name']] = tag['value']
        for i in code:
            if i in list_data:
                list_name.append(list_data[i])
            else:
                list_name.append(0)
        result.append(list_name)
    print(result)
    write_Excel(result, date_time, type)
    return

def function_historycurve(date_time):
    result = []
    type_name = ['city_name', 'move_in', 'move_out', 'internal']
    result.append(type_name)
# http://huiyan.baidu.com/migration/internalflowhistory.jsonp?dt=city&id=440100&date=20201114&callback=jsonp_1605340876623_8581344
# cityName, moveType, date, rankMethod
    list_data = {}
    for a in code:
        list_name = []
        list_name.append(a)
        # internalflowhistory
        tags_in = GetData(a, 'in', date_time, 'historycurve')
        tags_out = GetData(a, 'out', date_time, 'historycurve')
        tips = GetData(a, type, date_time, 'internalflowhistory')
        if date_time in tags_in:
            list_name.append(tags_in[date_time])
        else:
            list_name.append(0)
        if date_time in tags_in:
            list_name.append(tags_out[date_time])
        else:
            list_name.append(0)
        if date_time in tags_in:
            list_name.append(tips[date_time])
        else:
            list_name.append(0)
        result.append(list_name)
    print(result)
    write_Excel(result, date_time, "规模")
    return

def create_document(date_time):

    tag = function_cityrank('cityrank', 'in', date_time)
    if tag == 501:
        print('查无该日期信息')
        return 501
    print('in_excel 已经生成')
    function_cityrank('cityrank', 'out', date_time)
    print('out_excel 已经生成')
    function_historycurve(date_time)
    print('guimo_excel 已生成')
    return


三、效果展示

百度迁徙数据爬取 并生成excel数据
在D:目录下,生成qianxi文件夹,进而生成日期文件夹及三个需求excel

百度迁徙数据爬取 并生成excel数据
xxxx_in.xlsx

百度迁徙数据爬取 并生成excel数据
规模.xlsx

四、可执行.exe 下载链接

本文地址:https://blog.csdn.net/qq_43183340/article/details/110220157

相关标签: 爬虫 手记