欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

day03爬虫--豆瓣

程序员文章站 2022-05-03 08:21:53
...
import requests
from lxml import html
import pandas as pd
import json

# url=""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36"}
tv_list = []
for i in range(0, 300, 20):
    url = "https://movie.douban.com/j/search_subjects?type=tv&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start={}".format(i)

    response = requests.get(url, headers=headers)
    print(response.status_code)
    data = response.text
    print(data)
    data = json.loads(data)
    data_list = data["subjects"]
    print(type(data))
    for tv in data_list:
        title = tv['title']
        url = tv['url']
        rate = tv['rate']
        tv_list.append({
            "title": title,
            "url": url,
            "rate": rate
        })
pd.DataFrame(tv_list).to_csv("tv.csv", index="False")

相关标签: 机器学习 python