python函数式编程--爬取豆瓣数据
程序员文章站
2022-09-21 09:02:56
导入模块并输出类型代码import requestsimport pandas as pdimport jsonimport timeprint( ''' 1-纪录片;2-传记;3-犯罪;4-历史;5-动作; 6-情色;7-歌舞;8-儿童;10-悬疑;11-剧情; 12-灾难;13-爱情;14-音乐;15-冒险;16-奇幻; 17-科幻;18-运动;19-惊悚;20-恐怖;22-战争; 23-短篇;24-喜剧;25-动画;26-同性;27-西部; 2...
导入模块并输出类型代码
import requests import pandas as pd import json import time print( '''
1-纪录片;2-传记;3-犯罪;4-历史;5-动作;
6-情色;7-歌舞;8-儿童;10-悬疑;11-剧情;
12-灾难;13-爱情;14-音乐;15-冒险;16-奇幻;
17-科幻;18-运动;19-惊悚;20-恐怖;22-战争;
23-短篇;24-喜剧;25-动画;26-同性;27-西部;
28-家庭;29-武侠;30-古装;31-黑色电影
''')
根据需求输入类型代码及多少个电影数据
leixing = input("根据类型代码输入您想下载类型的代码:") num = input("请输入你想下载前多少名的电影信息:")
获取每个电影信息
def download(leixing, num): for i in range(int(num)): url = f"https://movie.douban.com/j/chart/top_list?type={leixing}&interval_id=100%3A90&action=&start=0&limit={i}" headers ={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36' } response = requests.get(url, headers=headers) dt = json.loads(response.text) title = [i['title'] for i in dt] rank = [i['rank'] for i in dt] score = [i['score'] for i in dt] types = [i['types'] for i in dt] regions = [i['regions'] for i in dt] release_date = [i['release_date'] for i in dt] actors = [i['actors'] for i in dt] cover_url = [i['cover_url'] for i in dt] date = pd.DataFrame({'电影名称':title,'排名':rank,'评分':score,'地区':regions,'上映时间':release_date,'类型':types,'主演':actors,'电影链接':cover_url}) date.index = date.index + 1 date.to_excel('e:/豆瓣电影排行榜.xlsx') time.sleep(2) download(leixing, num)
完整代码
import requests import pandas as pd import json import time print( '''
1-纪录片;2-传记;3-犯罪;4-历史;5-动作;
6-情色;7-歌舞;8-儿童;10-悬疑;11-剧情;
12-灾难;13-爱情;14-音乐;15-冒险;16-奇幻;
17-科幻;18-运动;19-惊悚;20-恐怖;22-战争;
23-短篇;24-喜剧;25-动画;26-同性;27-西部;
28-家庭;29-武侠;30-古装;31-黑色电影
''') leixing = input("根据类型代码输入您想下载类型的代码:") num = input("请输入你想下载前多少名的电影信息:") def download(leixing, num): for i in range(int(num)): url = f"https://movie.douban.com/j/chart/top_list?type={leixing}&interval_id=100%3A90&action=&start=0&limit={i}" headers ={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36' } response = requests.get(url, headers=headers) dt = json.loads(response.text) title = [i['title'] for i in dt] rank = [i['rank'] for i in dt] score = [i['score'] for i in dt] types = [i['types'] for i in dt] regions = [i['regions'] for i in dt] release_date = [i['release_date'] for i in dt] actors = [i['actors'] for i in dt] cover_url = [i['cover_url'] for i in dt] date = pd.DataFrame({'电影名称':title,'排名':rank,'评分':score,'地区':regions,'上映时间':release_date,'类型':types,'主演':actors,'电影链接':cover_url}) date.index = date.index + 1 date.to_excel('e:/豆瓣电影排行榜.xlsx') time.sleep(2) download(leixing, num)
思路
#1.抓取信息页面为动态页面#2.真实url中含有数量(翻页)信息
https://movie.douban.com/j/chart/top_list?type={leixing}&interval_id=100%3A90&action=&start=0&limit={i}
#3.获取数据为json数据
本文地址:https://blog.csdn.net/weixin_43422435/article/details/108249430
下一篇: MAC如何烧录img文件或ios文件