python爬取豆瓣电影

程序员文章站 2022-05-02 17:05:15

...

使用lxml

import requests
from lxml import etree
'''
想要学习Python？Python学习交流群：973783996满足你的需求，资料都已经上传群文件，可以自行下载！
'''
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36',
    'Referer':'https://movie.douban.com/'
}
url = 'https://movie.douban.com/cinema/nowplaying/beijing/'
response = requests.get(url,headers=headers)
text = response.text

html = etree.HTML(text)
#获取正在上映的电影
ul = html.xpath("//ul[@class='lists']")[0]
lis = ul.xpath("./li")
movies = []

for li in lis:
    title = li.xpath("@data-title")[0]
    score = li.xpath("@data-score")[0]
    duration = li.xpath("@data-duration")[0]
    region = li.xpath("@data-region")[0]
    director = li.xpath("@data-director")[0]
    actors = li.xpath("@data-actors")[0]
    #电影海报图片
    thumbnail = li.xpath(".//img/@src")[0]

    movie = {
        'title':title,
        'score':score,
        'duration':duration,
        'region':region,
        'director':director,
        'actors':actors,
        'thumbnail':thumbnail,
    }
    movies.append(movie)

print(movies)

上一篇：从负无穷学习机器学习（六）神经网络

下一篇：神奇的css属性pointer-events_html/css_WEB-ITnose

python爬取豆瓣电影

python爬取盘搜的有效链接

Python百行不到爬取当当网的图片以及标题导入数据库

python3 爬取图片的实例代码

使用python的request库爬取某小说书网站

python3爬虫-通过selenium登陆拉钩，爬取职位信息

python 爬取淘宝第一弹（淘宝登录）

【python】淘宝利用cookies登录，爬取商品信息

Python爬虫入门教程 13-100 斗图啦表情包多线程爬取

Python爬虫实战之爬取某宝男装信息

python爬取新浪微博评论