python3爬虫 —— 爬取豆瓣电影信息

程序员文章站 2022-05-02 16:49:55

...

爬取豆瓣网站的电影信息，并保存到excel中。

代码：

import re,xlwt,requests

#初始化并创建一个工作簿
book = xlwt.Workbook()
#创建一个名为sheetname的表单
sheet = book.add_sheet('movie') #重复写入数据

headings = [u'排名', u'电影名称',u'导演',u'国家',u'年份',u'评分']
k =0
for j in headings:
    sheet.write(0, k, j)
    k = k+1


url = ' https://movie.douban.com/top250'
#头部信息
headers = {
        'user_agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
    }

try:
    r= requests.get(url,timeout=30,headers=headers)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    text = r.text
    movie_info = re.findall(r'div class="pic">([\d\D]*?)<p class="quote">',text)

    count = 1
    for i in movie_info:
        rank = re.findall(r'<em class="">([\d]*)</em>',i)
        name = re.findall(r'span class="title">(\w*)</span>',i)
        director = re.findall(r'导演:([\d\D]*?)&nbsp;',i)
        year = re.findall(r'(\d{4})&nbsp;/&nbsp;',i)
        country = re.findall(r'\d{4}&nbsp;/&nbsp;([\d\D]*?)&nbsp;/&nbsp;',i)
        score = re.findall(r'<span class="rating_num" property="v:average">([\d.\d]*)',i)

        sheet.write(count,0,rank)
        sheet.write(count, 1, name)
        sheet.write(count, 2,  director)
        sheet.write(count, 3, year)
        sheet.write(count, 4, country)
        sheet.write(count, 5, score)

        count = count + 1
    book.save('电影信息.xls')

except:
    print('失败')

python3爬虫 —— 爬取豆瓣电影信息

Python爬虫实战用 BeautifulSoup 爬取电影网站信息

Python爬取豆瓣电影信息遇到的问题

python爬虫爬取豆瓣top排行图片

我的第一个爬虫，爬取北京地区短租房信息

【Python3爬虫】猫眼电影爬虫（破解字符集反爬）

Python利用Scrapy框架爬取豆瓣电影示例

python使用requests模块实现爬取电影天堂最新电影信息

用.NET Core写爬虫爬取电影天堂

Python爬虫使用selenium爬取qq群的成员信息（全自动实现自动登陆）

网易云歌单信息爬取及数据分析（python爬虫）