BeautifulSoup技术爬取豆瓣TOP250

程序员文章站 2022-07-02 22:35:52

爬取豆瓣top2501.环境:VS20192.首先安装re， BeautifulSoup，codecs，requests库，可以用 pip安装功能:将top250的电影名，评分，评价人数，链接，影评获取下来并生成.html文件效果如图：先展示爬虫函数：def pachong(url): cc = requests.get(url,headers=headers) cc=cc.text cc = BeautifulSoup(cc,"html.parser")...

爬取豆瓣top250

1.环境:VS2019
2.首先安装re， BeautifulSoup，codecs，requests库，可以用 pip安装
功能:将top250的电影名，评分，评价人数，链接，影评获取下来并生成.html文件 效果如图：

BeautifulSoup技术爬取豆瓣TOP250
先展示爬虫函数：

def pachong(url):
   cc = requests.get(url,headers=headers)
   cc=cc.text
   cc = BeautifulSoup(cc,"html.parser")
   print(u'豆瓣top250 \n')
   for tag in cc.find_all(attrs={"class":"item"}):
       shu = tag.find('em').get_text() #序号
       print (shu)
       outf.write(u"<tr><th>"+ shu)

       name = tag.find_all(attrs={"class":"title"}) #中文名称
       zname = name[0].get_text()
       print (u'[名称]',zname)
       outf.write(u"</th><th>"+ zname)

       urlm = tag.find(attrs={"class":"hd"}).a  #链接
       urls = urlm.attrs['href']
       print (u'[链接]',urls)
       outf.write(u"</th><th>"+urls)

       ping = tag.find(attrs={"class":"star"}).get_text() #评分评论
       ping = ping.replace('\n',' ')
       ping = ping.lstrip()
       mode = re.compile(r'\d+\.?\d*')
       mm = mode.findall(ping)
       k=0
       for n in mm:
           if k==0:
               print (u"[分数]"+n)
               outf.write(u"</th><th>" + n)
           elif k==1:
               print (u"[评论人数]"+n)
               outf.write(u"</th><th>" +n)
           k=k+1
       
       yu = tag.find(attrs={"class":"inq"}) #评语
       if(yu):
           content = yu.get_text()
           print (u'[评语]',content)
           outf.write(u"</th><th>")
           outf.write(content)
           outf.write(u"</th></tr>"+"\n")

注意：1.爬取豆瓣网时需要有请求头
2.此处用的beautifsoup技术进行爬取
3.写入的文件的是.html所以有<th></th><tr>等写入文件，这是.html语言

因为一个网页只有部分电影所以需要翻页爬取，顾在主函数时要多次调用pachong函数
完整代码见下一篇博客。

本文地址：https://blog.csdn.net/m0_46968194/article/details/110789905

上一篇： Ajax 文件上传进度监听之upload.onprogress案例详解

下一篇：爆囧,夫妻的小日子过的真醉人

BeautifulSoup技术爬取豆瓣TOP250

爬取豆瓣top250

python爬虫爬取豆瓣top排行图片

Python利用Scrapy框架爬取豆瓣电影示例

Scrapy爬取豆瓣图书数据并写入MySQL

Java基于WebMagic爬取某豆瓣电影评论的实现

爬取豆瓣电影排行top250

urllib和BeautifulSoup爬取*的词条简单实例

Python爬虫实战用 BeautifulSoup 爬取电影网站信息

Python爬虫项目，爬取豆瓣top250中影片信息

Python爬虫案例(爬取豆瓣top250)[完整＋详细]

Python利用lxml模块爬取豆瓣读书排行榜的方法与分析

BeautifulSoup技术爬取豆瓣TOP250

爬取豆瓣top250

python爬虫爬取豆瓣top排行图片

Python利用Scrapy框架爬取豆瓣电影示例

Scrapy爬取豆瓣图书数据并写入MySQL

Java基于WebMagic爬取某豆瓣电影评论的实现

爬取豆瓣电影排行top250

urllib和BeautifulSoup爬取*的词条简单实例

Python爬虫实战用 BeautifulSoup 爬取电影网站信息

Python爬虫项目 ，爬取豆瓣top250中影片信息

Python爬虫案例(爬取豆瓣top250)[完整＋详细]

Python利用lxml模块爬取豆瓣读书排行榜的方法与分析

Python爬虫项目，爬取豆瓣top250中影片信息