欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

python爬视频实例

程序员文章站 2023-04-05 13:53:22
例:抓取PhotoShop视频教程 网址http://www.mxiaobei.com/?id=424 BeautifulSoup: https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/ Requests: http://cn.python reque ......

例:抓取photoshop视频教程 网址http://www.mxiaobei.com/?id=424

import requests
import re
from bs4 import beautifulsoup
import time

dicts = {}
list1 = set()

print('start')

ua = 'mozilla/5.0 (macintosh; intel mac os x 10_14_0) applewebkit/537.36 (khtml, like gecko) chrome/76.0.3809.87 safari/537.36'

urls = 'http://www.mxiaobei.com/?id='

for index in range(451, 565):
    r = requests.get(urls + str(index), headers = {'user-agent': ua })
    r.encoding = 'utf-8'
    soup = beautifulsoup(r.text, 'lxml')
    title = soup.find(name='h2')
    mp4url = soup.find('div', id='cuplayer')
    if mp4url is none:
        list1.add(index)
        continue
    mpurl = re.search('http.*?mp4', mp4url.text)
    dicts[title.text] = mpurl.group()
    #print(index)
    #time.sleep(1)
    #print(title.text + ' : ' + dicts[title.text])
print(dicts)
print(list1)
for temp in dicts.items():
    #time.sleep(1)
    r = requests.get(temp[1], stream=true)
    with open(temp[0] + '.mp4', "wb") as mp4:
        for chunk in r.iter_content(chunk_size=1024 * 1024):
            if chunk:
                mp4.write(chunk)
    print(temp[0]+'下载完成')
print('end!')