python爬视频实例
程序员文章站
2023-04-05 13:53:22
例:抓取PhotoShop视频教程 网址http://www.mxiaobei.com/?id=424 BeautifulSoup: https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/ Requests: http://cn.python reque ......
例:抓取photoshop视频教程 网址http://www.mxiaobei.com/?id=424
import requests import re from bs4 import beautifulsoup import time dicts = {} list1 = set() print('start') ua = 'mozilla/5.0 (macintosh; intel mac os x 10_14_0) applewebkit/537.36 (khtml, like gecko) chrome/76.0.3809.87 safari/537.36' urls = 'http://www.mxiaobei.com/?id=' for index in range(451, 565): r = requests.get(urls + str(index), headers = {'user-agent': ua }) r.encoding = 'utf-8' soup = beautifulsoup(r.text, 'lxml') title = soup.find(name='h2') mp4url = soup.find('div', id='cuplayer') if mp4url is none: list1.add(index) continue mpurl = re.search('http.*?mp4', mp4url.text) dicts[title.text] = mpurl.group() #print(index) #time.sleep(1) #print(title.text + ' : ' + dicts[title.text]) print(dicts) print(list1) for temp in dicts.items(): #time.sleep(1) r = requests.get(temp[1], stream=true) with open(temp[0] + '.mp4', "wb") as mp4: for chunk in r.iter_content(chunk_size=1024 * 1024): if chunk: mp4.write(chunk) print(temp[0]+'下载完成') print('end!')
- beautifulsoup: https://beautifulsoup.readthedocs.io/zh_cn/v4.4.0/
- requests: http://cn.python-requests.org/zh_cn/latest/
下一篇: 在Electron中最快速预加载脚本