Python爬虫 爬取音频文件 #只用于学习
程序员文章站
2022-05-04 14:08:04
...
from lxml import etree
import requests
import os
from urllib import request,parse
url = 'https://www.ximalaya.com/lishi/4164479/'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
response = requests.get(url,headers=headers)
# print(response)
html = response.text
html_ele = etree.HTML(html)
mp_list = html_ele.xpath('//ul[@class="dOi2"]/li/div[2]/a/@href')
# print(mp_list)
# 遍历春秋尾部链接
for mp in mp_list:
# print(mp)
data = parse.urljoin(url, mp)
# print(data)
data_url_str = data.split('/')[-1]
#音频地址
data_url = 'https://www.ximalaya.com/revision/play/tracks?trackIds=' + str(data_url_str)
# print(data_url)
response = requests.get(data_url, headers=headers)
# print(response.text)
# # print(type(response.text))
# 直接转json类型
data_str = response.json()
# print(type(data_str))
# 获取m4a的地址
m4a_url = data_str['data']['tracksForAudioPlay'][0]['src']
m4a_name = data_str['data']['tracksForAudioPlay'][0]['trackName']
# print(m4a_url)
# print(m4a_name)
# 创建down文件夹
if not os.path.exists('Down'):
os.mkdir('Down')
filename = 'Down/' + m4a_name + '.m4a'
# print(filename)
#下载
request.urlretrieve(m4a_url, filename)
print(m4a_url + m4a_name + '正在下载ding...。')
print('---' * 50)
上一篇: Nginx 1.1.4开发版发布
推荐阅读
-
【Python爬虫案例学习】Python爬取淘宝店铺和评论
-
【Python爬虫案例学习2】python多线程爬取youtube视频
-
Python3爬虫学习之将爬取的信息保存到本地的方法详解
-
Python3爬虫学习之MySQL数据库存储爬取的信息详解
-
python爬虫学习教程之兼职网数据爬取
-
Python爬虫学习==>第十章:使用Requests+正则表达式爬取猫眼电影
-
python爬虫学习之爬取超清唯美壁纸
-
一个月入门Python爬虫学习,轻松爬取大规模数据
-
Python爬虫学习记录——8.使用自动化神器Selenium爬取动态网页
-
【Python3.6爬虫学习记录】(七)使用Selenium+ChromeDriver爬取知乎某问题的回答