Python实习第七天代码
程序员文章站
2022-06-23 09:38:37
网易云歌曲下载通过前两天的学习,已经学会了Xpath命令和BeautifulSoup命令,现在大家就要找到适合自己的命令来对HTML代码进行解析对于我本人而言,我更喜欢BeautifulSoup来解析代码我们今天呢,老师带领我们爬取一个比较厉害的网站网易云音乐我们首先搜索一首歌曲,然后你会看到很多条曲目:我们可以点击曲目下的歌手看看这个歌手的专辑好的,那我们现在就实现网易云会员歌曲下载1.下载曲目保存.mp3文件这个网易云的链接不是那么好获取的,需要按下F12分析同时,我们要点击NetWork...
网易云歌曲下载
通过前两天的学习,已经学会了Xpath命令和BeautifulSoup命令,现在大家就要找到适合自己的命令来对HTML代码进行解析
对于我本人而言,我更喜欢BeautifulSoup来解析代码
我们今天呢,老师带领我们爬取一个比较厉害的网站网易云音乐
我们首先搜索一首歌曲,然后你会看到很多条曲目:我们可以点击曲目下的歌手看看这个歌手的专辑
好的,那我们现在就实现网易云会员歌曲下载
1.下载曲目保存.mp3文件
这个网易云的链接不是那么好获取的,需要按下F12分析同时,我们要点击NetWork来进行分析,通过观察我们的链接中有一个#号,这个#号表示分割符,我们在Network中分析#前的DOC文件,再分析#后的DOC文件
每次请求网页需要不同的请求头,需要进行伪装
然后我们有一些网易云api可以用网易云api接口
这个网页分析着实是个能力,需要自己多想,这里只能指点下了
下面是今天的代码内容
'''
1。网页,查看单首歌的路径
通过歌曲列表点击单首歌: href=/song?id=468490571
https://music.163.com/#/song?id=468490571
下载:
http://music.163.com/song/media/outer/url?id=468490571.mp3
歌词下载:
尝试下载一首歌内容
https://music.163.com/#/song?id=468490571
2。
'''
import random
import re
import time
from urllib import request
import requests
from bs4 import BeautifulSoup
user_agents = [
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0",
"Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11",
]
def get_resource(url, params=None, flag='html'):
headers = {
'referer': 'http://music.163.com/',
'Host': 'music.163.com',
'User-Agent': random.choice(user_agents),
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
}
# 使用requests发出请求
response = requests.get(url=url, params=params, headers=headers)
print(response.status_code)
# 判断response的状态码
if response.status_code == 200:
# 判断flag
if flag == 'html':
return response.text
elif flag == 'media':
return response.content
else:
print('获取资源有误!')
# 获取歌曲列表 bs4
def get_music_list(html):
soup = BeautifulSoup(html, 'lxml')
a_list = soup.find('ul', class_='f-hide').find_all('a')
# 歌曲列表
song_list = []
for a_tag in a_list:
song_id = a_tag.get('href').rsplit("=")[-1]
song_name = a_tag.text
song = [song_name, song_id]
# 添加到列表中
song_list.append(song)
return song_list
# 封装: 下载音乐
def download_music(url, songname):
headers = {
'referer': 'http://music.163.com/',
# 'Host': 'm10.music.126.net',
'User-Agent': random.choice(user_agents),
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
}
# 使用requests发出请求
response = requests.get(url=url, headers=headers)
if response.status_code == 200:
resource = response.content
if resource:
# 本地保存
with open('music/{}.mp3'.format(songname), 'wb') as fw:
fw.write(resource)
print('成功下载:{}.mp3'.format(songname))
else:
print('下载{}失败'.format(songname))
else:
print('请求资源有误!')
# 获取歌词
def get_lyric(lyric_url, songname):
# 获取响应对象
headers = {
'referer': 'http://music.163.com/',
# 'Host': 'm10.music.126.net',
'User-Agent': random.choice(user_agents),
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
}
response = requests.get(lyric_url,headers=headers)
# 判断状态码
if response.status_code == 200:
json_data = response.json()
# json_data ---->字典数据
lrc = json_data.get('lrc')
if lrc: # lrc= {'key':'value'}
lyric = lrc.get('lyric')
# 清洗 通过正则表达式完成替换
final_lyric = re.sub(r'\[.*\]', '', lyric) #
# 本地保存
with open('musicword/{}.txt'.format(songname), 'w',encoding="utf-8") as fw:
fw.write(final_lyric)
print('成功保存{}歌词'.format(songname))
else:
print('解析失败!')
else:
print('歌词请求有误!')
if __name__ == '__main__':
#
url = 'https://music.163.com/playlist?id=3134064854'
# 获取资源
html = get_resource(url=url)
# 本地保存
with open('file/music.html', 'w',encoding="utf-8") as fw:
fw.write(html)
# 开始解析
song_list = get_music_list(html)
print(song_list)
# http://music.163.com/song/media/outer/url?id=536622304.mp3
for song in song_list: # ['名字','id']
url = 'http://music.163.com/song/media/outer/url?id={}.mp3'.format(song[1])
print(url, song[0])
# 下载音乐
download_music(url, song[0])
# 下载歌词
lyric_url = 'http://music.163.com/api/song/lyric?id={}&lv=1&kv=1&tv=-1'.format(song[1])
get_lyric(lyric_url, song[0])
time.sleep(random.randint(1, 5))
这个网易云要想破解的话很难,我们只需要用api去搞就好
本文地址:https://blog.csdn.net/chengqige/article/details/107323524