欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

新浪微博下载相册

程序员文章站 2022-04-28 19:36:34
...

打开新浪微博,登录,打开yz的相册

打开chrome的开发者工具,在Sources中+New snippet

timeout=prompt("Set timeout (Second):");
count=0
current=location.href;
if(timeout>0)
setTimeout('reload()',1000*timeout);
else
location.replace(current);
function reload(){
setTimeout('reload()',1000*timeout);
count++;
console.log('每('+timeout+')秒自动刷新,刷新次数:'+count);
window.scrollTo(0,document.body.scrollHeight);
}

右键Run,等结束,在Elements中Copy Element body
保存为yz.txt

然后执行脚本

import os
from lxml import etree
import requests
import sys


html = etree.parse('yz.txt', etree.HTMLParser(encoding='utf-8'))
print(type(html))  # <class 'lxml.etree._ElementTree'>  返回节点树

# 查找所有 li 节点
ust = html.xpath('//ul/@group_id') #// 
print(type(ust))   # ==><class 'list'>
 
for iul in ust:
    print(iul)
    print(type(iul))
    path = str(iul)
    isExists = os.path.exists(path)
    if not isExists:
        os.makedirs(path)
    else:
        print(path)
    output = '//ul[@group_id="'
    output += str(iul)
    output += '"]//img/@src'
    print(output)
    lst = html.xpath(output)
    print(type(lst))
    for ili in lst:
        print(ili)
        link = str(ili)
        if not link.startswith('https:'):
            link = 'https:' + link
        link = link.replace("/thumb300/", "/large/")
        print(link)
        response = requests.get(link,verify=False)
        index = link.rfind('/')
        file_name = path+'/'+link[index + 1:]
        with open(file_name, "wb") as f:
            f.write(response.content)

现在只有保存图片功能,保存视频以后加吧。

试了这个可以
Python爬虫——批量爬取微博图片(不使用cookie)

相关标签: life