APP爬虫之mitmdump的使用(待改)
程序员文章站
2022-06-02 21:49:19
...
1. 准备工作
- 安装mitmproxy和mitmdump(windows下不可以使用mitmproxy)
- 手机设置代理,端口8080
- 配置好mitmproxy的CA证书
- mongoDB安装运行
2. 编写脚本 script.py
3. 运行mitmdump
Mitmdump -s script.py
4. 操作手机便可得到对应输出
import json
import pymongo
from urllib.parse import unquote
import re
client = pymongo.MongoClient('localhost', 27017)
db = client['jd']
comments_collection = db['comments']
products_collection = db['products']
def response(flow):
global comments_collection, products_collection
# 提取评论数据
url = 'api.m.jd.com/client.action'
if url in flow.request.url:
pattern = re.compile('sku\".*?\"(\d+)\"')
# Request请求参数中包含商品ID
body = unquote(flow.request.text)
# 提取商品ID
id = re.search(pattern, body).group(1) if re.search(pattern, body) else None
# 提取Response Body
text = flow.response.text
data = json.loads(text)
comments = data.get('commentInfoList') or []
# 提取评论数据
for comment in comments:
if comment.get('commentInfo') and comment.get('commentInfo').get('commentData'):
info = comment.get('commentInfo')
text = info.get('commentData')
date = info.get('commentDate')
nickname = info.get('userNickName')
pictures = info.get('pictureInfoList')
print(id, nickname, text, date)
comments_collection.insert({
'id': id,
'text': text,
'date': date,
'nickname': nickname,
'pictures': pictures
})
url = 'cdnware.m.jd.com'
if url in flow.request.url:
text = flow.response.text
data = json.loads(text)
if data.get('wareInfo') and data.get('wareInfo').get('basicInfo'):
info = data.get('wareInfo').get('basicInfo')
id = info.get('wareId')
name = info.get('name')
images = info.get('wareImage')
print(id, name, images)
products_collection.insert({
'id': id,
'name': name,
'images': images
})
代码可能不能正常运行,可将数据库的存储操作封装为一个接口,然后在主程序体内调用
下一篇: python_ssh h3c 路由器