APP爬虫之mitmdump的使用（待改）

程序员文章站 2022-06-02 21:49:19

...

1. 准备工作

安装mitmproxy和mitmdump（windows下不可以使用mitmproxy）
手机设置代理，端口8080
配置好mitmproxy的CA证书
mongoDB安装运行

2. 编写脚本 script.py

3. 运行mitmdump

Mitmdump -s script.py

4. 操作手机便可得到对应输出

import json
import pymongo
from urllib.parse import unquote
import re

client = pymongo.MongoClient('localhost', 27017)
db = client['jd']
comments_collection = db['comments']
products_collection = db['products']


def response(flow):
    global comments_collection, products_collection
    # 提取评论数据
    url = 'api.m.jd.com/client.action'
    if url in flow.request.url:
        pattern = re.compile('sku\".*?\"(\d+)\"')
        # Request请求参数中包含商品ID
        body = unquote(flow.request.text)
        # 提取商品ID
        id = re.search(pattern, body).group(1) if re.search(pattern, body) else None
        # 提取Response Body
        text = flow.response.text
        data = json.loads(text)
        comments = data.get('commentInfoList') or []
        # 提取评论数据
        for comment in comments:
            if comment.get('commentInfo') and comment.get('commentInfo').get('commentData'):
                info = comment.get('commentInfo')
                text = info.get('commentData')
                date = info.get('commentDate')
                nickname = info.get('userNickName')
                pictures = info.get('pictureInfoList')
                print(id, nickname, text, date)
                comments_collection.insert({
                    'id': id,
                    'text': text,
                    'date': date,
                    'nickname': nickname,
                    'pictures': pictures
                })

    url = 'cdnware.m.jd.com'
    if url in flow.request.url:
        text = flow.response.text
        data = json.loads(text)
        if data.get('wareInfo') and data.get('wareInfo').get('basicInfo'):
            info = data.get('wareInfo').get('basicInfo')
            id = info.get('wareId')
            name = info.get('name')
            images = info.get('wareImage')
            print(id, name, images)
            products_collection.insert({
                'id': id,
                'name': name,
                'images': images
            })

代码可能不能正常运行，可将数据库的存储操作封装为一个接口，然后在主程序体内调用

上一篇： Linux下安装Keepalived及原理分析

下一篇： python_ssh h3c 路由器