欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

APP爬虫之mitmdump的使用(待改)

程序员文章站 2022-06-02 21:49:19
...

1. 准备工作

  • 安装mitmproxy和mitmdump(windows下不可以使用mitmproxy)
  • 手机设置代理,端口8080
  • 配置好mitmproxy的CA证书
  • mongoDB安装运行

2. 编写脚本  script.py

3. 运行mitmdump

    Mitmdump -s script.py

4. 操作手机便可得到对应输出

import json
import pymongo
from urllib.parse import unquote
import re

client = pymongo.MongoClient('localhost', 27017)
db = client['jd']
comments_collection = db['comments']
products_collection = db['products']


def response(flow):
    global comments_collection, products_collection
    # 提取评论数据
    url = 'api.m.jd.com/client.action'
    if url in flow.request.url:
        pattern = re.compile('sku\".*?\"(\d+)\"')
        # Request请求参数中包含商品ID
        body = unquote(flow.request.text)
        # 提取商品ID
        id = re.search(pattern, body).group(1) if re.search(pattern, body) else None
        # 提取Response Body
        text = flow.response.text
        data = json.loads(text)
        comments = data.get('commentInfoList') or []
        # 提取评论数据
        for comment in comments:
            if comment.get('commentInfo') and comment.get('commentInfo').get('commentData'):
                info = comment.get('commentInfo')
                text = info.get('commentData')
                date = info.get('commentDate')
                nickname = info.get('userNickName')
                pictures = info.get('pictureInfoList')
                print(id, nickname, text, date)
                comments_collection.insert({
                    'id': id,
                    'text': text,
                    'date': date,
                    'nickname': nickname,
                    'pictures': pictures
                })

    url = 'cdnware.m.jd.com'
    if url in flow.request.url:
        text = flow.response.text
        data = json.loads(text)
        if data.get('wareInfo') and data.get('wareInfo').get('basicInfo'):
            info = data.get('wareInfo').get('basicInfo')
            id = info.get('wareId')
            name = info.get('name')
            images = info.get('wareImage')
            print(id, name, images)
            products_collection.insert({
                'id': id,
                'name': name,
                'images': images
            })

代码可能不能正常运行,可将数据库的存储操作封装为一个接口,然后在主程序体内调用