欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

使用requests模拟登录smth论坛

程序员文章站 2024-03-16 14:22:16
...

登录前按F12打开chrome的开发者选项,登录,找到登录的json,如图,从来没试过JSON登录,现在来试试

使用requests模拟登录smth论坛

使用的库是requests,lxml,time

首先生成Request Headers,注意箭头指的两个地方,之前怎么搞都搞不出来就是因为请求头里没有加上这两处,尤其是Connection。Cookie里的两个红框是时间戳,所以我用int(time.time())来实现。Form Data里是要post的账号和密码。

先使用requests.post登录,然后保存登陆的cookies,再使用cookies访问具体的版面,用lxml.xpath找出版面的标题,拼接出链接的绝对地址,代码如下

# coding:utf-8
__author__ = 'Administrator'

import requests
from lxml import etree
import time

def build_headers():
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"
    cookie="Cookie:td_cookie=2405695880; td_cookie=2404190040; nforum-left=00000; Hm_lvt_9c7f4d9b7c00cb5aba2c637c64a41567=%d; Hm_lpvt_9c7f4d9b7c00cb5aba2c637c64a41567=%d; main[XWJOKE]=hoho; main[UTMPUSERID]=guest; main[UTMPKEY]=31535293; main[UTMPNUM]=12707" \
           %(int(time.time()),int(time.time())) #生成时间戳
    #header里的Connection和X-Requested-With一定要加上,不然会404
    headers = { "User-Agent": user_agent, "Referer":"http://www.newsmth.net/nForum/index",
                "Host":"www.newsmth.net","Origin":"http://www.newsmth.net",
                "Accept": "application/json, text/javascript, */*; q=0.01","Cookie":cookie,
                "Connection":"keep-alive","X-Requested-With": "XMLHttpRequest"}
    return headers


if __name__ == '__main__':
    #把登录信息保存成dict
    login_data = {"id": yourid, "passwd": yourpassword, "mode": "0", "CookieDate": "0"}
    login_url = "http://www.newsmth.net/nForum/user/ajax_login.json"
    #requests.post(url,headers,data)
    response = requests.post(login_url, headers=build_headers(), data=login_data)
    #保存登录成功的cookies
    cookie=response.cookies
    #准备访问版面的请求头,和登录时候的不一样
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"
    header = {"User-Agent": user_agent, "Referer": "http://www.newsmth.net/nForum/",
              "Host": "www.newsmth.net", "Connection": "keep-alive","X-Requested-With":"XMLHttpRequest"}
    #版面地址
    url="http://www.newsmth.net/nForum/board/CouponsLife"
    p="ajax:"
    #requests.get(url,headers,cookies,params)
    res=requests.get(url,headers=header,cookies=cookie,params=p)
    #生成tree对象
    tree=etree.HTML(res.text)
    #使用xpath找到文章信息
    list=tree.xpath('//table/tbody/tr[not(@class)]/td[@class="title_9"]/a')
    for topic in list:
        title=topic.text
        url="http://www.newsmth.net/nForum/#!"+topic.attrib.get('href').split('/nForum/')[1]
        print title,url