使用requests模拟登录smth论坛

程序员文章站 2024-03-16 14:22:16

...

登录前按F12打开chrome的开发者选项，登录，找到登录的json，如图，从来没试过JSON登录，现在来试试

使用的库是requests,lxml,time

首先生成Request Headers，注意箭头指的两个地方，之前怎么搞都搞不出来就是因为请求头里没有加上这两处，尤其是Connection。Cookie里的两个红框是时间戳，所以我用int(time.time())来实现。Form Data里是要post的账号和密码。

先使用requests.post登录，然后保存登陆的cookies，再使用cookies访问具体的版面，用lxml.xpath找出版面的标题，拼接出链接的绝对地址，代码如下：

# coding:utf-8
__author__ = 'Administrator'

import requests
from lxml import etree
import time

def build_headers():
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"
    cookie="Cookie:td_cookie=2405695880; td_cookie=2404190040; nforum-left=00000; Hm_lvt_9c7f4d9b7c00cb5aba2c637c64a41567=%d; Hm_lpvt_9c7f4d9b7c00cb5aba2c637c64a41567=%d; main[XWJOKE]=hoho; main[UTMPUSERID]=guest; main[UTMPKEY]=31535293; main[UTMPNUM]=12707" \
           %(int(time.time()),int(time.time())) #生成时间戳
    #header里的Connection和X-Requested-With一定要加上，不然会404
    headers = { "User-Agent": user_agent, "Referer":"http://www.newsmth.net/nForum/index",
                "Host":"www.newsmth.net","Origin":"http://www.newsmth.net",
                "Accept": "application/json, text/javascript, */*; q=0.01","Cookie":cookie,
                "Connection":"keep-alive","X-Requested-With": "XMLHttpRequest"}
    return headers


if __name__ == '__main__':
    #把登录信息保存成dict
    login_data = {"id": yourid, "passwd": yourpassword, "mode": "0", "CookieDate": "0"}
    login_url = "http://www.newsmth.net/nForum/user/ajax_login.json"
    #requests.post(url,headers,data)
    response = requests.post(login_url, headers=build_headers(), data=login_data)
    #保存登录成功的cookies
    cookie=response.cookies
    #准备访问版面的请求头，和登录时候的不一样
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"
    header = {"User-Agent": user_agent, "Referer": "http://www.newsmth.net/nForum/",
              "Host": "www.newsmth.net", "Connection": "keep-alive","X-Requested-With":"XMLHttpRequest"}
    #版面地址
    url="http://www.newsmth.net/nForum/board/CouponsLife"
    p="ajax:"
    #requests.get(url,headers,cookies,params)
    res=requests.get(url,headers=header,cookies=cookie,params=p)
    #生成tree对象
    tree=etree.HTML(res.text)
    #使用xpath找到文章信息
    list=tree.xpath('//table/tbody/tr[not(@class)]/td[@class="title_9"]/a')
    for topic in list:
        title=topic.text
        url="http://www.newsmth.net/nForum/#!"+topic.attrib.get('href').split('/nForum/')[1]
        print title,url

相关标签： python 爬虫模拟登录

上一篇： Flask使用 secure_filename 获取文件名不完整问题

下一篇： Java实训——学生信息管理系统

使用requests模拟登录smth论坛

使用requests模拟登录smth论坛

JavaWeb使用Cookie模拟实现自动登录功能(不需用户名和密码)

JavaWeb使用Cookie模拟实现自动登录功能(不需用户名和密码)

php 使用curl模拟登录人人(校内)网的简单实例

PHP使用CURL模拟登录的方法，phpcurl模拟登录

php使用curl模拟登录后采集页面的例子_PHP

Python3使用requests登录人人影视网站的方法

PHP使用CURL模拟登录的方法

如何使用curl模拟登录ci框架项目？

python采用requests库模拟登录和抓取数据的简单示例