使用requests模拟登录smth论坛
程序员文章站
2024-03-16 14:22:16
...
登录前按F12打开chrome的开发者选项,登录,找到登录的json,如图,从来没试过JSON登录,现在来试试
使用的库是requests,lxml,time
首先生成Request Headers,注意箭头指的两个地方,之前怎么搞都搞不出来就是因为请求头里没有加上这两处,尤其是Connection。Cookie里的两个红框是时间戳,所以我用int(time.time())来实现。Form Data里是要post的账号和密码。
先使用requests.post登录,然后保存登陆的cookies,再使用cookies访问具体的版面,用lxml.xpath找出版面的标题,拼接出链接的绝对地址,代码如下:
# coding:utf-8
__author__ = 'Administrator'
import requests
from lxml import etree
import time
def build_headers():
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"
cookie="Cookie:td_cookie=2405695880; td_cookie=2404190040; nforum-left=00000; Hm_lvt_9c7f4d9b7c00cb5aba2c637c64a41567=%d; Hm_lpvt_9c7f4d9b7c00cb5aba2c637c64a41567=%d; main[XWJOKE]=hoho; main[UTMPUSERID]=guest; main[UTMPKEY]=31535293; main[UTMPNUM]=12707" \
%(int(time.time()),int(time.time())) #生成时间戳
#header里的Connection和X-Requested-With一定要加上,不然会404
headers = { "User-Agent": user_agent, "Referer":"http://www.newsmth.net/nForum/index",
"Host":"www.newsmth.net","Origin":"http://www.newsmth.net",
"Accept": "application/json, text/javascript, */*; q=0.01","Cookie":cookie,
"Connection":"keep-alive","X-Requested-With": "XMLHttpRequest"}
return headers
if __name__ == '__main__':
#把登录信息保存成dict
login_data = {"id": yourid, "passwd": yourpassword, "mode": "0", "CookieDate": "0"}
login_url = "http://www.newsmth.net/nForum/user/ajax_login.json"
#requests.post(url,headers,data)
response = requests.post(login_url, headers=build_headers(), data=login_data)
#保存登录成功的cookies
cookie=response.cookies
#准备访问版面的请求头,和登录时候的不一样
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"
header = {"User-Agent": user_agent, "Referer": "http://www.newsmth.net/nForum/",
"Host": "www.newsmth.net", "Connection": "keep-alive","X-Requested-With":"XMLHttpRequest"}
#版面地址
url="http://www.newsmth.net/nForum/board/CouponsLife"
p="ajax:"
#requests.get(url,headers,cookies,params)
res=requests.get(url,headers=header,cookies=cookie,params=p)
#生成tree对象
tree=etree.HTML(res.text)
#使用xpath找到文章信息
list=tree.xpath('//table/tbody/tr[not(@class)]/td[@class="title_9"]/a')
for topic in list:
title=topic.text
url="http://www.newsmth.net/nForum/#!"+topic.attrib.get('href').split('/nForum/')[1]
print title,url