python模拟Facebook的requests方式登录(python采集帖子信息)
程序员文章站
2022-06-29 12:44:33
需求工作中需要采集FB上的帖子信息,目前FB只有小组中的帖子支持公开采集,其它个人的帖子需要登录上FB后方能采集,而分析登录的过程发现,post的请求体中有一段加密的信息,如下所示:请求的url为:link......
需求
工作中需要采集FB上的帖子信息,目前FB只有小组中的帖子支持公开采集,其它个人的帖子需要登录上FB后方能采集,而分析登录的过程发现,post的请求体中有一段加密的信息,如下所示:
- 请求的url为:https://www.facebook.com/
- 输入用户名密码后会跳转到 https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110
- 通过上面的url,可以发现是一个post请求,然后需要的参数为:
发现只有email参数,并没有发现password,但是通过分析第一步的url的html,可以发现这样一段:
开始处理
# -*- coding: utf-8 -*-
import requests
import re
from bs4 import BeautifulSoup
from urllib.parse import urljoin
base_url = 'https://www.facebook.com'
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " \
"Chrome/76.0.3809.87 Safari/537.36"
cookie = 'locale=en_US;'
headers = {
'user-agent': user_agent,
'accept-language': 'en-US,en;q=0.5',
'cookie': cookie
}
session = requests.session()
session.headers.update(headers)
response = session.get(url=base_url)
html = response.text
pattern = re.compile(r'<form.*?action=\"(.*?)\"')
action = pattern.findall(html, re.S)
action_url = [url for url in action if 'login/device-based' in url]
if action_url:
action_url = action_url[0].replace('&','&')
else:
soup = BeautifulSoup(html, 'html.parser')
form = soup.find('form', attrs={'method': 'post', 'id': 'login_form', 'action':True})
if form:
action_url = form['action'].replace('&','&')
if not action_url:
print('Get Login Url Error')
action_url = 'https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110'
login_url = urljoin(base_url, action_url)
data = {
'email': '你的邮箱或者手机号',
'pass': '密码'
}
r = session.post(login_url, data=data)
cookies = requests.utils.dict_from_cookiejar(session.cookies)
print(cookies)
可以通过访问m.facebook.com来登录,与上面的方式一模一样:
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests
import re
base_url = 'https://m.facebook.com'
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " \
"Chrome/76.0.3809.87 Safari/537.36"
cookie = 'locale=en_US;'
headers = {
'user-agent': user_agent,
'accept-language': 'en-US,en;q=0.5',
'cookie': cookie
}
session = requests.session()
session.headers.update(headers)
response = session.get(url=base_url)
html = response.text
pattern = re.compile(r'<form.*?action=\"(.*?)\"')
action = pattern.findall(html, re.S)
action_url = [url for url in action if 'login/device-based' in url]
if action_url:
action_url = action_url[0].replace('&','&')
else:
soup = BeautifulSoup(html, 'html.parser')
form = soup.find('form', attrs={'method': 'post', 'id': 'login_form', 'action':True})
if form:
action_url = form['action'].replace('&','&')
if not action_url:
print('Get Login Url Error')
action_url = '/login/device-based/regular/login/?refsrc=https%3A%2F%2Fm.facebook.com%2F&lwv=100&refid=8'
login_url = urljoin(base_url, action_url)
data = {
'email': '你的邮箱或者手机号',
'pass': '密码'
}
r = session.post(login_url, data=data)
cookies = requests.utils.dict_from_cookiejar(session.cookies)
print(cookies)
本文地址:https://blog.csdn.net/minghao2164/article/details/107077621