python使用scrapy发送post请求的坑
使用requests发送post请求
先来看看使用requests来发送post请求是多少好用,发送请求
requests 简便的 api 意味着所有 http 请求类型都是显而易见的。例如,你可以这样发送一个 http post 请求:
>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})
使用data可以传递字典作为参数,同时也可以传递元祖
>>>payload = (('key1', 'value1'), ('key1', 'value2')) >>>r = requests.post('http://httpbin.org/post', data=payload) >>>print(r.text) { ... "form": { "key1": [ "value1", "value2" ] }, ... }
传递json是这样
>>>import json >>>url = 'https://api.github.com/some/endpoint' >>>payload = {'some': 'data'} >>>r = requests.post(url, data=json.dumps(payload))
2.4.2 版的新加功能:
>>>url = 'https://api.github.com/some/endpoint' >>>payload = {'some': 'data'} >>>r = requests.post(url, json=payload)
也就是说,你不需要对参数做什么变化,只需要关注使用data=还是json=,其余的requests都已经帮你做好了。
使用scrapy发送post请求
通过源码可知scrapy默认发送的get请求,当我们需要发送携带参数的请求或登录时,是需要post、请求的,以下面为例
from scrapy.spider import crawlspider from scrapy.selector import selector import scrapy import json class lagou(crawlspider): name = 'myspider' def start_requests(self): yield scrapy.formrequest( url='https://www.******.com/jobs/positionajax.json?city=%e5%b9%bf%e5%b7%9e&needaddtionalresult=false', formdata={ 'first': 'true',#这里不能给bool类型的true,requests模块中可以 'pn': '1',#这里不能给int类型的1,requests模块中可以 'kd': 'python' },这里的formdata相当于requ模块中的data,key和value只能是键值对形式 callback=self.parse ) def parse(self, response): datas=json.loads(response.body.decode())['content']['positionresult']['result'] for data in datas: print(data['companyfullname'] + str(data['positionid']))
官方推荐的 using formrequest to send data via http post
return [formrequest(url="http://www.example.com/post/action", formdata={'name': 'john doe', 'age': '27'}, callback=self.after_post)]
这里使用的是formrequest,并使用formdata传递参数,看到这里也是一个字典。
但是,超级坑的一点来了,今天折腾了一下午,使用这种方法发送请求,怎么发都会出问题,返回的数据一直都不是我想要的
return scrapy.formrequest(url, formdata=(payload))
在网上找了很久,最终找到一种方法,使用scrapy.request发送请求,就可以正常的获取数据。
参考:send post request in scrapy
my_data = {'field1': 'value1', 'field2': 'value2'} request = scrapy.request( url, method='post', body=json.dumps(my_data), headers={'content-type':'application/json'} )
formrequest 与 request 区别
在文档中,几乎看不到差别,
the formrequest class adds a new argument to the constructor. the remaining arguments are the same as for the request class and are not documented here.
parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing html form data which will be url-encoded and assigned to the body of the request.
说formrequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且formrequest是继承request的
class formrequest(request): def __init__(self, *args, **kwargs): formdata = kwargs.pop('formdata', none) if formdata and kwargs.get('method') is none: kwargs['method'] = 'post' super(formrequest, self).__init__(*args, **kwargs) if formdata: items = formdata.items() if isinstance(formdata, dict) else formdata querystr = _urlencode(items, self.encoding) if self.method == 'post': self.headers.setdefault(b'content-type', b'application/x-www-form-urlencoded') self._set_body(querystr) else: self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr) ### def _urlencode(seq, enc): values = [(to_bytes(k, enc), to_bytes(v, enc)) for k, vs in seq for v in (vs if is_listlike(vs) else [vs])] return urlencode(values, doseq=1)
最终我们传递的{‘key': ‘value', ‘k': ‘v'}会被转化为'key=value&k=v' 并且默认的method是post,再来看看request
class request(object_ref): def __init__(self, url, callback=none, method='get', headers=none, body=none, cookies=none, meta=none, encoding='utf-8', priority=0, dont_filter=false, errback=none, flags=none): self._encoding = encoding # this one has to be set first self.method = str(method).upper()
默认的方法是get,其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法,这是定义请求的基础方法。
def request(method, url, **kwargs): """constructs and sends a :class:`request <request>`. :param method: method for the new :class:`request` object. :param url: url for the new :class:`request` object. :param params: (optional) dictionary or bytes to be sent in the query string for the :class:`request`. :param data: (optional) dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`request`. :param json: (optional) json data to send in the body of the :class:`request`. :param headers: (optional) dictionary of http headers to send with the :class:`request`. :param cookies: (optional) dict or cookiejar object to send with the :class:`request`. :param files: (optional) dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) auth tuple to enable basic/digest/custom http auth. :param timeout: (optional) how many seconds to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) boolean. enable/disable get/options/post/put/patch/delete/head redirection. defaults to ``true``. :type allow_redirects: bool :param proxies: (optional) dictionary mapping protocol to the url of the proxy. :param verify: (optional) either a boolean, in which case it controls whether we verify the server's tls certificate, or a string, in which case it must be a path to a ca bundle to use. defaults to ``true``. :param stream: (optional) if ``false``, the response content will be immediately downloaded. :param cert: (optional) if string, path to ssl client cert file (.pem). if tuple, ('cert', 'key') pair. :return: :class:`response <response>` object :rtype: requests.response usage:: >>> import requests >>> req = requests.request('get', 'http://httpbin.org/get') <response [200]> """ # by using the 'with' statement we are sure the session is closed, thus we # avoid leaving sockets open which can trigger a resourcewarning in some # cases, and look like a memory leak in others. with sessions.session() as session: return session.request(method=method, url=url, **kwargs)
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。
推荐阅读
-
详解使用fetch发送post请求时的参数处理
-
python利用requests库模拟post请求时json的使用
-
python使用scrapy发送post请求的坑
-
Python使用scrapy采集数据时为每个请求随机分配user-agent的方法
-
Python批量发送post请求的实现代码
-
Python使用requests发送POST请求实例代码
-
python 使用 requests 模块发送http请求 的方法
-
Scrapy中的POST请求发送和递归爬取
-
在python中使用requests 模拟浏览器发送请求数据的方法
-
python利用requests库模拟post请求时json的使用教程