爬虫(四)：requests模块

程序员文章站 2022-11-09 08:56:49

1. requests模块 1.1 requests简介 requests 是一个功能强大、简单易用的 HTTP 请求库，比起之前用到的urllib模块，requests模块的api更加便捷。（本质就是封装了urllib3）可以使用pip install requests命令进行安装，但是很容易出 ......

1. requests模块

1.1 requests简介

requests 是一个功能强大、简单易用的 http 请求库，比起之前用到的urllib模块，requests模块的api更加便捷。（本质就是封装了urllib3）

可以使用pip install requests命令进行安装，但是很容易出网络问题，所以我找了下国内的镜像源来加速。

然后就找到了豆瓣的镜像源：

pip install 包名 -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

只要将包名修改一下，就能快速下载模块了。

1.2 requests请求

请求方法有很多种，但是我们只讲最常用的两种：get请求和post请求。

1.2.1 get请求

get方法用于向目标网址发送请求，方法返回一个response响应对象，response下一小节详细讲解。

get方法的参数：

url：必填，指定请求的url

params：字典类型，指定请求参数，常用于发送get请求时使用

例子：

import requests
url = 'http://www.httpbin.org/get'
params = {
    'key1':'value1',
    'key2':'value2'
}
response = requests.get(url=url,params=params)
print(response.text)

结果：

爬虫(四)：requests模块

headers：字典类型，指定请求头部

例子：

import requests
url = 'http://www.httpbin.org/headers'
headers = {
    'user-agent':'mozilla/5.0 (windows nt 10.0; wow64) applewebkit/537.36 (khtml, like gecko) chrome/67.0.3396.99 safari/537.36'
}
response = requests.get(url=url,headers=headers)
print(response.text)

结果：

爬虫(四)：requests模块

proxies：字典类型，指定使用的代理

例子：

import requests
url = 'http://www.httpbin.org/ip'
proxies = {
    'http':'113.116.127.164:8123',
    'http':'113.116.127.164:80'
}
response = requests.get(url=url,proxies=proxies)
print(response.text)

结果：

爬虫(四)：requests模块

cookies：字典类型，指定cookie

例子：

import requests
url = 'http://www.httpbin.org/cookies'
cookies = {
    'name1':'value1',
    'name2':'value2'
}
response = requests.get(url=url,cookies=cookies)
print(response.text)

结果：

爬虫(四)：requests模块

auth：元组类型，指定登陆时的账号和密码

例子：

import requests
url = 'http://www.httpbin.org/basic-auth/user/password'
auth = ('user','password')
response = requests.get(url=url,auth=auth)
print(response.text)

结果：

爬虫(四)：requests模块

verify：布尔类型，指定请求网站时是否需要进行证书验证，默认为 true，表示需要证书验证，假如不希望进行证书验证，则需要设置为false

import requests
response = requests.get(url='https://www.httpbin.org/',verify=false)

结果：

爬虫(四)：requests模块

但是在这种情况下，一般会出现 warning 提示，因为 python 希望我们能够使用证书验证。

如果不希望看到 warning 信息，可以使用以下命令消除：

import urllib3
urllib3.disable_warnings(urllib3.exceptions.insecurerequestwarning)

timeout：指定超时时间，若超过指定时间没有获得响应，则抛出异常

1.2.2 post请求

post请求和get请求的区别就是post数据不会出现在地址栏，并且数据的大小没有上限。

所以get的参数，post差不多都可以使用，除了params参数，post使用data参数即可。

data：字典类型，指定表单信息，常用于发送 post 请求时使用

例子：

import requests
url = 'http://www.httpbin.org/post'
data = {
    'key1':'value1',
    'key2':'value2'
}
response = requests.post(url=url,data=data)
print(response.text)

结果：

爬虫(四)：requests模块

1.3 requests响应

1.3.1 response属性

使用get或post请求后，就会接收到response响应对象，其常用的属性和方法列举如下：

response.url：返回请求网站的 url

response.status_code：返回响应的状态码

response.encoding：返回响应的编码方式

response.cookies：返回响应的 cookie 信息

response.headers：返回响应头

response.content：返回 bytes 类型的响应体

response.text：返回 str 类型的响应体，相当于response.content.decode('utf-8')

response.json()：返回 dict 类型的响应体，相当于json.loads(response.text)

import requests
response = requests.get('http://www.httpbin.org/get')
print(type(response))
# <class 'requests.models.response'>
print(response.url) # 返回请求网站的 url
# http://www.httpbin.org/get
print(response.status_code) # 返回响应的状态码
# 200
print(response.encoding) # 返回响应的编码方式
# none
print(response.cookies) # 返回响应的 cookie 信息
# <requestscookiejar[]>
print(response.headers) # 返回响应头
# {'access-control-allow-credentials': 'true', 'access-control-allow-origin': '*', 'content-encoding': 'gzip', 'content-type': 'application/json', 'date': 'mon, 16 dec 2019 03:16:22 gmt', 'referrer-policy': 'no-referrer-when-downgrade', 'server': 'nginx', 'x-content-type-options': 'nosniff', 'x-frame-options': 'deny', 'x-xss-protection': '1; mode=block', 'content-length': '189', 'connection': 'keep-alive'}
print(type(response.content))# 返回 bytes 类型的响应体
# <class 'bytes'>
print(type(response.text)) # 返回 str 类型的响应体
# <class 'str'>
print(type(response.json())) # 返回 dict 类型的响应体
# <class 'dict'>

1.3.2 编码问题

#编码问题
import requests
response=requests.get('http://www.autohome.com/news/')
# response.encoding='gbk' #汽车之家网站返回的页面内容为gb2312编码的，而requests的默认编码为iso-8859-1，如果不设置成gbk则中文乱码
print(response.text)

上一篇： ai绘制的图形怎么添加到符号面板?

下一篇： python中函数的参数，返回值，变量，和递归等知识讲解

爬虫(四)：requests模块

1. requests模块

1.1 requests简介

1.2 requests请求

1.2.1 get请求

1.2.2 post请求

1.3 requests响应

1.3.1 response属性

1.3.2 编码问题

从零学习node.js之简易的网络爬虫（四）

python爬虫常用的模块分析

Python常用模块之requests模块用法分析

Python使用lxml模块和Requests模块抓取HTML页面的教程

requests库爬虫如何设置代理ip

Python3使用requests模块实现显示下载进度的方法详解

AngularJS标准Web业务流程开发框架-4.AngularJS四大模块之一：Controller

Python HTML解析模块HTMLParser用法分析【爬虫工具】

爬虫入门：requests库初步理解

[Abp 源码分析]四、模块配置