python爬虫-Requests库

程序员文章站 2022-07-14 11:18:44

...

Requests库官方中文参考手册

点我跳转

Requests库安装

只要在控制台上输入
Windows系统：pip install requests
Linux系统：sudo pip install requests
我用的是VS，所以用的这个控制台

python爬虫-Requests库

如果你也出现了拒绝访问这种情况，你只需要把拒绝访问的这个文件夹获取管理员权限，然后再尝试安装几次（因为我获取后再安装还是失败，然后再试了一次才行的），安装成功还要重新启动编译器才行。

python爬虫-Requests库

获取网页源码框架

import requests
def getHTMLText(url):
    try:
        r = requests.get(url,timeout=30)
        print(r.status_code)
        r.raise_for_status() #如果返回的状态码不是200，就会产生异常
        r.encoding = r.apparent_encoding
        return r.text
    except : 
        return "产生异常"

print(getHTMLText("http://www.baidu.com"))

Response对象

属性	说明
r.status_code	HTTP请求返回状态，200表示连接成功，404表示失败
r.text	HTTP响应的字符串内容，也即是网页源码
r.encoding	根据HTTP响应头来提取的编码（如果没有就使用默认的ISO编码）当然r.text属性的编码是和这个编码一致的，如果我们想要改变r.text的编码，就改这个属性就行了
r.apparent_encoding	根据内容分析出来推荐使用的编码（备选编码）
r.content	HTTP响应的二进制内容，如果获取的是图片，那么应该读取这个属性
r.request	我们请求服务器的request对象，可以用r.request.headers查看我们的请求头

方法	说明
r.raise_for_status()	如果返回的状态码不是200，就会产生异常

Requests库的7个主要方法

方法	说明	返回值
requests.request()	构造一个请求,以下各方法都是用这个方法封装的	返回response对象
requests.get()	获取HTML网页的主要方法，对应于HTTP的GET方法	返回response对象
requests.head()	获取网页头信息的方法，对应于HTTP的HEAD方法	返回response对象
requests.post()	向网页提交POST请求，对应于HTTP的POST方法	返回response对象
requests.put()	向网页提交put覆盖当前资源的请求，对应于HTTP的PUT方法	返回response对象
requests.patch()	向网页提交局部修改请求，对应于HTTP的PATCH方法	返回response对象
requests.delete()	向网页删除当前资源的请求，对应于HTTP的DELETE方法	返回response对象

rquest.get()方法获取网页源码

代码

import requests
r = requests.get("http://www.baidu.com")
print("请求响应代码：{0}".format(r.status_code))
print("头编码：{0}".format(r.encoding))
print("备选编码：{0}".format(r.apparent_encoding))
r.encoding = r.apparent_encoding #因为百度的响应编码不利于人查看所以把编码改成备选编码方便查看
print(r.text)

运行结果

请求响应代码：200
头编码：ISO-8859-1
备选编码：utf-8
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下，你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使用百度前必读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

requests.request()方法详解

由于Requests库里的7个主要方法中，有6个都是用request方法封装的，所以主要介绍一下request方法，其它方法就能明白了。
函数原型：requests.request(method,url,**kwargs)
method参数:请求方式，GET/POST/HEAD/PUT/DELETE/PATCH/OPTTONS
url数：访问网址

**kwarags可选参数	说明
params	字典或者字节序列，作为参数连接到url里
data	字典/字节序列/文件对象，作为Request的内容
json	以json格式发送的数据（可以把字典赋值给这个参数）
headers	字典，HTTP协议头
cookies	字典或者CookieJar,Request中的cookie
auth	元组:支持HTTP认证功能
files	字典类型，传输文件(字典的参数值对应一个文件对象，可传输多个文件)
timeout	设置超时时间，单位为秒
proxies	字典类型，设置代理服务器（字典是为了对应不同的协议(http/https等)设置不同的代理服务器）
allow_redirects	True/False,重定向开关，默认为True
stream	True/False,获取内容立即下载开关，默认为True
cert	本地SSL证书路径

以上学习内容来自慕课网嵩天老师的Python网络爬虫与信息提取课程
课程资料（内涵**kwarags可选参数的使用代码示例）
密码：dvrr

python爬虫-Requests库

Requests库官方中文参考手册

Requests库安装

获取网页源码框架

Response对象

Requests库的7个主要方法

rquest.get()方法获取网页源码

requests.request()方法详解

Python爬虫使用selenium爬取qq群的成员信息（全自动实现自动登陆）

【Python必学】Python爬虫反爬策略你肯定不会吧？

python爬虫系列：三、URLError异常处理

用于业余项目的8个优秀Python库

python爬虫之自动登录与验证码识别

Ubuntu18.04一次性升级Python所有库的方法步骤

Python17之函数、类、模块、包、库

Python_WIN10系统中递归所有文件夹所有文件_移动所有文件到主目录（使用到的库：os + glob + shutil）

Python实现mysql数据库更新表数据接口的功能

Python基于Hypothesis测试库生成测试数据