python urllib库的使用详解

程序员文章站 2022-06-25 17:26:33

相关：urllib是python内置的http请求库，本文介绍urllib三个模块：请求模块urllib.request、异常处理模块urllib.error、url解析模块urllib.parse。...

相关：urllib是python内置的http请求库，本文介绍urllib三个模块：请求模块urllib.request、异常处理模块urllib.error、url解析模块urllib.parse。

1、请求模块：urllib.request

python2

import urllib2
response = urllib2.urlopen('http://httpbin.org/robots.txt')

python3

import urllib.request
res = urllib.request.urlopen('http://httpbin.org/robots.txt')
urllib.request.urlopen(url, data=none, [timeout, ]*, cafile=none, capath=none, cadefault=false, context=none)
urlopen()方法中的url参数可以是字符串，也可以是一个request对象

#url可以是字符串
import urllib.request

resp = urllib.request.urlopen('http://www.baidu.com')
print(resp.read().decode('utf-8'))  # read()获取响应体的内容，内容是bytes字节流，需要转换成字符串

##url可以也是request对象
import urllib.request

request = urllib.request.request('http://httpbin.org')
response = urllib.request.urlopen(request)
print(response.read().decode('utf-8'))

data参数：post请求

# coding:utf8
import urllib.request, urllib.parse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding='utf8')
resp = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(resp.read())

urlopen()中的参数timeout：设置请求超时时间：

# coding:utf8
#设置请求超时时间
import urllib.request

resp = urllib.request.urlopen('http://httpbin.org/get', timeout=0.1)
print(resp.read().decode('utf-8'))

响应类型：

# coding:utf8
#响应类型
import urllib.request

resp = urllib.request.urlopen('http://httpbin.org/get')
print(type(resp))

python urllib库的使用详解

响应的状态码、响应头：

# coding:utf8
#响应的状态码、响应头
import urllib.request

resp = urllib.request.urlopen('http://www.baidu.com')
print(resp.status)
print(resp.getheaders())  # 数组（元组列表）
print(resp.getheader('server'))  # "server"大小写不区分

200
[('bdpagetype', '1'), ('bdqid', '0xa6d873bb003836ce'), ('cache-control', 'private'), ('content-type', 'text/html'), ('cxy_all', 'baidu+b8704ff7c06fb8466a83df26d7f0ad23'), ('date', 'sun, 21 apr 2019 15:18:24 gmt'), ('expires', 'sun, 21 apr 2019 15:18:03 gmt'), ('p3p', 'cp=" oti dsp cor iva our ind com "'), ('server', 'bws/1.1'), ('set-cookie', 'baiduid=8c61c3a67c1281b5952199e456eec61e:fg=1; expires=thu, 31-dec-37 23:55:55 gmt; max-age=2147483647; path=/; domain=.baidu.com'), ('set-cookie', 'bidupsid=8c61c3a67c1281b5952199e456eec61e; expires=thu, 31-dec-37 23:55:55 gmt; max-age=2147483647; path=/; domain=.baidu.com'), ('set-cookie', 'pstm=1555859904; expires=thu, 31-dec-37 23:55:55 gmt; max-age=2147483647; path=/; domain=.baidu.com'), ('set-cookie', 'delper=0; path=/; domain=.baidu.com'), ('set-cookie', 'bdsvrtm=0; path=/'), ('set-cookie', 'bd_home=0; path=/'), ('set-cookie', 'h_ps_pssid=1452_28777_21078_28775_28722_28557_28838_28584_28604; path=/; domain=.baidu.com'), ('vary', 'accept-encoding'), ('x-ua-compatible', 'ie=edge,chrome=1'), ('connection', 'close'), ('transfer-encoding', 'chunked')]
bws/1.1

使用代理：urllib.request.proxyhandler()：

# coding:utf8
proxy_handler = urllib.request.proxyhandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib.request.proxybasicauthhandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')

opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
# this time, rather than install the openerdirector, we use it directly:
resp = opener.open('http://www.example.com/login.html')
print(resp.read())

2、异常处理模块：urllib.error

异常处理实例1：

# coding:utf8
from urllib import error, request

try:
    resp = request.urlopen('http://www.blueflags.cn')
except error.urlerror as e:
    print(e.reason)

python urllib库的使用详解

异常处理实例2：

# coding:utf8
from urllib import error, request

try:
    resp = request.urlopen('http://www.baidu.com')
except error.httperror as e:
    print(e.reason, e.code, e.headers, sep='\n')
except error.urlerror as e:
    print(e.reason)
else:
    print('request successfully')

python urllib库的使用详解

异常处理实例3：

# coding:utf8
import socket, urllib.request, urllib.error

try:
    resp = urllib.request.urlopen('http://www.baidu.com', timeout=0.01)
except urllib.error.urlerror as e:
    print(type(e.reason))
    if isinstance(e.reason,socket.timeout):
        print('time out')

python urllib库的使用详解

3、url解析模块：urllib.parse

parse.urlencode

# coding:utf8
from urllib import request, parse

url = 'http://httpbin.org/post'
headers = {
    'host': 'httpbin.org',
    'user-agent': 'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/72.0.3626.109 safari/537.36'
}
dict = {'name': 'germey'}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.request(url=url, data=data, headers=headers, method='post')
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))

{
"args": {},
"data": "",
"files": {},
"form": {
"name": "thanlon"
},
"headers": {
"accept-encoding": "identity",
"content-length": "12",
"content-type": "application/x-www-form-urlencoded",
"host": "httpbin.org",
"user-agent": "mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/72.0.3626.109 safari/537.36"
},
"json": null,
"origin": "117.136.78.194, 117.136.78.194",
"url": "https://httpbin.org/post"
}

add_header方法添加请求头：

# coding:utf8
from urllib import request, parse

url = 'http://httpbin.org/post'
dict = {'name': 'thanlon'}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.request(url=url, data=data, method='post')
req.add_header('user-agent',
               'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/72.0.3626.109 safari/537.36')
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))

parse.urlparse：

# coding:utf8
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=1#comment')
print(type(result))
print(result)

<class 'urllib.parse.parseresult'>
parseresult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=1', fragment='comment')

from urllib.parse import urlparse

result = urlparse('www.baidu.com/index.html;user?id=1#comment', scheme='https')
print(type(result))
print(result)

<class 'urllib.parse.parseresult'>
parseresult(scheme='https', netloc='', path='www.baidu.com/index.html', params='user', query='id=1', fragment='comment')

# coding:utf8
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=1#comment', scheme='https')
print(result)

parseresult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=1', fragment='comment')

# coding:utf8
from urllib.parse import urlparse

result = urlparse('http://www.baidu.com/index.html;user?id=1#comment',allow_fragments=false)
print(result)

parseresult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=1', fragment='comment')

parse.urlunparse：

# coding:utf8
from urllib.parse import urlunparse

data = ['http', 'www.baidu.com', 'index.html', 'user', 'name=thanlon', 'comment']
print(urlunparse(data))

python urllib库的使用详解

parse.urljoin：

# coding:utf8
from urllib.parse import urljoin

print(urljoin('http://www.bai.com', 'index.html'))
print(urljoin('http://www.baicu.com', 'https://www.thanlon.cn/index.html'))#以后面为基准

python urllib库的使用详解

urlencode将字典对象转换成get请求的参数:

# coding:utf8
from urllib.parse import urlencode

params = {
    'name': 'thanlon',
    'age': 22
}
baseurl = 'http://www.thanlon.cn?'
url = baseurl + urlencode(params)
print(url)

python urllib库的使用详解

4、cookie

cookie的获取(保持登录会话信息)：

# coding:utf8
#cookie的获取(保持登录会话信息)
import urllib.request, http.cookiejar

cookie = http.cookiejar.cookiejar()
handler = urllib.request.httpcookieprocessor(cookie)
opener = urllib.request.build_opener(handler)
res = opener.open('http://www.baidu.com')
for item in cookie:
    print(item.name + '=' + item.value)

python urllib库的使用详解

mozillacookiejar(filename)形式保存cookie

# coding:utf8
#将cookie保存为cookie.txt
import http.cookiejar, urllib.request

filename = 'cookie.txt'
cookie = http.cookiejar.mozillacookiejar(filename)
handler = urllib.request.httpcookieprocessor(cookie)
opener = urllib.request.build_opener(handler)
res = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=true, ignore_expires=true)

lwpcookiejar(filename)形式保存cookie：

# coding:utf8
import http.cookiejar, urllib.request

filename = 'cookie.txt'
cookie = http.cookiejar.lwpcookiejar(filename)
handler = urllib.request.httpcookieprocessor(cookie)
opener = urllib.request.build_opener(handler)
res = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=true, ignore_expires=true)

读取cookie请求，获取登陆后的信息

# coding:utf8
import http.cookiejar, urllib.request

cookie = http.cookiejar.lwpcookiejar()
cookie.load('cookie.txt', ignore_discard=true, ignore_expires=true)
handler = urllib.request.httpcookieprocessor(cookie)
opener = urllib.request.build_opener(handler)
resp = opener.open('http://www.baidu.com')
print(resp.read().decode('utf-8'))

以上就是python urllib库的使用详解的详细内容，更多关于python urllib库的资料请关注其它相关文章！

python urllib库的使用详解

1、请求模块：urllib.request

data参数：post请求

urlopen()中的参数timeout：设置请求超时时间：

响应类型：

响应的状态码、响应头：

使用代理：urllib.request.proxyhandler()：

2、异常处理模块：urllib.error

异常处理实例1：

异常处理实例2：

异常处理实例3：

3、url解析模块：urllib.parse

parse.urlencode

add_header方法添加请求头：

parse.urlparse：

parse.urlunparse：

parse.urljoin：

urlencode将字典对象转换成get请求的参数:

4、cookie

cookie的获取(保持登录会话信息)：

mozillacookiejar(filename)形式保存cookie

lwpcookiejar(filename)形式保存cookie：

读取cookie请求，获取登陆后的信息

基于JSP编译器基本语法的使用详解

PHP使用JPGRAPH制作圆柱图的方法详解

python开发之IDEL(Python GUI)的使用方法图文详解

详解Python中的new、init、call三个特殊方法

Python的Django框架中使用SQLAlchemy操作数据库的教程

详解duck typing鸭子类型程序设计与Python的实现示例

python线程的几种创建方式详解

Python中的descriptor描述器简明使用指南

使用虚拟环境打包python为exe 文件的方法

Python搭建APNS苹果推送通知推送服务的相关模块使用指南

python urllib库的使用详解

1、请求模块：urllib.request

data参数：post请求

urlopen()中的参数timeout：设置请求超时时间：

响应类型：

响应的状态码、响应头：

使用代理：urllib.request.proxyhandler()：

2、异常处理模块：urllib.error

异常处理实例1：

异常处理实例2：

异常处理实例3：

3、url解析模块：urllib.parse

parse.urlencode

add_header方法添加请求头：

parse.urlparse：

parse.urlunparse：

parse.urljoin：

urlencode将字典对象转换成get请求的参数:

4、cookie

cookie的获取(保持登录会话信息)：

mozillacookiejar(filename)形式保存cookie

lwpcookiejar(filename)形式保存cookie：

读取cookie请求，获取登陆后的信息

基于JSP编译器基本语法的使用详解

PHP使用JPGRAPH制作圆柱图的方法详解

python开发之IDEL(Python GUI)的使用方法图文详解

详解Python中的__new__、__init__、__call__三个特殊方法

Python的Django框架中使用SQLAlchemy操作数据库的教程

详解duck typing鸭子类型程序设计与Python的实现示例

python线程的几种创建方式详解

Python中的descriptor描述器简明使用指南

使用虚拟环境打包python为exe 文件的方法

Python搭建APNS苹果推送通知推送服务的相关模块使用指南

详解Python中的new、init、call三个特殊方法