第十二章：互联网-urllib.request:网络资源访问--HTTP GET

程序员文章站 2022-07-14 08:45:05

...

12.2 urllib.request:网络资源访问
urllib.request模块提供了一个API来使用URL标识的Internet资源。各个应用可以扩展这个模块来支持新协议或者增加现有协议的变种(如处理HTTP基本认证)。

12.2.1 HTTP GET
说明：这些例子的测试服务器见http_server_GET.py。取自http.server模块的例子。在一个终端窗口启动这个服务器，再在另一个终端窗口中运行这些例子。

HTTP GET操作是urllib.request最简单的用法。通过将URL传递到urlopen()来得到远程数据的一个“类似文件”的句柄。

# http_server_GET.py
from http.server import BaseHTTPRequestHandler
from urllib import parse

class GetHandler(BaseHTTPRequestHandler):

    def do_GET(self):
        parsed_path = parse.urlparse(self.path)
        message_parts = [
            'CLIENT VALUES:',
            'client_address={} ({})'.format(
                self.client_address,
                self.address_string()),
            'command={}'.format(self.command),
            'path={}'.format(self.path),
            'real path={}'.format(parsed_path.path),
            'query={}'.format(parsed_path.query),
            'request_version={}'.format(self.request_version),
            '',
            'SERVER VALUES:',
            'server_version={}'.format(self.server_version),
            'sys_version={}'.format(self.sys_version),
            'protocol_version={}'.format(self.protocol_version),
            '',
            'HEADERS RECEIVED:',
            ]
        for name,value in sorted(self.headers.items()):
            message_parts.append(
                '{}={}'.format(name,value.rstrip())
                )
        message_parts.append('')
        message = '\r\n'.join(message_parts)
        self.send_response(200)
        self.send_header('Content-Type',
                         'text/plain;charset=utf-8')
        self.end_headers()
        self.wfile.write(message.encode('utf-8'))

if __name__ == '__main__':
    from http.server import HTTPServer

    server = HTTPServer(('localhost',8080),GetHandler)
    print('Starting server,use <Ctrl-C> to stop')
    server.serve_forever()

from urllib import request

response = request.urlopen('http://localhost:8080/')
print('RESPONSE:',response)
print('URL     :',response.geturl())

headers = response.info()
print('DATE    :',headers['date'])
print('HEADERS :')
print('---------')
print(headers)

data = response.read().decode('utf-8')
print('LENGTH  :',len(data))
print('DATA    :')
print('---------')
print(data)

这个示例服务器接收到来的值，格式化一个纯文本响应并发回客户。利用urlopen()的返回值，可以通过info()方法从HTTP服务器访问首部，还可以通过类似read()和readlines()等方法访问远程资源的相应数据。
运行结果：
第十二章：互联网-urllib.request:网络资源访问--HTTP GET

urlopen()返回的类似文件对象是可迭代的（iterable）。

from urllib import request

response = request.urlopen('http://localhost:8080/')
for line in response:
    print(line.decode('utf-8').rstrip())

这个例子在打印输出之前去除了末尾的换行符和回车。
运行结果：
第十二章：互联网-urllib.request:网络资源访问--HTTP GET

上一篇：最新版本的 Xcode 8+ Alcatraz 插件安装？

下一篇：第十二章：互联网-webbrowser:显示Web页面-窗口与标签页

第十二章：互联网-urllib.request:网络资源访问--HTTP GET

第十二章：互联网-urllib.request:网络资源访问--编码参数

第十二章：互联网-urllib.request:网络资源访问--HTTP GET

第十二章：互联网-urllib.request:网络资源访问--上传文件