Python笔记之使用urllib.request下载网页

程序员文章站 2022-05-03 21:34:07

...

Python代码

import os,urllib.request,urllib.parse

defaultsite = 'blog.csdn.net'
defaultpage = '/weixin_50648794?spm=1010.2135.3001.5343'
defaultlines = 20

def getfile():
    # 等待输入
    site = input('Enter host => ').strip()
    if not site:
        site = defaultsite
    page = input('Enter page => ').strip()
    if not page:
        page = defaultpage

    # 拼接URL并解析路径获取文件名
    url = 'https://%s%s' % (site, page)
    print('URL : ', url)
    loadfile = input('Enter filename(default directory "./") => ').strip()
    if loadfile:
        loadfile = './' + loadfile
    else:
        (scheme,server,path,parms,query,frag) = urllib.parse.urlparse(url)
    print('Scheme :',scheme,'\n',
          'Server :',server,'\n',
          'Path :',path,'\n',
          'Parms :',path,'\n',
          'Query :',query,'\n',
          'Frag :',frag)
    loadfile = os.path.split(path)[-1]
    loadfile = urllib.parse.quote(loadfile)
    print('Site :',site,'\n',
          'Page :',page,'\n',
          'File :',loadfile)

    #缓存页面文件
    urllib.request.urlretrieve(url,loadfile)
    lines = open(loadfile,'rb').readlines()

    for line in lines[:defaultlines]:
        print(line.decode('utf-8'))

if __name__ == '__main__':
    getfile()

运行结果

Enter host => 
Enter page => 
URL :  https://blog.csdn.net/weixin_50648794?spm=1010.2135.3001.5343
Enter filename(default directory "./") => 
Scheme : https 
 Server : blog.csdn.net 
 Path : /weixin_50648794 
 Parms : /weixin_50648794 
 Query : spm=1010.2135.3001.5343 
 Frag : 
Site : blog.csdn.net 
 Page : /weixin_50648794?spm=1010.2135.3001.5343 
 File : weixin_50648794


<!DOCTYPE html>

<html lang="zh-CN">

<head>

    <meta charset="utf-8">

    <link rel="canonical" href="https://blog.csdn.net/weixin_50648794"/>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

    <meta name="renderer" content="webkit"/>

    <meta name="force-rendering" content="webkit"/>

    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/>

    <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no">

    <meta name="report" content='{"pid": "blog", "spm":"1001.2014"}'>

    <meta name="referrer" content="always">

    <meta http-equiv="Cache-Control" content="no-siteapp" /><link rel="alternate" media="handheld" href="#" />

    <meta name="shenma-site-verification" content="5a59773ab8077d4a62bf469ab966a63b_1497598848">

    <meta name="applicable-device" content="pc">

    <link  href="https://g.csdnimg.cn/static/logo/favicon32.ico"  rel="shortcut icon" type="image/x-icon" />

    <title>VVgetsomeair的博客_CSDN博客-python笔记,Oracle笔记,Linux笔记领域博主</title>

    <meta name="description" content="VVgetsomeair擅长python笔记,Oracle笔记,Linux笔记,等方面的知识">

    <script src='//g.csdnimg.cn/tingyun/1.8.3/blog.js' type='text/javascript'></script>

上一篇：爬虫请求 urllib.request 基础用法

下一篇： Java字节码增强探秘

Python笔记之使用urllib.request下载网页

Python代码

运行结果

Python学习笔记之os模块使用总结

Python ORM框架SQLAlchemy学习笔记之映射类使用实例和Session会话介绍

零基础写python爬虫之使用urllib2组件抓取网页内容

使用Python3爬虫抓取网页来下载小说

Python学习笔记（八）—使用正则获取网页中所需要的信息。

Python学习笔记之Python迭代器正确使用方法详解

Python 工具之 Scrapy 环境搭建（Twisted插件下载安装），以及 Scrapy 框架的简单使用说明

使用Python3爬虫抓取网页来下载小说

Python笔记之使用urllib.request下载网页

Python笔记之使用urllib.request访问网页

Python笔记 之 使用urllib.request下载网页

Python代码

运行结果

Python学习笔记之os模块使用总结

Python ORM框架SQLAlchemy学习笔记之映射类使用实例和Session会话介绍

零基础写python爬虫之使用urllib2组件抓取网页内容

使用Python3爬虫抓取网页来下载小说

Python学习笔记（八）—使用正则获取网页中所需要的信息。

Python学习笔记之Python迭代器正确使用方法详解

Python 工具 之 Scrapy 环境搭建（Twisted插件下载安装），以及 Scrapy 框架的简单使用说明

使用Python3爬虫抓取网页来下载小说

Python笔记 之 使用urllib.request下载网页

Python笔记 之 使用urllib.request访问网页

Python笔记之使用urllib.request下载网页

Python 工具之 Scrapy 环境搭建（Twisted插件下载安装），以及 Scrapy 框架的简单使用说明

Python笔记之使用urllib.request下载网页

Python笔记之使用urllib.request访问网页