欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Python笔记 之 使用urllib.request下载网页

程序员文章站 2022-05-03 21:34:07
...

Python代码

代码下载

import os,urllib.request,urllib.parse

defaultsite = 'blog.csdn.net'
defaultpage = '/weixin_50648794?spm=1010.2135.3001.5343'
defaultlines = 20

def getfile():
    # 等待输入
    site = input('Enter host => ').strip()
    if not site:
        site = defaultsite
    page = input('Enter page => ').strip()
    if not page:
        page = defaultpage

    # 拼接URL并解析路径获取文件名
    url = 'https://%s%s' % (site, page)
    print('URL : ', url)
    loadfile = input('Enter filename(default directory "./") => ').strip()
    if loadfile:
        loadfile = './' + loadfile
    else:
        (scheme,server,path,parms,query,frag) = urllib.parse.urlparse(url)
    print('Scheme :',scheme,'\n',
          'Server :',server,'\n',
          'Path :',path,'\n',
          'Parms :',path,'\n',
          'Query :',query,'\n',
          'Frag :',frag)
    loadfile = os.path.split(path)[-1]
    loadfile = urllib.parse.quote(loadfile)
    print('Site :',site,'\n',
          'Page :',page,'\n',
          'File :',loadfile)

    #缓存页面文件
    urllib.request.urlretrieve(url,loadfile)
    lines = open(loadfile,'rb').readlines()

    for line in lines[:defaultlines]:
        print(line.decode('utf-8'))

if __name__ == '__main__':
    getfile()

运行结果

Enter host => 
Enter page => 
URL :  https://blog.csdn.net/weixin_50648794?spm=1010.2135.3001.5343
Enter filename(default directory "./") => 
Scheme : https 
 Server : blog.csdn.net 
 Path : /weixin_50648794 
 Parms : /weixin_50648794 
 Query : spm=1010.2135.3001.5343 
 Frag : 
Site : blog.csdn.net 
 Page : /weixin_50648794?spm=1010.2135.3001.5343 
 File : weixin_50648794


<!DOCTYPE html>

<html lang="zh-CN">

<head>

    <meta charset="utf-8">

    <link rel="canonical" href="https://blog.csdn.net/weixin_50648794"/>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

    <meta name="renderer" content="webkit"/>

    <meta name="force-rendering" content="webkit"/>

    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/>

    <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no">

    <meta name="report" content='{"pid": "blog", "spm":"1001.2014"}'>

    <meta name="referrer" content="always">

    <meta http-equiv="Cache-Control" content="no-siteapp" /><link rel="alternate" media="handheld" href="#" />

    <meta name="shenma-site-verification" content="5a59773ab8077d4a62bf469ab966a63b_1497598848">

    <meta name="applicable-device" content="pc">

    <link  href="https://g.csdnimg.cn/static/logo/favicon32.ico"  rel="shortcut icon" type="image/x-icon" />

    <title>VVgetsomeair的博客_CSDN博客-python笔记,Oracle笔记,Linux笔记领域博主</title>

    <meta name="description" content="VVgetsomeair擅长python笔记,Oracle笔记,Linux笔记,等方面的知识">

    <script src='//g.csdnimg.cn/tingyun/1.8.3/blog.js' type='text/javascript'></script>