python爬虫之HTMLParser讲解

程序员文章站 2022-03-29 20:42:36

HTMLParser 需要手动下载markupbase.py放到libs里每读到一个标签处理一下 from HTMLParser import HTMLParser #...

HTMLParser

需要手动下载markupbase.py放到libs里每读到一个标签处理一下

from HTMLParser import HTMLParser
# markupbase

class MyParser(HTMLParser):
    def handle_decl(self, decl):
        HTMLParser.handle_decl(self, decl)
        print('decl %s' % decl)

    def handle_starttag(self, tag, attrs):
        HTMLParser.handle_starttag(self, tag, attrs)
        print('<' + tag + '>')

    def handle_endtag(self, tag):
        HTMLParser.handle_endtag(self, tag)
        print('')

    def handle_data(self, data):
        HTMLParser.handle_data(self, data)
        print('data %s' % data)

    #

    def handle_startendtag(self, tag, attrs):
        HTMLParser.handle_startendtag(self, tag, attrs)

    def handle_comment(self, data):
        HTMLParser.handle_comment(self, data)
        print('data %s' % data)

    def close(self):
        HTMLParser.close(self)
        print('Close')

demo = MyParser()
demo.feed(open('test.html').read())
demo.close()

python爬虫之HTMLParser讲解

上一篇： PHP变量在zend内核中的存储方式

下一篇：三.python程序控制结构

python爬虫之HTMLParser讲解

Python 网络爬虫--关于简单的模拟登录实例讲解

Python3之简单搭建自带服务器的实例讲解

Python基础之文件读取的讲解

零基础写python爬虫之抓取百度贴吧代码分享

零基础写python爬虫之爬虫的定义及URL构成

零基础写python爬虫之HTTP异常处理

python3爬虫之设计签名小程序

python爬虫之Scrapy介绍（模拟登录）

Python反爬虫技术之防止IP地址被封杀的讲解

python实现爬虫统计学校BBS男女比例之数据处理（三）