python htmllib.HTMLParser处理A标签

程序员文章站 2022-06-10 23:14:02

...

#!/usr/bin/python
#encoding='utf-8'
import htmllib,urllib,formatter,string
'''''
import chardet,sys
type = sys.getdefaultencoding()
'''
class GetLinks(htmllib.HTMLParser):
    def __init__(self):
        self.links = {}
        f = formatter.NullFormatter()
        htmllib.HTMLParser.__init__(self, f)

    def anchor_bgn(self, href, name, type):
        self.save_bgn()
        self.link = href

    def anchor_end(self):
        text = string.strip(self.save_end())
        if self.link and text:
            self.links[text] = self.link#self.links.get(text, []) + [self.link]
            #print self.links
            #exit()
fp = urllib.urlopen("http://www.baidu.com")
data = fp.read()
fp.close()

linkdemo = GetLinks()
linkdemo.feed(data)
linkdemo.close()

for href, link in linkdemo.links.items():
    print href, "=>", link

python htmllib.HTMLParser处理A标签

Python 数据处理库 pandas进阶教程

Python3 处理JSON的实例详解

Python 2.7中文显示与处理方法

python处理PHP数组文本文件实例

Python+OpenCV图片局部区域像素值处理改进版详解

python训练数据时打乱训练数据与标签的两种方法小结

初学python数组的处理代码

python爬虫系列：三、URLError异常处理

Python处理Excel数据

python实现感知器算法（批处理）