利用PyQuery获取HTML指定标签内容_html/css_WEB-ITnose

程序员文章站 2022-04-06 18:50:01

...

安装

sudo pip install pyquery

例子

from pyquery import PyQueryimport urllib2page = urllib2.urlopen("http://www.lzu.edu.cn")text = unicode(page.read(), "utf-8")doc = PyQuery(text)for event in doc('.r li'):    event = PyQuery(event)    #loc = event.find('.h').text()    time = event.text().encode('utf-8')    #name = event.find('title').text()    #print 'name: %s' % name    print '名字 : %s' % time    #print 'location : %s' % loc    print '----------------------'

注意event里是unicode，在内存中运算的一定是固定2字节的unicode，存储要转为变字节的utf-8。

当然还有别的模块也可以用，如

#!/usr/bin/env python#-*- coding: utf8 -*-from HTMLParser import HTMLParserfrom htmlentitydefs import name2codepointimport urllib2class MyHTMLParser(HTMLParser):    def __init__(self):        HTMLParser.__init__(self)        self._flag = ''    def handle_starttag(self, tag, attrs):        if tag == 'h3' and attrs.__contains__(('class','event-title')):            self._flag = 'event-title'        if tag == 'time':            self._flag = 'time'        if tag == 'span' and attrs.__contains__(('class','event-location')):            self._flag = 'event-location'    def handle_data(self, data):        if self._flag == 'event-title':            print '会议名称: %s' %data            self._flag = ''        #if self._flag == 'time':        #   print '会议时间： %s' %data        if self._flag == 'event-location':            print '会议地点: %s' %data            print '-------------------'            self._flag = ''page = urllib2.urlopen('https://www.python.org/events/python-events/').read()parser = MyHTMLParser()parser.feed(page)

References

[1].http://www.douban.com/note/208670234/

[2].http://blog.csdn.net/mindmb/article/details/7898528

[3].http://pythonhosted.org/pyquery/api.html

相关标签：利用PyQuery获取HTML指定标签内容

上一篇： PHP设计模式：观察者模式观察者设计模式应用 java 观察者设计模式设计模式装饰者模

下一篇： ubuntu 代理服务器nginx安装 ubuntu15.04 linux ubuntu 16.04

利用PyQuery获取HTML指定标签内容_html/css_WEB-ITnose

安装

例子

References

php获取网页标题和内容函数(不包含html标签)

jquery删除指定的html标签并保留标签内文本内容的方法

php获取网页标题和内容函数(不包含html标签)

python的xpath获取div标签内html内容,实现innerhtml功能的方法

jQuery获取标签文本内容和html内容的方法教程

php获取html标签内容（php解析html的方法）

父页面获取子页面的内容_html/css_WEB-ITnose

怎么获取这个网页内容_html/css_WEB-ITnose

【紧急】我想问一下HTML的TITLE标签，里面的内容能填写多少个？有限制吗_html/css_WEB-ITnose

利用getComputedStyle方法获取元素css的属性值_html/css_WEB-ITnose