详解用python的BeautifulSoup分析html方法

程序员文章站 2022-05-22 07:54:05

...

1) 搜索tag：

find(tagname) # 直接搜索名为tagname的tag 如：find('head')
find(list) # 搜索在list中的tag，如: find(['head', 'body'])
find(dict) # 搜索在dict中的tag，如:find({'head':True, 'body':True})
find(re.compile('')) # 搜索符合正则的tag, 如:find(re.compile('^p')) 搜索以p开头的tag
find(lambda) # 搜索函数返回结果为true的tag, 如:find(lambda name: if len(name) == 1) 搜索长度为1的tag
find(True) # 搜索所有tag

2) 搜索文字（text）

3) recursive, limit:

from bs4 import BeautifulSoup
import re
 
doc = ['<html><head><title>Page title</title></head>',
       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
       '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
       '</html>']
soup = BeautifulSoup(''.join(doc))
 
print soup.prettify()+"\n"
print soup.findAll('b')
 
print soup.findAll(text=re.compile("paragraph"))
print soup.findAll(text=True)
print soup.findAll(text=lambda(x):len(x)<12)
 
a = soup.findAll(re.compile('^b'))
print [tag.name for tag in a]
 
print [tag.name for tag in soup.html.findAll()]
print [tag.name for tag in soup.html.findAll(recursive=False)]
 
print soup.findAll('p',limit=1)

以上就是详解用python的BeautifulSoup分析html方法的详细内容，更多请关注其它相关文章！

详解用python的BeautifulSoup分析html方法

python处理html转义字符的方法详解

对python_discover方法遍历所有执行的用例详解

python使用BeautifulSoup分析网页信息的方法

python3对拉勾数据进行可视化分析的方法详解

详解用ELK来分析Nginx服务器日志的方法

对python_discover方法遍历所有执行的用例详解

Python实现将HTML转成PDF的方法分析

详解用Python处理HTML转义字符的5种方式

python使用BeautifulSoup分析网页信息的方法

Python数据分析的八种处理缺失值方法详解