BeautifulSoup中find和find_all的使用详解

程序员文章站 2022-04-10 13:50:10

爬虫利器beautifulsoup中find和find_all的使用方法二话不说，先上段html例子使用beautifulsoup前需要先构建beautifulsoup实例需要注意的是，导入对的模块需...

爬虫利器beautifulsoup中find和find_all的使用方法

二话不说，先上段html例子

使用beautifulsoup前需要先构建beautifulsoup实例

需要注意的是，导入对的模块需要事先安装，此处导入的lxml事先已经安装。可以导入的模块可通过查询beautifulsoup的文档查看

BeautifulSoup中find和find_all的使用详解

接下来是find和find_all的介绍

1. find
只返回第一个匹配到的对象
语法：

BeautifulSoup中find和find_all的使用详解

参数：

参数名	作用
name	查找标签
text	查找文本
attrs	基于attrs参数

例子：

运行结果：

find_li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li.text(返回标签的内容): first item
li.attrs(返回标签的属性): {'id': 'flask', 'class': ['item-0']}
li.string(返回标签内容为字符串): first item

find也可以通过‘属性=值'的方法进行匹配

需要注意的是，因为class是python的保留关键字，若要匹配标签内class的属性，需要特殊的方法，有以下两种：

在attrs属性用字典的方式进行参数传递
beautifulsoup自带的特别关键字class_

运行结果

findclass: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

beautifulsoup_class_: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

2. find_all

返回所有匹配到的结果，区别于find（find只返回查找到的第一个结果）

语法：

BeautifulSoup中find和find_all的使用详解

参数名	作用
name	查找标签
text	查找文本
attrs	基于attrs参数

与find一样的语法

上代码

运行结果：

---
匹配到的li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li的内容: first item
li的属性: {'id': 'flask', 'class': ['item-0']}
---
匹配到的li: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
li的内容: second item
li的属性: {'class': ['item-1']}
---
匹配到的li: <li cvlass="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
li的内容: third item
li的属性: {'cvlass': 'item-inactie'}
---
匹配到的li: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
li的内容: fourth item
li的属性: {'class': ['item-1']}
---
匹配到的li: <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
</li>
li的内容: fifth item

附上比较灵活的find_all查询方法：

运行结果：

最灵活的查找方法: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
最灵活的查找方法: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>

完整代码：

到此这篇关于beautifulsoup中find和find_all的使用详解的文章就介绍到这了,更多相关beautifulsoup find和find_all内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持！

上一篇： BeautifulSoup获取指定class样式的div的实现

下一篇： Python hashlib和hmac模块使用方法解析

BeautifulSoup中find和find_all的使用详解

详解Linux中PostgreSQL和PostGIS的安装和使用

详解java中的深拷贝和浅拷贝（clone()方法的重写、使用序列化实现真正的深拷贝）

对python中的argv和argc使用详解

详解Linux中查找目录和文件的find和locate命令

PHP中register_globals参数为OFF和ON的区别（register_globals 使用详解）

android中DatePicker和TimePicker的使用方法详解

详解Python中列表和元祖的使用方法

详解Python中find()方法的使用

Linux命令中Ctrl+z、Ctrl+c和Ctrl+d的区别和使用详解

ES6中Array.find()和findIndex()函数的用法详解