python实现pyquery网络爬虫

程序员文章站 2022-05-08 18:29:14

...

PyQuery是强大而又灵活的网页解析库，正则写起来太麻烦，BeautifulSoup语法太难记，如果你熟悉jQuery的语法那么，PyQuery就是你绝佳的选择。

代码实现

'''
部分源码
<tr class="mBg2"onmouseover="jQuery(this).addClass('bgon');
"οnmοuseοut="jQuery(this).removeClass('bgon');">
	<td class="wd1">1</td>
	<td class="wd2 bOS">
		<a target="_blank" href="http://xf.house.163.com/gz/0SJN.html">南沙心意华庭</a></td>
	<td class="wd3"><a href="#" onclick="gotrend('南沙心意华庭')"></a></td>
	<td class="wd4">南沙</td>
	<td class="wc5 bgOnS">4</td>
	<td class="wc6">425</td>
	<td class="wc7">--</td>
	<td class="wc8">--</td>
	<td class="wc9">658</td>
	<td class="wc10">54202</td>
	<td class="wc11">18</td>
	<td class="wc12">1944</td>
'''
from pyquery import PyQuery as pq
while True:
    url ='http://data.house.163.com/'
    for j in range(3):  # 自定义重新运行次数，最多重复运行3次
        try:
            doc = pq(url=url, encoding='utf-8')  # 获取html，若运行时长超出，则重新运行
            break
        except:
            continue
    if doc('.pager_a.next-page') == []:   # 判断下一页是否存在
        break
    else:
        name = [i.text() for i in doc('.wd2.bOS a').items()]   # 寻找class节点
        place = [i.text() for i in doc('.wd4').items()]
        number1 = [i.text() for i in doc('.wc5.bgOnS').items()][1:]
        area1 = [i.text() for i in doc('.wc6').items()][1:]
        number2 = [i.text() for i in doc('.wc9').items()][1:]
        area2 = [i.text() for i in doc('.wc10').items()][1:]
        number3 = [i.text() for i in doc('.wc11').items()][1:]
        area3 = [i.text() for i in doc('.wc12').items()][1:]
        quhua = [i.text() for i in doc('.wd14').items()]

pyquery和BeautifulSoup的使用方法差不多，pyquery也可以结合requests使用

python实现pyquery网络爬虫

代码实现

python实现爬虫统计学校BBS男女比例之数据处理（三）

python实现博客文章爬虫示例

基于Python实现的百度贴吧网络爬虫实例

Python爬虫爬验证码实现功能详解

【Python3爬虫】自动查询天气并实现语音播报

Python网络爬虫项目：内容提取器的定义

python实现简单爬虫功能的示例

《精通Python网络爬虫》第18章　博客类爬虫项目代码

Python实现网络端口转发和重定向的方法

java实现一个简单的网络爬虫代码示例

python实现pyquery网络爬虫

代码实现

python实现爬虫统计学校BBS男女比例之数据处理（三）

python实现博客文章爬虫示例

基于Python实现的百度贴吧网络爬虫实例

Python爬虫爬验证码实现功能详解

【Python3爬虫】自动查询天气并实现语音播报

Python网络爬虫项目：内容提取器的定义

python实现简单爬虫功能的示例

《精通Python网络爬虫》第18章 博客类爬虫项目代码

Python实现网络端口转发和重定向的方法

java实现一个简单的网络爬虫代码示例

《精通Python网络爬虫》第18章　博客类爬虫项目代码