xpath解析数据（爬取全国城市名称）

程序员文章站 2022-05-07 23:09:28

...

目标网站：https://www.aqistudy.cn/historydata/

# 开发时间：2020/12/27 22:00
# 开发工具：PyCharm
# 开发者：Friday
# 网址 https://www.aqistudy.cn/historydata/
import requests
from lxml import etree

if __name__ == "__main__":
    headers = {
        'Referer': 'http://pic.netbian.com/4kmeinv/index_2.html',
        'user_agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
    }
    url = 'https://www.aqistudy.cn/historydata/'
    response = requests.get(url = url, headers = headers)
    page_text = response.text
    tree = etree.HTML(page_text)
    #方法一：
    # # 热门城市
    # host_city_list = tree.xpath('//div[@class="bottom"]/ul/li')
    # host_name_list = []
    # for li in  host_city_list:
    #     host_name = li.xpath('./a/text()')[0]
    #     host_name_list.append(host_name)
    # # print(host_name_list)
    #
    # #1.
    # # all_city_list = []
    # # all_city_ul_list = tree.xpath('//div[@class="bottom"]/ul')
    # # for ul in all_city_ul_list:
    # #     get_li_list = ul.xpath('./div/li')
    # #     for li in get_li_list:
    # #         name = li.xpath('./a/text()')[0]
    # #         host_name_list.append(name)
    # #2.
    # # all_city_li = tree.xpath('//div[@class="bottom"]/ul/div[2]/li')
    # # for li in all_city_li:
    # #     name = li.xpath('./a/text()')[0]
    # #     host_name_list.append(name)
    # print(host_name_list)
    # print(len(host_name_list))

    #方法二：
    a_list = tree.xpath('//div[@class="bottom"]/ul/li/a | //div[@class="bottom"]/ul/div[2]/li/a')
    all_city_names = []
    for a in a_list:
        city_name = a.xpath('./text()')[0]
        all_city_names.append(city_name)
    print(all_city_names)
    print(len(all_city_names))

总结：查看网页的代码结构，比较容易想到的就是进行两次xpath解析，分别获取“热门城市”和“全部城市”的li标签，但仔细思考，还是可以进一步优化的，由于我们要爬取的城市名称都在a标签下，所以我们可以利用xpath同时解析出两者所对应的a标签，然后再统一操作。

上一篇： Java日历类Calendar的简单使用

下一篇： python 网络爬虫第三章-爬取*（2）

xpath解析数据（爬取全国城市名称）

python使用XPath解析数据爬取起点小说网数据

xpath解析数据（爬取全国城市名称）

python爬虫之xpath案例——全国城市名称爬取

python使用XPath解析数据爬取起点小说网数据

Python爬虫学习第三章-4.3-使用xpath解析爬取全国城市名称