python爬虫随笔：动态渲染页面爬取之新浪股票1小时内10大热门股票

程序员文章站 2022-03-03 09:08:05

...

目标：爬取新浪股票首页最近一小时十大热门股票

数据分析：
1、查看网页源代码，发现找不到数据
2、network分析数据，发现几乎全部是js代码动态渲染
因此决定采用selenium工具来进行数据爬取。

环境准备：
1、pip install selenium
2、下载chrome的驱动程序放到python的Scripts目录下 http://chromedriver.storage.googleapis.com/index.html
测试chrome及python是否全部ok，如果弹出空白的chrome浏览器，则证明所有配置没有问题
brower = webdriver.Chrome()

from selenium import webdriver

try:
    brower = webdriver.Chrome()             #声明浏览器对象
    brower.get('https://finance.sina.com.cn/stock/')
    stocks = brower.find_elements_by_xpath('//li[@class="xh_hotstock_item"]')
    for s in stocks:
        print("股票名称是：",s.find_element_by_class_name("list02_name").text)
        print("股票代码是：", s.get_attribute("data-code"))
        print("股票价格是：",s.find_element_by_class_name("list02_diff").text)
        print("股票涨幅是：", s.find_element_by_class_name("list02_chg").text)
        print("股票网页是：", s.find_element_by_class_name("list02_name").get_attribute("href"))     #获取股票的主页地址

except Exception as e:
    print(e)
finally:
    brower.close()

总结：工具太强大，代码太少了，xpath很强大，一定要掌握。