python爬虫实例，股票数据定向爬虫

程序员文章站 2022-09-21 10:17:55

#CrawBaiduStocksA.pyimport requestsfrom bs4 import BeautifulSoupimport tracebackimport re def getHTMLText(url): try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r.text exc...

前言

我服了，这几天，怎么涨两天还不够跌一次，害。希望这个可以帮到自己！

“股票数据定向爬虫”实例介绍

功能描述

目标：获取上交所和深交所所有股票的名称和交易信息
输出：保存到文件中
技术路线：requests‐bs4‐re

候选数据网站的选择

新浪股票：http://finance.sina.com.cn/stock/
百度股票：https://gupiao.baidu.com/stock/

选取原则：股票信息静态存在于HTML页面中，非js代码生成；没有Robots协议限制
选取方法：浏览器 F12，源代码查看等
选取心态：不要纠结于某个网站，多找信息源尝试
请查看视频理解网站的选取过程

数据网站的确定

获取股票列表：
东方财富网：http://quote.eastmoney.com/stocklist.html
获取个股信息：
百度股票：https://gupiao.baidu.com/stock/
单个股票https://gupiao.baidu.com/stock/sz002439.html

程序的结构设计

步骤1：从东方财富网获取股票列表
步骤2：根据股票列表逐个到百度股票获取个股信息
步骤3：将结果存储到文件

“股票数据定向爬虫”实例编写

main（）

import requests from bs4 import BeautifulSoup import traceback import re def getHTMLText(url): return "" def getStockList(lst, stockURL): return "" def getStockInfo(lst, stockURL, fpath): return "" def main(): stock_list_url = 'http://quote.eastmoney.com/stocklist.html' stock_info_url = 'http://gupiao.baidu.com/stock/' output_file = 'D:/BaiduStockInfo.txt' slist=[] getStockList(slist, stock_list_url) getStockInfo(slist, stock_info_url, output_file) main()

def getHTMLText(url): try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return ""

东方财富网：http://quote.eastmoney.com/stocklist.html

def getStockList(lst, stockURL): html = getHTMLText(stockURL) soup = BeautifulSoup(html, 'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] lst.append(re.findall(r"[s][hz]\d{6}", href)[0]) except: continue

百度股票：https://gupiao.baidu.com/stock/

def getStockInfo(lst, stockURL, fpath): for stock in lst: url = stockURL + stock + ".html" html = getHTMLText(url) try: if html=="": continue infoDict = {} soup = BeautifulSoup(html, 'html.parser') stockInfo = soup.find('div',attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'})[0] infoDict.update({'股票名称': name.text.split()[0]}) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val with open(fpath, 'a', encoding='utf-8') as f: f.write( str(infoDict) + '\n' ) except: traceback.print_exc() continue

全代码见文末附录

“股票数据定向爬虫”实例优化

如何提高用户体验？
python爬虫实例，股票数据定向爬虫

单元小结

python爬虫实例，股票数据定向爬虫

附录（本节代码）

#CrawBaiduStocksA.py import requests from bs4 import BeautifulSoup import traceback import re def getHTMLText(url): try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return "" def getStockList(lst, stockURL): html = getHTMLText(stockURL) soup = BeautifulSoup(html, 'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] lst.append(re.findall(r"[s][hz]\d{6}", href)[0]) except: continue def getStockInfo(lst, stockURL, fpath): for stock in lst: url = stockURL + stock + ".html" html = getHTMLText(url) try: if html=="": continue infoDict = {} soup = BeautifulSoup(html, 'html.parser') stockInfo = soup.find('div',attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'})[0] infoDict.update({'股票名称': name.text.split()[0]}) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val with open(fpath, 'a', encoding='utf-8') as f: f.write( str(infoDict) + '\n' ) except: traceback.print_exc() continue def main(): stock_list_url = 'http://quote.eastmoney.com/stocklist.html' stock_info_url = 'http://gupiao.baidu.com/stock/' output_file = 'D:/BaiduStockInfo.txt' slist=[] getStockList(slist, stock_list_url) getStockInfo(slist, stock_info_url, output_file) main()

#CrawBaiduStocksB.py import requests from bs4 import BeautifulSoup import traceback import re def getHTMLText(url, code="utf-8"): try: r = requests.get(url) r.raise_for_status() r.encoding = code return r.text except: return "" def getStockList(lst, stockURL): html = getHTMLText(stockURL, "GB2312") soup = BeautifulSoup(html, 'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] lst.append(re.findall(r"[s][hz]\d{6}", href)[0]) except: continue def getStockInfo(lst, stockURL, fpath): count = 0 for stock in lst: url = stockURL + stock + ".html" html = getHTMLText(url) try: if html=="": continue infoDict = {} soup = BeautifulSoup(html, 'html.parser') stockInfo = soup.find('div',attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'})[0] infoDict.update({'股票名称': name.text.split()[0]}) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val with open(fpath, 'a', encoding='utf-8') as f: f.write( str(infoDict) + '\n' ) count = count + 1 print("\r当前进度: {:.2f}%".format(count*100/len(lst)),end="") except: count = count + 1 print("\r当前进度: {:.2f}%".format(count*100/len(lst)),end="") continue def main(): stock_list_url = 'http://quote.eastmoney.com/stocklist.html' stock_info_url = 'http://gupiao.baidu.com/stock/' output_file = 'D:/BaiduStockInfo.txt' slist=[] getStockList(slist, stock_list_url) getStockInfo(slist, stock_info_url, output_file) main()

本文地址：https://blog.csdn.net/haojie_duan/article/details/108228168

python爬虫实例，股票数据定向爬虫

前言

“股票数据定向爬虫”实例介绍

功能描述

候选数据网站的选择

数据网站的确定

程序的结构设计

“股票数据定向爬虫”实例编写

“股票数据定向爬虫”实例优化

单元小结

附录（本节代码）

python用BeautifulSoup库简单爬虫实例分析

Python爬虫包BeautifulSoup实例（三）

Python 爬虫招聘信息并存入数据库

Python爬虫抓取手机APP的传输数据

编写Python爬虫抓取暴走漫画上gif图片的实例分享

网易云歌单信息爬取及数据分析（python爬虫）

Python爬虫包BeautifulSoup学习实例（五）

Python爬取租房数据实例，据说可以入门爬虫的小案例！

Python爬虫实例：爬取B站《工作细胞》短评——异步加载信息的爬取

Python爬虫学习教程：天猫商品数据爬虫

python爬虫实例，股票数据定向爬虫

前言

“股票数据定向爬虫”实例介绍

功能描述

候选数据网站的选择

数据网站的确定

程序的结构设计

“股票数据定向爬虫”实例编写

“股票数据定向爬虫”实例优化

单元小结

附录（本节代码）

python用BeautifulSoup库简单爬虫实例分析

Python爬虫包BeautifulSoup实例（三）

Python 爬虫 招聘信息并存入数据库

Python爬虫抓取手机APP的传输数据

编写Python爬虫抓取暴走漫画上gif图片的实例分享

网易云歌单信息爬取及数据分析（python爬虫）

Python爬虫包BeautifulSoup学习实例（五）

Python爬取租房数据实例，据说可以入门爬虫的小案例！

Python爬虫实例：爬取B站《工作细胞》短评——异步加载信息的爬取

Python爬虫学习教程：天猫商品数据爬虫

Python 爬虫招聘信息并存入数据库