使用python爬虫实现网络股票信息爬取的demo

程序员文章站 2022-11-21 10:54:52

实例如下所示： import requests from bs4 import BeautifulSoup import traceback import...

实例如下所示：

import requests
from bs4 import BeautifulSoup
import traceback
import re
 
def getHTMLText(url):
 try:
  r = requests.get(url)
  r.raise_for_status()
  r.encoding = r.apparent_encoding
  return r.text
 except:
  return ""
 
def getStockList(lst, stockURL):
 html = getHTMLText(stockURL)
 soup = BeautifulSoup(html, 'html.parser') 
 a = soup.find_all('a')
 for i in a:
  try:
   href = i.attrs['href']
   lst.append(re.findall(r"[s][hz]\d{6}", href)[0])
  except:
   continue
 
def getStockInfo(lst, stockURL, fpath):
 for stock in lst:
  url = stockURL + stock + ".html"
  html = getHTMLText(url)
  try:
   if html=="":
    continue
   infoDict = {}
   soup = BeautifulSoup(html, 'html.parser')
   stockInfo = soup.find('div',attrs={'class':'stock-bets'})
 
   name = stockInfo.find_all(attrs={'class':'bets-name'})[0]
   infoDict.update({'股票名称': name.text.split()[0]})
    
   keyList = stockInfo.find_all('dt')
   valueList = stockInfo.find_all('dd')
   for i in range(len(keyList)):
    key = keyList[i].text
    val = valueList[i].text
    infoDict[key] = val
    
   with open(fpath, 'a', encoding='utf-8') as f:
    f.write( str(infoDict) + '\n' )
  except:
   traceback.print_exc()
   continue
 
def main():
 stock_list_url = 'http://quote.eastmoney.com/stocklist.html'
 stock_info_url = 'https://gupiao.baidu.com/stock/'
 output_file = 'D:/BaiduStockInfo.txt'
 slist=[]
 getStockList(slist, stock_list_url)
 getStockInfo(slist, stock_info_url, output_file)
 
main()

优化并且加入进度条显示

import requests
from bs4 import BeautifulSoup
import traceback
import re
def getHTMLText(url, code="utf-8"):
 try:
  r = requests.get(url)
  r.raise_for_status()
  r.encoding = code
  return r.text
 except:
  return ""
def getStockList(lst, stockURL):
 html = getHTMLText(stockURL, "GB2312")
 soup = BeautifulSoup(html, 'html.parser')
 a = soup.find_all('a')
 for i in a:
  try:
   href = i.attrs['href']
   lst.append(re.findall(r"[s][hz]\d{6}", href)[0])
  except:
   continue
def getStockInfo(lst, stockURL, fpath):
 count = 0
 for stock in lst:
  url = stockURL + stock + ".html"
  html = getHTMLText(url)
  try:
   if html == "":
    continue
   infoDict = {}
   soup = BeautifulSoup(html, 'html.parser')
   stockInfo = soup.find('div', attrs={'class': 'stock-bets'})
   name = stockInfo.find_all(attrs={'class': 'bets-name'})[0]
   infoDict.update({'股票名称': name.text.split()[0]})
   keyList = stockInfo.find_all('dt')
   valueList = stockInfo.find_all('dd')
   for i in range(len(keyList)):
    key = keyList[i].text
    val = valueList[i].text
    infoDict[key] = val
   with open(fpath, 'a', encoding='utf-8') as f:
    f.write(str(infoDict) + '\n')
    count = count + 1
    print("\r当前进度: {:.2f}%".format(count * 100 / len(lst)), end="")
  except:
   count = count + 1
   print("\r当前进度: {:.2f}%".format(count * 100 / len(lst)), end="")
   continue
def main():
 stock_list_url = 'http://quote.eastmoney.com/stocklist.html'
 stock_info_url = 'https://gupiao.baidu.com/stock/'
 output_file = 'BaiduStockInfo.txt'
 slist = []
 getStockList(slist, stock_list_url)
 getStockInfo(slist, stock_info_url, output_file)
main()

使用python爬虫实现网络股票信息爬取的demo

以上这篇使用python爬虫实现网络股票信息爬取的demo就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持。

上一篇： sql语句中where 1=1的作用

下一篇： Sybase数据库sa密码丢失后解决方法

使用python爬虫实现网络股票信息爬取的demo

Python爬虫使用selenium爬取qq群的成员信息（全自动实现自动登陆）

Python使用Scrapy爬虫框架全站爬取图片并保存本地的实现代码

python网络爬虫之解析网页的XPath(爬取Path职位信息)[三]

使用python爬虫实现网络股票信息爬取的demo

python爬虫基于Selenium的股票信息爬取工具实现

python使用requests库爬取拉勾网招聘信息的实现

Android Studio 爬虫之简单实现使用 jsoup/okhttp3 爬取购物商品信息的案例demo（附有详细步骤）

python爬虫：使用xpath和find两种方式分别实现使用requests_html库爬取网页中的内容

Python爬虫使用selenium爬取qq群的成员信息（全自动实现自动登陆）

python网络爬虫之解析网页的XPath(爬取Path职位信息)[三]

使用python爬虫实现网络股票信息爬取的demo

Python爬虫使用selenium爬取qq群的成员信息（全自动实现自动登陆）

Python使用Scrapy爬虫框架全站爬取图片并保存本地的实现代码

python网络爬虫之解析网页的XPath(爬取Path职位信息)[三]

使用python爬虫实现网络股票信息爬取的demo

python爬虫基于Selenium的股票信息爬取工具实现

python使用requests库爬取拉勾网招聘信息的实现

Android Studio 爬虫 之 简单实现使用 jsoup/okhttp3 爬取购物商品信息的案例demo（附有详细步骤）

python爬虫：使用xpath和find两种方式分别实现使用requests_html库爬取网页中的内容

Python爬虫使用selenium爬取qq群的成员信息（全自动实现自动登陆）

python网络爬虫之解析网页的XPath(爬取Path职位信息)[三]

Android Studio 爬虫之简单实现使用 jsoup/okhttp3 爬取购物商品信息的案例demo（附有详细步骤）