python学习笔记之导入数据

程序员文章站 2024-01-30 17:57:10

...

概述

把文件导入到python的方法有很多种

假设我们知道文件的地址（本地路径或者url），如果是flat/non-flat file
- 先将文件存到本地，然后用pd读取
- pd.read_csv()可以直接读url
通过HTTP请求
- urllib.request包
- requests包
- 用bs4包的BeautifulSoup来parse html

先将文件存到本地，然后用pd读取

urllib.request包中有一个方法 urlretrieve()，可以把一个文件通过url存到本地
实例：

url = 'https://s3.amazonaws.com/assets.datacamp.com/production/course_1606/datasets/winequality-red.csv'
# 存到本地，文件名为'winequality-red.csv'
urlretrieve(url, 'winequality-red.csv')

pd.read_csv()可以直接读url

url = 'https://s3.amazonaws.com/assets.datacamp.com/production/course_1606/datasets/winequality-red.csv'

df = pd.read_csv(url,sep=';')

# 画个图
pd.DataFrame.hist(df.ix[:, 0:1])
plt.xlabel('fixed acidity (g(tartaric acid)/dm$^3$)')
plt.ylabel('count')
plt.show()

python学习笔记之导入数据

通过url导入excel文件

#把excel存成一个dictionary
xls = pd.read_excel(url, sheet_name = None)
# 得到每个标签页的名字
print(xls.keys())
# 通过标签页的名字来access数据
print(xls['1700'].head())

通过HTTP请求来读取文件

urllib.request

urllib的子包request中有urlopen, Request方法，来建立http请求
实例：

from urllib.request import urlopen, Request

request = Request(url)
# 发送请求并储存结果
response = urlopen(request)
# 得到该url的html文件
html = response.read()

print(type(response))
# 关闭请求
response.close()

response的type是<class ‘http.client.HTTPResponse’>

requests包

requests包把发送请求接收回应合并成一个方程get()
response中text属性把html变成一个string

import requests
#send requestion and get response
r = requests.get(url)

# Extract the response: text
html = r.text

#处理json文件
json_data = r.json()

用bs4包的BeautifulSoup来parse html

显然成为一个单纯的string的html，既难以阅读，也难以分析，幸好有包 ><

from bs4 import BeautifulSoup
r = requests.get(url)
#得到html
html_doc = r.text
#建立一个BeautifulSoup对象
soup = BeautifulSoup(html_doc)

#得到title和text
title = soup.title
text = soup.get_text()
#找html中的超链接 (<a>中的内容)
a_tags = soup.find_all('a')

# 打印该html中所有超链接
for link in a_tags:
    print(link.get('href'))

python学习笔记之导入数据

概述

先将文件存到本地，然后用pd读取

pd.read_csv()可以直接读url

通过url导入excel文件

通过HTTP请求来读取文件

urllib.request

requests包

用bs4包的BeautifulSoup来parse html

python学习笔记之导入数据

大数据开发学习之hbase命令的简单操作

微信小程序学习之数据处理详解

AngularJS学习笔记之依赖注入详解

4.93Python数据类型之（8）集合

Python学习之Anaconda的使用与配置方法

Python数据结构之双向链表的定义与使用方法示例

Laravel 5框架学习之向视图传送数据_PHP

php之CodeIgniter学习笔记

Mysql学习总结（16）Mysql之数据库设计规范