pandas1:pandas基础
程序员文章站
2023-11-03 15:56:40
DataFrame的基本元素pandas是一个可以处理文本,二维表的独立第三方库官网:https://pandas.pydata.org/import pandasdf = pandas.read_csv('./data/gapminder.tsv',sep='\t')获取前n行print(df.head()) #默认显示5行print(df.head(6)) country continent year lifeExp pop gdpPercap0...
DataFrame的基本元素
pandas是一个可以处理文本,二维表的独立第三方库
import pandas
df = pandas.read_csv('./data/gapminder.tsv',sep='\t')
获取前n行
print(df.head()) #默认显示5行
print(df.head(6))
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
1 Afghanistan Asia 1957 30.332 9240934 820.853030
2 Afghanistan Asia 1962 31.997 10267083 853.100710
3 Afghanistan Asia 1967 34.020 11537966 836.197138
4 Afghanistan Asia 1972 36.088 13079460 739.981106
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
1 Afghanistan Asia 1957 30.332 9240934 820.853030
2 Afghanistan Asia 1962 31.997 10267083 853.100710
3 Afghanistan Asia 1967 34.020 11537966 836.197138
4 Afghanistan Asia 1972 36.088 13079460 739.981106
5 Afghanistan Asia 1977 38.438 14880372 786.113360
# head类型也是DataFrame
获取二维表维度
print(df.shape)
print(df.shape[0])
print(df.shape[1])
(1704, 6)
1704
6
获取列名
print(df.columns)
for col in df.columns:
print(col)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
country
continent
year
lifeExp
pop
gdpPercap
每个列的类型
print(df.dtypes)
for type in df.dtypes:
print(type)
country object
continent object
year int64
lifeExp float64
pop int64
gdpPercap float64
dtype: object
object
object
int64
float64
int64
float64
for item in zip(df.columns,df.dtypes):
print(item)
('country', dtype('O'))
('continent', dtype('O'))
('year', dtype('int64'))
('lifeExp', dtype('float64'))
('pop', dtype('int64'))
('gdpPercap', dtype('float64'))
columnTypes = dict(zip(df.columns,df.dtypes))
print(columnTypes.get('country'))
print(columnTypes.get('pop'))
object
int64
data frame结构描述
print(df.info)
<bound method DataFrame.info of country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
1 Afghanistan Asia 1957 30.332 9240934 820.853030
2 Afghanistan Asia 1962 31.997 10267083 853.100710
3 Afghanistan Asia 1967 34.020 11537966 836.197138
4 Afghanistan Asia 1972 36.088 13079460 739.981106
... ... ... ... ... ... ...
1699 Zimbabwe Africa 1987 62.351 9216418 706.157306
1700 Zimbabwe Africa 1992 60.377 10704340 693.420786
1701 Zimbabwe Africa 1997 46.809 11404948 792.449960
1702 Zimbabwe Africa 2002 39.989 11926563 672.038623
1703 Zimbabwe Africa 2007 43.487 12311143 469.709298
[1704 rows x 6 columns]>
pandas与python数据类型对应关系
‘’’
object string
int64 int
float64 float
datetim364 datetime
‘’’
2.获取DataFrame数据
import pandas
df = pandas.read_csv('./data/gapminder.tsv',sep='\t')
print(df.columns)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
2.1获取列
country_df = df['country']
print(type(country_df))
print(country_df)
<class 'pandas.core.series.Series'>
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
...
1699 Zimbabwe
1700 Zimbabwe
1701 Zimbabwe
1702 Zimbabwe
1703 Zimbabwe
Name: country, Length: 1704, dtype: object
print(country_df.head())
print(country_df.tail(6))
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
Name: country, dtype: object
1698 Zimbabwe
1699 Zimbabwe
1700 Zimbabwe
1701 Zimbabwe
1702 Zimbabwe
1703 Zimbabwe
Name: country, dtype: object
获取多个列数据
subset = df[['country','continent']]
print(subset.head(6))
country continent
0 Afghanistan Asia
1 Afghanistan Asia
2 Afghanistan Asia
3 Afghanistan Asia
4 Afghanistan Asia
5 Afghanistan Asia
2.2获取行数据
loc iloc
print(df.loc[0])
print(type(df.loc[15]))
#print(type(df.loc[-1])) #error 不能为负数
country Afghanistan
continent Asia
year 1952
lifeExp 28.801
pop 8425333
gdpPercap 779.445
Name: 0, dtype: object
<class 'pandas.core.series.Series'>
numer_of_rows = df.shape[0]
print(df.loc[numer_of_rows-1])
country Zimbabwe
continent Africa
year 2007
lifeExp 43.487
pop 12311143
gdpPercap 469.709
Name: 1703, dtype: object
print(type(df.loc[0]))
print(type(df.head()))
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
print(df.loc[[1,2]])
print(type(df.loc[[1,2]]))
country continent year lifeExp pop gdpPercap
1 Afghanistan Asia 1957 30.332 9240934 820.85303
2 Afghanistan Asia 1962 31.997 10267083 853.10071
<class 'pandas.core.frame.DataFrame'>
iloc可以用负数
print(df.iloc[-1])
country Zimbabwe
continent Africa
year 2007
lifeExp 43.487
pop 12311143
gdpPercap 469.709
Name: 1703, dtype: object
2.3获取单元格数据
subset = df.loc[:,['year','pop']]
print(type(subset))
print(subset.head())
<class 'pandas.core.frame.DataFrame'>
year pop
0 1952 8425333
1 1957 9240934
2 1962 10267083
3 1967 11537966
4 1972 13079460
subset = df.iloc[:,[2,4,-1]]
print(type(subset))
print(subset.head())
<class 'pandas.core.frame.DataFrame'>
year pop gdpPercap
0 1952 8425333 779.445314
1 1957 9240934 820.853030
2 1962 10267083 853.100710
3 1967 11537966 836.197138
4 1972 13079460 739.981106
2.4分片
subset = df.iloc[:,3:6]
print(subset.head())
lifeExp pop gdpPercap
0 28.801 8425333 779.445314
1 30.332 9240934 820.853030
2 31.997 10267083 853.100710
3 34.020 11537966 836.197138
4 36.088 13079460 739.981106
subset = df.iloc[0:3,:]
print(subset.head())
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
1 Afghanistan Asia 1957 30.332 9240934 820.853030
2 Afghanistan Asia 1962 31.997 10267083 853.100710
subset = df.loc[1,'lifeExp']
print(subset)
30.331999999999997
2.5分组统计
import pandas
df = pandas.read_csv('./data/gapminder.tsv',sep='\t')
print(df.columns)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
print(df.groupby('year')['lifeExp'].mean())
year
1952 49.057620
1957 51.507401
1962 53.609249
1967 55.678290
1972 57.647386
1977 59.570157
1982 61.533197
1987 63.212613
1992 64.160338
1997 65.014676
2002 65.694923
2007 67.007423
Name: lifeExp, dtype: float64
print(df.groupby('year'))
print(df.groupby('year')['lifeExp'])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7ff46dbd5890>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x7ff46dbd58d0>
2.6多组统计
multi_group_var = df.groupby(['year','continent'])[['lifeExp','gdpPercap']].mean()
print(type(multi_group_var))
print(multi_group_var)
<class 'pandas.core.frame.DataFrame'>
lifeExp gdpPercap
year continent
1952 Africa 39.135500 1252.572466
Americas 53.279840 4079.062552
Asia 46.314394 5195.484004
Europe 64.408500 5661.057435
Oceania 69.255000 10298.085650
1957 Africa 41.266346 1385.236062
Americas 55.960280 4616.043733
Asia 49.318544 5787.732940
Europe 66.703067 6963.012816
Oceania 70.295000 11598.522455
1962 Africa 43.319442 1598.078825
Americas 58.398760 4901.541870
Asia 51.563223 5729.369625
Europe 68.539233 8365.486814
Oceania 71.085000 12696.452430
1967 Africa 45.334538 2050.363801
Americas 60.410920 5668.253496
Asia 54.663640 5971.173374
Europe 69.737600 10143.823757
Oceania 71.310000 14495.021790
1972 Africa 47.450942 2339.615674
Americas 62.394920 6491.334139
Asia 57.319269 8187.468699
Europe 70.775033 12479.575246
Oceania 71.910000 16417.333380
1977 Africa 49.580423 2585.938508
Americas 64.391560 7352.007126
Asia 59.610556 7791.314020
Europe 71.937767 14283.979110
Oceania 72.855000 17283.957605
1982 Africa 51.592865 2481.592960
Americas 66.228840 7506.737088
Asia 62.617939 7434.135157
Europe 72.806400 15617.896551
Oceania 74.290000 18554.709840
1987 Africa 53.344788 2282.668991
Americas 68.090720 7793.400261
Asia 64.851182 7608.226508
Europe 73.642167 17214.310727
Oceania 75.320000 20448.040160
1992 Africa 53.629577 2281.810333
Americas 69.568360 8044.934406
Asia 66.537212 8639.690248
Europe 74.440100 17061.568084
Oceania 76.945000 20894.045885
1997 Africa 53.598269 2378.759555
Americas 71.150480 8889.300863
Asia 68.020515 9834.093295
Europe 75.505167 19076.781802
Oceania 78.190000 24024.175170
2002 Africa 53.325231 2599.385159
Americas 72.422040 9287.677107
Asia 69.233879 10174.090397
Europe 76.700600 21711.732422
Oceania 79.740000 26938.778040
2007 Africa 54.806038 3089.032605
Americas 73.608120 11003.031625
Asia 70.728485 12473.026870
Europe 77.648600 25054.481636
Oceania 80.719500 29810.188275
print(multi_group_var.reset_index())
year continent lifeExp gdpPercap
0 1952 Africa 39.135500 1252.572466
1 1952 Americas 53.279840 4079.062552
2 1952 Asia 46.314394 5195.484004
3 1952 Europe 64.408500 5661.057435
4 1952 Oceania 69.255000 10298.085650
5 1957 Africa 41.266346 1385.236062
6 1957 Americas 55.960280 4616.043733
7 1957 Asia 49.318544 5787.732940
8 1957 Europe 66.703067 6963.012816
9 1957 Oceania 70.295000 11598.522455
10 1962 Africa 43.319442 1598.078825
11 1962 Americas 58.398760 4901.541870
12 1962 Asia 51.563223 5729.369625
13 1962 Europe 68.539233 8365.486814
14 1962 Oceania 71.085000 12696.452430
15 1967 Africa 45.334538 2050.363801
16 1967 Americas 60.410920 5668.253496
17 1967 Asia 54.663640 5971.173374
18 1967 Europe 69.737600 10143.823757
19 1967 Oceania 71.310000 14495.021790
20 1972 Africa 47.450942 2339.615674
21 1972 Americas 62.394920 6491.334139
22 1972 Asia 57.319269 8187.468699
23 1972 Europe 70.775033 12479.575246
24 1972 Oceania 71.910000 16417.333380
25 1977 Africa 49.580423 2585.938508
26 1977 Americas 64.391560 7352.007126
27 1977 Asia 59.610556 7791.314020
28 1977 Europe 71.937767 14283.979110
29 1977 Oceania 72.855000 17283.957605
30 1982 Africa 51.592865 2481.592960
31 1982 Americas 66.228840 7506.737088
32 1982 Asia 62.617939 7434.135157
33 1982 Europe 72.806400 15617.896551
34 1982 Oceania 74.290000 18554.709840
35 1987 Africa 53.344788 2282.668991
36 1987 Americas 68.090720 7793.400261
37 1987 Asia 64.851182 7608.226508
38 1987 Europe 73.642167 17214.310727
39 1987 Oceania 75.320000 20448.040160
40 1992 Africa 53.629577 2281.810333
41 1992 Americas 69.568360 8044.934406
42 1992 Asia 66.537212 8639.690248
43 1992 Europe 74.440100 17061.568084
44 1992 Oceania 76.945000 20894.045885
45 1997 Africa 53.598269 2378.759555
46 1997 Americas 71.150480 8889.300863
47 1997 Asia 68.020515 9834.093295
48 1997 Europe 75.505167 19076.781802
49 1997 Oceania 78.190000 24024.175170
50 2002 Africa 53.325231 2599.385159
51 2002 Americas 72.422040 9287.677107
52 2002 Asia 69.233879 10174.090397
53 2002 Europe 76.700600 21711.732422
54 2002 Oceania 79.740000 26938.778040
55 2007 Africa 54.806038 3089.032605
56 2007 Americas 73.608120 11003.031625
57 2007 Asia 70.728485 12473.026870
58 2007 Europe 77.648600 25054.481636
59 2007 Oceania 80.719500 29810.188275
2.7统计数量
print(df.groupby('continent')['country'].nunique())
continent
Africa 52
Americas 25
Asia 33
Europe 30
Oceania 2
Name: country, dtype: int64
print(df.groupby('continent')['year'].nunique())
continent
Africa 12
Americas 12
Asia 12
Europe 12
Oceania 12
Name: year, dtype: int64
2.8可视化统计数据
import pandas
import matplotlib.pyplot as plt
df = pandas.read_csv('./data/gapminder.tsv',sep='\t')
print(df.columns)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
global_yearly_life_expectancy = df.groupby('year')['lifeExp'].mean()
print(global_yearly_life_expectancy)
year
1952 49.057620
1957 51.507401
1962 53.609249
1967 55.678290
1972 57.647386
1977 59.570157
1982 61.533197
1987 63.212613
1992 64.160338
1997 65.014676
2002 65.694923
2007 67.007423
Name: lifeExp, dtype: float64
global_yearly_life_expectancy.plot()
plt.show()
multi_group_var = df.groupby('year')['gdpPercap'].mean()
multi_group_var.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7ff46de9fed0>
#NumPy、Pandas、Matplot
fig,(ax1,ax2) = plt.subplots(1,2,figsize=(8,4))
ax1.plot(global_yearly_life_expectancy)
ax2.plot(multi_group_var)
plt.show()
本文地址:https://blog.csdn.net/luteresa/article/details/107363812