欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

pandas1:pandas基础

程序员文章站 2022-06-11 18:02:17
DataFrame的基本元素pandas是一个可以处理文本,二维表的独立第三方库官网:https://pandas.pydata.org/import pandasdf = pandas.read_csv('./data/gapminder.tsv',sep='\t')获取前n行print(df.head()) #默认显示5行print(df.head(6)) country continent year lifeExp pop gdpPercap0...

DataFrame的基本元素

pandas是一个可以处理文本,二维表的独立第三方库

官网:https://pandas.pydata.org/

import pandas

df = pandas.read_csv('./data/gapminder.tsv',sep='\t')

获取前n行

print(df.head()) #默认显示5行
print(df.head(6))
       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
4  Afghanistan      Asia  1972   36.088  13079460  739.981106
       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
4  Afghanistan      Asia  1972   36.088  13079460  739.981106
5  Afghanistan      Asia  1977   38.438  14880372  786.113360
# head类型也是DataFrame

获取二维表维度

print(df.shape)
print(df.shape[0])
print(df.shape[1])
(1704, 6)
1704
6

获取列名

print(df.columns)
for col in df.columns:
    print(col)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
country
continent
year
lifeExp
pop
gdpPercap

每个列的类型

print(df.dtypes)
for type in df.dtypes:
    print(type)
country       object
continent     object
year           int64
lifeExp      float64
pop            int64
gdpPercap    float64
dtype: object
object
object
int64
float64
int64
float64
for item in zip(df.columns,df.dtypes):
    print(item)
('country', dtype('O'))
('continent', dtype('O'))
('year', dtype('int64'))
('lifeExp', dtype('float64'))
('pop', dtype('int64'))
('gdpPercap', dtype('float64'))
columnTypes = dict(zip(df.columns,df.dtypes))
print(columnTypes.get('country'))
print(columnTypes.get('pop'))
object
int64

data frame结构描述

print(df.info)
<bound method DataFrame.info of           country continent  year  lifeExp       pop   gdpPercap
0     Afghanistan      Asia  1952   28.801   8425333  779.445314
1     Afghanistan      Asia  1957   30.332   9240934  820.853030
2     Afghanistan      Asia  1962   31.997  10267083  853.100710
3     Afghanistan      Asia  1967   34.020  11537966  836.197138
4     Afghanistan      Asia  1972   36.088  13079460  739.981106
...           ...       ...   ...      ...       ...         ...
1699     Zimbabwe    Africa  1987   62.351   9216418  706.157306
1700     Zimbabwe    Africa  1992   60.377  10704340  693.420786
1701     Zimbabwe    Africa  1997   46.809  11404948  792.449960
1702     Zimbabwe    Africa  2002   39.989  11926563  672.038623
1703     Zimbabwe    Africa  2007   43.487  12311143  469.709298

[1704 rows x 6 columns]>

pandas与python数据类型对应关系

‘’’

object string

int64 int

float64 float

datetim364 datetime
‘’’

2.获取DataFrame数据

import pandas
df = pandas.read_csv('./data/gapminder.tsv',sep='\t')
print(df.columns)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')

2.1获取列

country_df = df['country']
print(type(country_df))
print(country_df)
<class 'pandas.core.series.Series'>
0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: country, Length: 1704, dtype: object
print(country_df.head())
print(country_df.tail(6))
0    Afghanistan
1    Afghanistan
2    Afghanistan
3    Afghanistan
4    Afghanistan
Name: country, dtype: object
1698    Zimbabwe
1699    Zimbabwe
1700    Zimbabwe
1701    Zimbabwe
1702    Zimbabwe
1703    Zimbabwe
Name: country, dtype: object

获取多个列数据

subset = df[['country','continent']]
print(subset.head(6))
       country continent
0  Afghanistan      Asia
1  Afghanistan      Asia
2  Afghanistan      Asia
3  Afghanistan      Asia
4  Afghanistan      Asia
5  Afghanistan      Asia

2.2获取行数据

loc iloc

print(df.loc[0])
print(type(df.loc[15])) 
#print(type(df.loc[-1])) #error 不能为负数
country      Afghanistan
continent           Asia
year                1952
lifeExp           28.801
pop              8425333
gdpPercap        779.445
Name: 0, dtype: object
<class 'pandas.core.series.Series'>
numer_of_rows = df.shape[0]
print(df.loc[numer_of_rows-1])
country      Zimbabwe
continent      Africa
year             2007
lifeExp        43.487
pop          12311143
gdpPercap     469.709
Name: 1703, dtype: object
print(type(df.loc[0]))
print(type(df.head()))
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
print(df.loc[[1,2]])
print(type(df.loc[[1,2]]))
       country continent  year  lifeExp       pop  gdpPercap
1  Afghanistan      Asia  1957   30.332   9240934  820.85303
2  Afghanistan      Asia  1962   31.997  10267083  853.10071
<class 'pandas.core.frame.DataFrame'>

iloc可以用负数

print(df.iloc[-1])
country      Zimbabwe
continent      Africa
year             2007
lifeExp        43.487
pop          12311143
gdpPercap     469.709
Name: 1703, dtype: object

2.3获取单元格数据

subset = df.loc[:,['year','pop']]
print(type(subset))
print(subset.head())
<class 'pandas.core.frame.DataFrame'>
   year       pop
0  1952   8425333
1  1957   9240934
2  1962  10267083
3  1967  11537966
4  1972  13079460
subset = df.iloc[:,[2,4,-1]]
print(type(subset))
print(subset.head())
<class 'pandas.core.frame.DataFrame'>
   year       pop   gdpPercap
0  1952   8425333  779.445314
1  1957   9240934  820.853030
2  1962  10267083  853.100710
3  1967  11537966  836.197138
4  1972  13079460  739.981106

2.4分片

subset = df.iloc[:,3:6]
print(subset.head())
   lifeExp       pop   gdpPercap
0   28.801   8425333  779.445314
1   30.332   9240934  820.853030
2   31.997  10267083  853.100710
3   34.020  11537966  836.197138
4   36.088  13079460  739.981106
subset = df.iloc[0:3,:]
print(subset.head())
       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
subset = df.loc[1,'lifeExp']
print(subset)
30.331999999999997

2.5分组统计

import pandas
df = pandas.read_csv('./data/gapminder.tsv',sep='\t')
print(df.columns)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
print(df.groupby('year')['lifeExp'].mean())
year
1952    49.057620
1957    51.507401
1962    53.609249
1967    55.678290
1972    57.647386
1977    59.570157
1982    61.533197
1987    63.212613
1992    64.160338
1997    65.014676
2002    65.694923
2007    67.007423
Name: lifeExp, dtype: float64
print(df.groupby('year'))
print(df.groupby('year')['lifeExp'])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7ff46dbd5890>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x7ff46dbd58d0>

2.6多组统计

multi_group_var = df.groupby(['year','continent'])[['lifeExp','gdpPercap']].mean()
print(type(multi_group_var))
print(multi_group_var)
<class 'pandas.core.frame.DataFrame'>
                  lifeExp     gdpPercap
year continent                         
1952 Africa     39.135500   1252.572466
     Americas   53.279840   4079.062552
     Asia       46.314394   5195.484004
     Europe     64.408500   5661.057435
     Oceania    69.255000  10298.085650
1957 Africa     41.266346   1385.236062
     Americas   55.960280   4616.043733
     Asia       49.318544   5787.732940
     Europe     66.703067   6963.012816
     Oceania    70.295000  11598.522455
1962 Africa     43.319442   1598.078825
     Americas   58.398760   4901.541870
     Asia       51.563223   5729.369625
     Europe     68.539233   8365.486814
     Oceania    71.085000  12696.452430
1967 Africa     45.334538   2050.363801
     Americas   60.410920   5668.253496
     Asia       54.663640   5971.173374
     Europe     69.737600  10143.823757
     Oceania    71.310000  14495.021790
1972 Africa     47.450942   2339.615674
     Americas   62.394920   6491.334139
     Asia       57.319269   8187.468699
     Europe     70.775033  12479.575246
     Oceania    71.910000  16417.333380
1977 Africa     49.580423   2585.938508
     Americas   64.391560   7352.007126
     Asia       59.610556   7791.314020
     Europe     71.937767  14283.979110
     Oceania    72.855000  17283.957605
1982 Africa     51.592865   2481.592960
     Americas   66.228840   7506.737088
     Asia       62.617939   7434.135157
     Europe     72.806400  15617.896551
     Oceania    74.290000  18554.709840
1987 Africa     53.344788   2282.668991
     Americas   68.090720   7793.400261
     Asia       64.851182   7608.226508
     Europe     73.642167  17214.310727
     Oceania    75.320000  20448.040160
1992 Africa     53.629577   2281.810333
     Americas   69.568360   8044.934406
     Asia       66.537212   8639.690248
     Europe     74.440100  17061.568084
     Oceania    76.945000  20894.045885
1997 Africa     53.598269   2378.759555
     Americas   71.150480   8889.300863
     Asia       68.020515   9834.093295
     Europe     75.505167  19076.781802
     Oceania    78.190000  24024.175170
2002 Africa     53.325231   2599.385159
     Americas   72.422040   9287.677107
     Asia       69.233879  10174.090397
     Europe     76.700600  21711.732422
     Oceania    79.740000  26938.778040
2007 Africa     54.806038   3089.032605
     Americas   73.608120  11003.031625
     Asia       70.728485  12473.026870
     Europe     77.648600  25054.481636
     Oceania    80.719500  29810.188275
print(multi_group_var.reset_index())
    year continent    lifeExp     gdpPercap
0   1952    Africa  39.135500   1252.572466
1   1952  Americas  53.279840   4079.062552
2   1952      Asia  46.314394   5195.484004
3   1952    Europe  64.408500   5661.057435
4   1952   Oceania  69.255000  10298.085650
5   1957    Africa  41.266346   1385.236062
6   1957  Americas  55.960280   4616.043733
7   1957      Asia  49.318544   5787.732940
8   1957    Europe  66.703067   6963.012816
9   1957   Oceania  70.295000  11598.522455
10  1962    Africa  43.319442   1598.078825
11  1962  Americas  58.398760   4901.541870
12  1962      Asia  51.563223   5729.369625
13  1962    Europe  68.539233   8365.486814
14  1962   Oceania  71.085000  12696.452430
15  1967    Africa  45.334538   2050.363801
16  1967  Americas  60.410920   5668.253496
17  1967      Asia  54.663640   5971.173374
18  1967    Europe  69.737600  10143.823757
19  1967   Oceania  71.310000  14495.021790
20  1972    Africa  47.450942   2339.615674
21  1972  Americas  62.394920   6491.334139
22  1972      Asia  57.319269   8187.468699
23  1972    Europe  70.775033  12479.575246
24  1972   Oceania  71.910000  16417.333380
25  1977    Africa  49.580423   2585.938508
26  1977  Americas  64.391560   7352.007126
27  1977      Asia  59.610556   7791.314020
28  1977    Europe  71.937767  14283.979110
29  1977   Oceania  72.855000  17283.957605
30  1982    Africa  51.592865   2481.592960
31  1982  Americas  66.228840   7506.737088
32  1982      Asia  62.617939   7434.135157
33  1982    Europe  72.806400  15617.896551
34  1982   Oceania  74.290000  18554.709840
35  1987    Africa  53.344788   2282.668991
36  1987  Americas  68.090720   7793.400261
37  1987      Asia  64.851182   7608.226508
38  1987    Europe  73.642167  17214.310727
39  1987   Oceania  75.320000  20448.040160
40  1992    Africa  53.629577   2281.810333
41  1992  Americas  69.568360   8044.934406
42  1992      Asia  66.537212   8639.690248
43  1992    Europe  74.440100  17061.568084
44  1992   Oceania  76.945000  20894.045885
45  1997    Africa  53.598269   2378.759555
46  1997  Americas  71.150480   8889.300863
47  1997      Asia  68.020515   9834.093295
48  1997    Europe  75.505167  19076.781802
49  1997   Oceania  78.190000  24024.175170
50  2002    Africa  53.325231   2599.385159
51  2002  Americas  72.422040   9287.677107
52  2002      Asia  69.233879  10174.090397
53  2002    Europe  76.700600  21711.732422
54  2002   Oceania  79.740000  26938.778040
55  2007    Africa  54.806038   3089.032605
56  2007  Americas  73.608120  11003.031625
57  2007      Asia  70.728485  12473.026870
58  2007    Europe  77.648600  25054.481636
59  2007   Oceania  80.719500  29810.188275

2.7统计数量

print(df.groupby('continent')['country'].nunique())
continent
Africa      52
Americas    25
Asia        33
Europe      30
Oceania      2
Name: country, dtype: int64
print(df.groupby('continent')['year'].nunique())
continent
Africa      12
Americas    12
Asia        12
Europe      12
Oceania     12
Name: year, dtype: int64

2.8可视化统计数据

import pandas
import matplotlib.pyplot as plt
df = pandas.read_csv('./data/gapminder.tsv',sep='\t')
print(df.columns)
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
global_yearly_life_expectancy = df.groupby('year')['lifeExp'].mean()
print(global_yearly_life_expectancy)
year
1952    49.057620
1957    51.507401
1962    53.609249
1967    55.678290
1972    57.647386
1977    59.570157
1982    61.533197
1987    63.212613
1992    64.160338
1997    65.014676
2002    65.694923
2007    67.007423
Name: lifeExp, dtype: float64
global_yearly_life_expectancy.plot()
plt.show()

pandas1:pandas基础

multi_group_var = df.groupby('year')['gdpPercap'].mean()
multi_group_var.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7ff46de9fed0>

pandas1:pandas基础

#NumPy、Pandas、Matplot

fig,(ax1,ax2) = plt.subplots(1,2,figsize=(8,4))
ax1.plot(global_yearly_life_expectancy)
ax2.plot(multi_group_var)
plt.show()

pandas1:pandas基础

本文地址:https://blog.csdn.net/luteresa/article/details/107363812