Pandas基础2.1|Python学习笔记
程序员文章站
2022-05-26 21:40:32
...
import numpy as np
import pandas as pd
df = pd.read_csv('./data/table.csv',index_col='ID')
df
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1101 | 0 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
1102 | 1 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
1103 | 2 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
1104 | 3 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
1105 | 4 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
1201 | 5 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
1202 | 6 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
1203 | 7 | S_1 | C_2 | M | street_6 | 160 | 53 | 58.8 | A+ |
1204 | 8 | S_1 | C_2 | F | street_5 | 162 | 63 | 33.8 | B |
1205 | 9 | S_1 | C_2 | F | street_6 | 167 | 63 | 68.4 | B- |
1301 | 10 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
一、单级索引
- 最常用的三类:iloc - 位置索引;loc - 标签索引;[]
loc(RMK:loc中使用的切片全部包含右端点)
单行索引:
df.loc[1103]
Unnamed: 0 2
School S_1
Class C_1
Gender M
Address street_2
Height 186
Weight 82
Math 87.2
Physics B+
Name: 1103, dtype: object
多行索引:
df.loc[[1103,1104]]
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1103 | 2 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
1104 | 3 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
df.loc[2402:].head(5)#1304往后的所有
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
2402 | 31 | S_2 | C_4 | M | street_7 | 166 | 82 | 48.7 | B |
2403 | 32 | S_2 | C_4 | F | street_6 | 158 | 60 | 59.7 | B+ |
2404 | 33 | S_2 | C_4 | F | street_2 | 160 | 84 | 67.7 | B |
2405 | 34 | S_2 | C_4 | F | street_6 | 193 | 54 | 47.6 | B |
df.loc[2402:2304:-1].head(5) #从2402开始从后往前取;loc取到端点
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
2402 | 31 | S_2 | C_4 | M | street_7 | 166 | 82 | 48.7 | B |
2401 | 30 | S_2 | C_4 | F | street_2 | 192 | 62 | 45.3 | A |
2305 | 29 | S_2 | C_3 | M | street_4 | 187 | 73 | 48.9 | B |
2304 | 28 | S_2 | C_3 | F | street_6 | 164 | 81 | 95.5 | A- |
- 注:所有在loc中使用的切片全部包含右断电。
作为pandas的使用者,不会关注最后一个标签再往后一位。若为左闭右开,则需要先知道再后面一列的名字,不便于操作。
单列索引:
df.loc[:,'Height'].head()
ID
1101 173
1102 192
1103 186
1104 167
1105 159
Name: Height, dtype: int64
多列索引:
df.loc[1201:2405,['Math','Physics']].head(5)
Math | Physics | |
---|---|---|
ID | ||
1201 | 97.0 | A- |
1202 | 63.5 | B- |
1203 | 58.8 | A+ |
1204 | 33.8 | B |
1205 | 68.4 | B- |
df.loc[:,'Gender':'Weight'].head()
Gender | Address | Height | Weight | |
---|---|---|---|---|
ID | ||||
1101 | M | street_1 | 173 | 63 |
1102 | F | street_2 | 192 | 73 |
1103 | M | street_2 | 186 | 82 |
1104 | F | street_2 | 167 | 81 |
1105 | F | street_4 | 159 | 64 |
联合索引:
df.loc[1101:2405:4,'Address':'Math'].head()
Address | Height | Weight | Math | |
---|---|---|---|---|
ID | ||||
1101 | street_1 | 173 | 63 | 34.0 |
1105 | street_4 | 159 | 64 | 84.8 |
1204 | street_5 | 162 | 63 | 33.8 |
1303 | street_7 | 188 | 82 | 49.7 |
2102 | street_6 | 161 | 61 | 50.6 |
函数列索引:
- lambda:匿名函数
g = lambda x: x+1
def g(x): return x+1
两者等价 --> lambda简化了函数定义的书写形式
df.loc[lambda x:x['Height'] >170 ].head()
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1101 | 0 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
1102 | 1 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
1103 | 2 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
1201 | 5 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
1202 | 6 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
loc可传入函数,且函数的输入值是整张表,输出为标量、切片、合法列表(元素出现在索引中)、合法索引
def f(x):
return [1101,1202]
df.loc[f].head()
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1101 | 0 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
1202 | 6 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
布尔索引:
df_1 = df['Gender'].isin(['M'])
df_1.head()
ID
1101 True
1102 False
1103 True
1104 False
1105 False
Name: Gender, dtype: bool
df.loc[df_1].head()
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1101 | 0 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
1103 | 2 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
1201 | 5 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
1203 | 7 | S_1 | C_2 | M | street_6 | 160 | 53 | 58.8 | A+ |
1301 | 10 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
df_2 = [True if i[-1]=='4' or i[-1]=='7' else False for i in df['Address'].values]
#df_2为list
df_2
[False,
False,
False,
False,
True,
...]
df.loc[df_2].head()
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1105 | 4 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
1202 | 6 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
1301 | 10 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
1303 | 12 | S_1 | C_3 | M | street_7 | 188 | 82 | 49.7 | B |
2101 | 15 | S_2 | C_1 | M | street_7 | 174 | 84 | 83.3 | C |
只有布尔列表和索引子集构成的列表可传入loc
iloc方法(切片右端点不包含)
单行索引:
df.iloc[-1]
Unnamed: 0 34
School S_2
Class C_4
Gender F
Address street_6
Height 193
Weight 54
Math 47.6
Physics B
Name: 2405, dtype: object
多行索引:
df.iloc[0:10:2]
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1101 | 0 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
1103 | 2 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
1105 | 4 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
1202 | 6 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
1204 | 8 | S_1 | C_2 | F | street_5 | 162 | 63 | 33.8 | B |
单列索引:
df.iloc[:,-1].head()
ID
1101 A+
1102 B+
1103 B+
1104 B-
1105 B+
Name: Physics, dtype: object
多列索引:
df.iloc[:,-1::-2].head()
Physics | Weight | Address | Class | Unnamed: 0 | |
---|---|---|---|---|---|
ID | |||||
1101 | A+ | 63 | street_1 | C_1 | 0 |
1102 | B+ | 73 | street_2 | C_1 | 1 |
1103 | B+ | 82 | street_2 | C_1 | 2 |
1104 | B- | 81 | street_2 | C_1 | 3 |
1105 | B+ | 64 | street_4 | C_1 | 4 |
混合索引:
df.iloc[3::4,-1::-3].head()
Physics | Height | Class | |
---|---|---|---|
ID | |||
1104 | B- | 167 | C_1 |
1203 | A+ | 160 | C_2 |
1302 | A- | 175 | C_3 |
2101 | C | 174 | C_1 |
2105 | A | 170 | C_1 |
函数式索引:
df.iloc[lambda x:[-3],-1::-2].head()
Physics | Weight | Address | Class | Unnamed: 0 | |
---|---|---|---|---|---|
ID | |||||
2403 | B+ | 60 | street_6 | C_4 | 32 |
iloc中接受的参数智能为整数或整数列表或布尔列表,不能使用布尔Series,若要用则需要将values拿出来
df_3 = (df['Address']=='street_2').values
df_3
array([False, True, True, True, False, False, False, False, False,
False, False, False, False, True, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, True, False, False, True, False])
df.iloc[df_3].head()
Unnamed: 0 | School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|---|
ID | |||||||||
1102 | 1 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
1103 | 2 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
1104 | 3 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
1304 | 13 | S_1 | C_3 | M | street_2 | 195 | 70 | 85.2 | A |
2401 | 30 | S_2 | C_4 | F | street_2 | 192 | 62 | 45.3 | A |
[]操作符
Series的[]操作
单元素索引:
#df['*']为一个Series,作为data就传入了index,若后边又传入一个index,根据自动对齐规则(以后边指定的index为准),就变成了NaN
#df['*'].tolist()或者df['*'].values;若只有df['*']无法确定是Math的索引还是值
s = pd.Series(df['Math'].values,index = df['Address'])
s['street_2']
street_2 32.5
street_2 87.2
street_2 80.4
street_2 85.2
street_2 45.3
street_2 67.7
dtype: float64
m = pd.Series(df['Math'],index=df.index)
m[2105]
34.2
m[0:4]
ID
1101 34.0
1102 32.5
1103 87.2
1104 80.4
Name: Math, dtype: float64
函数式索引:
#lambda x: x.index[16::-6]为绝对位置切片
#lambda x: 16::-6 为元素切片
m[lambda x: x.index[16::-6]]
ID
2102 50.6
1301 31.5
1105 84.8
Name: Math, dtype: float64
布尔索引:
m>80
ID
1101 False
1102 False
1103 True
1104 True
1105 True
…
Name: Math, dtype: bool
m[m>80]
ID
1103 87.2
1104 80.4
1105 84.8
1201 97.0
1302 87.7
1304 85.2
2101 83.3
2205 85.4
2304 95.5
Name: Math, dtype: float64
注:在Series中[]的浮点切片不是位置比较,而是值比较,故尽量不要在行索引为浮点时使用[]操作符。
s_int = pd.Series([1,2,3,4],index = [1,3,5,6])
s_float = pd.Series([1,2,3,4],index=[1.,3.,5.,6.])
s_int
1 1
3 2
5 3
6 4
dtype: int64
s_float[2:]#2作为元素
3.0 2
5.0 3
6.0 4
dtype: int64
s_int[2:]#2作为位置
5 3
6 4
dtype: int64
上一篇: pyecharts 离散图代码
下一篇: 色彩饼状图(pie chart)