python数据分析-pandas模块基础知识
呀~博主是正在学习数据分析的一员,记录的是自己学习过程中总结的知识点,肯定有不完善的地方,如有问题可以私聊我改正,共同学习进步。希望大家都能保持学习的热情,坚持自己,不断超越自己!
博客地址:qxi的博客
还是可以先预习下前面的知识点耶:
pandas基础知识(1)
pandas基础知识(2)
pandas基础知识(3)
pandas基础知识(4)
pandas基础知识(5)
pandas基础知识(6)
#这一篇接着上一篇讲DataFrame的合并,利用的是merge()函数#
- merge()函数
①merge()函数中的on=’key’
,代表基于哪个列索引值
把两个DataFrame合并起来,key指的是列索引值,先看只有一个key的合并。
import pandas as pd
import numpy as np
left=pd.DataFrame({'key':['K0','K1','K2','K3'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
right=pd.DataFrame({'key':['K0','K1','K2','K3'],'C':['C0','C1','C2','C3'],'D':['D0','D1','D2','D3']})
print(left)
print(right)
res=pd.merge(left,right,on='key') #基于索引key合并
print(res)
运行结果:
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 B3
key C D
0 K0 C0 D0
1 K1 C1 D1
2 K2 C2 D2
3 K3 C3 D3
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
②merge()函数中的how=''
当how='inner'
合并的是两个df相同key1,key2的部分,类似于取交集,不定义how时就是默认how=‘inner’,比如例子中相同的是key1=0,key2=0以及key1=1,key2=0;how='outer'
合并的是两个df中关于key1,key2的全部,类似于取并集,具体看例子。
import pandas as pd
import numpy as np
df1=pd.DataFrame({'key1':['K0','K0','K1','K2'],'key2':['K0','K1','K0','K1'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
df2=pd.DataFrame({'key1':['K0','K1','K1','K2'],'key2':['K0','K0','K0','K0'],'C':['C0','C1','C2','C3'],'D':['D0','D1','D2','D3']})
print(df1)
print(df2)
res1=pd.merge(df1,df2,on=['key1','key2'],how='inner') #默认inner
print(res1)
res2=pd.merge(df1,df2,on=['key1','key2'],how='outer') #默认inner
print(res2)
运行结果:
key1 key2 A B
0 K0 K0 A0 B0
1 K0 K1 A1 B1
2 K1 K0 A2 B2
3 K2 K1 A3 B3
key1 key2 C D
0 K0 K0 C0 D0
1 K1 K0 C1 D1
2 K1 K0 C2 D2
3 K2 K0 C3 D3
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2 #只合并相同的部分
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN
5 K2 K0 NaN NaN C3 D3 #都会合并,没有的用nan值填充
③how='left’
基于左边的df1合并,df1中所有内容会显示,df2只出现跟它关联部分相同的部分,比如这里right中第3行不显示(由于key1=K2,key2=K0在df2中并没有);how='right’
则是基于右边的df2合并
print(df1)
print(df2)
res1=pd.merge(df1,df2,on=['key1','key2'],how='left')
print(res1)
res2=pd.merge(df1,df2,on=['key1','key2'],how='right')
print(res2)
运行结果:
key1 key2 A B
0 K0 K0 A0 B0
1 K0 K1 A1 B1
2 K1 K0 A2 B2
3 K2 K1 A3 B3 #df1
key1 key2 C D
0 K0 K0 C0 D0
1 K1 K0 C1 D1
2 K1 K0 C2 D2
3 K2 K0 C3 D3 #df2
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN #基于df1进行合并
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2
3 K2 K0 NaN NaN C3 D3 #基于df2表进行合并
④merge()函数中indicator=True
时是用来显示合并情况的
import pandas as pd
import numpy as np
df1=pd.DataFrame({'col1':[0,1],'col_left':['a','b']})
df2=pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
print(df1)
print(df2)
res1=pd.merge(df1,df2,on='col1',how='outer',indicator=True) #用来显示哪个df没有
print(res1)
运行结果:
col1 col_left
0 0 a
1 1 b
col1 col_right
0 1 2
1 2 2
2 2 2
col1 col_left col_right _merge
0 0 a NaN left_only #left有,right没有
1 1 b 2.0 both
2 2 NaN 2.0 right_only
3 2 NaN 2.0 right_only
⑤left_index
以及right_index
代表的是基于行索引
进行合并,如果都为True的话代表的是基于两个df的行索引进行合并,再加个outer则是并集,inner是交集。
import pandas as pd
import numpy as np
left=pd.DataFrame({'A':['A0','A1','A2'],'B':['B0','B1','B2']},index=['K0','K1','K2'])
right=pd.DataFrame({'C':['C0','C2','C3'],'D':['D0','D2','D3']},index=['K0','K2','K3'])
print(left)
print(right)
res1=pd.merge(left,right,left_index=True,right_index=True,how='outer')
print(res1)
res2=pd.merge(left,right,left_index=True,right_index=True,how='inner')
print(res2)
运行结果:
A B
K0 A0 B0
K1 A1 B1
K2 A2 B2
C D
K0 C0 D0
K2 C2 D2
K3 C3 D3
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2
K3 NaN NaN C3 D3 #都显示出来,并集
A B C D
K0 A0 B0 C0 D0
K2 A2 B2 C2 D2 #取都有的行索引K0,K2
⑥merge()函数中定义suffixes
是对含有相同列索引`进行命名,具体看例子
import pandas as pd
import numpy as np
boys=pd.DataFrame({'k':['K0','K1','K2'],'age':[1,2,3]})
girls=pd.DataFrame({'k':['K0','K0','K3'],'age':[4,5,6]})
print(boys)
print(girls)
res=pd.merge(boys,girls,on='k',how="inner")
print(res)
res=pd.merge(boys,girls,on='k',suffixes=['_boy','_girls'],how="inner")
print(res)
运行结果:
k age
0 K0 1
1 K1 2
2 K2 3
k age
0 K0 4
1 K0 5
2 K3 6
k age_x age_y
0 K0 1 4
1 K0 1 5 #自动命名为age_x,age_y
k age_boy age_girls
0 K0 1 4
1 K0 1 5 #自定义命名
关于DataFrame的合并就总结完啦,差不多就是这些内容了,如果对你有帮助的话记得点赞,收藏,关注~
本文地址:https://blog.csdn.net/hswqxi/article/details/107430832
推荐阅读