欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

python数据分析-pandas模块基础知识

程序员文章站 2022-11-05 20:13:34
呀~博主是正在学习数据分析的一员,记录的是自己学习过程中总结的知识点,肯定有不完善的地方,如有问题可以私聊我改正,共同学习进步。希望大家都能保持学习的热情,坚持自己,不断超越自己!博客地址:qxi的博客还是可以先预习下前面的知识点耶:pandas基础知识(1)pandas基础知识(2)pandas基础知识(3)pandas基础知识(4)pandas基础知识(5)pandas基础知识(6)#这一篇接着上一篇讲DataFrame的合并,利用的是merge()函数#merge()函数....

呀~博主是正在学习数据分析的一员,记录的是自己学习过程中总结的知识点,肯定有不完善的地方,如有问题可以私聊我改正,共同学习进步。希望大家都能保持学习的热情,坚持自己,不断超越自己!
博客地址:qxi的博客

还是可以先预习下前面的知识点耶:
pandas基础知识(1)
pandas基础知识(2)
pandas基础知识(3)
pandas基础知识(4)
pandas基础知识(5)
pandas基础知识(6)
#这一篇接着上一篇讲DataFrame的合并,利用的是merge()函数#

  1. merge()函数

①merge()函数中的on=’key’,代表基于哪个列索引值把两个DataFrame合并起来,key指的是列索引值,先看只有一个key的合并。

import pandas as pd
import numpy as np
left=pd.DataFrame({'key':['K0','K1','K2','K3'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
right=pd.DataFrame({'key':['K0','K1','K2','K3'],'C':['C0','C1','C2','C3'],'D':['D0','D1','D2','D3']})
print(left)
print(right)
res=pd.merge(left,right,on='key') #基于索引key合并
print(res)

运行结果:

  key   A   B
0  K0  A0  B0
1  K1  A1  B1
2  K2  A2  B2
3  K3  A3  B3
  key   C   D
0  K0  C0  D0
1  K1  C1  D1
2  K2  C2  D2
3  K3  C3  D3
  key   A   B   C   D
0  K0  A0  B0  C0  D0
1  K1  A1  B1  C1  D1
2  K2  A2  B2  C2  D2
3  K3  A3  B3  C3  D3

②merge()函数中的how=''
how='inner'合并的是两个df相同key1,key2的部分,类似于取交集,不定义how时就是默认how=‘inner’,比如例子中相同的是key1=0,key2=0以及key1=1,key2=0;
how='outer'合并的是两个df中关于key1,key2的全部,类似于取并集,具体看例子。

import pandas as pd
import numpy as np
df1=pd.DataFrame({'key1':['K0','K0','K1','K2'],'key2':['K0','K1','K0','K1'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
df2=pd.DataFrame({'key1':['K0','K1','K1','K2'],'key2':['K0','K0','K0','K0'],'C':['C0','C1','C2','C3'],'D':['D0','D1','D2','D3']})
print(df1)
print(df2)
res1=pd.merge(df1,df2,on=['key1','key2'],how='inner') #默认inner
print(res1)
res2=pd.merge(df1,df2,on=['key1','key2'],how='outer') #默认inner
print(res2)

运行结果:

  key1 key2   A   B
0   K0   K0  A0  B0
1   K0   K1  A1  B1
2   K1   K0  A2  B2
3   K2   K1  A3  B3
  key1 key2   C   D
0   K0   K0  C0  D0
1   K1   K0  C1  D1
2   K1   K0  C2  D2
3   K2   K0  C3  D3
  key1 key2   A   B   C   D
0   K0   K0  A0  B0  C0  D0
1   K1   K0  A2  B2  C1  D1
2   K1   K0  A2  B2  C2  D2  #只合并相同的部分
  key1 key2    A    B    C    D
0   K0   K0   A0   B0   C0   D0
1   K0   K1   A1   B1  NaN  NaN
2   K1   K0   A2   B2   C1   D1
3   K1   K0   A2   B2   C2   D2
4   K2   K1   A3   B3  NaN  NaN
5   K2   K0  NaN  NaN   C3   D3  #都会合并,没有的用nan值填充

how='left’基于左边的df1合并,df1中所有内容会显示,df2只出现跟它关联部分相同的部分,比如这里right中第3行不显示(由于key1=K2,key2=K0在df2中并没有);how='right’则是基于右边的df2合并

print(df1)
print(df2)
res1=pd.merge(df1,df2,on=['key1','key2'],how='left')
print(res1)
res2=pd.merge(df1,df2,on=['key1','key2'],how='right')
print(res2)

运行结果:

  key1 key2   A   B
0   K0   K0  A0  B0
1   K0   K1  A1  B1
2   K1   K0  A2  B2
3   K2   K1  A3  B3  #df1
  key1 key2   C   D
0   K0   K0  C0  D0
1   K1   K0  C1  D1
2   K1   K0  C2  D2
3   K2   K0  C3  D3  #df2
  key1 key2   A   B    C    D
0   K0   K0  A0  B0   C0   D0
1   K0   K1  A1  B1  NaN  NaN
2   K1   K0  A2  B2   C1   D1
3   K1   K0  A2  B2   C2   D2
4   K2   K1  A3  B3  NaN  NaN  #基于df1进行合并
  key1 key2    A    B   C   D
0   K0   K0   A0   B0  C0  D0
1   K1   K0   A2   B2  C1  D1
2   K1   K0   A2   B2  C2  D2
3   K2   K0  NaN  NaN  C3  D3  #基于df2表进行合并

④merge()函数中indicator=True时是用来显示合并情况的

import pandas as pd
import numpy as np
df1=pd.DataFrame({'col1':[0,1],'col_left':['a','b']})
df2=pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
print(df1)
print(df2)
res1=pd.merge(df1,df2,on='col1',how='outer',indicator=True) #用来显示哪个df没有
print(res1)

运行结果:

   col1 col_left
0     0        a
1     1        b
   col1  col_right
0     1          2
1     2          2
2     2          2
   col1 col_left  col_right      _merge
0     0        a        NaN   left_only #left有,right没有
1     1        b        2.0        both
2     2      NaN        2.0  right_only
3     2      NaN        2.0  right_only

left_index以及right_index代表的是基于行索引进行合并,如果都为True的话代表的是基于两个df的行索引进行合并,再加个outer则是并集,inner是交集。

import pandas as pd
import numpy as np
left=pd.DataFrame({'A':['A0','A1','A2'],'B':['B0','B1','B2']},index=['K0','K1','K2'])
right=pd.DataFrame({'C':['C0','C2','C3'],'D':['D0','D2','D3']},index=['K0','K2','K3'])
print(left)
print(right)
res1=pd.merge(left,right,left_index=True,right_index=True,how='outer') 
print(res1)
res2=pd.merge(left,right,left_index=True,right_index=True,how='inner') 
print(res2)

运行结果:

   A   B
K0  A0  B0
K1  A1  B1
K2  A2  B2
     C   D
K0  C0  D0
K2  C2  D2
K3  C3  D3
      A    B    C    D
K0   A0   B0   C0   D0
K1   A1   B1  NaN  NaN
K2   A2   B2   C2   D2
K3  NaN  NaN   C3   D3  #都显示出来,并集
     A   B   C   D
K0  A0  B0  C0  D0
K2  A2  B2  C2  D2  #取都有的行索引K0,K2

⑥merge()函数中定义suffixes是对含有相同列索引`进行命名,具体看例子

import pandas as pd
import numpy as np
boys=pd.DataFrame({'k':['K0','K1','K2'],'age':[1,2,3]})
girls=pd.DataFrame({'k':['K0','K0','K3'],'age':[4,5,6]})
print(boys)
print(girls)
res=pd.merge(boys,girls,on='k',how="inner")
print(res)
res=pd.merge(boys,girls,on='k',suffixes=['_boy','_girls'],how="inner")
print(res)

运行结果:

    k  age
0  K0    1
1  K1    2
2  K2    3
    k  age
0  K0    4
1  K0    5
2  K3    6
    k  age_x  age_y
0  K0      1      4
1  K0      1      5      #自动命名为age_x,age_y
    k  age_boy  age_girls
0  K0        1          4
1  K0        1          5    #自定义命名

关于DataFrame的合并就总结完啦,差不多就是这些内容了,如果对你有帮助的话记得点赞,收藏,关注~

本文地址:https://blog.csdn.net/hswqxi/article/details/107430832

相关标签: 数据分析 python