欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

pandas之重塑和轴向旋转

程序员文章站 2024-02-28 19:12:10
...

重塑和轴向旋转    用于重新排列表格型数据的基础运算。

对于DataFrame,主要功能有:

(1)stack:将数据的列“旋转”为行      (2)unstack:将数据的行“旋转”为列

例1:(其中行列索引均为字符串)

data = DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['O','C'],name='state'),columns=pd.Index(['one','two','three'],name='number'))
data
Out[3]: 
number  one  two  three
state                  
O         0    1      2
C         3    4      5

result=data.stack()     #使用该数据的stack方法即可将列转换为行,得到一个Series
result
Out[5]: 
state  number
O      one       0
       two       1
       three     2
C      one       3
       two       4
       three     5
dtype: int32

result.unstack()       #对于一个层次化索引的Series,你可以用unstack将其重排为一个DataFrame
Out[6]: 
number  one  two  three
state                  
O         0    1      2
C         3    4      5

result.unstack(0)      #默认情况下,操作的是最内层(stack也是如此)。传入分层级的编号或名称即可对其他级别进行unstack操作
Out[7]: 
state   O  C
number      
one     0  3
two     1  4
three   2  5

result.unstack('state')
Out[8]: 
state   O  C
number      
one     0  3
two     1  4
three   2  5

(3)如果不是所有的级别值都能在分组中找到的话,则unstack操作可能会引入缺失数据

s1 = Series([0,1,2,3],index=['a','b','c','d'])
s2 = Series([4,5,6],index=['c','d','e'])
data2 = pd.concat([s1,s2],keys=['one','two'])
data2.unstack()
Out[9]: 
       a    b    c    d    e
one  0.0  1.0  2.0  3.0  NaN
two  NaN  NaN  4.0  5.0  6.0

data2.unstack().stack()  #stack默认会滤除缺失数据,因此该运算是可逆的
Out[10]: 
one  a    0.0
     b    1.0
     c    2.0
     d    3.0
two  c    4.0
     d    5.0
     e    6.0
dtype: float64

data2.unstack().stack(dropna=False)
Out[11]: 
one  a    0.0
     b    1.0
     c    2.0
     d    3.0
     e    NaN
two  a    NaN
     b    NaN
     c    4.0
     d    5.0
     e    6.0
dtype: float64

(4)在对DataFrame进行unstack操作时,作为旋转轴的级别将会成为结果中的最低级别:

df = DataFrame({'left':result,'right':result+5},columns=pd.Index(['left','right'],name='side'))
df   
Out[13]: 
side          left  right
state number             
O     one        0      5
      two        1      6
      three      2      7
C     one        3      8
      two        4      9
      three      5     10

df = DataFrame({'left':result,'right':result+5},columns=pd.Index(['left','right'],name='side'))
df
Out[13]: 
side          left  right
state number             
O     one        0      5
      two        1      6
      three      2      7
C     one        3      8
      two        4      9
      three      5     10

df.unstack('state')
Out[14]: 
side   left    right    
state     O  C     O   C
number                  
one       0  3     5   8
two       1  4     6   9
three     2  5     7  10

df.unstack('state').stack('side')
Out[15]: 
state          C  O
number side        
one    left    3  0
       right   8  5
two    left    4  1
       right   9  6
three  left    5  2
       right  10  7