pandas调整列的顺序以及添加列的实现

程序员文章站 2022-05-05 16:04:05

在对excel的操作中，调整列的顺序以及添加一些列也是经常用到的，下面我们用pandas实现这一功能。1、调整列的顺序>>> df = pd.read_excel(r'd:/myex...

在对excel的操作中，调整列的顺序以及添加一些列也是经常用到的，下面我们用pandas实现这一功能。

1、调整列的顺序

>>> df = pd.read_excel(r'd:/myexcel/1.xlsx')
>>> df
  a b c d
0  bob 12 78 87
1 millor 15 92 21
>>> df.columns
index(['a', 'b', 'c', 'd'], dtype='object')
# 这是最简单常用的一种方法，相当于指定列名让pandas
# 从df中获取
>>> df[['a', 'd', 'c', 'b']]
  a d c b
0  bob 87 78 12
1 millor 21 92 15
# 这也是可以的
>>> df[['a', 'a', 'a', 'a']]
  a  a  a  a
0  bob  bob  bob  bob
1 millor millor millor millor

2、添加某一列或者某几列

（1）直接添加

>>> df['e']=[1, 2]
>>> df
  a b c d e
0  bob 12 78 87 1
1 millor 15 92 21 2

（2）调用assign方法。该方法善于根据已有的列添加新的列，通过基本运算，或者调用函数

>>> df
  a b c d
0  bob 12 78 87
1 millor 15 92 21
# 其中e是列名，根据b列-c列的值得到
>>> df.assign(e=df['b'] - df['c'])
  a b c d e
0  bob 12 78 87 -66
1 millor 15 92 21 -77
# 添加两列也可以
>>> df.assign(e=df['b'] - df['c'], f=df['b'] * df['c'])
  a b c d e  f
0  bob 12 78 87 -66 936
1 millor 15 92 21 -77 1380

哈哈，以上就是pandas关于调整列的顺序以及新增列的用法

补充：pandas修改dataframe中的列名&调整列的顺序

修改列名：

直接调用接口：

df.rename()

看一下接口中的定义：

 def rename(self, *args, **kwargs):
  """
  alter axes labels.
  function / dict values must be unique (1-to-1). labels not contained in
  a dict / series will be left as-is. extra labels listed don't throw an
  error.
  see the :ref:`user guide <basics.rename>` for more.
  parameters
  ----------
  mapper, index, columns : dict-like or function, optional
   dict-like or functions transformations to apply to
   that axis' values. use either ``mapper`` and ``axis`` to
   specify the axis to target with ``mapper``, or ``index`` and
   ``columns``.
  axis : int or str, optional
   axis to target with ``mapper``. can be either the axis name
   ('index', 'columns') or number (0, 1). the default is 'index'.
  copy : boolean, default true
   also copy underlying data
  inplace : boolean, default false
   whether to return a new dataframe. if true then value of copy is
   ignored.
  level : int or level name, default none
   in case of a multiindex, only rename labels in the specified
   level.
  returns
  -------
  renamed : dataframe
  see also
  --------
  pandas.dataframe.rename_axis
  examples
  --------
  ``dataframe.rename`` supports two calling conventions
  * ``(index=index_mapper, columns=columns_mapper, ...)``
  * ``(mapper, axis={'index', 'columns'}, ...)``
  we *highly* recommend using keyword arguments to clarify your
  intent.
  >>> df = pd.dataframe({"a": [1, 2, 3], "b": [4, 5, 6]})
  >>> df.rename(index=str, columns={"a": "a", "b": "c"})
   a c
  0 1 4
  1 2 5
  2 3 6
 
  >>> df.rename(index=str, columns={"a": "a", "c": "c"})
   a b
  0 1 4
  1 2 5
  2 3 6
 
  using axis-style parameters
 
  >>> df.rename(str.lower, axis='columns')
   a b
  0 1 4
  1 2 5
  2 3 6
 
  >>> df.rename({1: 2, 2: 4}, axis='index')
   a b
  0 1 4
  2 2 5
  4 3 6
  """
  axes = validate_axis_style_args(self, args, kwargs, 'mapper', 'rename')
  kwargs.update(axes)
  # pop these, since the values are in `kwargs` under different names
  kwargs.pop('axis', none)
  kwargs.pop('mapper', none)
  return super(dataframe, self).rename(**kwargs)

注意：

一个*，输入可以是数组、元组，会把输入的数组或元组拆分成一个个元素。

两个*，输入必须是字典格式

示例：

>>>import pandas as pd
>>>a = pd.dataframe({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]})
>>> a 
 a b c
0 1 4 7
1 2 5 8
2 3 6 9 
 
#将列名a替换为列名a，b改为b，c改为c
>>>a.rename(columns={'a':'a', 'b':'b', 'c':'c'}, inplace = true)
>>>a
 a b c
0 1 4 7
1 2 5 8
2 3 6 9

调整列的顺序：

如：

>>> import pandas
>>> dict_a = {'user_id':['webbang','webbang','webbang'],'book_id':['3713327','4074636','26873486'],'rating':['4','4','4'],
'mark_date':['2017-03-07','2017-03-07','2017-03-07']}
 
>>> df = pandas.dataframe(dict_a) # 从字典创建dataframe
>>> df # 创建好的df列名默认按首字母顺序排序，和字典中的先后顺序并不一样，字典中'user_id','book_id','rating','mark_date'
 
 book_id mark_date rating user_id
0 3713327 2017-03-07 4 webbang
1 4074636 2017-03-07 4 webbang
2 26873486 2017-03-07 4 webbang

直接修改列名：

>>> df = df[['user_id','book_id','rating','mark_date']] # 调整列顺序为'user_id','book_id','rating','mark_date'
>>> df
 
 user_id book_id rating mark_date
0 webbang 3713327 4 2017-03-07
1 webbang 4074636 4 2017-03-07
2 webbang 26873486 4 2017-03-07

就可以了。

以上为个人经验，希望能给大家一个参考，也希望大家多多支持。如有错误或未考虑完全的地方，望不吝赐教。

相关标签： pandas 列顺序添加列

上一篇：等咱有了孩儿

下一篇：江忠源是谁他为什么是曾国藩成功路上的奠基者的其中一位