欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Pandas统计重复的列里面的值方法

程序员文章站 2022-04-19 13:32:58
pandas 代码如下: import pandas as pd import numpy as np salaries = pd.dataframe(...

pandas

代码如下:

import pandas as pd
import numpy as np

salaries = pd.dataframe({
 'name': ['boss', 'lilei', 'lilei', 'han', 'boss', 'boss', 'han', 'boss'],
 'year': [2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017],
 'salary': [1, 2, 3, 4, 5, 6, 7, 8],
 'bonus': [2, 2, 2, 2, 3, 4, 5, 6]
})
print(salaries)
print(salaries['bonus'].duplicated(keep='first'))
print(salaries[salaries['bonus'].duplicated(keep='first')].index)
print(salaries[salaries['bonus'].duplicated(keep='first')])
print(salaries['bonus'].duplicated(keep='last'))
print(salaries[salaries['bonus'].duplicated(keep='last')].index)
print(salaries[salaries['bonus'].duplicated(keep='last')])

输出如下:

 bonus salary year name
0  2  1 2016 boss
1  2  2 2016 lilei
2  2  3 2016 lilei
3  2  4 2016 han
4  3  5 2017 boss
5  4  6 2017 boss
6  5  7 2017 han
7  6  8 2017 boss
0 false
1  true
2  true
3  true
4 false
5 false
6 false
7 false
name: bonus, dtype: bool
int64index([1, 2, 3], dtype='int64')
 bonus salary year name
1  2  2 2016 lilei
2  2  3 2016 lilei
3  2  4 2016 han
0  true
1  true
2  true
3 false
4 false
5 false
6 false
7 false
name: bonus, dtype: bool
int64index([0, 1, 2], dtype='int64')
 bonus salary year name
0  2  1 2016 boss
1  2  2 2016 lilei
2  2  3 2016 lilei

非pandas

对于如nunpy中的这些操作主要如下:

假设有数组

a = np.array([1, 2, 1, 3, 3, 3, 0])

想找出 [1 3]

则有

方法1

m = np.zeros_like(a, dtype=bool)
m[np.unique(a, return_index=true)[1]] = true
a[~m]
方法2

a[~np.in1d(np.arange(len(a)), np.unique(a, return_index=true)[1], assume_unique=true)]
方法3

np.setxor1d(a, np.unique(a), assume_unique=true)
方法4

u, i = np.unique(a, return_inverse=true)
u[np.bincount(i) > 1]
方法5

s = np.sort(a, axis=none)
s[:-1][s[1:] == s[:-1]]

参考:https://*.com/questions/11528078/determining-duplicate-values-in-an-array

以上这篇pandas统计重复的列里面的值方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持。