删除序列重复元素并保持顺序

程序员文章站 2022-07-03 20:32:19

在一个序列上面保持元素顺序的同时消除重复的值。看到这个标题，是不是会想到使用set() ，但是set()生成的结果顺序会被打乱，达不到保持顺序的目的。如果序列上的值都是 hashable 类型，那么可以很简单的利用集合或者生成器来解决这个问题。hashlib定义如下An object is hashable if it has a hash value which never changesduring its lifetime (it needs a hash() method), and c...

在一个序列上面保持元素顺序的同时消除重复的值。

看到这个标题，是不是会想到使用set() ，但是set()生成的结果顺序会被打乱，达不到保持顺序的目的。

如果序列上的值都是 hashable 类型，那么可以很简单的利用集合或者生成器来解决这个问题。

hashlib定义如下

An object is hashable if it has a hash value which never changes
during its lifetime (it needs a hash() method), and can be
compared to other objects (it needs an eq() or cmp() method).
Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set
member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no
mutable containers (such as lists or dictionaries) are. Objects which
are instances of user-defined classes are hashable by default; they
all compare unequal, and their hash value is their id().

写法如下

def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item  # 把这函数变成一个generator
            seen.add(item)  # 把有效元素加入集合

如果一个函数带了yield，那么他就变成了一个generator，generator在合适的场景内可以极大程度的节省内存，不带yield的写法如下：

def example(items):
    seen = set()
    for item in items:
        if item not in seen:
            seen.add(item)
    return seen

调用测试及其返回

>>> a = [1, 3, 4, 5, 4, 8, 9, 1]
>>> print(dedupe(a), '->', list(dedupe(a)))
>>> print(example(a))

OUTPUT
<generator object dedupe at 0x10d734db0> -> [1, 3, 4, 5, 8, 9]
{1, 3, 4, 5, 8, 9}

关于yield更具体的用法可以参考此链接。

如果你想消除的元素不可哈希，将代码修改为以下的写法即可支持。

def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key[item]
        if val not in seen:
            yield item
            seen.add(val)

调用测试及其返回

>>> a = [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}]
>>> list(dedupe(a, key=lambda d: (d['x'],d['y'])))
>>> list(dedupe(a, key=lambda d: d['x']))

OUTPUT
[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]
[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]

使用生成器函数让我们的函数更加通用，不仅仅是局限于列表处理。比如，如果如果你想读取一个文件，消除重复行，你可以很容易像这样做：

with open(somefile,'r') as f:
	for line in dedupe(f): ...

删除序列重复元素并保持顺序

本文地址：https://blog.csdn.net/WSH_ONLY/article/details/110250631

删除序列重复元素并保持顺序

python消除序列的重复值并保持顺序不变的实例

Python实现连接两个无规则列表后删除重复元素并升序排序的方法

Python cookbook(数据结构与算法)从序列中移除重复项且保持元素间顺序不变的方法

C语言：保持数列有序：有n(约定n＜=100)个整数，已经按照从小到大顺序排列好，现在另外给一个整数x，请将该数插入到序列中，并使新的序列仍然有序。