LRU算法原理解析

程序员文章站 2022-04-03 10:01:24

LRU是Least Recently Used的缩写，即最近最少使用，常用于页面置换算法，是为虚拟页式存储管理服务的。现代操作系统提供了一种对主存的抽象概念虚拟内存，来对主存进行更好地管理。他将主存看成是一个存储在磁盘上的地址空间的高速缓存，在主存中只保存活动区域，并根据需要在主存和磁盘之间来回传 ......

lru是least recently used的缩写，即最近最少使用，常用于页面置换算法，是为虚拟页式存储管理服务的。

现代操作系统提供了一种对主存的抽象概念虚拟内存，来对主存进行更好地管理。他将主存看成是一个存储在磁盘上的地址空间的高速缓存，在主存中只保存活动区域，并根据需要在主存和磁盘之间来回传送数据。虚拟内存被组织为存放在磁盘上的n个连续的字节组成的数组，每个字节都有唯一的虚拟地址，作为到数组的索引。虚拟内存被分割为大小固定的数据块虚拟页(virtual page,vp)，这些数据块作为主存和磁盘之间的传输单元。类似地，物理内存被分割为物理页(physical page,pp)。

虚拟内存使用页表来记录和判断一个虚拟页是否缓存在物理内存中：

LRU算法原理解析

如上图所示，当cpu访问虚拟页vp3时，发现vp3并未缓存在物理内存之中，这称之为缺页，现在需要将vp3从磁盘复制到物理内存中，但在此之前，为了保持原有空间的大小，需要在物理内存中选择一个牺牲页，将其复制到磁盘中，这称之为交换或者页面调度，图中的牺牲页为vp4。把哪个页面调出去可以达到调动尽量少的目的？最好是每次调换出的页面是所有内存页面中最迟将被使用的——这可以最大限度的推迟页面调换，这种算法，被称为理想页面置换算法，但这种算法很难完美达到。

为了尽量减少与理想算法的差距，产生了各种精妙的算法，lru算法便是其中一个。

lru原理

lru 算法的设计原则是：如果一个数据在最近一段时间没有被访问到，那么在将来它被访问的可能性也很小。也就是说，当限定的空间已存满数据时，应当把最久没有被访问到的数据淘汰。

根据lru原理和redis实现所示，假定系统为某进程分配了3个物理块，进程运行时的页面走向为 7 0 1 2 0 3 0 4，开始时3个物理块均为空，那么lru算法是如下工作的：

LRU算法原理解析

基于哈希表和双向链表的lru算法实现

如果要自己实现一个lru算法，可以用哈希表加双向链表实现：

LRU算法原理解析

设计思路是，使用哈希表存储 key，值为链表中的节点，节点中存储值，双向链表来记录节点的顺序，头部为最近访问节点。

lru算法中有两种基本操作：

get(key)：查询key对应的节点，如果key存在，将节点移动至链表头部。
set(key, value)：设置key对应的节点的值。如果key不存在，则新建节点，置于链表开头。如果链表长度超标，则将处于尾部的最后一个节点去掉。如果节点存在，更新节点的值，同时将节点置于链表头部。

lru缓存机制

leetcode上有一道关于lru缓存机制的题目：

运用你所掌握的数据结构，设计和实现一个 lru (最近最少使用) 缓存机制。它应该支持以下操作：获取数据 get 和写入数据 put 。

获取数据 get(key) - 如果密钥 (key) 存在于缓存中，则获取密钥的值（总是正数），否则返回 -1。写入数据 put(key, value) - 如果密钥不存在，则写入其数据值。当缓存容量达到上限时，它应该在写入新数据之前删除最近最少使用的数据值，从而为新的数据值留出空间。

进阶:

你是否可以在 o(1) 时间复杂度内完成这两种操作？

示例:
lrucache cache = new lrucache( 2 /* 缓存容量 */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // 返回  1
cache.put(3, 3);    // 该操作会使得密钥 2 作废
cache.get(2);       // 返回 -1 (未找到)
cache.put(4, 4);    // 该操作会使得密钥 1 作废
cache.get(1);       // 返回 -1 (未找到)
cache.get(3);       // 返回  3
cache.get(4);       // 返回  4

我们可以自己实现双向链表，也可以使用现成的数据结构，python中的数据结构ordereddict是一个有序哈希表，可以记住加入哈希表的键的顺序，相当于同时实现了哈希表与双向链表。ordereddict是将最新数据放置于末尾的:

in [35]: from collections import ordereddict

in [36]: lru = ordereddict()

in [37]: lru[1] = 1

in [38]: lru[2] = 2

in [39]: lru
out[39]: ordereddict([(1, 1), (2, 2)])

in [40]: lru.popitem()
out[40]: (2, 2)

ordereddict有两个重要方法：

popitem(last=true): 返回一个键值对，当last=true时，按照lifo的顺序，否则按照fifo的顺序。
move_to_end(key, last=true): 将现有 key 移动到有序字典的任一端。如果 last 为true（默认）则将元素移至末尾；如果 last 为false则将元素移至开头。

删除数据时，可以使用popitem(last=false)将开头最近未访问的键值对删除。访问或者设置数据时，使用move_to_end(key, last=true)将键值对移动至末尾。

代码实现：

from collections import ordereddict


class lrucache:
    def __init__(self, capacity: int):
        self.lru = ordereddict()
        self.capacity = capacity
        
    def get(self, key: int) -> int:
        self._update(key)
        return self.lru.get(key, -1)
        
    def put(self, key: int, value: int) -> none:
        self._update(key)
        self.lru[key] = value
        if len(self.lru) > self.capacity:
            self.lru.popitem(false)
         
    def _update(self, key: int):
        if key in self.lru:
            self.lru.move_to_end(key)

ordereddict源码分析

ordereddict其实也是用哈希表与双向链表实现的：

class ordereddict(dict):
    'dictionary that remembers insertion order'
    # an inherited dict maps keys to values.
    # the inherited dict provides __getitem__, __len__, __contains__, and get.
    # the remaining methods are order-aware.
    # big-o running times for all methods are the same as regular dictionaries.

    # the internal self.__map dict maps keys to links in a doubly linked list.
    # the circular doubly linked list starts and ends with a sentinel element.
    # the sentinel element never gets deleted (this simplifies the algorithm).
    # the sentinel is in self.__hardroot with a weakref proxy in self.__root.
    # the prev links are weakref proxies (to prevent circular references).
    # individual links are kept alive by the hard reference in self.__map.
    # those hard references disappear when a key is deleted from an ordereddict.

    def __init__(*args, **kwds):
        '''initialize an ordered dictionary.  the signature is the same as
        regular dictionaries.  keyword argument order is preserved.
        '''
        if not args:
            raise typeerror("descriptor '__init__' of 'ordereddict' object "
                            "needs an argument")
        self, *args = args
        if len(args) > 1:
            raise typeerror('expected at most 1 arguments, got %d' % len(args))
        try:
            self.__root
        except attributeerror:
            self.__hardroot = _link()
            self.__root = root = _proxy(self.__hardroot)
            root.prev = root.next = root
            self.__map = {}
        self.__update(*args, **kwds)

    def __setitem__(self, key, value,
                    dict_setitem=dict.__setitem__, proxy=_proxy, link=_link):
        'od.__setitem__(i, y) <==> od[i]=y'
        # setting a new item creates a new link at the end of the linked list,
        # and the inherited dictionary is updated with the new key/value pair.
        if key not in self:
            self.__map[key] = link = link()
            root = self.__root
            last = root.prev
            link.prev, link.next, link.key = last, root, key
            last.next = link
            root.prev = proxy(link)
        dict_setitem(self, key, value)

　由源码看出，ordereddict使用self.__map = {}作为哈希表，其中保存了key与链表中的节点link()的键值对，self.__map[key] = link = link():

class _link(object):
    __slots__ = 'prev', 'next', 'key', '__weakref__'

节点link()中保存了指向前一个节点的指针prev，指向后一个节点的指针next以及key值。

而且，这里的链表是一个环形双向链表,ordereddict使用一个哨兵元素root作为链表的head与tail：

   self.__hardroot = _link()
   self.__root = root = _proxy(self.__hardroot)
    root.prev = root.next = root

由__setitem__可知，向ordereddict中添加新值时，链表变为如下的环形结构：

         next             next             next
   root <----> new node1 <----> new node2 <----> root
         prev             prev             prev

root.next为链表的第一个节点，root.prev为链表的最后一个节点。

由于ordereddict继承自dict，键值对是保存在ordereddict自身中的，链表节点中只保存了key，并未保存value。

如果我们要自己实现的话，无需如此复杂，可以将value置于节点之中，链表只需要实现插入最前端与移除最后端节点的功能即可：

from _weakref import proxy as _proxy


class node:
    __slots__ = ('prev', 'next', 'key', 'value', '__weakref__')


class lrucache:

    def __init__(self, capacity: int):
        self.__hardroot = node()
        self.__root = root = _proxy(self.__hardroot)
        root.prev = root.next = root
        self.__map = {}
        self.capacity = capacity
        
    def get(self, key: int) -> int:
        if key in self.__map:
            self.move_to_head(key)
            return self.__map[key].value
        else:
            return -1
         
    def put(self, key: int, value: int) -> none:
        if key in self.__map:
            node = self.__map[key]
            node.value = value
            self.move_to_head(key)
        else:
            node = node()
            node.key = key
            node.value = value
            self.__map[key] = node
            self.add_head(node)
            if len(self.__map) > self.capacity:
                self.rm_tail()
        
    def move_to_head(self, key: int) -> none:
        if key in self.__map:
            node = self.__map[key]
            node.prev.next = node.next
            node.next.prev = node.prev
            head = self.__root.next
            self.__root.next = node
            node.prev = self.__root
            node.next = head
            head.prev = node
    
    def add_head(self, node: node) -> none:
        head = self.__root.next
        self.__root.next = node
        node.prev = self.__root
        node.next = head
        head.prev = node
    
    def rm_tail(self) -> none:
        tail = self.__root.prev
        del self.__map[tail.key]
        tail.prev.next = self.__root
        self.__root.prev = tail.prev

node-lru-cache

在实际应用中，要实现lru缓存算法，还要实现很多额外的功能。

有一个用javascript实现的很好的包：

var lru = require("lru-cache")
  , options = { max: 500
              , length: function (n, key) { return n * 2 + key.length }
              , dispose: function (key, n) { n.close() }
              , maxage: 1000 * 60 * 60 }
  , cache = new lru(options)
  , othercache = new lru(50) // sets just the max size

cache.set("key", "value")
cache.get("key") // "value"

这个包不是用缓存key的数量来判断是否要启动lru淘汰算法，而是使用保存的键值对的实际大小来判断。选项options中可以设置缓存所占空间的上限max，判断键值对所占空间的函数length，还可以设置键值对的过期时间maxage等，有兴趣的可以看下。

参考链接

lru原理和redis实现——一个今日头条的面试题

上一篇：华为Mate30系列上线快速切换应用手势：多任务效率大增

下一篇： Python使用循环嵌套输出九九乘法表

LRU算法原理解析

lru原理

基于哈希表和双向链表的lru算法实现

lru缓存机制

ordereddict源码分析

node-lru-cache

参考链接

深入koa-bodyparser原理解析

JVM运行时数据区原理解析

JVM垃圾回收原理解析

Yii框架中用response保存cookie，用request读取cookie的原理解析

spring boot jar的启动原理解析

PHP实现克鲁斯卡尔算法实例解析

Laravel框架源码解析之模型Model原理与用法解析

Android 操作系统获取Root权限原理详细解析

Python @property装饰器原理解析

机器学习算法（主成分分析原理及应用）

LRU算法原理解析

lru原理

基于哈希表和双向链表的lru算法实现

lru缓存机制

ordereddict源码分析

node-lru-cache

参考链接

深入koa-bodyparser原理解析

JVM运行时数据区原理解析

JVM垃圾回收原理解析

Yii框架中用response保存cookie，用request读取cookie的原理解析

spring boot jar的启动原理解析

PHP实现克鲁斯卡尔算法实例解析

Laravel框架源码解析之模型Model原理与用法解析

Android 操作系统获取Root权限 原理详细解析

Python @property装饰器原理解析

机器学习算法（主成分分析原理及应用）

Android 操作系统获取Root权限原理详细解析