欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

荐 Python functools.lru_cache 实现高速缓存及其原理

程序员文章站 2022-06-24 19:50:23
Python functools.lru_cache 实现高速缓存及其原理...

原理

lru_cache的实现,依赖于Python的闭包,以及LRU算法。另外,这个缓存方式是线程安全的,其生命周期,开始于进程创立后的被装饰函数的的第一次运行,直到进程结束

  1. 借助Python的闭包,实现函数结果的高速缓存
  2. 借助LRU算法(最近最少使用),实现函数结果的高速缓存更新。
  3. 如果,maxsize=None,则将禁用LRU功能,并且缓存可以无限增长。
  4. 如果将typed设置为true,则将分别缓存不同类型的函数参数。例如,f(3)和f(3.0)将被视为具有不同结果的不同调用
  5. 由于使用字典来缓存结果,因此函数的位置和关键字参数必须是可哈希的。
  6. 可以将不同的参数模式视为具有单独的缓存项的不同调用。例如,f(a=1,b=2)f(b=2,a=1) 的关键字参数顺序不同,并且可能具有两个单独的缓存条目。
  7. 借助func.cache_info()方法,查看缓存信息
  8. 借助func.cache_clear()方法,清除缓存信息

讲解中,注释写的比较明白,如想进一步了解,可以仔细阅读

原理测试

功能

下面这个测试函数,只会在第一次运行的时候,停顿1s。其它次运行,不会停顿

import time
from functools import lru_cache


@lru_cache
def sleep(x, y):
    time.sleep(1)
    print('I woke up')
    return x * 10, y * 10

sleep(1, 2)
sleep(1, 2)

缓存原理测试

可以仔细观察,闭包参数的变化,执行函数前后,变化的参数前五个参数(下表中的最上面的5个参数,参数含义参考下文函数讲解)。由此可见,这个装饰器的缓存位置确实是利用闭包来实现

closure_pre = sleep.__closure__
for i in closure_pre:
    print(f'| {i} | {i.cell_contents} |')

sleep(1, 2)
closure = sleep.__closure__
for i in closure:
    print(f' {i} | {i.cell_contents} |')
print('| 第一次运行前 | 闭包参数 | 第一次运行后 | 闭包参数 |')
print('|--|--|--|--|')
参数名 第一次运行前 闭包参数 第一次运行后 闭包参数
cache <cell at 0x000001570D814B80: dict object at 0x000001570D554440> {} <cell at 0x000001570D814B80: dict object at 0x000001570D554440> {[1, 2]: [[[…], […], None, None], [[…], […], None, None], [1, 2], (10, 20)]}
full <cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770> False <cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770> False
misses <cell at 0x000001570D814C40: int object at 0x00007FF823311680> 0 <cell at 0x000001570D814C40: int object at 0x00007FF823311680> s0
hits <cell at 0x000001570D814D00: int object at 0x00007FF823311680> 0 <cell at 0x000001570D814D00: int object at 0x00007FF8233116A0> 1
root <cell at 0x000001570D814D30: list object at 0x000001570D7F4C80> [[…], […], None, None] <cell at 0x000001570D814D30: list object at 0x000001570D7F4C80> [[[…], […], [1, 2], (10, 20)], [[…], […], [1, 2], (10, 20)], None, None]
原函数 <cell at 0x000001570D814DF0: function object at 0x000001570D5A8940> <function sleep at 0x000001570D5A8940> <cell at 0x000001570D814DF0: function object at 0x000001570D5A8940> <function sleep at 0x000001570D5A8940>
PREV <cell at 0x000001570D644250: int object at 0x00007FF823311680> 0 <cell at 0x000001570D644250: int object at 0x00007FF823311680> 0
NEXT <cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0> 1 <cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0> 1
KEY <cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0> 2 <cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0> 2
RESULT <cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0> 3 <cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0> 3
maxsize <cell at 0x000001570D814CD0: int object at 0x00007FF823312680> 128 <cell at 0x000001570D814CD0: int object at 0x00007FF823312680> 128
typed <cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770> False <cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770> False
cache_len <cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20> <method-wrapper ‘len’ of dict object at 0x000001570D554440> <cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20> <method-wrapper ‘len’ of dict object at 0x000001570D554440>
cache_get <cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0> <built-in method get of dict object at 0x000001570D554440> <cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0> <built-in method get of dict object at 0x000001570D554440>
make_key <cell at 0x000001570D814CA0: function object at 0x000001570D81C310> <function _make_key at 0x000001570D81C310> <cell at 0x000001570D814CA0: function object at 0x000001570D81C310> <function _make_key at 0x000001570D81C310>
lock <cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40> <unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40> <cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40> <unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40>

LRU算法

这里主要是借助循环双链表实现的,参考下图,理解LRU实现过程。

初始化
荐
                                                        Python functools.lru_cache 实现高速缓存及其原理

添加第一个缓存

荐
                                                        Python functools.lru_cache 实现高速缓存及其原理

添加第二个缓存
荐
                                                        Python functools.lru_cache 实现高速缓存及其原理

重新运行第一次调用
荐
                                                        Python functools.lru_cache 实现高速缓存及其原理

缓存已经到了最大值,最旧的缓存替换为新缓存
荐
                                                        Python functools.lru_cache 实现高速缓存及其原理

图解绘图来源

Python Tutor,交互式网站,可以查看变量的引用动画

绘图代码,一次运行下面代码,观察变量变化

# 初始化

PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {}
root = []
root[:] = [root, root, None, None] 

# 添加第一个缓存 
key= 1
result = 10
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link

# 添加第二个缓存

key1 = 2
result1 = 20
last1 = root[PREV]
link1 =  [last, root, key1, result1]
last1[NEXT] = root[PREV] = cache[key1] = link1

# 重新运行第一次调用

link_prev, link_next, _key, result2 = link

link_prev[NEXT] = link_next
link_next[PREV] = link_prev
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root

# 缓存已经到了最大值,最旧的缓存替换为新缓存
key2 = 3
result2 = 30
oldroot = root
oldroot[KEY] = key2
oldroot[RESULT] = result2
root = oldroot[NEXT]
oldkey = root[KEY]
oldresult = root[RESULT]
root[KEY] = root[RESULT] = None
del cache[oldkey]
cache[key2] = oldroot

源码讲解

依赖

from collections import namedtuple
from _thread import RLock

WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__qualname__', '__doc__',
                       '__annotations__')
WRAPPER_UPDATES = ('__dict__',)

_initial_missing = object()

# 类的一种简易声明方式
_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])

lru_cache函数

  1. maxsize:这个参数有两个含义:user_function,直接以lru_cache(user_function)形式使用,maxsize默认为128。另一种含义就是maxsize=maxsize,用来指定最大能缓存几个函数结果。
  2. typed:参数类型是否作为存储键的生成
def lru_cache(maxsize=128, typed=False):
    """
    :param maxsize: 这个参数有两个含义:user_function,直接以lru_cache(user_function)形式使用,maxsize默认为128。另一种含义就是maxsize=maxsize,用来指定最大能缓存几个函数结果。
    :param typed:
    :return:
    """
    if isinstance(maxsize, int):
        # Negative maxsize is treated as 0
        if maxsize < 0:
            maxsize = 0
    elif callable(maxsize) and isinstance(typed, bool):
        # The user_function was passed in directly via the maxsize argument
        user_function, maxsize = maxsize, 128
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)
    elif maxsize is not None:
        raise TypeError(
            'Expected first argument to be an integer, a callable, or None')

    # maxsize=None
    def decorating_function(user_function):
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)

    return decorating_function

_lru_cache_wrapper 函数

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # 一个被装饰的函数对象,所有的调用,共用一个缓存对象
    sentinel = object()  # 未命中缓存时的默认值
    make_key = _make_key  # 用于生成缓存内容的键
    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3  # names for the link fields

    cache = {}  # 闭包缓存内容存储位置
    hits = misses = 0  # hit:命中次数,misses:缓存结果个数
    full = False  # misses是否大于等于maxsize
    cache_get = cache.get  # 为了方便,写的一个中间函数
    cache_len = cache.__len__  # 缓存的长度,未调用,替代len(cache)函数
    lock = RLock()  # 列表的插入不是线程安全的,用这个锁来保证这个装饰器的线程安全
    root = []  # 循环双链表的根
    root[:] = [root, root, None, None]  # 初始化循环双向列表

    if maxsize == 0:
        def wrapper(*args, **kwds):
            # 没有缓存-只是统计信息更新
            nonlocal misses
            misses += 1
            result = user_function(*args, **kwds)
            return result

    elif maxsize is None:
        def wrapper(*args, **kwds):
            # 只会无限添加缓存,不会限制长度,顺序
            nonlocal hits, misses
            # 利用函数参数创建缓存键
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            misses += 1
            result = user_function(*args, **kwds)
            cache[key] = result
            return result

    else:
        def wrapper(*args, **kwds):
            nonlocal root, hits, misses, full
            key = make_key(args, kwds, typed)
            with lock:
                link = cache_get(key)
                if link is not None:
                    # 如果链表中存在这个缓存,将其移到链表的最前面
                    link_prev, link_next, _key, result = link
                    link_prev[NEXT] = link_next
                    link_next[PREV] = link_prev
                    last = root[PREV]
                    last[NEXT] = root[PREV] = link
                    link[PREV] = last
                    link[NEXT] = root
                    hits += 1
                    return result
                misses += 1
            result = user_function(*args, **kwds)
            with lock:
                if key in cache:
                    # 命中缓存
                    pass
                elif full:
                    # 如果缓存已经达到最大,将会删除最旧的缓存,写入新内容
                    oldroot = root
                    oldroot[KEY] = key
                    oldroot[RESULT] = result
                    root = oldroot[NEXT]
                    oldkey = root[KEY]
                    root[KEY] = root[RESULT] = None
                    del cache[oldkey]
                    cache[key] = oldroot
                else:
                    # 添加新内容到缓存中
                    last = root[PREV]
                    link = [last, root, key, result]
                    last[NEXT] = root[PREV] = cache[key] = link
                    full = (cache_len() >= maxsize)
            return result

    def cache_info():
        """缓存内容"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())

    def cache_clear():
        """清除缓存"""
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

_make_key 生成缓存存贮的键

class _HashedSeq(list):
    """ 
     此类保证每个元素调用hash()的次数不超过一次。 
     这很重要,因为lru_cache()将在缓存未命中多次哈希键。
    """
	# 这个声明的作用是阻止在实例化类时为实例分配dict,节省存储空间
    __slots__ = 'hashvalue'

    def __init__(self, tup, hash=hash):
        self[:] = tup
        self.hashvalue = hash(tup)

    def __hash__(self):
        return self.hashvalue


def _make_key(args, kwds, typed, kwd_mark=(object(),), fasttypes={int, str}, tuple=tuple, type=type, len=len):
	"""
    键值是由原函数参数和参数类型组成的,参数类型可以不适用
    :param typed: 
    """

    key = args
    if kwds:
        key += kwd_mark
        for item in kwds.items():
            key += item
    if typed:
        key += tuple(type(v) for v in args)
        if kwds:
            key += tuple(type(v) for v in kwds.values())
    elif len(key) == 1 and type(key[0]) in fasttypes:
        return key[0]
    return _HashedSeq(key)

func.cache_clear() 清除缓存

更改闭包参数,置为初始值。

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
	...
	 def cache_clear():
        """清除缓存"""
        # 闭包中,在函数内部,用来更新闭包参数值,须先利用关键字nonlocal声明
        # 作用类似于 global
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

func.cache_info() 获取缓存信息

输出结构声明参见上文的,依赖部分

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
	...
    def cache_info():
        """缓存内容"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())
# 输出内容
# CacheInfo(hits=0, misses=2, maxsize=128, currsize=2)

本文地址:https://blog.csdn.net/wei_bo_cai/article/details/107860395

相关标签: Python