荐 Python functools.lru_cache 实现高速缓存及其原理

程序员文章站 2022-06-24 19:50:23

Python functools.lru_cache 实现高速缓存及其原理...

Python functools.lru_cache 实现高速缓存及其原理

原理
原理测试

功能
缓存原理测试
LRU算法
图解绘图来源

源码讲解

依赖
lru_cache函数
_lru_cache_wrapper 函数
_make_key 生成缓存存贮的键
func.cache_clear() 清除缓存
func.cache_info() 获取缓存信息

原理

lru_cache的实现，依赖于Python的闭包，以及LRU算法。另外，这个缓存方式是线程安全的，其生命周期，开始于进程创立后的被装饰函数的的第一次运行，直到进程结束

借助Python的闭包，实现函数结果的高速缓存
借助LRU算法（最近最少使用），实现函数结果的高速缓存更新。
如果，maxsize=None，则将禁用LRU功能，并且缓存可以无限增长。
如果将typed设置为true，则将分别缓存不同类型的函数参数。例如，f(3)和f(3.0)将被视为具有不同结果的不同调用
由于使用字典来缓存结果，因此函数的位置和关键字参数必须是可哈希的。
可以将不同的参数模式视为具有单独的缓存项的不同调用。例如，f(a=1,b=2)和f(b=2,a=1) 的关键字参数顺序不同，并且可能具有两个单独的缓存条目。
借助func.cache_info()方法，查看缓存信息
借助func.cache_clear()方法，清除缓存信息

讲解中，注释写的比较明白，如想进一步了解，可以仔细阅读

原理测试

功能

下面这个测试函数，只会在第一次运行的时候，停顿1s。其它次运行，不会停顿

import time
from functools import lru_cache


@lru_cache
def sleep(x, y):
    time.sleep(1)
    print('I woke up')
    return x * 10, y * 10

sleep(1, 2)
sleep(1, 2)

缓存原理测试

可以仔细观察，闭包参数的变化，执行函数前后，变化的参数前五个参数（下表中的最上面的5个参数，参数含义参考下文函数讲解）。由此可见，这个装饰器的缓存位置确实是利用闭包来实现

closure_pre = sleep.__closure__
for i in closure_pre:
    print(f'| {i} | {i.cell_contents} |')

sleep(1, 2)
closure = sleep.__closure__
for i in closure:
    print(f' {i} | {i.cell_contents} |')
print('| 第一次运行前 | 闭包参数 | 第一次运行后 | 闭包参数 |')
print('|--|--|--|--|')

LRU算法

这里主要是借助循环双链表实现的，参考下图，理解LRU实现过程。

初始化
荐
Python functools.lru_cache 实现高速缓存及其原理

添加第一个缓存

荐
Python functools.lru_cache 实现高速缓存及其原理

添加第二个缓存
荐
Python functools.lru_cache 实现高速缓存及其原理

重新运行第一次调用
荐
Python functools.lru_cache 实现高速缓存及其原理

缓存已经到了最大值，最旧的缓存替换为新缓存
荐
Python functools.lru_cache 实现高速缓存及其原理

图解绘图来源

Python Tutor，交互式网站，可以查看变量的引用动画

绘图代码，一次运行下面代码，观察变量变化

# 初始化

PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {}
root = []
root[:] = [root, root, None, None] 

# 添加第一个缓存 
key= 1
result = 10
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link

# 添加第二个缓存

key1 = 2
result1 = 20
last1 = root[PREV]
link1 =  [last, root, key1, result1]
last1[NEXT] = root[PREV] = cache[key1] = link1

# 重新运行第一次调用

link_prev, link_next, _key, result2 = link

link_prev[NEXT] = link_next
link_next[PREV] = link_prev
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root

# 缓存已经到了最大值，最旧的缓存替换为新缓存
key2 = 3
result2 = 30
oldroot = root
oldroot[KEY] = key2
oldroot[RESULT] = result2
root = oldroot[NEXT]
oldkey = root[KEY]
oldresult = root[RESULT]
root[KEY] = root[RESULT] = None
del cache[oldkey]
cache[key2] = oldroot

源码讲解

依赖

from collections import namedtuple
from _thread import RLock

WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__qualname__', '__doc__',
                       '__annotations__')
WRAPPER_UPDATES = ('__dict__',)

_initial_missing = object()

# 类的一种简易声明方式
_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])

lru_cache函数

maxsize：这个参数有两个含义：user_function，直接以lru_cache(user_function)形式使用，maxsize默认为128。另一种含义就是maxsize=maxsize，用来指定最大能缓存几个函数结果。
typed：参数类型是否作为存储键的生成

def lru_cache(maxsize=128, typed=False):
    """
    :param maxsize: 这个参数有两个含义：user_function，直接以lru_cache(user_function)形式使用，maxsize默认为128。另一种含义就是maxsize=maxsize，用来指定最大能缓存几个函数结果。
    :param typed:
    :return:
    """
    if isinstance(maxsize, int):
        # Negative maxsize is treated as 0
        if maxsize < 0:
            maxsize = 0
    elif callable(maxsize) and isinstance(typed, bool):
        # The user_function was passed in directly via the maxsize argument
        user_function, maxsize = maxsize, 128
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)
    elif maxsize is not None:
        raise TypeError(
            'Expected first argument to be an integer, a callable, or None')

    # maxsize=None
    def decorating_function(user_function):
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)

    return decorating_function

_lru_cache_wrapper 函数

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # 一个被装饰的函数对象，所有的调用，共用一个缓存对象
    sentinel = object()  # 未命中缓存时的默认值
    make_key = _make_key  # 用于生成缓存内容的键
    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3  # names for the link fields

    cache = {}  # 闭包缓存内容存储位置
    hits = misses = 0  # hit：命中次数，misses：缓存结果个数
    full = False  # misses是否大于等于maxsize
    cache_get = cache.get  # 为了方便，写的一个中间函数
    cache_len = cache.__len__  # 缓存的长度，未调用，替代len(cache)函数
    lock = RLock()  # 列表的插入不是线程安全的，用这个锁来保证这个装饰器的线程安全
    root = []  # 循环双链表的根
    root[:] = [root, root, None, None]  # 初始化循环双向列表

    if maxsize == 0:
        def wrapper(*args, **kwds):
            # 没有缓存-只是统计信息更新
            nonlocal misses
            misses += 1
            result = user_function(*args, **kwds)
            return result

    elif maxsize is None:
        def wrapper(*args, **kwds):
            # 只会无限添加缓存，不会限制长度，顺序
            nonlocal hits, misses
            # 利用函数参数创建缓存键
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            misses += 1
            result = user_function(*args, **kwds)
            cache[key] = result
            return result

    else:
        def wrapper(*args, **kwds):
            nonlocal root, hits, misses, full
            key = make_key(args, kwds, typed)
            with lock:
                link = cache_get(key)
                if link is not None:
                    # 如果链表中存在这个缓存，将其移到链表的最前面
                    link_prev, link_next, _key, result = link
                    link_prev[NEXT] = link_next
                    link_next[PREV] = link_prev
                    last = root[PREV]
                    last[NEXT] = root[PREV] = link
                    link[PREV] = last
                    link[NEXT] = root
                    hits += 1
                    return result
                misses += 1
            result = user_function(*args, **kwds)
            with lock:
                if key in cache:
                    # 命中缓存
                    pass
                elif full:
                    # 如果缓存已经达到最大，将会删除最旧的缓存，写入新内容
                    oldroot = root
                    oldroot[KEY] = key
                    oldroot[RESULT] = result
                    root = oldroot[NEXT]
                    oldkey = root[KEY]
                    root[KEY] = root[RESULT] = None
                    del cache[oldkey]
                    cache[key] = oldroot
                else:
                    # 添加新内容到缓存中
                    last = root[PREV]
                    link = [last, root, key, result]
                    last[NEXT] = root[PREV] = cache[key] = link
                    full = (cache_len() >= maxsize)
            return result

    def cache_info():
        """缓存内容"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())

    def cache_clear():
        """清除缓存"""
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

_make_key 生成缓存存贮的键

class _HashedSeq(list):
    """ 
     此类保证每个元素调用hash（）的次数不超过一次。 
     这很重要，因为lru_cache（）将在缓存未命中多次哈希键。
    """
	# 这个声明的作用是阻止在实例化类时为实例分配dict，节省存储空间
    __slots__ = 'hashvalue'

    def __init__(self, tup, hash=hash):
        self[:] = tup
        self.hashvalue = hash(tup)

    def __hash__(self):
        return self.hashvalue


def _make_key(args, kwds, typed, kwd_mark=(object(),), fasttypes={int, str}, tuple=tuple, type=type, len=len):
	"""
    键值是由原函数参数和参数类型组成的，参数类型可以不适用
    :param typed: 
    """

    key = args
    if kwds:
        key += kwd_mark
        for item in kwds.items():
            key += item
    if typed:
        key += tuple(type(v) for v in args)
        if kwds:
            key += tuple(type(v) for v in kwds.values())
    elif len(key) == 1 and type(key[0]) in fasttypes:
        return key[0]
    return _HashedSeq(key)

func.cache_clear() 清除缓存

更改闭包参数，置为初始值。

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
	...
	 def cache_clear():
        """清除缓存"""
        # 闭包中，在函数内部，用来更新闭包参数值，须先利用关键字nonlocal声明
        # 作用类似于 global
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

func.cache_info() 获取缓存信息

输出结构声明参见上文的，依赖部分

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
	...
    def cache_info():
        """缓存内容"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())
# 输出内容
# CacheInfo(hits=0, misses=2, maxsize=128, currsize=2)

本文地址：https://blog.csdn.net/wei_bo_cai/article/details/107860395

参数名	第一次运行前	闭包参数	第一次运行后	闭包参数
cache	<cell at 0x000001570D814B80: dict object at 0x000001570D554440>	{}	<cell at 0x000001570D814B80: dict object at 0x000001570D554440>	{[1, 2]: [[[…], […], None, None], [[…], […], None, None], [1, 2], (10, 20)]}
full	<cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770>	False	<cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770>	False
misses	<cell at 0x000001570D814C40: int object at 0x00007FF823311680>	0	<cell at 0x000001570D814C40: int object at 0x00007FF823311680>	s0
hits	<cell at 0x000001570D814D00: int object at 0x00007FF823311680>	0	<cell at 0x000001570D814D00: int object at 0x00007FF8233116A0>	1
root	<cell at 0x000001570D814D30: list object at 0x000001570D7F4C80>	[[…], […], None, None]	<cell at 0x000001570D814D30: list object at 0x000001570D7F4C80>	[[[…], […], [1, 2], (10, 20)], [[…], […], [1, 2], (10, 20)], None, None]
原函数	<cell at 0x000001570D814DF0: function object at 0x000001570D5A8940>	<function sleep at 0x000001570D5A8940>	<cell at 0x000001570D814DF0: function object at 0x000001570D5A8940>	<function sleep at 0x000001570D5A8940>
PREV	<cell at 0x000001570D644250: int object at 0x00007FF823311680>	0	<cell at 0x000001570D644250: int object at 0x00007FF823311680>	0
NEXT	<cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0>	1	<cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0>	1
KEY	<cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0>	2	<cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0>	2
RESULT	<cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0>	3	<cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0>	3
maxsize	<cell at 0x000001570D814CD0: int object at 0x00007FF823312680>	128	<cell at 0x000001570D814CD0: int object at 0x00007FF823312680>	128
typed	<cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770>	False	<cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770>	False
cache_len	<cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20>	<method-wrapper ‘len’ of dict object at 0x000001570D554440>	<cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20>	<method-wrapper ‘len’ of dict object at 0x000001570D554440>
cache_get	<cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0>	<built-in method get of dict object at 0x000001570D554440>	<cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0>	<built-in method get of dict object at 0x000001570D554440>
make_key	<cell at 0x000001570D814CA0: function object at 0x000001570D81C310>	<function _make_key at 0x000001570D81C310>	<cell at 0x000001570D814CA0: function object at 0x000001570D81C310>	<function _make_key at 0x000001570D81C310>
lock	<cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40>	<unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40>	<cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40>	<unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40>

荐 Python functools.lru_cache 实现高速缓存及其原理

Python functools.lru_cache 实现高速缓存及其原理

原理

原理测试

功能

缓存原理测试

LRU算法

图解绘图来源

源码讲解

依赖

lru_cache函数

_lru_cache_wrapper 函数

_make_key 生成缓存存贮的键

func.cache_clear() 清除缓存

func.cache_info() 获取缓存信息

荐 Python functools.lru_cache 实现高速缓存及其原理

荐 逻辑(Logistic)回归原理及Python实现（含数据集）

荐 Python functools.lru_cache 实现高速缓存及其原理

荐逻辑(Logistic)回归原理及Python实现（含数据集）