荐 Python functools.lru_cache 实现高速缓存及其原理
程序员文章站
2022-06-24 19:50:23
Python functools.lru_cache 实现高速缓存及其原理...
Python functools.lru_cache 实现高速缓存及其原理
原理
lru_cache
的实现,依赖于Python
的闭包,以及LRU
算法。另外,这个缓存方式是线程安全的,其生命周期,开始于进程创立后的被装饰函数的的第一次运行,直到进程结束
- 借助
Python
的闭包,实现函数结果的高速缓存 - 借助
LRU
算法(最近最少使用),实现函数结果的高速缓存更新。 - 如果,
maxsize=None
,则将禁用LRU
功能,并且缓存可以无限增长。 - 如果将
typed
设置为true
,则将分别缓存不同类型的函数参数。例如,f(3)和f(3.0)将被视为具有不同结果的不同调用 - 由于使用字典来缓存结果,因此函数的位置和关键字参数必须是可哈希的。
- 可以将不同的参数模式视为具有单独的缓存项的不同调用。例如,
f(a=1,b=2)
和f(b=2,a=1)
的关键字参数顺序不同,并且可能具有两个单独的缓存条目。 - 借助
func.cache_info()
方法,查看缓存信息 - 借助
func.cache_clear()
方法,清除缓存信息
讲解中,注释写的比较明白,如想进一步了解,可以仔细阅读
原理测试
功能
下面这个测试函数,只会在第一次运行的时候,停顿1s。其它次运行,不会停顿
import time
from functools import lru_cache
@lru_cache
def sleep(x, y):
time.sleep(1)
print('I woke up')
return x * 10, y * 10
sleep(1, 2)
sleep(1, 2)
缓存原理测试
可以仔细观察,闭包参数的变化,执行函数前后,变化的参数前五个参数(下表中的最上面的5个参数,参数含义参考下文函数讲解)。由此可见,这个装饰器的缓存位置确实是利用闭包来实现
closure_pre = sleep.__closure__
for i in closure_pre:
print(f'| {i} | {i.cell_contents} |')
sleep(1, 2)
closure = sleep.__closure__
for i in closure:
print(f' {i} | {i.cell_contents} |')
print('| 第一次运行前 | 闭包参数 | 第一次运行后 | 闭包参数 |')
print('|--|--|--|--|')
参数名 | 第一次运行前 | 闭包参数 | 第一次运行后 | 闭包参数 |
---|---|---|---|---|
cache | <cell at 0x000001570D814B80: dict object at 0x000001570D554440> | {} | <cell at 0x000001570D814B80: dict object at 0x000001570D554440> | {[1, 2]: [[[…], […], None, None], [[…], […], None, None], [1, 2], (10, 20)]} |
full | <cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770> | False | <cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770> | False |
misses | <cell at 0x000001570D814C40: int object at 0x00007FF823311680> | 0 | <cell at 0x000001570D814C40: int object at 0x00007FF823311680> | s0 |
hits | <cell at 0x000001570D814D00: int object at 0x00007FF823311680> | 0 | <cell at 0x000001570D814D00: int object at 0x00007FF8233116A0> | 1 |
root | <cell at 0x000001570D814D30: list object at 0x000001570D7F4C80> | [[…], […], None, None] | <cell at 0x000001570D814D30: list object at 0x000001570D7F4C80> | [[[…], […], [1, 2], (10, 20)], [[…], […], [1, 2], (10, 20)], None, None] |
原函数 | <cell at 0x000001570D814DF0: function object at 0x000001570D5A8940> | <function sleep at 0x000001570D5A8940> | <cell at 0x000001570D814DF0: function object at 0x000001570D5A8940> | <function sleep at 0x000001570D5A8940> |
PREV | <cell at 0x000001570D644250: int object at 0x00007FF823311680> | 0 | <cell at 0x000001570D644250: int object at 0x00007FF823311680> | 0 |
NEXT | <cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0> | 1 | <cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0> | 1 |
KEY | <cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0> | 2 | <cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0> | 2 |
RESULT | <cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0> | 3 | <cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0> | 3 |
maxsize | <cell at 0x000001570D814CD0: int object at 0x00007FF823312680> | 128 | <cell at 0x000001570D814CD0: int object at 0x00007FF823312680> | 128 |
typed | <cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770> | False | <cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770> | False |
cache_len | <cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20> | <method-wrapper ‘len’ of dict object at 0x000001570D554440> | <cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20> | <method-wrapper ‘len’ of dict object at 0x000001570D554440> |
cache_get | <cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0> | <built-in method get of dict object at 0x000001570D554440> | <cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0> | <built-in method get of dict object at 0x000001570D554440> |
make_key | <cell at 0x000001570D814CA0: function object at 0x000001570D81C310> | <function _make_key at 0x000001570D81C310> | <cell at 0x000001570D814CA0: function object at 0x000001570D81C310> | <function _make_key at 0x000001570D81C310> |
lock | <cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40> | <unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40> | <cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40> | <unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40> |
LRU算法
这里主要是借助循环双链表实现的,参考下图,理解LRU实现过程。
初始化
添加第一个缓存
添加第二个缓存
重新运行第一次调用
缓存已经到了最大值,最旧的缓存替换为新缓存
图解绘图来源
Python Tutor,交互式网站,可以查看变量的引用动画
绘图代码,一次运行下面代码,观察变量变化
# 初始化
PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {}
root = []
root[:] = [root, root, None, None]
# 添加第一个缓存
key= 1
result = 10
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link
# 添加第二个缓存
key1 = 2
result1 = 20
last1 = root[PREV]
link1 = [last, root, key1, result1]
last1[NEXT] = root[PREV] = cache[key1] = link1
# 重新运行第一次调用
link_prev, link_next, _key, result2 = link
link_prev[NEXT] = link_next
link_next[PREV] = link_prev
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root
# 缓存已经到了最大值,最旧的缓存替换为新缓存
key2 = 3
result2 = 30
oldroot = root
oldroot[KEY] = key2
oldroot[RESULT] = result2
root = oldroot[NEXT]
oldkey = root[KEY]
oldresult = root[RESULT]
root[KEY] = root[RESULT] = None
del cache[oldkey]
cache[key2] = oldroot
源码讲解
依赖
from collections import namedtuple
from _thread import RLock
WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__qualname__', '__doc__',
'__annotations__')
WRAPPER_UPDATES = ('__dict__',)
_initial_missing = object()
# 类的一种简易声明方式
_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])
lru_cache函数
- maxsize:这个参数有两个含义:user_function,直接以lru_cache(user_function)形式使用,maxsize默认为128。另一种含义就是maxsize=maxsize,用来指定最大能缓存几个函数结果。
- typed:参数类型是否作为存储键的生成
def lru_cache(maxsize=128, typed=False):
"""
:param maxsize: 这个参数有两个含义:user_function,直接以lru_cache(user_function)形式使用,maxsize默认为128。另一种含义就是maxsize=maxsize,用来指定最大能缓存几个函数结果。
:param typed:
:return:
"""
if isinstance(maxsize, int):
# Negative maxsize is treated as 0
if maxsize < 0:
maxsize = 0
elif callable(maxsize) and isinstance(typed, bool):
# The user_function was passed in directly via the maxsize argument
user_function, maxsize = maxsize, 128
wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
return update_wrapper(wrapper, user_function)
elif maxsize is not None:
raise TypeError(
'Expected first argument to be an integer, a callable, or None')
# maxsize=None
def decorating_function(user_function):
wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
return update_wrapper(wrapper, user_function)
return decorating_function
_lru_cache_wrapper 函数
def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
# 一个被装饰的函数对象,所有的调用,共用一个缓存对象
sentinel = object() # 未命中缓存时的默认值
make_key = _make_key # 用于生成缓存内容的键
PREV, NEXT, KEY, RESULT = 0, 1, 2, 3 # names for the link fields
cache = {} # 闭包缓存内容存储位置
hits = misses = 0 # hit:命中次数,misses:缓存结果个数
full = False # misses是否大于等于maxsize
cache_get = cache.get # 为了方便,写的一个中间函数
cache_len = cache.__len__ # 缓存的长度,未调用,替代len(cache)函数
lock = RLock() # 列表的插入不是线程安全的,用这个锁来保证这个装饰器的线程安全
root = [] # 循环双链表的根
root[:] = [root, root, None, None] # 初始化循环双向列表
if maxsize == 0:
def wrapper(*args, **kwds):
# 没有缓存-只是统计信息更新
nonlocal misses
misses += 1
result = user_function(*args, **kwds)
return result
elif maxsize is None:
def wrapper(*args, **kwds):
# 只会无限添加缓存,不会限制长度,顺序
nonlocal hits, misses
# 利用函数参数创建缓存键
key = make_key(args, kwds, typed)
result = cache_get(key, sentinel)
if result is not sentinel:
hits += 1
return result
misses += 1
result = user_function(*args, **kwds)
cache[key] = result
return result
else:
def wrapper(*args, **kwds):
nonlocal root, hits, misses, full
key = make_key(args, kwds, typed)
with lock:
link = cache_get(key)
if link is not None:
# 如果链表中存在这个缓存,将其移到链表的最前面
link_prev, link_next, _key, result = link
link_prev[NEXT] = link_next
link_next[PREV] = link_prev
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root
hits += 1
return result
misses += 1
result = user_function(*args, **kwds)
with lock:
if key in cache:
# 命中缓存
pass
elif full:
# 如果缓存已经达到最大,将会删除最旧的缓存,写入新内容
oldroot = root
oldroot[KEY] = key
oldroot[RESULT] = result
root = oldroot[NEXT]
oldkey = root[KEY]
root[KEY] = root[RESULT] = None
del cache[oldkey]
cache[key] = oldroot
else:
# 添加新内容到缓存中
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link
full = (cache_len() >= maxsize)
return result
def cache_info():
"""缓存内容"""
with lock:
return _CacheInfo(hits, misses, maxsize, cache_len())
def cache_clear():
"""清除缓存"""
nonlocal hits, misses, full
with lock:
cache.clear()
root[:] = [root, root, None, None]
hits = misses = 0
full = False
wrapper.cache_info = cache_info
wrapper.cache_clear = cache_clear
return wrapper
_make_key 生成缓存存贮的键
class _HashedSeq(list):
"""
此类保证每个元素调用hash()的次数不超过一次。
这很重要,因为lru_cache()将在缓存未命中多次哈希键。
"""
# 这个声明的作用是阻止在实例化类时为实例分配dict,节省存储空间
__slots__ = 'hashvalue'
def __init__(self, tup, hash=hash):
self[:] = tup
self.hashvalue = hash(tup)
def __hash__(self):
return self.hashvalue
def _make_key(args, kwds, typed, kwd_mark=(object(),), fasttypes={int, str}, tuple=tuple, type=type, len=len):
"""
键值是由原函数参数和参数类型组成的,参数类型可以不适用
:param typed:
"""
key = args
if kwds:
key += kwd_mark
for item in kwds.items():
key += item
if typed:
key += tuple(type(v) for v in args)
if kwds:
key += tuple(type(v) for v in kwds.values())
elif len(key) == 1 and type(key[0]) in fasttypes:
return key[0]
return _HashedSeq(key)
func.cache_clear() 清除缓存
更改闭包参数,置为初始值。
def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
...
def cache_clear():
"""清除缓存"""
# 闭包中,在函数内部,用来更新闭包参数值,须先利用关键字nonlocal声明
# 作用类似于 global
nonlocal hits, misses, full
with lock:
cache.clear()
root[:] = [root, root, None, None]
hits = misses = 0
full = False
func.cache_info() 获取缓存信息
输出结构声明参见上文的,依赖部分
def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
...
def cache_info():
"""缓存内容"""
with lock:
return _CacheInfo(hits, misses, maxsize, cache_len())
# 输出内容
# CacheInfo(hits=0, misses=2, maxsize=128, currsize=2)
本文地址:https://blog.csdn.net/wei_bo_cai/article/details/107860395