【原创】(十五)Linux内存管理之RMAP
背景
-
read the fucking source code!
--by 鲁迅 -
a picture is worth a thousand words.
--by 高尔基
说明:
- kernel版本:4.14
- arm64处理器,contex-a53,双核
- 使用工具:source insight 3.5, visio
1. 概述
rmap反向映射
是一种物理地址反向映射虚拟地址的方法。
映射
页表用于虚拟地址到物理地址映射,其中的pte
页表项记录了映射关系,同时struct page
结构体中的mapcount
字段保存了有多少pte
页表项映射了该物理页。反向映射
当某个物理地址要进行回收或迁移时,此时需要去找到有多少虚拟地址射在该物理地址,并断开映射处理。在没有反向映射的机制时,需要去遍历进程的页表,这个效率显然是很低下的。反向映射可以找到虚拟地址空间vma
,并仅从vma
使用的用户页表中取消映射,可以快速解决这个问题。
反向映射的典型应用场景:
-
kswapd
进行页面回收时,需要断开所有映射了该匿名页面的pte表项; - 页面迁移时,需要断开所有映射了该匿名页面的pte表项;
2. 数据结构
反向映射有三个关键的结构体:
-
struct vm_area_struct
,简称vma
;vma
我们在之前的文章中介绍过,用于描述进程地址空间中的一段区域。与反向映射相关的字段如下:
struct vm_area_struct { ... /* * a file's map_private vma can be in both i_mmap tree and anon_vma * list, after a cow of one of the file pages. a map_shared vma * can only be in the i_mmap tree. an anonymous map_private, stack * or brk vma (with null file) can only be in an anon_vma list. */ struct list_head anon_vma_chain; /* serialized by mmap_sem & * page_table_lock */ struct anon_vma *anon_vma; /* serialized by page_table_lock */ ... }
-
struct anon_vma
,简称av
;av
结构用于管理匿名类型vmas
,当有匿名页需要unmap
处理时,可以先找到av
,然后再通过av
进行查找处理。结构如下:
/* * the anon_vma heads a list of private "related" vmas, to scan if * an anonymous page pointing to this anon_vma needs to be unmapped: * the vmas on the list will be related by forking, or by splitting. * * since vmas come and go as they are split and merged (particularly * in mprotect), the mapping field of an anonymous page cannot point * directly to a vma: instead it points to an anon_vma, on whose list * the related vmas can be easily linked or unlinked. * * after unlinking the last vma on the list, we must garbage collect * the anon_vma object itself: we're guaranteed no page can be * pointing to this anon_vma once its vma list is empty. */ struct anon_vma { struct anon_vma *root; /* root of this anon_vma tree */ struct rw_semaphore rwsem; /* w: modification, r: walking the list */ /* * the refcount is taken on an anon_vma when there is no * guarantee that the vma of page tables will exist for * the duration of the operation. a caller that takes * the reference is responsible for clearing up the * anon_vma if they are the last user on release */ atomic_t refcount; /* * count of child anon_vmas and vmas which points to this anon_vma. * * this counter is used for making decision about reusing anon_vma * instead of forking new one. see comments in function anon_vma_clone. */ unsigned degree; struct anon_vma *parent; /* parent of this anon_vma */ /* * note: the lsb of the rb_root.rb_node is set by * mm_take_all_locks() _after_ taking the above lock. so the * rb_root must only be read/written after taking the above lock * to be sure to see a valid next pointer. the lsb bit itself * is serialized by a system wide lock only visible to * mm_take_all_locks() (mm_all_locks_mutex). */ /* interval tree of private "related" vmas */ struct rb_root_cached rb_root; };
-
struct anon_vma_chain
,简称avc
;avc
是连接vma
和av
之间的桥梁。
/* * the copy-on-write semantics of fork mean that an anon_vma * can become associated with multiple processes. furthermore, * each child process will have its own anon_vma, where new * pages for that process are instantiated. * * this structure allows us to find the anon_vmas associated * with a vma, or the vmas associated with an anon_vma. * the "same_vma" list contains the anon_vma_chains linking * all the anon_vmas associated with this vma. * the "rb" field indexes on an interval tree the anon_vma_chains * which link all the vmas associated with this anon_vma. */ struct anon_vma_chain { struct vm_area_struct *vma; struct anon_vma *anon_vma; struct list_head same_vma; /* locked by mmap_sem & page_table_lock */ struct rb_node rb; /* locked by anon_vma->rwsem */ unsigned long rb_subtree_last; #ifdef config_debug_vm_rb unsigned long cached_vma_start, cached_vma_last; #endif };
来一张图就清晰明了了:
- 通过
same_vma
链表节点,将anon_vma_chain
添加到vma->anon_vma_chain
链表中; - 通过
rb
红黑树节点,将anon_vma_chain
添加到anon_vma->rb_root
的红黑树中;
2. 流程分析
先看一下宏观的图:
- 地址空间
vma
可以通过页表完成虚拟地址到物理地址的映射; - 页框与
page
结构对应,page
结构中的mapping
字段指向anon_vma
,从而可以通过rmap
机制去找到与之关联的vma
;
2.1 anon_vma_prepare
之前在page fault
的文章中,提到过anon_vma_prepare
函数,这个函数完成的工作就是为进程地址空间中的vma
准备struct anon_vma
结构。
调用例程及函数流程如下图所示:
至于vma,av,avc
三者之间的关联关系,在上文的图中已经有所描述。
当创建了与vma
关联的av
后,还有关键的一步需要做完,才能算是真正的把rmap
通路打通,那就是让page
与av
关联起来。只有这样才能通过page
找到av
,进而找到vma
,从而完成对应的pte unmap
操作。
2.2 子进程创建anon_vma
父进程通过fork()
来创建子进程,子进程会复制整个父进程的地址空间及页表。子进程拷贝了父进程的vma
数据结构内容,而子进程创建相应的anon_vma
结构,是通过anon_vma_fork()
函数来实现的。
anon_vma_fork()
效果图如下:
以实际fork()
两次为例,发生cow
之后,看看三个进程的链接关系,如下图:
2.3 ttu(try to unmap)
和rmap walk
如果有page
被映射到多个虚拟地址,可以通过rmap walk机制
来遍历所有的vma
,并最终调用回调函数来取消映射。
与之相关的结构体为struct rmap_walk_control
,如下:
/* * rmap_walk_control: to control rmap traversing for specific needs * * arg: passed to rmap_one() and invalid_vma() * rmap_one: executed on each vma where page is mapped * done: for checking traversing termination condition * anon_lock: for getting anon_lock by optimized way rather than default * invalid_vma: for skipping uninterested vma */ struct rmap_walk_control { void *arg; /* * return false if page table scanning in rmap_walk should be stopped. * otherwise, return true. */ bool (*rmap_one)(struct page *page, struct vm_area_struct *vma, unsigned long addr, void *arg); int (*done)(struct page *page); struct anon_vma *(*anon_lock)(struct page *page); bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); };
取消映射的入口为try_to_unmap
,流程如下图所示:
基本的套路就是围绕着struct rmap_walk_control
结构,初始化回调函数,以便在适当的时候能调用到。
关于取消映射try_to_unmap_one
的详细细节就不进一步深入了,把握好大体框架即可。
推荐阅读
-
【原创】(十三)Linux内存管理之vma/malloc/mmap
-
【原创】(十)Linux内存管理 - zoned page frame allocator - 5
-
【原创】(十四)Linux内存管理之page fault处理
-
【原创】(八)Linux内存管理 - zoned page frame allocator - 3
-
【原创】(十二)Linux内存管理之vmap与vmalloc
-
【原创】(九)Linux内存管理 - zoned page frame allocator - 4
-
【原创】(四)Linux内存模型之Sparse Memory Model
-
【原创】(十六)Linux内存管理之CMA
-
【原创】(五)Linux内存管理zone_sizes_init
-
Linux系统内存管理系列之五