欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  科技

【原创】(十五)Linux内存管理之RMAP

程序员文章站 2022-06-18 21:53:43
背景 By 鲁迅 By 高尔基 说明: 1. Kernel版本:4.14 2. ARM64处理器,Contex A53,双核 3. 使用工具:Source Insight 3.5, Visio 1. 概述 是一种物理地址反向映射虚拟地址的方法。 映射 页表用于虚拟地址到物理地址映射,其中的 页表项记 ......

背景

  • read the fucking source code! --by 鲁迅
  • a picture is worth a thousand words. --by 高尔基

说明:

  1. kernel版本:4.14
  2. arm64处理器,contex-a53,双核
  3. 使用工具:source insight 3.5, visio

1. 概述

rmap反向映射是一种物理地址反向映射虚拟地址的方法。

  • 映射
    页表用于虚拟地址到物理地址映射,其中的pte页表项记录了映射关系,同时struct page结构体中的mapcount字段保存了有多少pte页表项映射了该物理页。

  • 反向映射
    当某个物理地址要进行回收或迁移时,此时需要去找到有多少虚拟地址射在该物理地址,并断开映射处理。在没有反向映射的机制时,需要去遍历进程的页表,这个效率显然是很低下的。反向映射可以找到虚拟地址空间vma,并仅从vma使用的用户页表中取消映射,可以快速解决这个问题。

【原创】(十五)Linux内存管理之RMAP

反向映射的典型应用场景:

  1. kswapd进行页面回收时,需要断开所有映射了该匿名页面的pte表项;
  2. 页面迁移时,需要断开所有映射了该匿名页面的pte表项;

2. 数据结构

反向映射有三个关键的结构体:

  1. struct vm_area_struct,简称vma;
    vma我们在之前的文章中介绍过,用于描述进程地址空间中的一段区域。与反向映射相关的字段如下:
struct vm_area_struct {
...
/*
     * a file's map_private vma can be in both i_mmap tree and anon_vma
     * list, after a cow of one of the file pages.  a map_shared vma
     * can only be in the i_mmap tree.  an anonymous map_private, stack
     * or brk vma (with null file) can only be in an anon_vma list.
     */
    struct list_head anon_vma_chain; /* serialized by mmap_sem &
                      * page_table_lock */
    struct anon_vma *anon_vma;  /* serialized by page_table_lock */
...
}
  1. struct anon_vma,简称av;
    av结构用于管理匿名类型vmas,当有匿名页需要unmap处理时,可以先找到av,然后再通过av进行查找处理。结构如下:
/*
 * the anon_vma heads a list of private "related" vmas, to scan if
 * an anonymous page pointing to this anon_vma needs to be unmapped:
 * the vmas on the list will be related by forking, or by splitting.
 *
 * since vmas come and go as they are split and merged (particularly
 * in mprotect), the mapping field of an anonymous page cannot point
 * directly to a vma: instead it points to an anon_vma, on whose list
 * the related vmas can be easily linked or unlinked.
 *
 * after unlinking the last vma on the list, we must garbage collect
 * the anon_vma object itself: we're guaranteed no page can be
 * pointing to this anon_vma once its vma list is empty.
 */
struct anon_vma {
    struct anon_vma *root;      /* root of this anon_vma tree */
    struct rw_semaphore rwsem;  /* w: modification, r: walking the list */
    /*
     * the refcount is taken on an anon_vma when there is no
     * guarantee that the vma of page tables will exist for
     * the duration of the operation. a caller that takes
     * the reference is responsible for clearing up the
     * anon_vma if they are the last user on release
     */
    atomic_t refcount;

    /*
     * count of child anon_vmas and vmas which points to this anon_vma.
     *
     * this counter is used for making decision about reusing anon_vma
     * instead of forking new one. see comments in function anon_vma_clone.
     */
    unsigned degree;

    struct anon_vma *parent;    /* parent of this anon_vma */

    /*
     * note: the lsb of the rb_root.rb_node is set by
     * mm_take_all_locks() _after_ taking the above lock. so the
     * rb_root must only be read/written after taking the above lock
     * to be sure to see a valid next pointer. the lsb bit itself
     * is serialized by a system wide lock only visible to
     * mm_take_all_locks() (mm_all_locks_mutex).
     */

    /* interval tree of private "related" vmas */
    struct rb_root_cached rb_root;
};
  1. struct anon_vma_chain,简称avc;
    avc是连接vmaav之间的桥梁。
/*
 * the copy-on-write semantics of fork mean that an anon_vma
 * can become associated with multiple processes. furthermore,
 * each child process will have its own anon_vma, where new
 * pages for that process are instantiated.
 *
 * this structure allows us to find the anon_vmas associated
 * with a vma, or the vmas associated with an anon_vma.
 * the "same_vma" list contains the anon_vma_chains linking
 * all the anon_vmas associated with this vma.
 * the "rb" field indexes on an interval tree the anon_vma_chains
 * which link all the vmas associated with this anon_vma.
 */
struct anon_vma_chain {
    struct vm_area_struct *vma;
    struct anon_vma *anon_vma;
    struct list_head same_vma;   /* locked by mmap_sem & page_table_lock */
    struct rb_node rb;          /* locked by anon_vma->rwsem */
    unsigned long rb_subtree_last;
#ifdef config_debug_vm_rb
    unsigned long cached_vma_start, cached_vma_last;
#endif
};

来一张图就清晰明了了:
【原创】(十五)Linux内存管理之RMAP

  • 通过same_vma链表节点,将anon_vma_chain添加到vma->anon_vma_chain链表中;
  • 通过rb红黑树节点,将anon_vma_chain添加到anon_vma->rb_root的红黑树中;

2. 流程分析

先看一下宏观的图:

【原创】(十五)Linux内存管理之RMAP

  • 地址空间vma可以通过页表完成虚拟地址到物理地址的映射;
  • 页框与page结构对应,page结构中的mapping字段指向anon_vma,从而可以通过rmap机制去找到与之关联的vma

2.1 anon_vma_prepare

之前在page fault的文章中,提到过anon_vma_prepare函数,这个函数完成的工作就是为进程地址空间中的vma准备struct anon_vma结构。

调用例程及函数流程如下图所示:

【原创】(十五)Linux内存管理之RMAP

至于vma,av,avc三者之间的关联关系,在上文的图中已经有所描述。

当创建了与vma关联的av后,还有关键的一步需要做完,才能算是真正的把rmap通路打通,那就是让pageav关联起来。只有这样才能通过page找到av,进而找到vma,从而完成对应的pte unmap操作。
【原创】(十五)Linux内存管理之RMAP

2.2 子进程创建anon_vma

父进程通过fork()来创建子进程,子进程会复制整个父进程的地址空间及页表。子进程拷贝了父进程的vma数据结构内容,而子进程创建相应的anon_vma结构,是通过anon_vma_fork()函数来实现的。

anon_vma_fork()效果图如下:

【原创】(十五)Linux内存管理之RMAP

以实际fork()两次为例,发生cow之后,看看三个进程的链接关系,如下图:

【原创】(十五)Linux内存管理之RMAP

2.3 ttu(try to unmap)rmap walk

如果有page被映射到多个虚拟地址,可以通过rmap walk机制来遍历所有的vma,并最终调用回调函数来取消映射。

与之相关的结构体为struct rmap_walk_control,如下:

/*
 * rmap_walk_control: to control rmap traversing for specific needs
 *
 * arg: passed to rmap_one() and invalid_vma()
 * rmap_one: executed on each vma where page is mapped
 * done: for checking traversing termination condition
 * anon_lock: for getting anon_lock by optimized way rather than default
 * invalid_vma: for skipping uninterested vma
 */
struct rmap_walk_control {
    void *arg;
    /*
     * return false if page table scanning in rmap_walk should be stopped.
     * otherwise, return true.
     */
    bool (*rmap_one)(struct page *page, struct vm_area_struct *vma,
                    unsigned long addr, void *arg);
    int (*done)(struct page *page);
    struct anon_vma *(*anon_lock)(struct page *page);
    bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);
};

【原创】(十五)Linux内存管理之RMAP

取消映射的入口为try_to_unmap,流程如下图所示:

【原创】(十五)Linux内存管理之RMAP

基本的套路就是围绕着struct rmap_walk_control结构,初始化回调函数,以便在适当的时候能调用到。

关于取消映射try_to_unmap_one的详细细节就不进一步深入了,把握好大体框架即可。

【原创】(十五)Linux内存管理之RMAP