Vitrual Memory
In order to manage memory more efficiently and with fewer errors, modern systems provide an abstraction of main memory known as virtual memory (VM). Virtual memory is an elegant interaction of hardware exceptions, hardware ad-dress translation, main memory, disk files, and kernel software that provides each process with a large, uniform, and private address space.
With one clean mechanism, virtual memory provides three important capabilities.
(1) It uses main memory efficiently by treating it as a cache for an address space stored on disk, keeping only the active areas in main memory, and transferring data back and forth between disk and memory as needed.
(2) It simplifies memory management by providing each process with a uniform address space.
(3) It protects the address space of each process from corruption by other processes.
屁话在前,还是建议先看看《MOS》的内存管理
9.1 Physical and Virtual Addressing
The task of converting a virtual address to a physical one is known as address translation .
9.2 Address Spaces
An address space is an ordered set of nonnegative integer addresses
{0 , 1, 2 ,... }
If the integers in the address space are consecutive, then we say that it is a linear address space .
In a system with virtual memory, the CPU generates virtual addresses from an address space of N = 2^n addresses called the virtual address space :
{0 , 1, 2 ,...,N − 1}
The size of an address space is characterized by the number of bits that are needed to represent the largest address. For example, a virtual address space with N = 2^n addresses is called an n-bit address space.
Modern systems typically support either 32-bit or 64-bit virtual address spaces.
A system also has aphysical address space that corresponds to the M bytes of physical memory in the system:
{0 , 1, 2 ,...,M − 1}
M is not required to be a power of two, but to simplify the discussion we will assume thatM = 2^m.
9.3 VM as a Tool for Caching
Conceptually, a virtual memory is organized as an array of N contiguous byte-sized cells stored on disk. Each byte has a unique virtual address that serves as an index into the array.
Unallocated:
Pages that have not yet been allocated (or created) by the VM system. Unallocated blocks do not have any data associated with them, and thus do not occupy any space on disk.
Cached:
Allocated pages that are currently cached in physical memory.
Uncached:
Allocated pages that are not cached in physical memory.
9.3.1 DRAM Cache Organization
Because of the large miss penalty and the expense of accessing the first byte, virtual pages tend to be large, typically 4 KB to 2 MB.
Finally, because of the large access time of disk, DRAM caches always use write-back instead of write-through.
9.3.2 Page Tables
As with any cache, the VM system must have some way to determine if a virtual page is cached somewhere in DRAM. If so, the system must determine which physical page it is cached in. If there is a miss, the system
must determine where the virtual page is stored on disk, select a victim page in physical memory, and
copy the virtual page from disk to DRAM, replacing the victim page.
A data structure stored in physical memory known as a page table that maps virtual pages to physical pages.
If the valid bit is not set, then a null address indicates that the virtual page has not yet been allocated. Otherwise, the address points to the start of the virtual page on
disk.
9.3.3 Page Hits
Consider what happens when the CPU reads a word of virtual memory contained in VP 2, which is cached in DRAM (Figure 9.5).
Since the valid bit is set, the address translation hardware knows that VP 2 is cached in memory. So it uses the physical memory address in the PTE (which points to the start of the cached page in PP 1) to construct the physical address of the word.
9.3.4 Page Faults
In virtual memory parlance, a DRAM cache miss is known as a page fault. Fig-ure 9.6 shows the state of our example page table before the fault. The CPU has referenced a word
in VP 3, which is not cached in DRAM. The address transla-tion hardware reads PTE 3 from memory, infers from the valid bit that VP 3 is not cached, and triggers a page fault exception.
The page fault exception invokes a page fault exception handler in the kernel, which selects a victim page, in this case VP 4 stored in PP 3. If VP 4 has been modified, then the kernel copies it back to disk. In either case, the kernel modifies the page table entry for VP 4 to reflect the fact that VP 4 is no longer cached in main memory.
Next, the kernel copies VP 3 from disk to PP 3 in memory, updates PTE 3, and then returns. When the handler returns, it restarts the faulting instruction, which resends the faulting virtual address to the address
translation hardware.
But now, VP 3 is cached in main memory, and the page hit is handled normally by the address translation hardware. Figure 9.7 shows the state of our example page table after the page fault.
In virtual memory parlance, blocks are known as pages. The activity of transferring a page between disk and memory is known asswapping or paging . Pages are swapped in (paged
in) from disk to DRAM, and swapped out (paged out) from DRAM to disk. The strategy of waiting until the last moment to swap in a page, when a miss occurs, is known as demand paging
.
9.4 VM as a Tool for Memory Management
In fact, operating systems provide a separate page table, and thus a separate virtual address space, for each process.
Figure 9.9 shows the basic idea. In the example, the page table for process i maps VP 1 to PP 2 and VP 2 to PP 7. Similarly, the page table for processj maps VP 1 to PP 7 and VP 2 to PP 10. Notice that multiple virtual pages can be mapped to the same shared physical page.
9.5 VM as a Tool for Memory Protection
A user process should not be allowed to modify its read-only text section. Nor should it be allowed to read or modify any of the code and data structures in the kernel. It should not be allowed to read or write
the private memory of other processes, and it should not be allowed to modify any virtual pages that are shared with other processes, unless all parties explicitly allow it (via calls to explicit interprocess communication system calls).
As we have seen, providing separate virtual address spaces makes it easy to isolate the private memories of different processes. But the address translation mechanism can be extended in a natural way to provide
even finer access control. Since the address translation hardware reads a PTE each time the CPU generates an address, it is straightforward to control access to the contents of a virtual page by adding some additional permission bits to the PTE. Figure 9.10
shows the general idea.
In this example, we have added three permission bits to each PTE. The SUP bit indicates whether processes must be running in kernel (supervisor) mode to access the page.
9.6 Address Translation
A control register in the CPU, the page table base register (PTBR) points to the current page table. Then-bit virtual address has two components: a p -bit virtual page offset (VPO) and an(n − p)-bit
virtual page number (VPN). The MMU uses the VPN to select the appropriate PTE. For example, VPN 0 selects PTE 0, VPN 1
selects PTE 1, and so on. The corresponding physical address is the concatenation of the physical page number (PPN) from the page table entry and the VPO from the virtual address. Notice that since the physical and virtual pages are both Pbytes, thephysical
page offset (PPO) is identical to the VPO.
Figure 9.13(a) shows the steps that the CPU hardware performs when there is a page hit.
Step 1: The processor generates a virtual address and sends it to the MMU.
Step 2: The MMU generates the PTE address and requests it from the cache/main memory.
Step 3: The cache/main memory returns the PTE to the MMU.
Step 4: The MMU constructs the physical address and sends it to cache/main memory.
Step 5: The cache/main memory returns the requested data word to the processor.
Unlike a page hit, which is handled entirely by hardware, handling a page fault requires cooperation between hardware and the operating system kernel (Figure 9.13(b)).
Steps 1 to 3:The same as Steps 1 to 3 in Figure 9.13(a).
Step 4: The valid bit in the PTE is zero, so the MMU triggers an exception, which transfers control in the CPU to a page fault exception handler in the operating system kernel.
Step 5: The fault handler identifies a victim page in physical memory, and if that page has been modified, pages it out to disk.
Step 6: The fault handler pages in the new page and updates the PTE in memory.
Step 7: The fault handler returns to the original process, causing the faulting instruction to be restarted. The CPU resends the offending virtual address to the MMU. Because the virtual page is now cached in physical memory, there is a hit, and after the MMU
performs the steps in Figure 9.13(b), the main memory returns the requested word to the processor.
9.6.2 Speeding up Address Translation with a TLB
However, many systems try to eliminate even this cost by including a small cache of PTEs in the MMU called a translation lookaside buffer (TLB).
有点断章取义的味道了,这里是为了消除VM translation的长时间消耗,直接採用一个新的硬件去帮助实现page table的检索。TLB
9.6.3 Multi-Level Page Tables
The common approach for compacting the page table is to use a hierarchy of page tables instead. The idea is easiest to understand with a concrete example.
Each PTE in the level-1 table is responsible for mapping a 4MB chunk of the virtual address space, where each chunk consists of 1024 contiguous pages.
Each PTE in a level 2 page table is responsible for mapping a 4KB page of virtual memory, just as before when we looked at single-level page tables.
This scheme reduces memory requirements in two ways.
First, if a PTE in the level 1 table is null, then the corresponding level 2 page table does not even have to exist. This represents a significant potential savings, since most of the 4 GB virtual address space for a typical program is unallocated.
Second, only the level 1 table needs to be in main memory at all times. The level 2 page tables can be created and paged in and out by the VM system as they are needed, which reduces pressure on main memory. Only the most heavily used level 2 page tables need to be cached in main memory.
具体的vitrual address translate into physics address的具体demo看书吧。
。。
,受益匪浅
9.7 Case Study: The Intel Core i7/Linux Memory System
9.7.1 Core i7 Address Translation
The Core i7 uses a four-level page table hierarchy. Each process has its own private page table hierarchy
CPU先产生VA。然后首先检索TLB,假设TLB里面没有,于是通过MMU,去内存的page table 找相应的VA。找到相应的VA储存的PA之后。接着看相应的PA在cache中有没有。有就通过CT | CI |CO 读取数据,没有就去更低一层的储存中找例如说。main memory
9.7.2 Linux Virtual Memory System
Linux maintains a separate virtual address space for each process of the form shown in Figure 9.26.
Linux Virtual Memory Areas
Linux organizes the virtual memory as a collection of areas(also called segments ). An area is a contiguous chunk of existing (allocated) virtual memory whose pages are related in some way. For example, the
code segment, data segment, heap, shared library segment, and user stack are all distinct areas.
vm_start: Points to the beginning of the area
vm_end: Points to the end of the area
vm_prot : Describes the read/write permissions for all of the pages contained in the area
vm_flags: Describes (among other things) whether the pages in the area are shared with other processes or private to this process
vm_next : Points to the next area struct in the list
9.8 Memory Mapping
9.8.4 User-level Memory Mapping with the mmap Function
Unix processes can use the mmapfunction to create new areas of virtual memory and to map objects into these areas.
/***************************************************************
code writer: EOF
code date : 2014.07.27
e-mail:aaa@qq.com
code purpose:
practice for function mmap
void *mmap(void *start,size_t length,int prot,int flags,int fd,off_t offset)
****************************************************************/
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
void mmapcopy(int fd,int size)
{
char* bufp;
bufp = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0);
write(STDOUT_FILENO,bufp,size);
return;
}
int main(int argc,char* argv[])
{
struct stat stat;
int fd;
if(argc != 2)
{
printf("usage: %s <filename>\n",argv[0]);
return 0;
}
fd = open(argv[1],O_RDONLY,0);
fstat(fd,&stat);
mmapcopy(fd,stat.st_size);
return 0;
}
9.9 Dynamic Memory Allocation
9.9.1 The mallocand freeFunctions
Figure 9.34 shows how an implementation ofmallocand freemight manage a (very) small heap of 16 words for a C program. Each box represents a 4-byte word. The heavy-lined rectangles correspond to allocated blocks
(shaded) and free blocks (unshaded). Initially, the heap consists of a single 16-word double-word aligned free block.
Figure 9.34(a):The program asks for a four-word block. Mallocresponds by carving out a four-word block from the front of the free block and returning a pointer to the first word of the block.
Figure 9.34(b): The program requests a five-word block. Mallocresponds by allocating a six-word block from the front of the free block. In this example, malloc pads the block with an extra word in order to keep the free block aligned on
a double-word boundary.
Figure 9.34(c):The program requests a six-word block and mallocresponds by carving out a six-word block from the free block.
Figure 9.34(d): The program frees the six-word block that was allocated in
Figure 9.34(b). Notice that after the call to freereturns, the pointer p2 still points to the freed block. It is the responsibility of the application not to use p2 again until it is reinitialized by a new call
to malloc
Figure 9.34(e):The program requests a two-word block. In this case,malloc allocates a portion of the block that was freed in the previous step and returns a pointer to this new block.
关于malloc申请内存的blocksize 另外专门用别的blog笔记
link:
http://blog.csdn.net/cinmyheart/article/details/38174421
9.9.10 Coalescing Free Blocks
When the allocator frees an allocated block, there might be other free blocks that are adjacent to the newly freed block. Such adjacent free blocks can cause a phenomenon known as false fragmentation, where
there is a lot of available free memory chopped up into small, unusable free blocks. For example, Figure 9.38 shows the result of freeing the block that was allocated in Figure 9.37. The result is two adjacent free blocks with payloads of three words each.
As a result, a subsequent request for a payload of four words would fail, even though the
aggregate size of the two free blocks is large enough to satisfy the request.
To combat false fragmentation, any practical allocator must merge adjacent free blocks in a process known ascoalescing.
就到这里了。
之后的user space 简单的malloc实现和垃圾回收的实现,以及常见和malloc有关的bug会单独用别的blog贴出。