CSCI 432
Operating Systems
Home | Calendar | Assignments | CS@Williams
Project 2 - Virtual Memory Manager
This project will help you understand address spaces and virtual memory
management.
In this project, you will implement an external pager, which is a process that
handles virtual memory requests for application processes. The external pager
("pager" for short) is analogous to the virtual-memory portion of a normal
operating system. The pager will handle address space creation, read and write
faults, address space destruction, and simple argument passing between spaces.
Your pager will manage a fixed range of the address space (called the arena) of
each application that uses it. Your pager will be single threaded, handling
each request to completion before processing the next request. Valid pages in
the arena will be stored in (simulated) physical memory or in (simulated) disk.
Your pager will manage these two resources on behalf of all the applications
using the pager.
In addition to handling page faults, your pager will handle two system calls:
vm_extend and vm_syslog. An application uses vm_extend to ask the pager to
make another virtual page of its arena valid. An application uses vm_syslog to
ask the pager to print a message to its console.
This handout is organized as follows. Section 1 describes how the
infrastructure for this project performs the same tasks as the MMU and
exception mechanism on normal computer systems. Section 2 describes the MMU
used in this project. Section 3 describes the system calls that applications
can use to communicate explicitly with the pager. Section 4 is the main
section; it describes the functionality that you will implement in the
external pager. Section 5 describes how your pager will maintain the emulated
page tables and access physical memory and disk. Section 6 gives some hints
for doing the project, and Sections 8-11 describe the test suite and project
logistics/grading.
Download the starter files from here:
proj2.tar.gz.
1. Infrastructure and System structure
In a normal computer system, the CPU's MMU (memory management unit) and exception mechanism perform several important tasks. The MMU is invoked on every virtual memory access. These important tasks include:
On normal computer systems, system call instructions also invoke the exception
mechanism. When a system call instruction is executed, the exception mechanism
transfers control to the registered kernel handler for this exception.
We provide software infrastructure to emulate the MMU and exception
functionality of normal hardware. To use this infrastructure, each application
that uses the external pager must include "vm_app.h" and link with
"libvm_app.a", and your external pager must include "vm_pager.h" and link with
"libvm_pager.a". You do not need to understand the mechanisms used to emulate
this functionality (in case you're curious, the infrastructure uses mmap,
mprotect, SEGV handlers, named pipes, and remote procedure calls).
Linking with these libraries enables application processes to communicate with
the pager process in the same manner as applications on real hardware
communicate with real operating systems. Applications issue load and store
instructions (compiled from normal C++ variable accesses), and these are
translated or faulted by the infrastructure exactly as in the above
description of the MMU. The infrastructure transfers control on faults and
system calls to the pager, which receives control via function calls.
The following diagram shows how your pager will interact with applications that
use the pager. An application makes a request to the system via the function
calls vm_extend, vm_syslog and vm_yield, or by trying to load or store an
address that is non-resident or protected.
initialize VM system --> vm_init create process --> vm_create end process --> vm_destroy switches to new process --> vm_switch vm_yield --> may switch to new process vm_extend --> system call handler --> vm_extend vm_syslog --> system call handler --> vm_syslog faulting load --> exception handler --> vm_fault faulting store --> exception handler --> vm_fault +-----------+ +----------------+ +----------------+ |APPLICATION| | INFRASTRUCTURE | | EXTERNAL PAGER | +-----------+ +----------------+ +----------------+
Note that there are two versions of vm_extend and vm_syslog: one for applications and one for the pager. The application-side vm_extend/vm_syslog is implemented in libvm_app.a and is called by the application process. The pager-side vm_extend/vm_syslog is implemented by you in your pager. Think of the vm_extend/vm_syslog in libvm_app.a as a system call wrapper, and think of the vm_extend/vm_syslog in your pager as the code that is invoked by the system call. When the application calls its vm_extend/vm_syslog (the one in libvm_app.a), the infrastructure takes care of invoking the system call vm_extend/vm_syslog in your pager. See the header files vm_app.h and vm_pager.h for the actual function declarations.
2. Simulated MMU
The MMU being simulated in this project is a single-level, fixed-size page table. A virtual address is composed of a virtual page number and a page offset:
bit 63-13 bit 12-0 +----------------------+-------------------+ | virtual page number | page offset | +----------------------+-------------------+
The page table used by this MMU is an array of page table entries (PTEs), one PTE per virtual page in the arena. The MMU locates the page table through the page table base register (PTBR). The PTBR is a variable that is declared and defined by the infrastructure (but will be controlled by your pager). The following portion of vm_pager.h describes the arena, PTE, page table, and PTBR.
/* * *********************** * * Definition of arena * * *********************** */ /* pagesize for the machine */ #define VM_PAGESIZE 8192 /* virtual address at which application's arena starts */ #define VM_ARENA_BASEADDR ((void *) 0x60000000) /* virtual page number at which application's arena starts */ #define VM_ARENA_BASEPAGE ((unsigned int) VM_ARENA_BASEADDR / VM_PAGESIZE) /* size (in bytes) of arena */ #define VM_ARENA_SIZE 0x20000000 /* * ************************************** * * Definition of page table structure * * ************************************** */ /* * Format of page table entry. * * read_enable=0 ==> loads to this virtual page will fault * write_enable=0 ==> stores to this virtual page will fault * ppage refers to the physical page for this virtual page (unused if * both read_enable and write_enable are 0) */ typedef struct { unsigned long ppage : 51; /* bit 0-50 */ unsigned int read_enable : 1; /* bit 51 */ unsigned int write_enable : 1; /* bit 52 */ } page_table_entry_t; /* * Format of page table. Entries start at virtual page VM_ARENA_BASEPAGE, * i.e. ptes[0] is the page table entry for virtual page VM_ARENA_BASEPAGE. */ typedef struct { page_table_entry_t ptes[VM_ARENA_SIZE/VM_PAGESIZE]; } page_table_t; /* * MMU's page table base register. This variable is defined by the * infrastructure, but it is controlled completely by the student's pager code. */ extern page_table_t *page_table_base_register;
3. Interface used by applications of the external pager
Applications use three system calls to communicate explicitly with the simulated operating system: vm_extend, vm_syslog, and vm_yield. The prototypes for these system calls are given in the file "vm_app.h":
/* * vm_app.h * * Public routines for clients of the external pager */ #ifndef _VM_APP_H_ #define _VM_APP_H_ /* * vm_extend() -- ask for the lowest invalid virtual page in the process's * arena to be declared valid. Returns the lowest-numbered byte of the new * valid virtual page. E.g., if the valid part of the arena before calling * vm_extend is 0x60000000-0x60003fff, the return value will be 0x60004000, * and the resulting valid part of the arena will be 0x60000000-0x60005fff. * vm_extend will return NULL if the disk is out of swap space. */ extern void *vm_extend(void); /* * vm_syslog() -- ask external pager to log a message (message data must * be in address space controlled by external pager). Logs message of length * len. Returns 0 on success, -1 on failure, */ extern int vm_syslog(void *message, unsigned int len); /* * vm_yield() -- ask operating system to yield the CPU to another process. * The infrastructure's scheduler is non-preemptive, so a process runs until * it calls vm_yield() or exits. */ extern void vm_yield(void); #define VM_PAGESIZE 8192 #endif /* _VM_APP_H_ */
The arena of a process is the range of addresses from (VM_ARENA_BASEADDR) to
(VM_ARENA_BASEADDR + VM_ARENA_SIZE). The arena is initialized to have no valid
virtual pages. An application calls vm_extend to ask for the lowest invalid
page in its arena to be declared valid. vm_extend returns the lowest-numbered
byte of the newly allocated memory. E.g., if the arena before calling
vm_extend is 0x60000000-0x60003fff, the return value of the next vm_extend call
will be 0x60004000, and the resulting valid part of the arena will be
0x60000000-0x60005fff. Each byte of a newly extended virtual page is defined
to be initialized with the value 0. Applications can load or store to any
address on a valid arena page. Depending on the protections and residencies
set by the pager, some of these loads and stores will result in calls to the
pager's vm_fault routine; however these faults are serviced without the
application's knowledge. An application calls vm_syslog to ask the pager to
print a message (all message data should be in the valid part of the arena).
vm_syslog returns 0 on success and -1 on failure.
FYI, the vm_extend interface is similar to the sbrk call provided by Linux (and FreeBSD).
The interface you are used to using to manage dynamic memory (new/malloc and
delete/free) are user-level libraries built on top of sbrk.
The following is a sample application program that uses the external pager.
#include <iostream> #include "vm_app.h" using namespace std; int main() { char *p; p = (char *) vm_extend(); p[0] = 'h'; p[1] = 'e'; p[2] = 'l'; p[3] = 'l'; p[4] = 'o'; vm_syslog(p, 5); }
4. Pager Specification
This section describes the functions you will implement in your external pager:
vm_init, vm_create, vm_fault, vm_destroy, vm_extend, and vm_syslog. Note that
you will not implement main(); instead, main() is included in libvm_pager.a.
The infrastructure will invoke your pager functions as described below.
The following portion of vm_pager.h describes your pager functions:
/* * vm_init * * Called when the pager starts. It should set up any internal data structures * needed by the pager, e.g. physical page bookkeeping, process table, disk * usage table, etc. * * vm_init is passed both the number of physical memory pages and the number * of disk blocks in the raw disk. */ extern void vm_init(unsigned int memory_pages, unsigned int disk_blocks); /* * vm_create * * Called when a new process, with process identifier "pid", is added to the * system. It should create whatever new elements are required for each of * your data structures. The new process will only run when it's switched * to via vm_switch(). */ extern void vm_create(pid_t pid); /* * vm_switch * * Called when the kernel is switching to a new process, with process * identifier "pid". This allows the pager to do any bookkeeping needed to * register the new process. */ extern void vm_switch(pid_t pid); /* * vm_fault * * Called when current process has a fault at virtual address addr. write_flag * is true if the access that caused the fault is a write. * Should return 0 on success, -1 on failure. */ extern int vm_fault(void *addr, bool write_flag); /* * vm_destroy * * Called when current process exits. It should deallocate all resources * held by the current process (page table, physical pages, disk blocks, etc.) */ extern void vm_destroy(); /* * vm_extend * * A request by current process to declare as valid the lowest invalid virtual * page in the arena. It should return the lowest-numbered byte of the new * valid virtual page. E.g., if the valid part of the arena before calling * vm_extend is 0x60000000-0x60003fff, the return value will be 0x60004000, * and the resulting valid part of the arena will be 0x60000000-0x60005fff. * vm_extend should return NULL on error, e.g., if the disk is out of swap * space. */ extern void * vm_extend(); /* * vm_syslog * * A request by current process to log a message that is stored in the process' * arena at address "message" and is of length "len". * * Should return 0 on success, -1 on failure. */ extern int vm_syslog(void *message, unsigned int len);
4.1. vm_init
The infrastructure calls vm_init when the pager starts. Its parameters are the number of physical pages provided in physical memory and the number of disk blocks available on the disk. vm_init should set up whatever data structures you need to begin accepting vm_create and subsequent requests from processes.
4.2. vm_create
The infrastructure calls vm_create when a new application process starts. You should initialize whatever data structures you need to handle this process and its subsequent calls to the library. Its initial page table should be empty, since there are no valid virtual pages in its arena until vm_extend is called. Note that the new process will not be running until after it switched to via vm_switch.
4.3. vm_switch
The infrastructure calls vm_switch when the OS scheduler runs a new process. This allows your pager to do whatever bookkeeping it needs to register the fact that a new process is running.
4.4. vm_extend
vm_extend is called when a process wants to make another virtual page in its
arena valid. vm_extend should return the lowest-numbered byte of the new valid
virtual page. E.g., if the arena before calling vm_extend is
0x60000000-0x60003fff, the return value of vm_extend() will be 0x60004000, and
the resulting valid part of the arena will be 0x60000000-0x60005fff.
vm_extend should ensure that there are enough available disk blocks to hold
all valid virtual pages (this is called "eager" swap allocation). If there
are no free disk blocks, vm_extend should return NULL. The benefit of eager
swap allocation is that applications know at the time of vm_extend that there
is no more swap space, rather than when a page needs to be evicted to disk.
Remember that an application should see each byte of a newly extended virtual
page as initialized with the value 0. However, the actual data initialization
needed to provide this abstraction should be deferred as long as possible.
4.5. vm_fault
The vm_fault routine is called in response to a read or write fault by the
application. Your pager determines which accesses in the arena will generate
faults by setting the read_enable and write_enable fields in the page table.
Your pager determines which physical page is associated with a virtual page by
setting the ppage field in the page table. A physical page should be
associated with at most one virtual page in one process at any given time (no
sharing).
vm_fault should return 0 after successfully handling a fault. vm_fault should
return -1 if the address is to an invalid page.
4.5.1. Non-resident pages
If a fault occurs on a virtual page that is not resident, you must find a
physical page to associate with the virtual page. If there are no free
physical pages, you must create a free physical page by evicting a virtual
page that is currently resident.
Use the "FIFO with second-chance" (clock) algorithm to select a victim. The
clock queue is an ordered list of all valid, resident virtual pages in the
system. When a virtual page is assigned to a physical page, that physical
page should be placed on the tail of the clock queue (and marked as
referenced). To select a victim, remove and examine the physical page at the
head of the queue. If it has been accessed in any way since it was last
placed on the queue, it should be added to the tail of the queue (and its
protection updated appropriately), and victim selection should proceed to the
next page in the queue.
If the page at the head has not been accessed since it was last enqueued, then
its virtual page should be evicted. Dirty and clean pages are treated the same
when selecting a victim page to evict.
Hint: many pages, even those that are not zero-filled, do not need to be paged
out.
Hint: The order of pages in the clock queue may differ from the order of
their physical page numbers.
4.5.2. Resident pages
Your pager controls the page protections for resident pages. Its goal in controlling protections is to maintain any state it needs to defer work and implement the clock replacement algorithm (e.g. dirty and reference bits). An access to a resident page will generate a page fault if the page's protection does not allow the access. On these faults, vm_fault should update state as needed, change the protections on the virtual page, and continue.
4.6. vm_syslog
vm_syslog is called with a pointer to an array of bytes in the current
process's virtual address space and the length of that array. Your pager
should first check that the entire message is in valid pages of the arena.
Return -1 (and don't print anything) if any part of the message is not on a
valid arena page, or if length is zero.
After checking for invalid addresses, your pager should next copy the entire
array into a C++ string in the pager's address space, then print the C++ string
to cout. You should use the following snippet of code for your print statement
(this assumes the C++ string variable is named "s"):
cout << "syslog \t\t\t" << s << endl;
Most of the work in vm_syslog will be copying the array into the pager's C++ string. vm_syslog must handle virtual to physical address translation while copying the message from one address space to another. (Hint: you should treat vm_syslog's accesses to the application's data exactly as if they came from the application program for the purposes of protection, residence, and reference bits. Why?). vm_syslog should copy the application's data starting at the lowest virtual address and proceeding toward the highest virtual address.
4.7. vm_destroy
vm_destroy is called by the infrastructure when the corresponding application exits. This routine must deallocate all resources held by that process. This includes page tables, physical pages, and disk blocks. Physical pages that are released should be put back on the free list.
4.8. Deferring and avoiding work
There are many points in this project where you have some freedom over when
zero-fills, faults, and disk I/O happen. You must defer such work as far into
the future as possible.
Similarly, there are points in this project where careful state maintenance can
help you avoid doing work. Whenever possible, avoid work. For example, if a
page that is being evicted does not need to be written to disk, don't do so.
(However, the victim selection algorithm in Section 5.5.1 must be used as
specified; e.g. don't change the victim selection to avoid writing a page to
disk).
Note that you will need to maintain reference and dirty bits to defer work and
to implement the clock algorithm. Since the MMU for this project does not
maintain dirty or reference bits, your pager will maintain these bits by
generating page faults on appropriate accesses.
If you could possibly defer or avoid some action at the possible expense of
making another necessary, keep in mind that incurring a fault (about 5
microseconds on current hardware) is cheaper than zero-filling a page (30
microseconds), which is in turn cheaper than a disk I/O (10 milliseconds).
For instance, if you have a choice between taking an extra page fault and
causing an extra disk I/O, you should prefer to take the extra fault.
5. Interface used by external pager to access the simulated hardware
This section describes how your external pager will access the simulated
hardware, i.e. physical memory, disk, and MMU.
The following portion of vm_pager.h describes the variables and utility
functions for accessing this hardware.
/* * ********************************************* * * Public interface for the disk abstraction * * ********************************************* * * Disk blocks are numbered from 0 to (disk_blocks-1), where disk_blocks * is the parameter passed in vm_init(). */ /* * disk_read * * read block "block" from the disk into physical memory page "ppage". */ extern void disk_read(unsigned int block, unsigned int ppage); /* * disk_write * * write the contents of physical memory page "ppage" onto disk block "block". */ extern void disk_write(unsigned int block, unsigned int ppage); /* * ******************************************************** * * Public interface for the physical memory abstraction * * ******************************************************** * * Physical memory pages are numbered from 0 to (memory_pages-1), where * memory_pages is the parameter passed in vm_init(). * * Your pager accesses the data in physical memory through the variable * pm_physmem, e.g. ((char *)pm_physmem)[5] is byte 5 in physical memory. */ extern void * pm_physmem;
Physical memory is structured as a contiguous collection of N pages, numbered
from 0 to N-1. It is settable through the -m option when you run the external
pager (e.g. by running "pager -m 4"). The minimum number of physical pages is
2, the maximum is 128, and the default is 4. Your pager can access the data
in physical memory via the array pm_physmem.
The disk is modeled as a single device that is disk_blocks "blocks" long, where
each disk block is the same size as a physical memory page. Your pager will
use two functions: disk_read and disk_write. Arguments to each function are a
disk block number and a physical page number. The disk_write function is used
to write data from a physical page out to disk, and the disk_read function is
used to read data from disk into a physical page.
Your pager controls the operation of the MMU by modifying the contents of
the page table and the variable page_table_base_register.
6. Hints
The first thing you should do is to write down a finite state machine for the
life of a virtual page, from creation via vm_extend to destruction via
vm_destroy. Ask yourself what events can happen to a page at each stage of its
lifetime, and what state you will need to keep to represent each state. As you
design the state machine, try to identify all of the places in the state
machine where work can be deferred or avoided. A large portion of the credit
in this project hinges on having this state machine correct.
Use assertion statements copiously in your process library to check for
unexpected conditions generated by bugs in your program. These error checks
are essential in debugging complex programs, because they help flag error
conditions early.
Read-faults should typically make the virtual page read-only (read_enable=1,
write_enable=0), but NOT always.
Virtual pages will never be write-only (read_enable=0, write_enable=1).
7. Test cases
An integral (and graded) part of writing your pager will be to write a suite of
test cases to validate any pager. This is common practice in the real
world--software companies maintain a suite of test cases for their programs and
use this suite to check the program's correctness after a change. Writing a
comprehensive suite of test cases will deepen your understanding of virtual
memory, and it will help you a lot as you debug your pager. To construct a
good test suite, trace through different transition paths that a page can take
through a pager's state machine, then write a short test case that causes a
page to take each path.
Each test case for the pager will be a short C++ application program that uses
a pager via the interface described in Section 3 (e.g. the example program in
Section 3). Each test case should be run without any arguments and should not
use any input files.
Your test suite may contain up to 20 test cases. Each test case may cause a
correct pager to generate at most 256 KB of output and must take less than 60
seconds to run. These limits are much larger than needed for full credit.
You will submit your suite of test cases together with your pager, and we will
grade your test suite according to how thoroughly it exercises a pager. See
Section 9 for how your test suite will be graded.
Each test case will specify the number of physical memory pages to use when
running the pager (the -m option) for the test case. This parameter will be
communicated via the name of the test case file. Each test case should be of
the following format:
anyName.memoryPages.cc
where memoryPages identifies the number of physical memory pages to use with
the pager, and the parts are separated by periods. Remember that the minimum
number of physical memory pages is 2 and the maximum is 128.
You should test your pager with both single and multiple applications running.
However, your submitted test suite need only be a single process; none of the
buggy pagers used to evaluate your test suite require multi-process
applications to be exposed. If you do want to try your hand at multi-process
test cases, I suggest you run a parent process that then forks child processes
to do the actual pager requests. The parent process can communicate with
child processes by creating pipes before forking. We'll assume the entire set
of application processes is done when the initial process exits, so it should
exit after all the other application processes. Writing a correct
multi-process test case is challenging (which is why we don't require
multi-process test cases in the submitted test suite), but you'll learn some
interesting things.
8. Project logistics
Write your pager in C++ on Linux. The public functions in vm_pager.h are
declared "extern", but all other functions and global variables in your pager
should be declared "static" to prevent naming conflicts with other libraries.
Use g++ (/usr/bin/g++) to compile your programs. You may use any functions
included in the standard C++ library, including (and especially) the STL. You
should not use any libraries other than the standard C++ library. To compile a
pager "pager.cc", use the command "g++ -std=c++11 pager.cc -o pager libvm_pager.a -lcrypto" (of course, you
can add things like -g for debugging and -o to name the executable).
You compile an application just like any other C++ program, except that you
must link with libvm_app.a. E.g. "g++ -std=c++11 app.cc -o app libvm_app.a"
Your pager must be in a single file and must be named "pager.cc".
Here's how to run your pager and an application. First start the pager. The
infrastructure will print a message saying "Pager started with # physical
memory pages", where "#" refers to the number of physical memory pages. After
the pager starts, you can run one or more application processes which will
interact with the pager via the infrastructure. The same user must run the
pager and the applications that use the pager, and all processes must run on
the same computer. I strongly suggest starting with the simple sample application discussed in class.
We will place copies of vm_app.h, vm_pager.h, libvm_app.a, and libvm_pager.a in
proj2.tar.gz. You should make copies of these
files and store them in your own directory so you can do "#include vm_app.h"
in your applications and "#include vm_pager.h" in your pager (without needing
the -I flag to g++).
9. Grading, auto-grading, and formatting
To help you validate your programs, your submissions will be graded
automatically, and the result will be mailed back to you. You may then
continue to work on the project and re-submit.
For this project, you can use up to 7 bonus submissions to the autograder.
The student suite of test cases will be graded according to how thoroughly they
test a pager. We will judge thoroughness of the test suite by how well it
exposes potential bugs in a pager. The auto-grader will first run a test case
with a correct pager and generate the correct output FROM THE PAGER (on stdout,
i.e. the stream used by cout) for this test case. The auto-grader will then
run the test case with a set of buggy pagers. A test case exposes a buggy
pager by causing the buggy pager to generate output (on stdout) that differs
from the correct output. The test suite is graded based on how many of the
buggy pagers were exposed by at least one test case. This is known as
"mutation testing" in the research literature on automated testing.
Because your programs will be auto-graded, you must be careful to follow the
exact rules in the project description:
In addition to the auto-grader's evaluation of your programs' correctness, I will evaluate your programs on issues such as the clarity and completeness of your documentation, coding style, the efficiency, brevity, and understandability of your code, etc. Your pager documentation should give an overall picture of your solution, with enough detail that I can easily read and understand your code. You should present a list of all of the places in your solution that you deferred work; give both the event that you deferred, and the time at which you had to do the work.
10. Turning in the project
Use the submit432 program to submit your files. The full name is /usr/cs-local/432/bin/submit432, but you should update your path so you don't have to keep typing in the full name. submit432 submits the set of files associated with a project part, and is called as follows:
submit432 <project-part> <file1> <file2> <file3> ... <filen>
Here are the files you should submit for each project part:
The official time of submission for your project will be the time of your last submission. If you send in anything after the due date, your project will be considered late.
11. Project Writeup
For your project writeup, I would like you to write a short (2-3 pages) paper that summarizes your project. In particular, you should include i) an introductory section that highlights the purpose of the project, ii) an architectural overview/design section that describes the structure of your code (if a figure makes your discussion more clear, use one!), iii) an evaluation section that discusses your test cases and how you verified the correct behavior of your pager, and iv) a conclusion that draws conclusions and reflects on the assignment in general. The purpose of the writeup is to help you gain experience with technical writing. Submit the writeup on GLOW.