The project will be done in two phases, with each phase having some well defined steps. Phase I is to implement sequential consistent SVM. For Phase II, you have a choice of extending Phase I to either a multi-threaded version of Phase I (using the non-preemptive thread library you implemented in Homework 2) or to a single-writer Lazy Release Consistency SVM. Phase II descriptions will be posted shortly.
Keep checking often for updates in the description and details (these will be announced on the mailing list also).
If you want to do your own project, please contact the professor to discuss your idea as soon as possible (before Apr 3rd).
This project requires you to implement protocols for a Shared Virtual Memory
(SVM) system across a network of SUN workstations, using UNIX sockets for UDP-based inter-node communication. The project will exercise your knowledge
on threads, synchronization, virtual memory and socket-based communication.
Shared memory is a simple programming model for parallel computing because communication is implicitly hidden behind memory accesses. If processes X
and Y want to communicate it is enough for X to write at a shared location A which is subsequently read by process Y. Obviously X and Y must synchronize
in order to achieve meaningful communication: first X must write, then Y should read.
Providing support for shared memory programming model is trivial on
uniprocessors. In this case, we do not have parallelism but concurrency. On a multiprocessor, if shared memory is supported in hardware, it again comes
for free from the software support point of view.
You have already come across an example of shared memory programming model,
namely, programming with threads. Threads share the same address space and this is why synchronization is necessary.
When we move to a network of computers like the local area networks in the department, the only communication support available is message passing using
UNIX sockets. In this case communication becomes explicit in the program and some programmers consider this difficult compared to the shared memory
programming model. This is where shared virtual memory comes in. It provides a global shared memory space for processes running at different nodes.
Your task is to provide a shared memory programming model on top of
message-based communication. The shared memory you provide will be an illusion. It doesn't physically exist, but this must be transparent to the
programmer. The programming interface must be identical to the one used to program threads, and the only difference the programmer can notice will be
in terms of performance.
This idea of emulating shared memory in software using message passing communication support was introduced in 1986 by Kai Li. The key support you have to use is the virtual memory mechanism. The illusion of shared memory is created essentially by controlling memory accesses by the use of page protection.
Sequential consistency: Here is a definition by Leslie Lamport:
"...the results of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program."
Sequential consistency requires that a shared memory multiprocessor appear to be a multiprogrammed uniprocessor system to any program running on it. This requires that:
The goal of this phase is to extend the above definitions and requirements from a Shared Memory Multiprocessor to a SVM based cluster.
We shall first implement synchronization primitives - locks and barriers - in a distributed environment.
Locks
A lock is represented as an unsigned variable which is assigned an unique id. Locks can be acquired and released to create mutual exclusion. Each lock
has a home. A simple way is to assign (lock%nprocs) node as the home of the lock. Here nprocs is the number of nodes. The home keeps track of the last
owner of the lock. The lock can be passed from one node to another. At the owner the lock can be in the acquired (BUSY) mode or FREE mode. If it is
free, this means that the owner has released the lock but no one else has asked for it yet. The owner may try to acquire the lock many times, in which
case the home need not be contacted.
When a node wants to acquire a lock it has to send a message to the home of the lock. The home keeps the information about the last owner and forwards
the request to the last owner. As soon as the request is sent, the home updates its last owner field. Note that the last owner might not have got
the lock yet, but will eventually get it.
When a node which is not the home, receives a lock request one of the
following three things are possible
Notice that this scheme implements a distributed queue in which each node waiting for the lock keeps a pointer to the next one. Also, be careful that all three cases may generate race conditions with the release operation, thus requiring a mutex.
Barriers
A barrier is a global synchronization point. When a node arrives at a
barrier, it waits for all the other nodes to arrive at the barrier before
proceeding. A very simple implementation of barriers is the following: When a node arrives at the barrier, it sends (nprocs-1) messages, one to each of
the other nodes. It then waits for (nprocs-1) messages (one from each of the other processors), before proceeding.
There is a slight complication however. A node may arrive at a barrier, send and receive (nprocs-1) messages and move on. It may reach another barrier
and send another (nprocs-1) messages. As a result, a node may receive two barrier messages (but not more than two) from the same remote node. To make
sure that a node receives (nprocs-1) messages from different nodes, you have to count barriers and include the barrier counter in the barrier message. In
fact, since no more than two messages can arrive from the same node, you only have to distinguish between even and odd barriers (using a flip-flop
variable).
Page Management
We are going to have a home-based scheme in which each page of the global memory address space has a home. The homes could be
assigned in a simple manner such as having node number (pagenum%nprocs) as the home of page number
pagenum. The home maintains various information about the page, such as whether the page is read-shared among many processes, or writeable by only
one process, who currently holds read/write permission for the page,
and queues for the requests for the page.
At each node, the entire global virtual address space is replicated. The
granularity of memory allocation (using G_MALLOC, which is a function you
will implement) is a page. A page is either read/writeable (RDWR - and no one else can read/write their copy of the page),
or readable (RDONLY - other nodes could possibly have read permission for their copy
of the page) or neither readable nor writeable (PROTECTED). Initially the home of a page
is the only node with write permission for the page. If the page is readable or
writeable, then the content of the page has to be up-to-date(valid). Otherwise, the content of the page may be
invalid, and read/write operations are not allowed on invalid pages.
Access to pages is controlled by issuing the appropriate mprotect call
whenever the state (RDWR/RDONLY/PROTECTED) changes. Now an invalid access will be caught and the necessary communication can be carried out in
the fault handler (if necessary) before changing the permission for the page.
For example, if a page is read-only and there is an attempt to write to the
page, this should generate a segmentation violation. In the fault handler, the home of the page is determined, and a request is sent to the home to
get write permission for the page. We assume that the home does what is necessary (invalidating the page on all other nodes which have readable
copies) and then sends a message back to the first node that it can now make the page
writeable. When the node gets this message, it can issue an mprotect changing the permission for the page to read/write and then continue.
At the home of a page, there should be a thread waiting for requests, which it satisfies. The home of a page keeps a queue of requests for that page.
It satisfies the request based on the sequential consistency state diagram.
The requests for pages may be of two types, namely read requests and write requests.
Note that there are other possibilities, such as if the home is the owner
itself, in which case, some of the communication is unnecessary.
There are a lot of race conditions here, which you have to watch out for.
Technicalities
Here are hints for the general framework you have to implement:
An instance of the application using the shared memory should be launched simultaneously
(using rsh) on each node of the cluster. Messages are sent as UDP packets. There could be
a dedicated thread which waits for asynchronous requests and replies and services them. These could include requests for locks, read or
write requests for pages at the home, requests to give up or invalidate a page, etc.
You could have three sockets per node. One for asynchronous messages as above, one for barrier messages, and one for synchronous requests/replies performed
by the application like locks, and page requests.
Look at the man pages for mutex, socket, mprotect, memcpy for information on these functions.
You will soon be provided with some macro definitions for the functions that you have to implement (like G_MALLOC), also sample applications.