Use the SUN Cereal machines in Hill 248 to run all the programs. Note that the descriptions here are minimal - you are expected to provide more details. :) Since i wrote up most of these in a day, they may not be the best solutions, but they are illustrative. Please point out bugs/typos (from cut&paste) etc. if any.
(A) Cache measurements
(1) Write a C program to measure the size of L1 and L2 caches - using capacity misses.
To determine the cache size, the approach here is to try to access blocks of data of increasing size until the size becomes greater than the L1/L2
cache size. We notice a jump in the access times when we cross these boundaries(L1 and L2 cache sizes).
Accesses must be done in terms of cache lines. The cache line size also can be determined by accessing strides of various lengths. When one
crosses the size of the cache line, a difference in the access time can be noticed. I'm assuming a cache line size of 32 bytes for the L1
cache and 64 bytes for L2.
The following program determines the L1 & L2 cache sizes using capacity
misses. Run the following program multiple times, each time with a different guess for the number
of cache lines. Figure out the cache sizes by noticing a jump in the access times. For example, if you
increase the guess by a factor of two each time, you'll see a jump(by more than a factor of two) when you cross the cache sizes.
The program assumes that the cache is split (i.e. there are separate data and
instruction caches). Click here to see the output and results.
#include <sys/time.h>
#define PAGE_SIZE 8192
#define TOTAL_MEM_SIZE (4096 * 1024)
#define L1_CACHE_LINE_SIZE 32
main(int argc, char *argv[])
{
float total_time;
register int *a, *b, i, num_cache_lines_guess, num_iter, k, l;
struct timeval t1, t2;
if(argc < 2) {
printf("Usage: %s guess_num_cache_lines\n", argv[0]);
exit(1);
}
num_cache_lines_guess = atoi(argv[1]);
b = a = (void *) sbrk(TOTAL_MEM_SIZE);
/* Touch all pages */
for(k = 0; k < TOTAL_MEM_SIZE/PAGE_SIZE; k++, a += 2048)
*a = 1;
num_iter = TOTAL_MEM_SIZE / (num_cache_lines_guess * L1_CACHE_LINE_SIZE);
gettimeofday(&t1,0);
for(a = b,k = 0; k < num_iter; k++, a = b) {
/* Each iteration in the loop accesses one cache line */
for(l = 0; l < num_cache_lines_guess; l++,a += 8)
i = *a;
}
gettimeofday(&t2,0);
total_time = ((t2.tv_sec-t1.tv_sec)*1000000 + (t2.tv_usec-t1.tv_usec));
printf("Access time for block of size %d : %f microseconds\n",
num_cache_lines_guess * L1_CACHE_LINE_SIZE, total_time/num_iter);
}
(2) Write a C program to measure the size of L1 cache - using conflict misses. What are the assumptions under which your program finds the correct size?
Here, the approach is to generate accesses so that the same instruction generates accesses to addresses that map to the same location in cache. Addresses that are "cache size" apart will have this property. Two important assumptions for this program to work are that the cache is direct mapped and virtually addressed. The output and result follows the program.
#include <unistd.h>
#define NUM_PAGES 1024
#define MAX_GUESS 128*1024
main() {
volatile char *a;
register int pagesize, i, incr;
hrtime_t start, end;
long long int calc;
pagesize = sysconf(_SC_PAGESIZE);
printf("PAGESIZE = %d\n", pagesize);
a = (char *) memalign(pagesize, pagesize*NUM_PAGES);
for (i=0; i < NUM_PAGES; i++) a[pagesize*i] = 0;
for(incr=1; incr < MAX_GUESS; incr*=2) {
start = gethrtime();
for(i=pagesize*NUM_PAGES/4; i<pagesize*(NUM_PAGES/4 + 1); i++)
a[i] += a[i+incr];
end = gethrtime();
calc = (end - start) / (pagesize);
printf("Avg time (%d accesses, incr = %d) = %lld nsec\n",
pagesize, incr, calc);
usleep(1000);
}
}
Results for trix.rutgers.edu
L1 cache size = 16K
Actual data
Avg time (8192 accesses, incr = 1) = 89 nsec
Avg time (8192 accesses, incr = 2) = 86 nsec
Avg time (8192 accesses, incr = 4) = 86 nsec
Avg time (8192 accesses, incr = 8) = 86 nsec
Avg time (8192 accesses, incr = 16) = 86 nsec
Avg time (8192 accesses, incr = 32) = 86 nsec
Avg time (8192 accesses, incr = 64) = 86 nsec
Avg time (8192 accesses, incr = 128) = 86 nsec
Avg time (8192 accesses, incr = 256) = 86 nsec
Avg time (8192 accesses, incr = 512) = 86 nsec
Avg time (8192 accesses, incr = 1024) = 86 nsec
Avg time (8192 accesses, incr = 2048) = 87 nsec
Avg time (8192 accesses, incr = 4096) = 87 nsec
Avg time (8192 accesses, incr = 8192) = 88 nsec
==> L1 size = 16384 bytes
Avg time (8192 accesses, incr = 16384) = 151 nsec
Avg time (8192 accesses, incr = 32768) = 151 nsec
Avg time (8192 accesses, incr = 65536) = 151 nsec
(3)
Determine
the L1 and L2 cache miss penalties (by modifying one of the above programs).
The approach here is to try to access the caches in terms of the respective cache lines, and make sure that each access is a miss. (These
programs actually measure the miss latency. Miss penalty will be
latency - hittime. Measuring hittime is trivial.)
#include <sys/time.h>
#define PAGE_SIZE 8192
#define L1_CACHE_LINE_SIZE 32
#define L1_CACHE_SIZE 16384
main(int argc, char *argv[]) {
float total_time;
int num_accesses;
register int *a, *b, i, num_iter, k, l;
struct timeval t1, t2;
num_iter = 1024;
b = a = (void *) sbrk(2 * L1_CACHE_SIZE);
/* Touch all pages */
for(k = 0; k < 2 * L1_CACHE_SIZE/PAGE_SIZE; k++, a += 2048)
i = *a;
gettimeofday(&t1,0);
for(a = b,k = 0; k < num_iter; k++, a = b) {
/* Alternate accesses to blocks of data,
each of size = L1_CACHE_SIZE. These blocks
mutually knock each other out of the L1 cache,
resulting in L1 cache misses for every access */
/* Each iteration in the loop accesses one cache line */
for(l = 0; l < L1_CACHE_SIZE/L1_CACHE_LINE_SIZE; l++,a += 8)
i = *a;
for(l = 0; l < L1_CACHE_SIZE/L1_CACHE_LINE_SIZE; l++,a += 8)
i = *a;
}
gettimeofday(&t2,0);
total_time = ((t2.tv_sec-t1.tv_sec)*1000000 + (t2.tv_usec-t1.tv_usec));
num_accesses = num_iter * (L1_CACHE_SIZE/L1_CACHE_LINE_SIZE) * 2.0;
printf("L1 cache miss latency: %f microseconds\n", total_time/num_accesses);
}
L1 cache miss latency = ~50 nanoseconds
Actual data (from trix)
> repeat 10 ./miss1 L1 cache miss latency: 0.050288 microseconds L1 cache miss latency: 0.050339 microseconds L1 cache miss latency: 0.050270 microseconds L1 cache miss latency: 0.049543 microseconds L1 cache miss latency: 0.050138 microseconds L1 cache miss latency: 0.049687 microseconds L1 cache miss latency: 0.050120 microseconds L1 cache miss latency: 0.049520 microseconds L1 cache miss latency: 0.050299 microseconds L1 cache miss latency: 0.061378 microseconds
The above program can be modified to measure L2 miss latency by changing the L1_CACHE_SIZE and L1_CACHE_LINE_SIZE to the L2 cache and line sizes respectively (512K and 64 in this case).
L2 cache miss latency = ~291 nanoseconds
> repeat 10 ./miss2 L2 cache miss latency: 0.291649 microseconds L2 cache miss latency: 0.291595 microseconds L2 cache miss latency: 0.291650 microseconds L2 cache miss latency: 0.291899 microseconds L2 cache miss latency: 0.291164 microseconds L2 cache miss latency: 0.295575 microseconds L2 cache miss latency: 0.292015 microseconds L2 cache miss latency: 0.293448 microseconds L2 cache miss latency: 0.291755 microseconds L2 cache miss latency: 0.291537 microseconds
(B) Implementing a shell
Write a simple Unix shell program that
handles I/O redirections (|, <, >, >>)
allows background processing (&)
handle 'cd' (change directory) correctly
(It should also handle white space between components in a single command line correctly. i.e. "cmd | more", "cmd| more", " cmd |more", "cmd|more " should all work.)
Here is the shell program.
(C) Page Fault Measurements
Write C programs to measure the time taken to service a page fault (segmentation violation). This should be measured in two ways:
(1)
Protect a page using
mprotect, and then try to access it. In the handler for segmentation violation, unprotect it.
#include <sys/time.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <unistd.h>
#include <signal.h>
#define ITER 9999
char *a;
int pagesize;
int counter = ITER;
struct timeval begin, end;
void handler() {
if(counter--)
return;
gettimeofday(&end, NULL);
if(mprotect(a, pagesize, PROT_WRITE) <0) {
perror("mprotect:"); exit();
}
}
main() {
struct sigaction sa;
long diff;
pagesize = sysconf(_SC_PAGESIZE);
a = (char *)valloc(pagesize);
sa.sa_handler = handler;
sigaction(SIGSEGV, &sa, NULL);
if(mprotect(a, pagesize, PROT_NONE) <0) {
perror("mprotect:"); exit();
}
gettimeofday(&begin, NULL);
*a = 1;
diff = (end.tv_sec - begin.tv_sec)*1000000 + (end.tv_usec - begin.tv_usec);
printf("fault time = %f microsecs\n", (float)diff/(ITER+1));
}
Results for trix.rutgers.edu
Protection fault handling time in kernel = ~96 microseconds
Actual data
> repeat 10 ./flt1
fault time = 99.181396 microsecs
fault time = 99.248001 microsecs
fault time = 96.362000 microsecs
fault time = 97.499802 microsecs
fault time = 100.148201 microsecs
fault time = 96.206497 microsecs
fault time = 96.246597 microsecs
fault time = 96.569298 microsecs
fault time = 97.553101 microsecs
fault time = 96.353699 microsecs
(2)
Use a null pointer and try to access it. In the signal handler,
skip the faulting instruction by incrementing the program counter, which you can find on the stack. The following code
fragment gets the program counter from the stack.
signal_handler(int t) {
int *p = &t + 42; /* program counter available at *(&t + 42) */
}
#include <sys/time.h>
#include <unistd.h>
#include <signal.h>
#define ITER 9999
int counter = ITER;
struct timeval begin, end;
void handler(int t) {
int *p = &t + 42;
if(counter--)
return;
gettimeofday(&end, NULL);
(*p)+=4;
}
main() {
char *a;
struct sigaction sa;
long diff;
a = NULL;
sa.sa_handler = handler;
sigaction(SIGSEGV, &sa, NULL);
gettimeofday(&begin, NULL);
*a = 1;
diff = (end.tv_sec - begin.tv_sec)*1000000 + (end.tv_usec - begin.tv_usec);
printf("fault time = %f microsecs\n", (float)diff/(ITER+1));
}
Results from trix.rutgers.edu
Segmentation fault handling time in kernel = ~85 microseconds
Actual data
> repeat 10 ./flt2
fault time = 85.142899 microsecs
fault time = 85.683899 microsecs
fault time = 85.086304 microsecs
fault time = 86.544403 microsecs
fault time = 87.665703 microsecs
fault time = 87.082603 microsecs
fault time = 85.144501 microsecs
fault time = 85.115402 microsecs
fault time = 87.924004 microsecs
fault time = 86.466599 microsecs