Home
News
People
Projects
Publications
Workshops
Seminar
Software
Conferences
Funding
Collaborations
Resources
Directions

 

Webmaster

Goals

The goals of our work on DSM are:  

[Top]

Description

Our work continues the research in Home-based Lazy Release Consistent (HLRC) DSM protocols started at Princeton,  focusing on scalability, fault-tolerance, adaptive protocols and non-scientific applications.  We implement the shared memory abstraction as a software layer on top of a fast communication library.  With this layer, a cluster of commodity PCs/workstations can provide the same programming interface as a hardware cache-coherent machine.  The critical question is with what level of performance?  Relaxed consistency models, such as release consistency (RC), are a well-established solution to reduce the communication traffic in software DSM systems and thus improve performance.  Recently, such relaxed consistency models have gained wider acceptance among programmers of parallel applications.

To provide good scalability over a larger class of applications, we are working towards exploiting the benefits of multithreading in our HLRC protocol. We are particularly interested in using multithreading for dynamic reconfiguration of the cluster in order to optimize the performance of parallel applications.We have also explored techniques for optimizing the HLRC protocol by adapting its behavior according to the sharing patterns exhibited by parallel applications. These techniques are reminiscent of the Adaptive DSM System and include home migration, adaptation between single and multiple writer protocols, and adaptation between invalidate and update protocols. The optimized protocol is currently called Home-based Adaptive Protocol (HAP).  This work is done in collaboration with the Parallel Computing Lab of COPPE Systems Engineering/UFRJ, Brazil.

Our fault tolerance research targets scalable distributed programming environments using the DSM abstraction. Examples of such environments are large LAN-based clusters and meta-clusters interconnected by a wide-area network. Fault tolerance support should not add too much overhead during the failure-free operation of the system, and the mechanisms it uses must work without global coordination, which may be either expensive or impractical in the targeted environments. We have designed a fault-tolerant DSM based on the HLRC protocol that addresses these issues.

Finally, we are also investigating new application domains for software DSM such as parallel data mining and continuous media applications. We have already developed a parallel data mining engine that achieves comparable on a high-end multiprocessor and a cluster of PCs.

[Top]

Status

 

We have developed a prototype of HLRC using Virtual Interface Architecture (VIA) on a cluster of PCs running Linux.  A source-code distribution of our HLRC protocol for Linux/VIA is available for download.  The next release will include versions of HLRC over VIA on Windows and HLRC over MPI on Linux, and the corresponding versions for SMP PCs.

Programs for shared memory multiprocessors can run on clusters of PCs without modification. Download our software and try!

[Top]


Publications

[Top]


People

Faculty

Liviu Iftode

 

Graduate Students

Murali Rangarajan

 

[Top]

Software
We have made available the source code for our implementation of HLRC DSM for VIA-based PC clusters, along with some documentation, through the following links.

[Top]