|
Home
Participating Institutions
Steering Committee
NMADS past events
Related events in the NYC Metro Area
Send mail to NMADS
|
March 24, 2000
IBM T. J. Watson Research Center
(Hawthorne Buildings)
[Directions]
[Schedule]
[List of attendees]
[Invited Talk - slides]
ABSTRACTS
Invited Talk
The Next Generation Internet: Unsafe at any speed?
Kenneth P. Birman , Cornell University
There is a debate concerning the future directions that the Internet should take. Support has been building for the Next Generation
Internet (NGI) Initiative, which would greatly enhance the performance and scalability of the current Internet, while also providing improved
Internet access for K-12 schools and investing in new kinds of Internet uses, such as for providing government services over the network.
These include safety, revenue and life-critical systems which have traditionally
been developed using special-purpose computing technology. The NGI envisions a steady
transition of such systems onto a commercial off-the-shelf technology base. There really isn't any alternative; only standard
technologies yield cost-effective systems. But this begs the question: What needs to be
done to transform the current Internet into the NGI, if critical applications will run upon it? In particular, if the current Internet is unsafe for
such applications, why should we expect the NGI to be more safe? The talk suggests that as things stand, the NGI will not be safe for critical uses,
but also shows how a simple "virtual overlay network" capability could be developed to plug the major deficiency.
Short Talks
Using Jini for Complex Systems Integration
(Ben Jai, Bell Labs)
I will describe the work that we have been doing using a Jini object model as the basis for Systems Integration, in particular where those
systems are large, distributed, complex, and have legacy components. In contrast to other distributed object systems, Jini has the unique
advantage that it is protocol-independent, meaning that any communication protocol can be encapsulated in the object's stub or
proxy that a client obtains from the Lookup Service. This is immensely important for large systems that often have to support many
co-existing protocols.
Contact Information: benjai@research.bell-labs.com
Evolving the Publish-Subscribe Paradigm
(Rob Strom, IBM T. J. Watson Research Center)
We discuss the work in the Gryphon project at IBM TJ Watson Research, to provide robust, scalable, internet-wide event distribution for
business and personal applications. Gryphon began as an effort to extend conventional group-based publish-subscribe to content-based
publish-subscribe. It is now being extended into a system allowing clients to subscribe to "interesting" state derived from published
events and from other state. The design and implementation of Gryphon involves a challenging synthesis of concepts drawn from both database
and distributed messaging technologies. We discuss recent results, and ongoing work.
Contact Information: strom@us.ibm.com
Computing Communities: Transparent Distribution Middleware
(Vijay Karamcheti, New York University)
The heterogeneous and dynamic nature of distributed systems makes it a challenging task to design performance-portable applications. Even
when appropriate techniques have been developed (e.g., for fault-tolerant systems), they have seen restricted deployment because
of the application development barrier: such techniques have typically required applications to adhere to a different API,
ultimately limiting acceptance. The Computing Communities project is designing transparent virtualization and adaptation techniques which
aim to address these shortcomings. These techniques lower the application development barrier by being embodied in a middleware
layer that is inserted between the application and the base operating system (Windows NT in our case) using API interception techniques. We
discuss recent results and ongoing work.
Contact Information: vijayk@cs.nyu.edu
URL: http://www.cs.nyu.edu/cc
Robustness Testing of CORBA ORB Implementations
(Jiantao Pan, AT&T Labs - Research)
Today the Common Object Request Broker Architecture (CORBA) is widely recognized as a standard for heterogeneous distributed computing and
application level integration of COTS and legacy software components.
As the new standard for object management technology in the new century, CORBA and CORBA-enabled solutions have penetrated into many
critical application areas, such as aerospace/defense, banking/finance, e-commerce. The robustness of these applications,
however, is becoming a serious concern. Moreover, as the infrastructure that manages communication and requests, the Object
Request Broker (ORB) implementation plays a critical role in the overall robustness of the whole system. Ungraceful handling of
exceptional inputs may cause crashes of the whole application and result in huge loss. The Ballista® software robustness testing
service provides a scalable, portable way of testing generic software
modules for robustness failures caused by exception handling problems. We have implemented a Ballista® client that can test the
robustness of ORB implementations. This approach enables us to evaluate the robustness of a specific ORB implementation, to compare
different ORB implementations provided by various vendors, and to enhance robustness of a specific ORB implementation.
Contact Information: jpan@research.att.com
Integrating Asynchronous Messaging and Distributed Object
Transactions
(Stefan Tai and Isabelle Rouvello, IBM T. J. Watson Research Center)
Messaging, and distributed transactions, describe two important models for building enterprise-scale software systems. Distributed
object middleware provides services for transaction processing and for asynchronous messaging. However, the integration of distributed
transactions and asynchronous messaging in distributed object systems is mostly undefined and unsupported. In this talk, we argue for a
systematic and well-defined approach to integrating distributed transactions and messaging using middleware. We present first results
from our project of developing a messaging and transaction integration facility, and outline important research topics that need
to be addressed.
Contact Information: stai@us.ibm.com
Perfect Failure Detection in Timed Systems with Hardware Watchdogs
(Christof Fetzer, AT&T Labs - Research)
I address the problem of how a backup computer can decide when it has to take over the services of a primary computer. On one hand, the
backup must not take over as long as the primary is alive because in the considered replication scheme the backup is taking over some of
the network addresses of the primary. On the other hand, when the primary is crashed, the backup has to take over within a bounded
amount of time (unless it has crashed itself). To solve this problem, I define a *timed perfect* failure detector TP that is very similar
to a perfect failure detector. The main difference is that TP has to cope with computers recovering from crash failures and TP has to
detect crashes/recoveries within a maximum known time. I propose a protocol that implements TP in timed systems with three computers
(primary, backup, and a witness) if each of the three computers has a local hardware watchdog.
Contact Information: christof@research.att.com
Exemptions from Coordination and the Origins of Primary Partition
Behavior
(Aleta Ricciardi, Bell Labs)
We describe a generalization of Uniform Distributed Coordination in which not just the faulty processes are exempt from participating. We
define one exemption and show it is the origin of primary-partition behavior, seen in systems that are said to exhibit virtual synchrony.
Our characterization of the exemption allows us to weaken this exemption still further to derive problems that are inherently
solvable in systems that devolve to multiple partitions.
Contact Information: aleta@research.bell-labs.com
Fast Deterministic Consensus in a Noisy Environment
(James Aspnes, Yale University)
It is well known that the consensus problem cannot be solved deterministically in an asynchronous environment, but that randomized
solutions are possible. We propose a new model we call noisy scheduling in which an adversarial schedule is perturbed randomly,
and show that in this model randomness in the environment can substitute for randomness in the algorithm. In particular, we show
that a simplified, deterministic version of Chandra's wait-free shared-memory consensus algorithm solves consensus in time at most
logarithmic in the number of active processes. The proof of
termination is based on showing that a race between independent delayed renewal processes produces a winner quickly. In addition, we
show that the protocol finishes in constant time using quantum and priority-based scheduling on a uniprocessor, suggesting that it is
robust against the choice of model over a wide range.
Contact Information: aspnes@cs.yale.edu
Fault Tolerant HLRC Distributed Shared Memory
(Florin Sultan, Thu Nguyen, and Liviu Iftode, Rutgers University)
We describe a design for integrating fault tolerance in a distributed shared memory (DSM) system based on a Home-based Lazy Release
Consistency (HLRC) protocol. Fault tolerant HLRC uses checkpointing and volatile memory logging for rollback recovery from single node
failures. HLRC has been known as a highly efficient DSM protocol, and therefore any design for fault tolerance support should not adversely
impact its performance. Our recovery support introduces low overhead during failure-free operation by using volatile memory for logging,
and by aggressively exploiting the base DSM protocol. The main goal of this work is to investigate the feasibility and efficiency of
building a fault-tolerant DSM system that can recover from failures using only independent checkpointing and fast log-based recovery.
Independent checkpointing is particularly interesting for computing on large local clusters and on clusters interconnected by a wide area
network, since taking coordinated checkpoints in such cases may be either too expensive or simply impractical. Our approach is motivated by the concept of
meta-clustering (clusters of local-area clusters connected by the Internet), pursued by projects like InterWeave (U.
of Rochester) and Legion (U. of Virginia). The challenge in building such a system lies in controlling the size of the logged and
checkpointed state, without forcing processes to synchronize. We have designed a scheme for lazy log trimming (LLT) and checkpoint garbage
collection (CGC) in a fault-tolerant HLRC DSM system that does not use global synchronization and therefore retains the advantages of
independent checkpointing. We have extended a Home-base LRC system to include our LLT and CGC algorithms along with a local checkpointing
policy, and obtained results indicating that log and checkpoint
management can be effectively performed under this scheme. For three update-intensive applications, we show that the amount of logged and
checkpointed state can be controlled without recourse to global synchronization operations. For high efficiency, our DSM system is
implemented on top of an efficient user-level virtual memory-mapped communication system. An important problem that we had to address was
how to provide the minimal support required for recovering the communication state after the crash of a node. All our checkpointing,
logging and log-management algorithms are light-weight, do not add extra messages to the base DSM protocol, and do not specifically
require global synchronization.
Contact Information: sultan@paul.rutgers.edu
Automatic Configuration and Run-time Adaptation of Distributed
Applications
(Fangzhe Chang and Vijay Karamcheti, New York
University)
Increased platform heterogeneity and varying resource availability in distributed systems motivates the design of resource-aware
applications, which ensure a desired performance level by continuously adapting their behavior to changing resource
characteristics. In this talk, we describe an application-independent adaptation framework that simplifies the design of resource-aware
applications. This framework eliminates the need for adaptation decisions to be explicitly programmed into the application by relying
on two novel components: (1) a tunability interface, which exposes
adaptation choices in the form of alternate application configurations while encapsulating core application functionality;
and (2) a virtual execution environment, which emulates application execution under diverse resource availability enabling off-line
collection of information about resulting behavior. Together, these components permit automatic run-time decisions on "when" to adapt by
continuously monitoring resource conditions and application progress, and "how" to adapt by dynamically choosing an
application configuration most appropriate for the prescribed user preference. We
have evaluated the framework using an interactive distributed image visualization application. The framework permits successful automatic
application adaptation in reaction to changes in CPU load and network bandwidth by choosing a different compression algorithm or
controlling the image transmission sequence so as to satisfy user
preferences of visualization quality and timeliness.
Contact Information: fangzhe@cs.nyu.edu
URL: http://www.cs.nyu.edu/cc
A Component Description Repository
(Francisco Curbera, Sanjiva Weerawarana and Matthew Duftler, IBM T. J. Watson Research Center)
One of the most powerful factors in the development of open industrial sectors is the standardization of industrial components.
This enables industry players to unbind their operations from individual suppliers, and to take advantage of an open market of
commoditized components. Take the case of the new economy now developing on the Internet. Here, the central components are
electronic services, and several standardization processes are on their way to produce common service interfaces in a wide variety of
sectors. From a developer?s perspective, commoditized services can be viewed as standardized software components which are accessed
remotely. The ability to dynamically switch suppliers is provided here by late binding to remote components. In this talk we will
present a mechanism for dynamic binding to remote software components built around a networked repository of component interface
descriptions and a Java API for dynamic resolution of component references. Three key characteristics distinguish this system from
related industry initiatives no assumptions are made about the protocol for component access, rather, this is part of the
component?s description; component interfaces are described in an XML IDL language; finally, matching component interfaces is done by
structural equivalence.
Contact Information: {curbera, sanjiva, duftler}@us.ibm.com
Cluster-Based WWW Servers
(Enrique Vinicio Carrera and Ricardo Bianchini, Rutgers University)
In this talk I will describe and evaluate a new cluster-based network server, the Locality and Load balancing Server (L2S). In essence, L2S
uses cache (memory) locality and load balancing information to distribute requests across the cluster nodes. All cluster nodes can
both forward and service requests and, thus, the server can achieve high throughput and has no single point of failure or potential
bottleneck. L2S can be used to implement a variety of cluster-based network servers, such as ftp or WWW servers, but our evaluation of
L2S concentrates on its application as a WWW server. The evaluation is based on the detailed simulation of a 16-node cluster and on a
performance comparison of L2S against a traditional server and the
best-known locality-conscious server. Our simulation results demonstrate that L2S achieves throughput that is within 20\% of the
full potential of locality-conscious distribution on 16 nodes, significantly outperforming and outscaling the other servers.
Preliminary results of a native implementation of L2S are also promising.
Contact Information: vinicio@cs.rutgers.edu
Prefetching Hyperlinks
(Dan Duchamp, AT&T Labs - Research)
We develop a new method for prefetching Web pages into the browser's cache. Clients send information about hyperlink reference patterns to
Web servers, which aggregate the reference information in near-real-time and then disperse the aggregated information to all
clients, piggybacked on GET responses. The information indicates how often hyperlink URLs embedded in pages have been previously accessed
relative to the embedding page. Based on knowledge about which hyperlinks are generally popular, clients initiate prefetching of the
hyperlinks and their embedded images according to any algorithm they prefer. Both client and server may cap the prefetching mechanism's
space overhead and waste of network resources due to speculation. The result of these differences is improved prefetching lower client
latency (by 52.3\%) and less wasted network bandwidth (24.0\%).
Contact Information: duchamp@research.att.com
Posters
DataSpace - querying and monitoring deeply networked collections in
physical space
(Samir Goel and Tomasz Imielinski, Rutgers University)
The DataSpace is a three dimensional physical space 100 kilometers above and 10 kilometers below the surface of earth that is accessible
to the network. It is addressed geographically as opposed to the current ``logical'' addressing scheme of the Internet. With the
enormous 128 bit addressing space of IPv6, one can individually address every cubic centimeter of the physical space in DataSpace
with approximately 90 bits of area code. This would include every street, building, room, basement or even drawer of a desk. The
DataSpace would thus serve as the host for the entire part of the
physical world that is connected to the network. The DataSpace is populated by a massive number of objects that produce and locally
store data about themselves. In the DataSpace, physical objects are no longer characterized just by shape, size, and color. They are also
characterized by processor type, the amount of memory and the network connection. To support the DataSpace, we propose a version of the
multicast protocol called ``spacecast''. Here, the network plays the role of a Database
machine, handling queries through spacecast which ``illuminate'' selected datacubes and gather multiple responses from
the objects that respond to the query. The objects in a DataSpace must be organized in a way that they can be usefully accessed and
manipulated. This is achieved by grouping objects into classes called dataflocks. Dataflocks are often mobile class of objects that move
through the physical world while still maintaining (some level of) connectivity to the network. Dataflocks are accessed through the
datacubes they are located in and through their own class properties. For new objects, becoming part of a datacube is a simple ``plug and
play'' mechanism. In this work we build on earlier work with Navas where we introduced GeoCast for geographic messaging and have
implemented it within DARPA's GLOMO program.
Contact Information: {gsamir, imielins}@cs.Rutgers.edu
URL: http://paul.rutgers.edu/~gsamir/dataspace/
Transparent Network Connectivity in Dynamic Cluster Environments
(Xiaodong Fu, Hua Wang, and Vijay Karamcheti, New York University)
Improvements in microprocessor and networking performance have made networks of workstations a very attractive platform for high-end
parallel and distributed computing. However, the effective deployment of such environments requires addressing two problems not associated
with dedicated parallel machines: heterogeneous resource capabilities and dynamic availability. Achieving good performance requires that
application components be able to migrate between cluster resources and efficiently adapt to the underlying resource capabilities. An
important component of the required support is maintaining network connectivity, which directly impacts on the transparency of migration
to the application and its performance after migration. Unfortunately, existing approaches rely on either extensive operating
system modifications or new APIs to maintain network connectivity, both of which limits their wider applicability. This poster presents
the design, implementation, and performance of a transparent network connectivity layer for dynamic cluster environments. Our design uses
the techniques of API interception and virtualization to construct a transparent layer in user space; use of the layer requires no
modification either to the application or the underlying operating system and messaging layers. Our layer enables the migration of
application components without breaking network connections, and additionally permits adaptation to the characteristics of the
underlying networking substrate. Experiments with supporting a persistent socket interface in two environments---an Ethernet LAN on
top of TCP/IP, and a Myrinet LAN on top of Fast Messages---show that our approach incurs minimal overheads and can effectively select the
best substrate for implementing application communication requirements.
Contact Information: {xiaodong, wanghua, vijayk}@cs.nyu.edu
URL: http://www.cs.nyu.edu/cc
Software DSM on PC clusters using the Virtual Interface Architecture
(Murali Rangarajan, Zoltan Jarai, Ricardo Bianchini, Thu Nguyen, and
Liviu Iftode, Rutgers University)
Our goal is to provide a shared address space over a cluster of PCs using a software DSM protocol and a low-latency high-speed
interconnect supporting the Virtual Interface Architecture (VIA). VIA standardizes the interface for user-level memory mapped communication
over high-performance System Area Networks(SAN). The focus of our project is on providing a high-performance DSM system that can be
used for scientific as well as non-scientific applications like data mining and continuous media applications. We have implemented a
Home-based Lazy Release Consistent (HLRC) DSM protocol on a cluster of eight PCs connected by a VIA-based Giganet network. Our
preliminary results indicate that software DSM on VIA achieves good performance that compares well with that achieved by similar
protocols on Myrinet-based networks.
Contact Information: muralir@paul.rutgers.edu
Modeling Cluster Failures
Taliver Heath and Rich Martin, Rutgers University)
Our work investigates the distribution of failure rates for a variety of cluster components, including operating system failures, software
failures, and component failures. In particular, we focus on modeling the failure rate of the operating system. Our primary method is
empirical observation of systems running in a variety of environments, ranging from desktop PCs to clusters of Suns servers
housed in strictly controlled machine rooms.
Contact Information: taliver@paul.rutgers.edu
Split-OS
(Matias Cuenca, Kiran Nagaraj, Kiran Srinivasan, Florin Sultan, Ricardo Bianchini, Richard Martin, Thu D. Nguyen and Liviu
Iftode, Rutgers University)
The introduction of switched-based scalable I/O architecture for servers,
like InfiniBand, and the increasing interest for "intelligent" devices with RISC processors and memory on board are creating new opportunities
for OS designers. In particular, by redesigning the OS, we may be able to improve system reliability and increase performance through the
aggressive use of device intelligence. In Split-OS, we are redesigning the traditional OS by splitting (and
possibly replicating) OS functions and state between the host and the intelligent
devices. Our current approach is to simulate this expected server architecture using a cluster of PCs interconnected by
Virtual Interface Architecture (VIA). In this setup, each PC
emulates a particular device. We are working to define a communication protocol between devices using VIA channels in
anticipation of the InfiniBand specifications release.
Contact Information: mcuenca@paul.rutgers.edu
Co-operative File Systems for
clusters
Matias Cuenca, Suresh Gopalakrishnan, Chandrashekar Krishnan, Kiran Nagaraja, Rahul
Pupala, Kiran Srinivasan, Qian Zhang, Ricardo Bianchini, Liviu Iftode, Thu
Nguyen - Rutgers University
A cooperative file system is a user-level distributed file system
created ad-hoc for a distributed application by merging the local file systems of the cluster nodes. The merging
is lazily performed, temporary (application lifetime), volatile (in memory, not on disks),
local to the application , and dynamically reconfigurable. The system is called cooperative
because local file systems can "help" each other by migrating or replicating hot files among them for
better load balacing and fault tolerance. Central to the design is the concept of virtual directories provide
location independent file naming and consequently support file migration. The main application for CFS are the distributed servers like file
servers, web servers, video servers, email servers etc. Today, servers are extremely loaded and their I/O activity can be easily imbalanced.
CFS supports dynamic balacing of the disc I/O load accross the cluster using file migration and replication.
We are currently implementing a CFS prototype that uses Virtual Interface
Architecture (VIA) to interconnect 8 PCs into a cluster. Our first application
is a NFS distributed server. We will be exploring various policies for migration and studying the impact of caching policies on performance.
Contact Information: gsuresh@cs.rutgers.edu
|