As computer systems become more complex, tolerance to failures and recoverability without compromising performance have emerged as guiding principles for system design. The need for such features is exacerbated by the increasing demand for performance critical, highly available services. Self-healing is becoming increasingly interesting because systems that support it present these features. From a system viewpoint, healing requires at least two functions: (i) Monitoring, for detecting exceptional events like failure, intrusion, policy violations, etc., and (ii) Action in response to these events by recovery, repair, fault containment, etc.
Backdoor is an architectural approach to building remote healing systems (RHS). An RHS allows a remote machine to nonintrusively monitor a target machine and detect failures, then perform recovery and repair operations. An RHS must not interfere with the normal operation of the target system and must work even if the target system is "dead" (due to a hardware fault, OS crash or freeze) or cannot be accessed by conventional means (due to overload, DoS attack, etc.). Backdoor aims to enable remote access to resources of a machine (memory, I/O devices) for remote healing operations without involving its processor(s).
Backdoor (BD) is a system architecture that allows healing operations to be performed on a remote operating system or application image without using remote CPU cycles.
We identify seven key principles of a BD-based design:
One of the key ingredients of the Backdoor architecture is the remote memory communication (RMC) technology provided by standards like VIA or InfiniBand, specifically its support for remote DMA (RDMA) read and write operations. Existing RMC hardware implementations have features that make them attractive for a BD design. In particular, they comply with almost all of the requirements expressed by the above BD design principles.
RMC provides interesting research opportunities for nonintrusive remote monitoring and intervention on a running system. In contrast with previous research that has used RMC mostly for its performance benefits, we propose a novel approach by using it as a building block in RHS design.
Backdoor uses remote access channels provided by RMC to perform remote monitoring. Monitoring accesses are performed without the intervention of the CPU/OS of the monitored node. They do not impose extra burden on the target node, to the extent that it may even be unaware that it is being monitored.
Monitoring over RMC:
Backdoor exploits RMC to implement actions of fine-grained control on a target system at the OS or application level. The action component is semantically rich (depending on the trigger event) and may affect the target system at the OS or application level.
Through special remote access hooks provided by the BD, a remote system can: