Abstract
Existing major design techniques for real-time fault-tolerant (RTFT) computer systems remain unsatisfactorily integrated although they are complementary in nature. There are several challenging issues that must be resolved in order to pave ways to achieve cost-effective integration of those design techniques. This paper starts with the introduction of a scheme for classification of various fault tolerance schemes according to the benefits that they bring. Thereafter, a review of important design techniques that have broad applicability in designing RTFT computer systems follows. Included among the reviewed techniques are the techniques for designing fault-tolerant processing nodes capable of real-time forward recovery from hardware and/or software faults and the techniques for real-time network surveillance and reconfiguration. Issues in integrating these sets of techniques are then discussed. Another set of design techniques which will play increasingly important roles in future engineering of RTFT computer systems are those for object-oriented (OO) structuring. Issues to be resolved in and possible directions for integrating the important fault tolerance techniques with the OO structuring techniques are discussed next.
Get full access to this article
View all access options for this article.
