Abstract
Fault detection and propagation in a computational grid requires a comprehensive framework that takes in consideration the various grid environmental conditions such as the asynchronous nature of communication and the uncertainty on the disseminated fault information. The paper presents a fault-tolerance framework that provides the necessary models to manage the local faulty behavior associated with the operation of hosted services. The framework includes a quantification mechanism of the fault vulnerability of grid nodes and their hosted services. The resulting measures of fault vulnerability are globally disseminated to enable the synthesis of decentralized fault-tolerant decision making strategies.
Get full access to this article
View all access options for this article.
