Distributed program reliability analysis

The reliability of distributed processing systems can be expressed in terms of the reliability of the processing elements that run the programs, the reliability of the processing elements holding the required files, and the reliability of the communication links used in file transfers. The authors introduce two reliability measures, namely distributed program reliability and distributed system reliability, to accurately model the reliability of distributed systems. The first measure describes the probability of successful execution of a distributed program which runs on some processing elements and needs to communicate with other processing elements for remote files, while the second measure describes the probability that all the programs of a given set can run successfully. The notion of minimal file spanning trees is introduced to efficiently evaluate these reliability measures. Graph theory techniques are used to systematically generate file spanning trees that provide all the required connections. The technique is general and can be used in a dynamic environment for efficient reliability evaluation.

[1]  S. Rai,et al.  An Efficient Method for Reliability Evaluation of a General Network , 1978, IEEE Transactions on Reliability.

[2]  A. Satyanarayana,et al.  A New Algorithm for the Reliability Analysis of Multi-Terminal Networks , 1981, IEEE Transactions on Reliability.

[3]  David A. Rennels Distributed Fault-Tolerant Computer Systems , 1980, Computer.

[4]  Richard E. Merwin,et al.  Derivation and use of a survivability criterion for DDP systems , 1980, AFIPS '80.

[5]  Philip H. Enslow What is a "Distributed" Data Processing System? , 1978, Computer.

[6]  Salim Hariri,et al.  SYREL: A Symbolic Reliability Algorithm Based on Path and Cutset Methods , 1987, IEEE Transactions on Computers.

[7]  Suresh Rai,et al.  Reliability Evaluation in Computer-Communication Networks , 1981, IEEE Transactions on Reliability.

[8]  J. Abraham An Improved Algorithm for Network Reliability , 1979, IEEE Transactions on Reliability.

[9]  B. J. Leon,et al.  A New Algorithm for Symbolic System Reliability Analysis , 1976, IEEE Transactions on Reliability.

[10]  Salim Hariri,et al.  RELIABILITY MEASURES FOR DISTRIBUTED PROCESSING SYSTEMS. , 1985 .

[11]  Butler W. Lampson,et al.  Distributed Systems — Architecture and Implementation , 1982, Lecture Notes in Computer Science.

[12]  K.B. Misra,et al.  A Fast Algorithm for Reliability Evaluation , 1975, IEEE Transactions on Reliability.

[13]  Mario Gerla,et al.  A new algorithm for symbolic reliability analysis of computer - Communication networks , 1980 .

[14]  John A. Stankovic,et al.  A Perspective on Distributed Computer Systems , 1984, IEEE Transactions on Computers.

[15]  U. Montanari,et al.  A Boolean algebra method for computing the terminal reliability in a communication network , 1973 .