A Survey of Software Fault Tolerance Techniques

Smith, Jonathan M.

This report examines the state of the field of software fault tolerance. Terminology, techniques for building reliable systems, and fault tolerance are discussed. While a scientific consensus on the measurement of software reliability has not been reached, software systems are sufficiently pervasive that “software“ components of larger systems must be reliable, since dependence is placed on them. Fault tolerant systems utilize redundant components to mitigate the effects of component failures, and thus create a system which is more reliable than a single component. This idea can be applied to software systems as well. Several techniques for designing fault tolerant software systems are discussed and assessed qualitatively, where "software fault" refers to what is more commonly known as a bug. The assumptions, relative merits, available experimental results, and implementation experience are discussed for each technique. This leads us to some conclusions about the state of the field.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-325-88
Published Here
December 7, 2011