A Software Checking Framework Using Distributed Model Checking and Checkpoint/Resume of Virtualized PrOcess Domains

Keetha, Nageswar; Wu, Leon Li; Kaiser, Gail E.; Yang, Junfeng

Complexity and heterogeneity of the deployed software applications often result in a wide range of dynamic states at runtime. The corner cases of software failure during execution often slip through the traditional software checking. If the software checking infrastructure supports the transparent checkpoint and resume of the live application states, the checking system can preserve and replay the live states in which the software failures occur. We introduce a novel software checking framework that enables application states including program behaviors and execution contexts to be cloned and resumed on a computing cloud. It employs (1) EXPLODE's model checking engine for a lightweight and general purpose software checking (2) ZAP system for faster, low overhead and transparent checkpoint and resume mechanism through virtualized PODs (PrOcess Domains), which is a collection of host-independent processes, and (3) scalable and distributed checking infrastructure based on Distributed EXPLODE. Efficient and portable checkpoint/resume and replay mechanism employed in this framework enables scalable software checking in order to improve the reliability of software products. The evaluation we conducted showed its feasibility, efficiency and applicability.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-032-09
Published Here
July 15, 2010