Survivor: An Approach for Adding Dependability to Legacy Workflow Systems

Greze, Jean-Denis; Kaiser, Gail E.; Kc, Gaurav S.

Although they often provide critical services, most workflow systems are not dependable. There has been much literature on dependable/survivable distributed systems; most is concerned with developing new architectures, not adapting pre-existing ones. Additionally, the literature is focused on hardening, security-based defense, as opposed to recovery. For deployed systems, it is often infeasible to completely replace existing infrastructures; what is more pragmatic are ways in which existing distributed systems can be adapted to offer better dependability. In this paper, we outline a general architecture that can easily be retrofitted to legacy workflow systems in order to improve dependability and fault tolerance. We do this by monitoring enactment and replicating partial workflow states as tools for detection, analysis and recovery. We discuss some policies that can guide these mechanisms. Finally, we describe and evaluate our implementation, Survivor, which modified an existing workflow system provided by the Naval Research Lab.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-026-02
Published Here
April 21, 2011