Improving System Reliability for Cyber-Physical Systems

Wu, Leon Li

System reliability is a fundamental requirement of Cyber-Physical System, i.e., a system featuring a tight combination of, and coordination between, the systems computational and physical elements. Cyber-physical system includes systems ranging from the critical infrastructure such as power grid and transportation system to the health and biomedical devices. An unreliable system often leads to disruption of service, financial cost and even loss of human life. This thesis aims to improve system reliability for cyber-physical systems that meet following criteria: processing large amount of data; employing software as a system component; running online continuously; having operator-in-the-loop because of human judgment and accountability requirement for safety critical systems. The reason that I limit the system scope to this type of cyber-physical system is that this type of cyber-physical systems are important and becoming more prevalent. To improve system reliability for this type of cyber-physical systems, I propose a system evaluation approach named automated online evaluation. It works in parallel with the cyber-physical system to conduct automated evaluation at the multiple stages along the workflow of the system continuously and provide operator-in-the-loop feedback on reliability improvement. It is an approach whereby data from cyber-physical system is evaluated. For example, abnormal input and output data can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop. The operator can then take actions and make changes to the system based on the alerts in order to achieve minimal system downtime and higher system reliability. To implement the proposed approach, I further propose a system architecture named ARIS (Autonomic Reliability Improvement System). One technique used by the approach is data quality analysis using computational intelligence that applies computational intelligence in evaluating data quality in some automated and efficient way to ensure data quality and make sure the running system to perform as expected reliably. The computational intelligence is enabled by machine learning, data mining, statistical and probabilistic analysis, and other intelligent techniques. In a cyber-physical system, the data collected from the system, e.g., software bug reports, system status logs and error reports, are stored in some databases. In my approach, these data are analyzed via data mining and other intelligent techniques so that useful information on system reliability including erroneous data and abnormal system state can be concluded. These reliability related information are directed to operators so that proper actions can be taken, sometimes proactively based on the predictive results, to ensure the proper and reliable execution of the system. Another technique used by the approach is self-tuning that automatically self-manages and self-configures the evaluation system to ensure it adapts itself based on the changes in the system and feedback from the operator. The self-tuning adapts the evaluation system to ensure its proper functioning, which leads to a more robust evaluation system and improved system reliability. For feasibility study of the proposed approach, I first present NOVA (Neutral Online Visualization-aided Autonomic) system, a data quality analysis system for improving system reliability for power grid cyber-physical system. I then present a feasibility study on effectiveness of some self-tuning techniques, including data classification, redundancy checking and trend detection. The self-tuning leads to an adaptive evaluation system that works better under system changes and operator feedback, which will lead to improved system reliability. The contribution of the work is an automated online evaluation approach that is able to improve system reliability for cyber-physical systems in the domain of interest as indicated above. It enables online reliability assurance of the deployed systems that are not possible to perform robust tests prior to actual deployment.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-038-11
Published Here
May 3, 2012