Academic Commons


SPARSE: A Hybrid System to Detect Malcode-Bearing Documents

Li, Wei-Jen; Stolfo, Salvatore

Embedding malcode within documents provides a convenient means of penetrating systems which may be unreachable by network-level service attacks. Such attacks can be very targeted and difficult to detect compared to the typical network worm threat due to the multitude of document-exchange vectors. Detecting malcode embedded in a document is difficult owing to the complexity of modern document formats that provide ample opportunity to embed code in a myriad of ways. We focus on Microsoft Word documents as malcode carriers as a case study in this paper. We introduce a hybrid system that integrates static and dynamic techniques to detect the presence and location of malware embedded in documents. The system is designed to automatically update its detection models to improve accuracy over time. The overall hybrid detection system with a learning feedback loop is demonstrated to achieve a 99.27% detection rate and 3.16% false positive rate on a corpus of 6228 Word documents.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-006-08
Published Here
April 27, 2011