2014 Reports
Model Aggregation for Distributed Content Anomaly Detection
Cloud computing offers a scalable, low-cost, and resilient platform for critical applications. Securing these applications against attacks targeting unknown vulnerabilities is an unsolved challenge. Network anomaly detection addresses such zero-day attacks by modeling attributes of attack-free application traffic and raising alerts when new traffic deviates from this model. Content anomaly detection (CAD) is a variant of this approach that models the payloads of such traffic instead of higher level attributes. Zero-day attacks then appear as outliers to properly trained CAD sensors. In the past, CAD was unsuited to cloud environments due to the relative overhead of content inspection and the dynamic routing of content paths to geographically diverse sites. We challenge this notion and introduce new methods for efficiently aggregating content models to enable scalable CAD in dynamically-pathed environments such as the cloud. These methods eliminate the need to exchange raw content, drastically reduce network and CPU overhead, and offer varying levels of content privacy. We perform a comparative analysis of our methods using Random Forest, Logistic Regression, and Bloom Filter-based classifiers for operation in the cloud or other distributed settings such as wireless sensor networks. We find that content model aggregation offers statistically significant improvements over non-aggregate models with minimal overhead, and that distributed and non-distributed CAD have statistically indistinguishable performance. Thus, these methods enable the practical deployment of accurate CAD sensors in a distributed attack detection infrastructure.
Subjects
Files
- cucs-017-14.pdf application/pdf 430 KB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Department of Computer Science, Columbia University
- Series
- Columbia University Computer Science Technical Reports, CUCS-017-14
- Published Here
- June 17, 2014