Academic Commons


Anagram: A Content Anomaly Detector Resistant to Mimicry Attack

Wang, Ke; Parekh, Janak J.; Stolfo, Salvatore

In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and "suspicious" network packet payloads. By using higher-order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly efficient Bloom filters, reducing space requirements and enabling privacy-preserving cross-site correlation. The sensor models the distinct content flow of a network or host using a semi-supervised training regimen. Previously known exploits, extracted from the signatures of an IDS, are likewise modeled in a Bloom filter and are used during training as well as detection time. We demonstrate that Anagram can identify anomalous traffic with high accuracy and low false positive rates. Anagram's high-order n-gram analysis technique is also resilient against simple mimicry attacks that blend exploits with "normal" appearing byte padding, such as the blended polymorphic attack recently demonstrated in [1]. We discuss randomized n-gram models, which further raises the bar and makes it more difficult for attackers to build precise packet structures to evade Anagram even if they know the distribution of the local site content flow. Finally, Anagram's speed and high detection rate makes it valuable not only as a standalone sensor, but also as a network anomaly flow classifier in an instrumented fault-tolerant host-based environment; this enables significant cost amortization and the possibility of a "symbiotic" feedback loop that can improve accuracy and reduce false positive rates over time.



Also Published In

More About This Work

Academic Units
Computer Science
Published Here
April 28, 2010


Recent advances in intrusion detection 9th international symposium, RAID 2006, Hamburg, Germany, September 20-22, 2006: proceedings, Lecture Notes in Computer Science, vol. 4219 (New York: Springer, 2006), pp. 226-248.