Synthetic Data Generation and Defense in Depth Measurement of Web Applications

Boggs, Nathaniel Gordon; Zhao, Hang; Du, Senyao; Stolfo, Salvatore

Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls.



  • thumnail for RAID2014_115_boggs_syntheticDataGen.pdf RAID2014_115_boggs_syntheticDataGen.pdf application/pdf 570 KB Download File

Also Published In

Research in Attacks, Intrusions and Defenses 17th International Symposium, RAID 2014, Gothenburg, Sweden, September 17-19, 2014, Proceedings

More About This Work

Academic Units
Computer Science
Published Here
July 15, 2015


Presented at the 17th International Symposium on Research in Attacks, Intrusions and Defenses; RAID 2014; 2014/09/17