Behavior-Based Network Traffic Synthesis

Song, Yingbo; Stolfo, Salvatore; Jebara, Tony

Modern network security research has demonstrated a clear necessity for open sharing of traffic datasets between organizations a need that has so far been superseded by the challenges of removing sensitive content from the data beforehand. Network Data Anonymization is an emerging field dedicated to solving this problem, with a main focus on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Network traffic synthesis (or simulation) is a closely related complimentary approach which, while more difficult to accurately execute, has the potential for far greater flexibility. This paper leverages the statistical-idiosyncrasies of network behavior to augment anonymization and traffic-synthesis techniques through machine-learning models specifically designed to capture host-level behavior. We present the design of a system that can automatically learn models for network host behavior across time, then use these models to replicate the original behavior, to interpolate across gaps in the original traffic, and demonstrate how to generate new diverse behaviors. Further, we measure the similarity of the synthesized data to the original, providing us with a quantifiable estimate of data fidelity.



Also Published In

IEEE International Conference on Technologies for Homeland Security: HST '11, Waltham, Massachusetts, November 15-17, 2011

More About This Work

Academic Units
Computer Science
Published Here
December 16, 2011