Academic Commons

Theses Doctoral

A Behavior-based Approach Towards Statistics-Preserving Network Trace Anonymization

Song, Yingbo

In modern network measurement research, there exists a clear and demonstrable need for open sharing of large-scale network traffic datasets between organizations. Beyond network measurement, many security-related fields, such as those focused on detecting new exploits or worm outbreaks, stand to benefit given the ability to easily correlate information between several different sources. Currently, the primary factor limiting such sharing is the risk of disclosing private information. While prior anonymization work has focused on traffic content, analysis based on statistical behavior patterns within network traffic has, so far, been under-explored. This thesis proposes a new behavior-based approach towards network trace source-anonymization, motivated by the concept of anonymity-by-crowds, and conditioned on the statistical similarity in host behavior. Novel time-series models for network traffic and kernel metrics for similarity are derived, and the problem is framed such that anonymity and statistics-preservation are congruent objectives in an unsupervised-learning problem. Source-anonymity is connected directly to the group size and homogeneity under this approach, and metrics for these properties are derived. Optimal segmentation of the population into anonymized groups is approximated with a graph-partitioning problem where maximization of this anonymity metric is an intrinsic property of the solution. Algorithms that guarantee a minimum anonymity-set size are presented, as well as novel techniques for behavior visualization and compression. Empirical evaluations on a range of network traffic datasets show significant advantages in both accuracy and runtime over similar solutions.



  • thumnail for Song_columbia_0054D_10704.pdf Song_columbia_0054D_10704.pdf application/x-pdf 3.87 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Stolfo, Salvatore
Ph.D., Columbia University
Published Here
May 7, 2012


Supporting data available at