2012 Data (Information)
A Behavior-based Approach Towards Statistics-Preserving Network Trace Anonymization: Supporting Data
In modern network measurement research, there exists a clear and demonstrable need for open sharing of large-scale network traffic datasets between organizations. Beyond network measurement, many security-related fields, such as those focused on detecting new exploits or worm outbreaks, stand to benefit given the ability to easily correlate information between several different sources. Currently, the primary factor limiting such sharing is the risk of disclosing private information. While prior anonymization work has focused on traffic content, analysis based on statistical behavior patterns within network traffic has, so far, been under-explored. This thesis proposes a new behavior-based approach towards network trace source-anonymization, motivated by the concept of anonymity-by-crowds, and conditioned on the statistical similarity in host behavior. Novel time-series models for network traffic and kernel metrics for similarity are derived, and the problem is framed such that anonymity and statistics-preservation are congruent objectives in an unsupervised-learning problem. Source-anonymity is connected directly to the group size and homogeneity under this approach, and metrics for these properties are derived. Optimal segmentation of the population into anonymized groups is approximated with a graph-partitioning problem where maximization of this anonymity metric is an intrinsic property of the solution. Algorithms that guarantee a minimum anonymity-set size are presented, as well as novel techniques for behavior visualization and compression. Empirical evaluations on a range of network traffic datasets show significant advantages in both accuracy and runtime over similar solutions.
More About This Work
- Academic Units
- Computer Science
- Published Here
- May 7, 2012
Supporting data for http://hdl.handle.net/10022/AC:P:13159.