Academic Commons

Theses Doctoral

On Efficiency and Accuracy of Data Flow Tracking Systems

Jee, Kangkook

Data Flow Tracking (DFT) is a technique broadly used in a variety of security applications such as attack detection, privacy leak detection, and policy enforcement. Although effective, DFT inherits the high overhead common to in-line monitors which subsequently hinders their adoption in production systems. Typically, the runtime overhead of DFT systems range from 3× to 100× when applied to pure binaries, and 1.5× to 3× when inserted during compilation. Many performance optimization approaches have been introduced to mitigate this problem by relaxing propagation policies under certain conditions but these typically introduce the issue of inaccurate taint tracking that leads to over-tainting or under-tainting.
Despite acknowledgement of these performance / accuracy trade-offs, the DFT literature consistently fails to provide insights about their implications. A core reason, we believe, is the lack of established methodologies to understand accuracy.
In this dissertation, we attempt to address both efficiency and accuracy issues. To this end, we begin with libdft, a DFT framework for COTS binaries running atop commodity OSes and we then introduce two major optimization approaches based on statically and dynamically analyzing program binaries.
The first optimization approach extracts DFT tracking logics and abstracts them using TFA. We then apply classic compiler optimizations to eliminate redundant tracking logic and minimize interference with the target program. As a result, the optimization can achieve 2× speed-up over base-line performance measured for libdft. The second optimization approach decouples the tracking logic from execution to run them in parallel leveraging modern multi-core innovations. We apply his approach again applied to libdft where it can run four times as fast, while concurrently consuming fewer CPU cycles.
We then present a generic methodology and tool for measuring the accuracy of arbitrary DFT systems in the context of real applications. With a prototype implementation for the Android framework – TaintMark, we have discovered that TaintDroid’s various performance optimizations lead to serious accuracy issues, and that certain optimizations should be removed to vastly improve accuracy at little performance cost. The TaintMark approach is inspired by blackbox differential testing principles to test for inaccuracies in DFTs, but it also addresses numerous practical challenges that arise when applying those principles to real, complex applications. We introduce the TaintMark methodology by using it to understand taint tracking accuracy trade-offs in TaintDroid, a well-known DFT system for Android.
While the aforementioned works focus on the efficiency and accuracy issues of DFT systems that dynamically track data flow, we also explore another design choice that statically tracks information flow by analyzing and instrumenting the application source code. We apply this approach to the different problem of integer error detection in order to reduce the number of false alarmings.


  • thumnail for Jee_columbia_0054D_13090.pdf Jee_columbia_0054D_13090.pdf binary/octet-stream 2.57 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Keromytis, Angelos D.
Ph.D., Columbia University
Published Here
January 29, 2016