Academic Commons

Reports

Code Relatives: Detecting Similar Software Behavior

Su, Fang-Hsiang; Harvey, Kenneth; Sethumadhavan, Simha; Kaiser, Gail E.; Jebara, Tony

Detecting "similar code" is fundamental to many software engineering tasks. Current tools can help detect code with statically similar syntactic features (code clones). Unfortunately, some code fragments that behave alike without similar syntax may be missed. In this paper, we propose the term "code relatives" to refer to code with dynamically similar execution features. Code relatives can be used for such tasks as implementation-agnostic code search and classification of code with similar behavior for human understanding, which code clone detection cannot achieve. To detect code relatives, we present DyCLINK, which constructs an approximate runtime representation of code using a dynamic instruction graph. With our link analysis based subgraph matching algorithm, DyCLINK detects fine-grained code relatives efficiently. In our experiments, DyCLINK analyzed 290+ million prospective subgraph matches. The results show that DyCLINK detects not only code relatives, but also code clones that the state-of-the-art system is unable to identify. In a code classification problem, DyCLINK achieved 96% precision on average compared with the competitor's 61%.

Files

More About This Work

Academic Units
Computer Science
Publisher
Department of Computer Science, Columbia University
Series
Columbia University Computer Science Technical Reports, CUCS-014-15
Published Here
October 5, 2015
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.