Identifying Functionally Similar Code in Complex Codebases

Su, Fang-Hsiang; Bell, Jonathan; Kaiser, Gail E.; Sethumadhavan, Simha

Identifying similar code in software systems can assist many software engineering tasks, including program understanding. While most approaches focus on identifying code that looks alike, some researchers propose to detect instead code that functions alike, which are known as functional clones. However, previous work has raised the technical challenges to detect these functional clones in object oriented languages such as Java. We propose a novel technique, In-Vivo Clone Detection, a language-agnostic technique that detects functional clones in arbitrary programs by observing and mining inputs and outputs. We implemented this technique targeting programs that run on the JVM, creating HitoshiIO (available freely on GitHub), a tool to detect functional code clones. Our experimental results show that it is powerful in detecting these functional clones, finding 185 methods that are functionally similar across a corpus of 118 projects, even when there are only very few inputs available. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate, while the false positive rate is only 15%.


More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-003-16
Published Here
December 13, 2016