Theses Doctoral

Adaptive Sampling for Targeted Software Testing

Shah, Abhishek

Targeted software testing is a critical task in development of secure software. The core challenge of targeted software testing is to generate many inputs that reach specific code target locations in a given program. However, this task is challenging because it is NP-hard in theory and real-world programs contain very large input spaces and many lines of code, making this difficult in practice.

In this thesis, I introduce a new approach for targeted software testing based on adaptive sampling. The key insight is to reduce the original problem to a sequence of approximate counting problems, and I apply this approach to targeted software testing in two stages.

First, to find a single target-reaching input when no such input is given, I develop a new search algorithm MC2 that adaptively uses approximate-count feedback to narrow down which input region is more likely to contain a target-reaching input using probabilistic bisection.

Second, given a single target-reaching input, I develop a new set approximation algorithm ProgramSampler that adaptively learns an approximation of the set of target-reaching inputs based on approximate-count feedback, where the set approximation can be efficiently uniformly sampled for many target-reaching inputs.

Backed by theoretical guarantees, these techniques have been highly effective in practice: outperforming existing methods on average by 1-2 orders of magnitude.

Files

This item is currently under embargo. It will be available starting 2029-06-26.

More About This Work

Academic Units
Computer Science
Thesis Advisors
Jana, Suman
Degree
Ph.D., Columbia University
Published Here
July 10, 2024