2016 Theses Doctoral
Making Software More Reliable by Uncovering Hidden Dependencies
As software grows in size and complexity, it also becomes more interdependent. Multiple internal components often share state and data. Whether these dependencies are intentional or not, we have found that their mismanagement often poses several challenges to testing. This thesis seeks to make it easier to create reliable software by making testing more efficient and more effective through explicit knowledge of these hidden dependencies.
The first problem that this thesis addresses, reducing testing time, directly impacts the day-to-day work of every software developer. The frequency with which code can be built (compiled, tested, and package) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being build was successful. We have discovered that in the case of some languages, such as Java, the vast majority of build time is spent running tests. Therefore, it's incredibly important to focus on approaches to accelerating testing, while simultaneously making sure that we do not inadvertently cause tests to erratically fail (i.e. become flaky).
Typical techniques for accelerating tests (like running only a subset of them, or running them in parallel) often can't be applied soundly, since there may be hidden dependencies between tests. While we might think that each test should be independent (i.e. that a test's outcome isn't influenced by the execution of another test), we and others have found many examples in real software projects where tests truly have these dependencies: some tests require others to run first, or else their outcome will change. Previous work has shown that these dependencies are often complicated, unintentional, and hidden from developers. We have built several systems, VMVM and ElectricTest, that detect different sorts of dependencies between tests and use that information to soundly reduce testing time by several orders of magnitude.
In our first approach, Unit Test Virtualization, we reduce the overhead of isolating each unit test with a lightweight, virtualization-like container, preventing these dependencies from manifesting. Our realization of Unit Test Virtualization for Java, VMVM eliminates the need to run each test in its own process, reducing test suite execution time by an average of 62% in our evaluation (compared to execution time when running each test in its own process).
However, not all test suites isolate their tests: in some, dependencies are allowed to occur between tests. In these cases, common test acceleration techniques such as test selection or test parallelization are unsound in the absence of dependency information. When dependencies go unnoticed, tests can unexpectedly fail when executed out of order, causing unreliable builds. Our second approach, ElectricTest, soundly identifies data dependencies between test cases, allowing for sound test acceleration.
To enable more broad use of general dependency information for testing and other analyses, we created Phosphor, the first and only portable and performant dynamic taint tracking system for the JVM. Dynamic taint tracking is a form of data flow analysis that applies labels to variables, and tracks all other variables derived from those tagged variables, propagating those tags. Taint tracking has many applications to software engineering and software testing, and in addition to our own work, researchers across the world are using Phosphor to build their own systems. Towards making testing more effective, we also created Pebbles, which makes it easy for developers to specify data-related test oracles on mobile devices by thinking in terms of high level objects such as emails, notes or pictures.
- Bell_columbia_0054D_13413.pdf binary/octet-stream 1.39 MB Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Kaiser, Gail E.
- Ph.D., Columbia University
- Published Here
- June 2, 2016