Parameterizing Random Test Data According to Equivalence Classes

Christian Murphy; Gail E. Kaiser; Marta Arias

Parameterizing Random Test Data According to Equivalence Classes
Murphy, Christian
Kaiser, Gail E.
Arias, Marta
Technical reports
Computer Science
Permanent URL:
Columbia University Computer Science Technical Reports
Part Number:
Department of Computer Science, Columbia University
Publisher Location:
New York
We are concerned with the problem of detecting bugs in machine learning applications. In the absence of sufficient real-world data, creating suitably large data sets for testing can be a difficult task. Random testing is one solution, but may have limited effectiveness in cases in which a reliable test oracle does not exist, as is the case of the machine learning applications of interest. To address this problem, we have developed an approach to creating data sets called "parameterized random data generation". Our data generation framework allows us to isolate or combine different equivalence classes as desired, and then randomly generate large data sets using the properties of those equivalence classes as parameters. This allows us to take advantage of randomness but still have control over test case selection at the system testing level. We present our findings from using the approach to test two different machine learning ranking applications.
Computer science
Item views:
Additional metadata is currently unavailable for this item.
Suggested Citation:
Christian Murphy, Gail E. Kaiser, Marta Arias, 2007, Parameterizing Random Test Data According to Equivalence Classes, Columbia University Academic Commons, http://hdl.handle.net/10022/AC:P:29515.

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries | Terms of Use | Copyright