Academic Commons

Articles

The case of the missing eights. An object lesson in data quality assurance

Stellman, Steven D.

Data analysis is an integral part of the training of epidemiologists, but computer-based data management and quality control (QC) procedures whereby raw data are prepared for analysis are often overlooked. Cancer Prevention Study 2 (CPS-2) is a cohort study of 1.2 million American men and women begun by the American Cancer Society in 1982. During data preparation for a study of diet and cancer it was found that the distribution of the number of missing items out of 28 possible foods was monotonic, as expected, except that no individuals were missing exactly 8 or 18 items. These anomalous “holes” in the distribution were traced to a programming error within a section of QC code that confused a zero with the letter O. One lesson learned is that simple frequency tabulations to identify missing, out of range, or miscoded individual data items, as well as more complex assessment of permissible combinations of multiple items, should be supplemented by content-sensitive tests as well.

Files

  • thumnail for Stellman 1989 AJE_Missing_Eights.pdf Stellman 1989 AJE_Missing_Eights.pdf application/pdf 592 KB Download File

Also Published In

Title
American Journal of Epidemiology
DOI
https://doi.org/10.1093/oxfordjournals.aje.a115200

More About This Work

Academic Units
Epidemiology
Published Here
September 4, 2019
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.