Curation of Scientific Data at Risk of Loss: Data Rescue and Dissemination

Robert R. Downs; Robert S. Chen

Curation of Scientific Data at Risk of Loss: Data Rescue and Dissemination
Downs, Robert R.
Chen, Robert S.
Book chapters
Center for International Earth Science Information Network
Persistent URL:
Book/Journal Title:
Curating research data. Volume one, Practical strategies for your digital repository
Book Author:
Johnston, Lisa
Association of College and Research Libraries, a division of the American Library Association
Publisher Location:
Data rescue offers an opportunity for digital repositories, including institutional repositories, data archives, and scientific data centers, to provide access to potentially valuable scientific data that is at risk of being lost. Rescue may be valuable not only to restore access to data of past scientific interest, such as environmental observations or social surveys, but also to recover historic information about the state of knowledge and science at the time the data was collected or assembled. Scientific data may need to be rescued at any stage along the data life cycle, and the extent of data curation that was completed prior to a data rescue effort may vary, depending on the circumstances that led to the need for data rescue. The level of effort required to complete a data rescue depends largely on the condition of the data being rescued, the availability and quality of data documentation and provenance information, and the accessibility of the data producers. In extreme cases, data organization and documentation are poor, and those knowledgeable about how the data was collected or developed are no longer available. In some cases, collections of data sets may need to be rescued from an existing archive that is no longer sustainable. In short, scientific data may be at risk of loss for a variety of reasons, and a data rescue effort can present new challenges for data curation and dissemination operations. We report here on a recent effort by the NASA Socioeconomic Data and Applications Center (SEDAC) to rescue the Millennium Ecosystem Assessment (MA) collection of scientific data as a case study on the issues raised by a data rescue effort from an existing archive that had not fully curated the original data. The MA was an international survey of the world’s ecosystems conducted by the scientific community in 2001–2005 involving more than 1,300 experts from around the world. As part of the MA, a diverse set of environmental and socioeconomic data was assembled and integrated in order to enable scientific analysis and assessment in support of policy and decision making. This data was held by the US Geological Survey (USGS) National Biological Information Infrastructure (NBII), which was terminated by the US government in early 2012. This case study describes what happened to the data after the MA was completed, why data rescue was subsequently needed, the process used to decide on the data rescue effort, and the subsequent issues and challenges addressed in rescuing the MA data. The core preservation need for the MA collection is described along with the tradeoffs involved in conducting the data rescue. Based on the case study, we summarize lessons learned from the data rescue effort, including lessons for projects that create or collect data, for repositories that acquire data from such projects, and for those engaged in rescuing data. Of course, whether there will be significant scientific or historical benefit resulting from this rescue effort remains to be seen.
Information science
Data libraries
Digital preservation
Item views
text | xml
Suggested Citation:
Robert R. Downs, Robert S. Chen, , Curation of Scientific Data at Risk of Loss: Data Rescue and Dissemination, Columbia University Academic Commons, .

Center for Digital Research and Scholarship at Columbia University Libraries | Policies