Academic Commons

Presentations (Communicative Events)

Definitions of Dataset in the Scientific and Technical Literature

Renear, Allen H.; Sacchi, Simone; Wickett, Karen M.

The integration of heterogeneous data in varying formats and from diverse communities requires an improved understanding of the concept of a dataset, and of key related concepts, such as format, encoding, and version. Ultimately, a normative formal framework of such concepts will be needed to support the effective curation, integration, and use of shared multi-disciplinary scientific data. To prepare for the development of this framework we reviewed the definitions of dataset found in technical documentation and the scientific literature. Four basic features can be identified as common to most definitions: grouping, content, relatedness, and purpose. In this summary of our results we describe each of these features, indicating the directions a more formal analysis might take.

Files

Also Published In

Title
Proceedings of the American Society for Information Science and Technology
DOI
https://doi.org/10.1002/meet.14504701240

More About This Work

Academic Units
Libraries and Information Services
Published Here
July 1, 2014
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.