Academic Commons

Presentations (Communicative Events)

When citing data, what thing are you actually citing?

Sacchi, Simone

When citing data, what thing are you actually citing? In particular, when a data identifier "like a DOI" is included in the citation, what, exactly, does that identifier actually denote? The trivial answer "that data" will not help us understand the practical implications of assigning identifiers to scientific products and evolve identifier systems that support data sharing, linked open data, and service oriented architectures. Identifiers included in citations often function as citable locators (Duerr et al., 2011), supporting discoverability, retrieval, and access: "such an identifier should lead readers at any time in the future to the exact data used in the work that led to the publication" (Duerr et al., 2011). This works reasonably well in communities with shared practices and expectations. But it leaves open a precise understanding of what sort of thing it is that we are being lead to, or, how we can interpret an identifier as referring to a thing rather than leading us to it. For instance, when the same data is made available in different encoding formats, what is the actual thing identifiers denote? And what is the expected outcome of their resolution? Drawing from previous work conducted by the Data Conservancy Data Concepts group at the University of Illinois at Urbana-Champaign (Renear et al., 2010, Sacchi et al., 2011, Sacchi et al., 2012, Wickett et al., 2012), this poster is intended to engage the data curation community in a discussion of how we might begin an approach to this problem by assigning identifiers at different levels of abstraction. To initiate this discussion we propose three major levels: (a) data content (the scientific content carried by a set of data), (b) data itself (the primary expression of that content), and (c) data product (encodings of that data) (Sacchi et. al, 2011). By asking the community to discuss how they use existing identifier schemes and what levels are important to them, we hope to elicit new implications for consistent data citation practice and data reuse.


  • thumnail for ResearchDataSymposiumColumbia-Sacchi-poster-FINAL.pdf ResearchDataSymposiumColumbia-Sacchi-poster-FINAL.pdf application/pdf 2.52 MB Download File

More About This Work

Academic Units
Libraries and Information Services
Published Here
February 27, 2013


Presented on February 27, 2013 at Research Data Symposium, Columbia University, New York, NY.