Uncovering the Loss of Contextual Information in Data Sharing: An Analysis of Missing Metadata in Research Data Repository

Title

Uncovering the Loss of Contextual Information in Data Sharing: An Analysis of Missing Metadata in Research Data Repository

Description

Presentation at the 2024 CALA Midwest Annual Conference

Creator

Jiang, Tianji

Publisher

Chinese American Librarians Association

Date

2024-05-24

Rights

https://creativecommons.org/licenses/by/4.0

Language

eng

Type

Presentation

Abstract

“Data sharing” generally refers to the act of releasing data in a form that can be used by other individuals, which requires disseminating not only the data file itself but also the contextual information, such as descriptions of purposes of the study for which the data was originally collected for, and methods for collecting, processing, and verifying data. Metadata has been found to be an effective way to convey the contextual information of data, just as it has with books in libraries and records in archives. Libraries have a natural affinity with the goals of the open data movement, which are improving the availability, findability, re-usability and curation of research data. To achieve these goals, librarians have been working on the documentation of data, such as designing and updating metadata schemas for research data, cataloging the deposited data following these schemas, and gathering metadata for data housed elsewhere. However, a missing value issue is identified in the current practices of documenting research data, which has the potential of undermining reproducibility, reducing re-usability, and even preventing discovery of data. In this study, I analyze the issue of missing values using empirical evidence from two open catalogs that hold metadata for open research data: DataCite Commons and OpenAlex. By analyzing the number of missing values for each metadata item, I found some of the metadata fields have a surprisingly high proportion of records with missing value, including the doi, title, affiliations of data creators, and funding information relevant to the data. In my presentation, I am planning to present the details of my findings and my explanations for the missing value issues. I will also share my perspectives on the future roles of libraries in improving metadata schemas for research data and discuss additional ways librarians can support the sharing and re-use of research data. The insights from this study will contribute to the development of research data metadata schemas in the future, thereby enhancing the infrastructure for research data sharing and promoting effective re-use of research data.

Position: 1019 (115 views)

Citation

Jiang, Tianji, “Uncovering the Loss of Contextual Information in Data Sharing: An Analysis of Missing Metadata in Research Data Repository,” CALASYS - CALA Academic Resources & Repository System, accessed March 4, 2026, http://ir.cala-web.org/items/show/1471.