Unrestricted Versus Regulated Open Data Governance: A Bibliometric Comparison of SARS-CoV-2 Nucleotide Sequence Databases

Nathanael Sheehan,Federico Botta,Sabina Leonelli
DOI: https://doi.org/10.1101/2023.05.13.540634
2024-03-25
Abstract:Two distinct modes of data governance have emerged in accessing and reusing viral data pertaining to COVID-19: an unrestricted model, espoused by data repositories part of the International Nucleotide Sequence Database Collaboration and a regulated model promoted by the Global Initiative on Sharing All Influenza data. In this paper, we focus on publications mentioning either infrastructure in the period between January 2020 and January 2023, thus capturing a period of acute response to the COVID-19 pandemic. Through a variety of bibliometric and network science methods, we compare the extent to which either data infrastructure facilitated collaboration from different countries around the globe to understand how data reuse can enhance forms of diversity between institutions, countries, and funding groups. Our findings reveal disparities in representation and usage between the two data infrastructures. We conclude that both approaches offer useful lessons, with the unrestricted model providing insights into complex data linkage and the regulated model demonstrating the importance of global representation.
Genetics
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to compare the effectiveness of two different data governance models in the acquisition and reuse of virus data related to COVID-19. Specifically, the paper focuses on the unrestricted data sharing model supported by the International Nucleotide Sequence Database Collaboration (INSDC) and the regulated data sharing model promoted by the Global Initiative on Sharing All Influenza Data (GISAID). The authors use a series of bibliometric and network science methods to compare the extent to which these two data infrastructures facilitate global collaboration. They also explore how data reuse enhances diversity among institutions, countries, and funding groups. The findings reveal differences in representation and usage between the two data infrastructures and conclude that both approaches offer valuable lessons: the unrestricted model provides insights into complex data linkages, while the regulated model demonstrates the importance of global representation.