Open Data: From Theory to Practice: Case Studies and Commentary from Libraries, Publishers, Funders and Industry


From theory to practice is the first time in the nine-year history of The State of Open Data that a supplementary publication has expanded upon the main report’s years of survey results about open data, involving tens of thousands of researchers globally.

Each case study and commentary is told from the perspective of a research stakeholder group:

  • Funding bodies: The NIH Generalist Repository Ecosystem Initiative: meeting community needs for FAIR data sharing and discovery
  • Scholarly Publishers: Operationalize data policies through collaborative approaches – the momentum is now
  • University Libraries: One size does not fit all: an investigation into how institutional libraries are tailoring support to their researchers’ needs
  • Industry: How Open Pharma supports responsible data sharing for pharma research publications.

https://tinyurl.com/ytcxprn7

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Evaluating an Instructional Intervention for Research Data Management Training "


At a large research university in Canada, a research data management (RDM) specialist and two liaison librarians partnered to evaluate the effectiveness of an active learning component of their newly developed RDM training program. . . . This study relies on a pre- and post-test quasi-experimental intervention during introductory RDM workshops offered 12 times between February 2022 and January 2023. . . . Comparing the overall average scores for each participant pre- and post-instruction intervention, we find that workshop participants, in general, improved in proficiency. The results of a Wilcoxon signed-rank test demonstrate that the difference between the pre- and post-test observations is statistically significant with a high effect size.

https://tinyurl.com/2wvt5bhj

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

The Research Data Services Landscape at US and Canadian Higher Education Institutions


The following are our high-level findings:

  • While there are wide divergences in the number and variety of services offered both within and across Carnegie Classifications, R1 institutions offer approximately three times the number of services offered by R2s, and more than nine times the number offered by liberal arts colleges.
  • General research data services are the most common type offered regardless of institution type. Statistical services, geospatial services, and visualization services are also common at research universities, which typically offer a much wider range of specialized services than liberal arts colleges.
  • Libraries remain the largest provider of research data services at US and Canadian research universities, but IT and units associated with the research office play important collaborative roles, especially with specialized services.
  • Bioinformatics services are offered almost exclusively through the interdisciplinary units associated with the research office or core facilities associated with medical schools.
  • Consulting services are the most common mode of service provision, comprising almost three quarters of all data services.

https://doi.org/10.18665/sr.320420

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"FAIRness of Research Data in the European Humanities Landscape "


This paper explores the landscape of research data in the humanities in the European context, delving into their diversity and the challenges of defining and sharing them. It investigates three aspects: the types of data in the humanities, their representation in repositories, and their alignment with the FAIR principles (Findable, Accessible, Interoperable, Reusable). By reviewing datasets in repositories, this research determines the dominant data types, their openness, licensing, and compliance with the FAIR principles. This research provides important insight into the heterogeneous nature of humanities data, their representation in the repository, and their alignment with FAIR principles, highlighting the need for improved accessibility and reusability to improve the overall quality and utility of humanities research data.

https://doi.org/10.3390/publications12010006

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Decade of Progress: Insights of Open Data Practices in Biosciences at the University of Edinburgh"


Our analysis of research data sharing from 2014 to 2022 manually reviewed 193 journal articles against criteria for Openness and FAIRness, including the Completeness of data shared relative to data generated. The findings reveal an uptick in data completeness and reusability, with a clear influence of data type, and genomic data being shared more frequently than image data. Data availability statements (DAS) and preprint sharing show a strong correlation with higher Openness and FAIRness scores. Since 2016, when the FAIR guidelines were published, data Reusability increased along with the inclusion of Data Availability Statements. On the other hand, since the COVID-19 pandemic, we have found a substantial increase in preprint sharing and significant improvements in Completeness, Reusability, and Accessibility scores. This paper documents a local research institute’s journey towards Open Data, addressing the changes and advocating for best practices to nurture this progression.

https://doi.org/10.1101/2024.02.18.580901

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Cost and Price of Public Access to Research Data: A Synthesis"


Beginning on or before 31 December 2025, all recipients of United States federal research funding will be required to make their federally funded scholarly outputs, including scientific data, freely available via public access venues with no delays or embargos. This paper focuses on research data as one of the key scholarly output types impacted by the requirements outlined in the Memorandum on Ensuring Free, Immediate and Equitable Access to Federally Funded Research issued by the US Office of Science and Technology Policy ?OSTP?, commonly called the "Nelson memo."

This paper sets out working definitions of four key terms: cost, price, reasonable, and allowable. Using these terms, we describe some of the pathways research data take to final publication, and summarize some of the extensive body of research on the costs of research data curation and sharing. We conclude that, for repositories leveraging sources of revenue other than deposit fees or other revenue streams that do not immediately scale up with increased deposits, sustainability is an important concern.

In the process, we look at cost modelling experimentation in the fields of research data management and digital preservation to consider what might be relevant from their approaches. Labour is the most significant cost for repositories and data curation, particularly in support of ingest and access, although the actual cost of data curation in repositories varies by discipline, characteristics of data, and level of curatorial services provided. If "reasonable” cost is not readily generalizable, greater clarity regarding allowable activities and more transparency in repositories" costs would aid researchers and funders in evaluating whether any deposit, membership, or other form of fees that are charged are appropriate for the services rendered. Where some or all of the effort associated with meeting public access requirements is performed by members of the research team, costs could be properly allocated to research and to publication components of grant budgets.

https://doi.org/10.5281/zenodo.10729575

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Stamp—Standardized Data Management Plan for Educational Research: A Blueprint to Improve Data Management across Disciplines"


To provide more tailored, discipline-specific guidance on data management, Science Europe suggested the concept of domain data protocols. Based on this concept, the project Domain Data Protocols for Educational Research developed a first domain data protocol for educational research, titled Standardized Data Management Plan for Educational Research (Stamp). Its multi-level approach includes minimal conditions on managing data according to the FAIR Data Principles and checklists with concrete activities to reach each minimal condition; also included are auxiliary materials to support researchers in educational research in planning, implementing, and realizing different data management activities. Although we developed the Stamp for educational research, its design and flexible structure enables transferring it to other (research) domains and communities. To investigate this flexibility, we organized two workshops, discussing to what extent the Stamp can be used beyond educational research, with representatives from other social science domains as well as from research domains beyond the social sciences. In sum, there was consensus among participants of both workshops on the usability of the Stamp outside educational research, at least if the same types of data are processed and analyzed with similar methods. For other types of data, the Stamp serves as a blueprint to develop further domain data protocols, in terms of standardized data management plans, according to the specific needs of the respective domain.

https://doi.org/10.5334/dsj-2024-007

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Thinking Outside the Black Box: Insights from a Digital Exhibition in the Humanities"


One of the main goals of Open Science is to make research more reproducible. There is no consensus, however, on what exactly "reproducibility" is, as opposed for example to "replicability", and how it applies to different research fields. After a short review of the literature on reproducibility/replicability with a focus on the humanities, we describe how the creation of the digital twin of the temporary exhibition "The Other Renaissance" has been documented throughout, with different methods, but with constant attention to research transparency, openness and accountability. A careful documentation of the study design, data collection and analysis techniques helps reflect and make all possible influencing factors explicit, and is a fundamental tool for reliability and rigour and for opening the "black box" of research.

https://arxiv.org/abs/2402.12000

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Identifying the Most Important Facilitators of Open Research Data Sharing and Reuse in Epidemiology: A Mixed-Methods Study"


To understand how open research data sharing and reuse can be further improved in the field of Epidemiology, this study explores the facilitating role that infrastructural and institutional arrangements play in this research discipline. It addresses two research questions: 1) What influence do infrastructural and institutional arrangements have on open research data sharing and reuse practices in the field of Epidemiology? And 2) how could infrastructural and institutional instruments used in Epidemiology potentially be useful to other research disciplines? First, based on a systematic literature review, a conceptual framework of infrastructural and institutional instruments for open research data facilitation is developed. Second, the conceptual framework is applied in interviews with Epidemiology researchers. The interviews show that two infrastructural and institutional instruments have a very high influence on open research data sharing and reuse practices in the field of Epidemiology, namely (a) access to a powerful search engine that meets open data search needs and (b) support by data stewards and data managers. Third, infrastructural and institutional instruments with a medium, high, or very high influence were discussed in a research workshop involving data stewards and research data officers from different research fields. This workshop suggests that none of the influential instruments identified in the interviews are specific to Epidemiology. Some of our findings thus seem to apply to multiple other disciplines. This study contributes to Science by identifying field-specific facilitators and challenges for open research data in Epidemiology, while at the same time revealing that none of the identified influential infrastructural and institutional instruments were specific to this field. Practically, this implies that open data infrastructure developers, policymakers, and research funding organizations may apply certain infrastructural and institutional arrangements to multiple research disciplines to facilitate and enhance open research data sharing and reuse.

https://doi.org/10.1371/journal.pone.0297969

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"From Data Creator to Data Reuser: Distance Matters"


Sharing research data is complex, labor-intensive, expensive, and requires infrastructure investments by multiple stakeholders. Open science policies focus on data release rather than on data reuse, yet reuse is also difficult, expensive, and may never occur. Investments in data management could be made more wisely by considering who might reuse data, how, why, for what purposes, and when. Data creators cannot anticipate all possible reuses or reusers; our goal is to identify factors that may aid stakeholders in deciding how to invest in research data, how to identify potential reuses and reusers, and how to improve data exchange processes. Drawing upon empirical studies of data sharing and reuse, we develop the theoretical construct of distance between data creator and data reuser, identifying six distance dimensions that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. These dimensions are primarily social in character, with associated technical aspects that can decrease — or increase — distances between creators and reusers. We identify the order of expected influence on data reuse and ways in which the six dimensions are interdependent. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies.

https://arxiv.org/abs/2402.07926

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"NIST Releases Version 2.0 of Research Data Framework (RDaF)"


The NIST RDaF is a resource for the entire research data community, including both organizations and individuals engaged in research data management in any discipline, community members from industry, universities, government departments and agencies, funders, and scholarly publishers. . . .

[It is:]

  • a map of the research data management ecosystem, organized by a canonical data lifecycle, into topics and subtopics;
  • a dynamic guide for research data stakeholders to understand best practices in research data management and dissemination;
  • a resource for understanding costs, benefits, and risks associated with research data management;
  • and a consensus document based on inputs and conversations amongst stakeholders in research data.

Providing perhaps the most comprehensive view of the research data ecosystem to date, the NIST RDaF comprises: 6 lifecycle stages, 50 topics, and 335 subtopics (programmatic and operational activities, concepts, and other important factors relevant to research data management with definitions), 14 overarching themes, 8 “generic” profiles (samples for common job functions or roles), and over 1,000 informational references (standards, guidelines, and policies, that assist stakeholders in addressing that subtopic).

http://tinyurl.com/fbep35xu

Open Scholarship in the Humanities


The book begins with the history of digital developments and their influence on the founding of international policies toward open scholarship. The concept of making research more freely available to the broader community, in practice, will require changes across every part of the system: government agencies, funders, university administrators, publishers, libraries, researchers and IT developers. To this end, the book sheds light on the urgent need for partnership and collaboration between diverse stakeholders to address multi-level barriers to both the policy and practical implementation of open scholarship. It also highlights the specific challenges confronted by the humanities which often makes their presentation in accessible open formats more costly and complex. Finally, the authors illustrate some promising international examples and ways forward for their implementation. The book ends by asking the reader to view their role as a researcher, university administrator, or member of government or philanthropic funding body, through new lenses. It highlights how, in our digital era, the frontiers through which knowledge is being advanced and shared can reshape the landscape for academic research to have the greatest impact for society.

http://tinyurl.com/2453s6du

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Realities of Academic Data Sharing (RADS) Initiative Releases Reports on Expenses of Making Data Publicly Accessible, Project Methodology"


This report presents data on the average yearly cost of DMS activities for institutional units, as well as direct DMS expenses incurred by researchers per funded research project. These expenses were then analyzed together, showing an average combined overall cost of $2,500,000 (with total institutional expenses ranging from approximately $800,000 to over $6,000,000).

http://tinyurl.com/5xsw32we

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Enhancing the FAIRness of Arctic Research Data Through Semantic Annotation"


The National Science Foundation’s Arctic Data Center is the primary data repository for NSF-funded research conducted in the Arctic. There are major challenges in discovering and interpreting resources in a repository containing data as heterogeneous and interdisciplinary as those in the Arctic Data Center. This paper reports on advances in cyberinfrastructure at the Arctic Data Center that help address these issues by leveraging semantic technologies that enhance the repository’s adherence to the FAIR data principles and improve the Findability, Accessibility, Interoperability, and Reusability of digital resources in the repository. We describe the Arctic Data Center’s improvements. We use semantic annotation to bind metadata about Arctic data sets with concepts in web-accessible ontologies. The Arctic Data Center’s implementation of a semantic annotation mechanism is accompanied by the development of an extended search interface that increases the findability of data by allowing users to search for specific, broader, and narrower meanings of measurement descriptions, as well as through their potential synonyms. Based on research carried out by the DataONE project, we evaluated the potential impact of this approach, regarding the accessibility, interoperability, and reusability of measurement data. Arctic research often benefits from having additional data, typically from multiple, heterogeneous sources, that complement and extend the bases – spatially, temporally, or thematically – for understanding Arctic phenomena. These relevant data resources must be ‘found’, and ‘harmonized’ prior to integration and analysis. The findings of a case study indicated that the semantic annotation of measurement data enhances the capabilities of researchers to accomplish these tasks.

https://doi.org/10.5334/dsj-2024-002

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Data Management in Distributed, Federated Research Infrastructures: The Case of EPOS"


Data management is a key activity when Open Data stewardship through services complying with the FAIR principles is required, as it happens in many National and European initiatives. Existing guidelines and tools facilitate the drafting of Data Management Plans by focusing on a set of common parameters or questions. In this paper we describe how data management is carried out in EPOS, the European Research Infrastructure for providing access to integrated data and services in the solid Earth domain. EPOS relies on a federated model and is committed to remain operational in the long term. In EPOS, five key dimensions were identified for the Federated Data Management, namely the management of: thematic data; e-infrastructure for data integration; community of data providers committed to data provision processes; sustainability; and policies. On the basis of the EPOS experience, which is to some extent applicable to other research infrastructures, we propose additional components that may extend the EU Horizon 2020 Data Management Guidelines template, thus comprehensively addressing the Federated Data Management in the context of distributed Research Infrastructures.

https://doi.org/10.5334/dsj-2024-005

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Do Disappearing Data Repositories Pose a Threat to Open Science and the Scholarly Record? "


Only little more than half of the research data repositories in the sample have detailed strategies they use to mitigate data loss. It is important to note that none of the strategies analysed offers a permanent solution; instead, infrastructure maintenance requires continuous efforts. The burden of infrastructure maintenance and data preservation is currently placed on individual repositories alone; preservation systems comparable to those for scholarly texts, such as CLOCKSS, are not widely spread and can be difficult to realise.

http://tinyurl.com/3snrhxpk

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"DataCite Launches First Release of the Data Citation Corpus"


DataCite, in partnership with the Chan Zuckerberg Initiative (CZI), is delighted to announce the first release of the Data Citation Corpus. A major milestone in the Make Data Count initiative, the release makes eight million data citations openly available and usable for the first time via an interactive dashboard and public data file.

https://makedatacount.org/first-release-of-the-open-global-data-citation-corpus/

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Agile Research Data Management with Open Source: LinkAhead"


Research data management (RDM) in academic scientific environments increasingly enters the focus as an important part of good scientific practice and as a topic with big potentials for saving time and money. Nevertheless, there is a shortage of appropriate tools, which fulfill the specific requirements in scientific research. We identified where the requirements in science deviate from other fields and proposed a list of requirements which RDM software should answer to become a viable option. We analyzed a number of currently available technologies and tool categories for matching these requirements and identified areas where no tools can satisfy researchers’ needs. Finally we assessed the open-source RDMS (research data management system) LinkAhead for compatibility with the proposed features and found that it fulfills the requirements in the area of semantic, flexible data handling in which other tools show weaknesses.

https://doi.org/10.48694/inggrid.3866

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"RDA Professionalising Data Stewardship — What Does a Career Track for Data Stewards Look Like?"


The report "What does a career track for data stewards look like?" provides an initial discussion of the results of the RDA IG Professionalising Data Stewardship career tracks survey completed in 2022. The survey asked respondents who self-identified as performing data stewardship roles about their job titles, educational background, match between educational background and area of professional activity, contract types, as well as future career perspectives. The report is of value to international and national projects and initiatives seeking to define and develop the professional role of data stewards as well as to organizations employing data stewards that seek to define career progression pathways for data stewards and better job satisfaction and job security of data stewards. The report is also of value to the emerging communities of data stewards because it provides evidence base for understanding which professionals already fulfill data stewardship roles and how these professionals perceive their career paths.

https://zenodo.org/records/10571388

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Towards a Shared Framework: A Classificatory Matrix for Teaching Data Standards"


Standards for research data can be a mystifying topic for both researchers and data professionals. A common source of confusion is that they are multipurpose: standards can (and should) be applied to both primary data and metadata, enabling a wide range of functions from the search features in a repository to the integration of disparate data sources. This paper reviews examples of classificatory approaches used by both librarians and researchers to describe data standards. This literature is synthesized into a classificatory matrix that can be used to map different types of standards. The matrix is constructed around two organizing principles: purpose (finding or using data) and type of information controlled (meaning or syntax). The objective of this classificatory exercise is to encourage further discussion about the misunderstandings between researchers and data support professionals and to spur further development of the educational resources needed to improve understanding and use of data standards.

https://doi.org/10.7191/jeslib.758

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Decades of Transformation: Evolution of the NASA Astrophysics Data System’s Infrastructure"


The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technological infrastructure underpinning it.

https://arxiv.org/abs/2401.09685

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"On the Readiness of Scientific Data for a Fair and Transparent Use in Machine Learning"


To ensure the fairness and trustworthiness of machine learning (ML) systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, the adoption of these practices by academic institutions has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their completeness and coverage of the requested dimensions, and trends in recent years, putting special emphasis on the most and least documented dimensions. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data’s preparedness for its transparent and fairer use in ML technologies.

https://arxiv.org/abs/2401.10304

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Scaling Up: How Data Curation Can Help Address Key Issues in Qualitative Data Reuse and Big Social Research


This book explores the connections between qualitative data reuse, big social research, and data curation. A review of existing literature identifies the key issues of context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property and data ownership. Through interviews of qualitative researchers, big social researchers, and data curators, the author further examines each key issue and produces new insights about how domain differences affect each community of practice’s viewpoints, different strategies that researchers and curators use to ensure responsible practice, and different perspectives on data curation.

https://doi.org/10.1007/978-3-031-49222-8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Towards a Quality Indicator for Research Data Publications and Research Software publications — A Vision from the Helmholtz Association"


Research data and software are widely accepted as an outcome of scientific work. However, in comparison to text-based publications, there is not yet an established process to assess and evaluate quality of research data and research software publications. This paper presents an attempt to fill this gap. Initiated by the Working Group Open Science of the Helmholtz Association the Task Group Helmholtz Quality Indicators for Data and Software Publications currently develops a quality indicator for research data and research software publications to be used within the Association. This report summarizes the vision of the group of what all contributes to such an indicator. The proposed approach relies on generic well-established concepts for quality criteria, such as the FAIR Principles and the COBIT Maturity Model. It does — on purpose — not limit itself to technical implementation possibilities to avoid using an existing metric for a new purpose. The intention of this paper is to share the current state for further discussion with all stakeholders, particularly with other groups also working on similar metrics but also with entities that use the metrics.

https://arxiv.org/abs/2401.08804

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Data Stewardship: Case Studies from North-American, Dutch, and Finnish Universities"


This work seeks to elaborate the picture of different data stewardship programs running in different institutional arrangements and research environments. Design/methodology/approach – Drawing from autoethnography and case study methods, this study described three distinct data stewardship programs from Purdue University (United States), Delft Technical University (Netherlands) and Aalto University (Finland). In addition, this work investigated the institutional arrangements and national research environments of the programs. The focus was on initiatives led by academic libraries or similar services. Findings – This work demonstrates that data stewardship may be understood differently within different national and institutional contexts. The data stewardship programs differed in terms of roles, organization and funding structures. Moreover, the mesh of policies and legislation, organizational structures, and national infrastructures differed.

https://arxiv.org/abs/2312.04092

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |