Study on the Readiness of Research Data and Literature Repositories to Facilitate Compliance With the Open Science Horizon Europe MGA Requirement

In this study we analysed 220 repositories and, via a structured methodology, we identified 165 trusted repositories and tested their readiness to facilitate the compliance with the HE MGA Open Science requirements.

We show that it is not straightforward to assess whether a given repository is suitable to facilitate compliance with the HE MGA requirements. This is mainly due to varying interpretations of definitions and requirements, whether information on repository specifications is publicly available, and the high level of technical expertise needed to assess all requirements.

We highlight that repository registries, such as FAIRsharing, re3data or the CoreTrustSeal (CTS) website, are not sufficient on their own to assess the readiness of repositories to facilitate compliance with the HE MGA requirements, as the definition of what constitutes a trusted repository is subtle and varied and needs to be carefully interpreted and applied to repositories. This is also the case for related concepts such as community endorsement or for policy requirements in terms of preservation, curation and security of the repository contents.

https://doi.org/10.5281/zenodo.7728016

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Characterizing Data Practices in Research Papers Across Four Disciplines"


In this paper, we focus on the five most common types of RDP — collecting data, processing data, analyzing data, representing data, and publishing or citing data. First, we compared the distributions of the five types of RDP across disciplines and observed noticeable differences between disciplines. In addition, we examined the characteristics of each type of RDP under different disciplinary contexts, by developing discipline-specific RDP vocabulary employing the tf-idf approach. Based on the common terms as well as the discipline-specific ones, we found that the five types of RDP can be distinctly conceptualized, while each type of RDP varies by disciplines in terms of their action, object, and instrument.

https://doi.org/10.1007/978-3-031-28035-1_26

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Trustworthy Digital Repository Certification: A Longitudinal Study"


To understand the impact of certification on repositories’ infrastructure, processes, and services, we analyzed a sample of publicly available TDR audit reports (n = 175) from the Data Seal of Approval (DSA) and Core Trust Seal (CTS) certification programs. This first longitudinal study of TDR certification over a ten-year period (from 2010 to 2020) found that many repositories either maintain a relatively high standard of trustworthiness in terms of their compliance with guidelines in DSA or CTS standards or improve their trustworthiness by raising their compliance levels with these guidelines each time they get recertified.

https://doi.org/10.1007/978-3-031-28032-0_42

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Evolution of Research Data Management in Academic Libraries: A Review of the Literature"


The study is qualitative in nature and based on an extensive literature review survey. The analysis of the reviewed literature reveals that the idea of RDM has emerged as a new addition to library research support services. The more recent literature clearly established the pivotal role of libraries and librarians in developing and managing RDM services. However, data sharing practices and the development of RDM services in libraries are more prevalent in developed countries. While these trends are still lacking among researchers and libraries in developing countries.

https://doi.org/10.1177/02666669231157405

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Are the Humanities Ready for Data Sharing?


To get a sense of trends in data sharing within the humanities, we conducted semi-structured interviews with key personnel at several humanities projects with strong data components. The interviews focused on identifying where and how they planned to share their research data, how they imagined it might be used by others, and their perspective on barriers and opportunities to data sharing in the humanities. The research agendas, skills, and perspectives of the people we spoke with are not representative of most humanities-oriented research. However, the interviews provide important insight into the thinking of humanists who are already working across the cultural divide around data that separate the humanities from most other academic disciplines. We use them here as a springboard for consideration of what humanities data is, how to access and preserve it, and how it fits into the larger goals of creating an open research culture.

https://doi.org/10.18665/sr.318526

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Ten Lessons for Data Sharing with a Data Commons"


A data commons is a cloud-based data platform with a governance structure that allows a community to manage, analyze and share its data. Data commons provide a research community with the ability to manage and analyze large datasets using the elastic scalability provided by cloud computing and to share data securely and compliantly, and, in this way, accelerate the pace of research. Over the past decade, a number of data commons have been developed and we discuss some of the lessons learned from this effort.

https://doi.org/10.1038/s41597-023-02029-x

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Open Data and the 2023 NIH Data Management and Sharing Policy"


As the largest public funder of biomedical research in the world, the National Institutes of Health’s (NIH) new Data Management and Sharing (DMS) Policy is a large step toward shifting the culture of medical research toward a broader sharing of scientific data. . . . This article will serve as a primer on open data, data sharing, the NIH’s DMS Policy and its implications, and how librarians can support researchers in this landscape.

https://doi.org/10.1080/02763869.2023.2168103

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

79.3 Exabytes Capacity Sold in 2022: "Magnetic Tape Storage Is Seeing Cloud Go Back to the Future for Its Archival Data Needs"


Even then [in 1981], says Goodwin, people were saying tape was not long for this world. Those critics appear to have been silenced by recent sales figures, which show year-on-year shipments of hard disk drives (HDDs) sink by 34% in 2022, while consignments of magnetic tape drives rose by 14% — a total of 79.3 exabytes, or roughly equivalent to the entirety of data created on the internet every 32 days.

bit.ly/3ky5Trv

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Research Data Management Needs Assessment for Social Sciences Graduate Students: A Mixed Methods Study"


The complexity and privacy issues inherent in social science research data makes research data management (RDM) an essential skill for future researchers. Data management training has not fully addressed the needs of graduate students in the social sciences. To address this gap, this study used a mixed methods design to investigate the RDM awareness, preparation, confidence, and challenges of social science graduate students. A survey measuring RDM preparedness and training needs was completed by 98 graduate students in a school of education at a research university in the southern United States. Then, interviews exploring data awareness, knowledge of RDM, and challenges related to RDM were conducted with 10 randomly selected graduate students. All participants had low confidence in using RDM, but United States citizens had higher confidence than international graduate students. Most participants were not aware of on-campus RDM services, and were not familiar with data repositories or data sharing. Training needs identified for social science graduate students included support with data documentation and organization when collaborating, using naming procedures to track versions, data analysis using open access software, and data preservation and security. These findings are significant in highlighting the topics to cover in RDM training for social science graduate students. Additionally, RDM confidence and preparation differ between populations so being aware of the backgrounds of students taking the training will be essential for designing student-centered instruction.

https://doi.org/10.1371/journal.pone.0282152

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How and Why Do Researchers Reference Data? A Study of Rhetorical Features and Functions of Data References in Academic Articles"


Data reuse is a common practice in the social sciences. While published data play an essential role in the production of social science research, they are not consistently cited, which makes it difficult to assess their full scholarly impact and give credit to the original data producers. Furthermore, it can be challenging to understand researchers’ motivations for referencing data. Like references to academic literature, data references perform various rhetorical functions, such as paying homage, signaling disagreement, or drawing comparisons. This paper studies how and why researchers reference social science data in their academic writing. We develop a typology to model relationships between the entities that anchor data references, along with their features (access, actions, locations, styles, types) and functions (critique, describe, illustrate, interact, legitimize). We illustrate the use of the typology by coding multidisciplinary research articles (n=30) referencing social science data archived at the Inter-university Consortium for Political and Social Research (ICPSR). We show how our typology captures researchers’ interactions with data and purposes for referencing data. Our typology provides a systematic way to document and analyze researchers’ narratives about data use, extending our ability to give credit to data that support research.

https://arxiv.org/abs/2302.08477

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Management Librarians Role in a Large Interdisciplinary Scientific Grant for PFAS Remediation: Considerations and Recommendations"


This article explores the conflicts, disparities, and inequalities experienced by two librarians when collaborating on a federal grant proposal. The authors discuss concerns related to time and salary expectations and the inequities that can occur during faculty and staff collaborations on research grants. The bureaucratic structure and the job classifications of staff at academic institutions in addition to the contract limitations of non-faculty status librarian positions can hinder successful collaborations. The authors also describe data management needs that may occur when working with interdisciplinary research teams and detail the type of work that is included in writing a data management grant. This article concludes with considerations and recommendations for other data librarians who may undertake similar projects with a focus on ways to create parity between faculty and staff collaborators.

https://doi.org/10.7191/jeslib.616

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"There’s No “I” in Research Data Management: Reshaping RDM Services Toward a Collaborative Multi-Stakeholder Model"


Objective: This article examines a reshaped service model for research data management (RDM) founded on centralized and cohesive collaboration between multiple stakeholders at a large research university in Canada. This initiative, along with a newly formed team dedicated to RDM service provision, is a joint effort by the institution’s Vice-Principal Research and Innovation (VPRI), Library, IT Services, and Research Ethics units.

Methods: This article presents a single case study methodology. The authors reflect on services such as "query the panel" sessions where researchers across all disciplines bring their questions to representatives from the Library, IT, Research Ethics, and VPRI. This case study also highlights the use of Jira’s service desk software as a user management system. The authors also present descriptive statistics representing engagement with this new unit and our services.

Results: Support for RDM requires expertise from multiple domains. With a collaborative approach as a guiding principle and a focus on establishing a small, but agile team comprised of a librarian along with stakeholders from IT and VPRI, it is possible to leverage resources and support for RDM from a broad range of units across an institution.

Conclusions: At many institutions, RDM services are siloed within the library or an adjacent campus unit. New digital technologies have profoundly transformed academic research across all disciplines, necessitating the evolution of corresponding research data-related services. The authors will conclude by outlining specific lessons learned in reshaping digital research infrastructure-related services at their institution.

https://doi.org/10.7191/jeslib.624

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Are Institutional Research Data Policies in the US Supporting the FAIR Principles? A Content Analysis"


Objective: The FAIR principles were created with the goal of enhancing the reusability of research data and to give guidance on how to make data Findable, Accessible, Interoperable and Reusable. In this article we explore the role of institutional research data policies in enabling and encouraging researchers at their institutions to generate FAIR data.

Methods: We identified the research data policies in place for “very high research activity” institutions (as defined by Carnegie classification) in the United States. We created a list of 31 criteria, based on previous work by Davidson et al. (2019) and Briney et al. (2015), and evaluated the 40 policies using a content analysis methodology.

Results: The guiding principles and the definitions for research data in the policies support the idea that institutional policies are a potential tool for the implementation of the FAIR principles. However, our analysis indicates that they are not generally used for that purpose. Only one policy mentions FAIR. Data sharing is mentioned in half of the policies, but 11 of these only note this concept in the context of funder requirements. Access and retention sections are mostly written without considering publicly available data. Twenty-nine policies do not mention data documentation.

Conclusions: We discuss ways in which these institutional policies represent a missed opportunity to implement the FAIR principles and suggest ways policies could be modified to encourage researchers to follow them. We also discuss future research opportunities to examine how policy implementation may affect what institutional support researchers receive.

https://doi.org/10.7191/jeslib.614

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Rethinking Data and Rebalancing Digital Power


This report highlights and contextualises four cross-cutting interventions with a strong potential to reshape the digital ecosystem:

  • Transforming infrastructure into open and interoperable ecosystems.
  • Reclaiming control of data from dominant companies.
  • Rebalancing the centres of power with new (non-commercial) institutions.
  • Ensuring public participation as an essential component of technology policymaking.

http://bit.ly/40WNbKA

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The Role of Open Data in Digital Society: The Analysis of Scientific Trending Topics through a Bibliometric Approach"


The analysis of contemporary society, characterized by technological, economic, political, social, and cultural changes, has become more challenging due to the development of the internet and information and communication technologies, which provide a vast and increasingly valuable source of information, knowledge, and data. Within this context, so-called open data—that is, data that are made public, especially by public administrations, through an open governance model (transparent and accessible to citizens) are assuming a significant role. This is a topic of growing importance that scientific research is addressing in an attempt to discern the multiplicity of social, educational, legal, technological, statistical, and methodological issues that underlie the creation and use of such data. This article aims to provide insights into understanding scientific trends on the topic of open data through a bibliometric approach. Specifically, a total of 3,110 publications related to the disciplinary fields of the social sciences and humanities published from 2013 to 2022 were collected. The data was then analyzed using network and factorial analysis techniques to detect the conceptual structure to identify the trends of topics and perspectives of research that characterize open data studies.

bit.ly/40Xgahi

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Archive for the BRAIN Initiative (DABI)"


Data sharing is becoming ubiquitous and can be advantageous for most biomedical research. However, some data are inherently more amenable to sharing than others. For example, human intracranial neurophysiology recordings and associated multimodal data have unique features that warrant special considerations. The associated data are heterogeneous, difficult to compare, highly specific, and collected from small cohorts with treatment resistant conditions, posing additional complications when attempting to perform generalizable analyses across projects. We present the Data Archive for the BRAIN Initiative (DABI) and describe features of the platform that are designed to overcome these and other challenges. DABI is a data repository and portal for BRAIN Initiative projects that collect human and animal intracranial recordings, and it allows users to search, visualize, and analyze multimodal data from these projects. The data providers maintain full control of data sharing privileges and can organize and manage their data with a user-friendly and intuitive interface. We discuss data privacy and security concerns, example analyses from two DABI datasets, and future goals for DABI.

https://doi.org/10.1038/s41597-023-01972-z

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Data Preservation in High Energy Physics: DPHEP Global Report 2022


This document summarizes the status of data preservation in high energy physics. The paradigms and the methodological advances are discussed from a perspective of more than ten years of experience with a structured effort at international level. The status and the scientific return related to the preservation of data accumulated at large collider experiments are presented, together with an account of ongoing efforts to ensure long-term analysis capabilities for ongoing and future experiments. Transverse projects aimed at generic solutions, most of which are specifically inspired by open science and FAIR principles, are presented as well. A prospective and an action plan are also indicated.

https://arxiv.org/abs/2302.03583

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"What are Researchers’ Needs in Data Discovery? Analysis and Ranking of a Large-Scale Collection of Crowdsourced Use Cases"


Data discovery is important to facilitate data re-use. In order to help frame the development and improvement of data discovery tools, we collected a list of requirements and users’ wishes. This paper presents the analysis of these 101 use cases to examine data discovery requirements; these cases were collected between 2019 and 2020. We categorized the information across 12 "topics" and eight types of users. While the availability of metadata was an expected topic of importance, users were also keen on receiving more information on data citation and a better overview of their field. We conducted and analysed a survey among data infrastructure specialists in a first attempt at ranking the requirements. Between these data professionals, these rankings were very different, excepting the availability of metadata and data quality assessment.

http://doi.org/10.5334/dsj-2023-003

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Sharing and Reuse Practices: Disciplinary Differences and Improvements Needed"


This study investigates differences and commonalities in data production, sharing and reuse across the widest range of disciplines yet and identifies types of improvements needed to promote data sharing and reuse. . . .From the 3,257 survey responses, data sharing and reuse are still increasing but not ubiquitous in any subject area and are more common among experienced researchers.

https://doi.org/10.1108/OIR-08-2021-0423

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Analysis of U.S. Federal Funding Agency Data Sharing Policies: 2020 Highlights and Key Observations "


Federal funding agencies in the United States (U.S.) continue to work towards implementing their plans to increase public access to funded research and comply with the 2013 Office of Science and Technology memo Increasing Access to the Results of Federally Funded Scientific Research. In this article we report on an analysis of research data sharing policy documents from 17 U.S. federal funding agencies as of February 2021. Our analysis is guided by two questions: 1.) What do the findings suggest about the current state of and trends in U.S. federal funding agency data sharing requirements? 2.) In what ways are universities, institutions, associations, and researchers affected by and responding to these policies? Over the past five years, policy updates were common among these agencies and several themes have been thoroughly developed in that time; however, uncertainty remains around how funded researchers are expected to satisfy these policy requirements.

http://www.ijdc.net/article/view/791

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?"


Today’s researchers expect to be able to complete text and data mining (TDM) work on many types of textual data. But they are often blocked more by contractual limitations on what data they can use, and how they can use it, than they are by what data may be available to them. This article lays out the different types of TDM processes currently in use, the issues that may block researchers from being able to do the work they would like, and some possible solutions.

https://doi.org/10.31274/jlsc.15530

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "LABDRIVE, a Petabyte Scalable, OAIS/ISO 16363 Conformant, for Scientific Research Organisations to Preserve Documents, Processed Data, and Software"


Before LABDRIVE no system could adequately preserve such information, especially in such gigantic volume and variety. In this paper we describe the development of LABDRIVE and its ability to preserve and to scale up to tens or hundreds of Petabytes in a way which is conformant to the OAIS Reference Model and capable of being ISO 16363 certified.

https://doi.org/10.1109/BigData55660.2022.10020648

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Management Plans: Implications for Automated Analyses"


Data management plans (DMPs) are an essential part of planning data-driven research projects and ensuring long-term access and use of research data and digital objects; however, as text-based documents, DMPs must be analyzed manually for conformance to funder requirements. This study presents a comparison of DMPs evaluations for 21 funded projects using 1) an automated means of analysis to identify elements that align with best practices in support of open research initiatives and 2) a manually-applied scorecard measuring these same elements. The automated analysis revealed that terms related to availability (90% of DMPs), metadata (86% of DMPs), and sharing (81% of DMPs) were reliably supplied. Manual analysis revealed 86% (n = 18) of funded DMPs were adequate, with strong discussions of data management personnel (average score: 2 out of 2), data sharing (average score 1.83 out of 2), and limitations to data sharing (average score: 1.65 out of 2). This study reveals that the automated approach to DMP assessment yields less granular yet similar results to manual assessments of the DMPs that are more efficiently produced. Additional observations and recommendations are also presented to make data management planning exercises and automated analysis even more useful going forward.

http://doi.org/10.5334/dsj-2023-002

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |