Data Preservation in High Energy Physics: DPHEP Global Report 2022


This document summarizes the status of data preservation in high energy physics. The paradigms and the methodological advances are discussed from a perspective of more than ten years of experience with a structured effort at international level. The status and the scientific return related to the preservation of data accumulated at large collider experiments are presented, together with an account of ongoing efforts to ensure long-term analysis capabilities for ongoing and future experiments. Transverse projects aimed at generic solutions, most of which are specifically inspired by open science and FAIR principles, are presented as well. A prospective and an action plan are also indicated.

https://arxiv.org/abs/2302.03583

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"What are Researchers’ Needs in Data Discovery? Analysis and Ranking of a Large-Scale Collection of Crowdsourced Use Cases"


Data discovery is important to facilitate data re-use. In order to help frame the development and improvement of data discovery tools, we collected a list of requirements and users’ wishes. This paper presents the analysis of these 101 use cases to examine data discovery requirements; these cases were collected between 2019 and 2020. We categorized the information across 12 "topics" and eight types of users. While the availability of metadata was an expected topic of importance, users were also keen on receiving more information on data citation and a better overview of their field. We conducted and analysed a survey among data infrastructure specialists in a first attempt at ranking the requirements. Between these data professionals, these rankings were very different, excepting the availability of metadata and data quality assessment.

http://doi.org/10.5334/dsj-2023-003

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Sharing and Reuse Practices: Disciplinary Differences and Improvements Needed"


This study investigates differences and commonalities in data production, sharing and reuse across the widest range of disciplines yet and identifies types of improvements needed to promote data sharing and reuse. . . .From the 3,257 survey responses, data sharing and reuse are still increasing but not ubiquitous in any subject area and are more common among experienced researchers.

https://doi.org/10.1108/OIR-08-2021-0423

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Analysis of U.S. Federal Funding Agency Data Sharing Policies: 2020 Highlights and Key Observations "


Federal funding agencies in the United States (U.S.) continue to work towards implementing their plans to increase public access to funded research and comply with the 2013 Office of Science and Technology memo Increasing Access to the Results of Federally Funded Scientific Research. In this article we report on an analysis of research data sharing policy documents from 17 U.S. federal funding agencies as of February 2021. Our analysis is guided by two questions: 1.) What do the findings suggest about the current state of and trends in U.S. federal funding agency data sharing requirements? 2.) In what ways are universities, institutions, associations, and researchers affected by and responding to these policies? Over the past five years, policy updates were common among these agencies and several themes have been thoroughly developed in that time; however, uncertainty remains around how funded researchers are expected to satisfy these policy requirements.

http://www.ijdc.net/article/view/791

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?"


Today’s researchers expect to be able to complete text and data mining (TDM) work on many types of textual data. But they are often blocked more by contractual limitations on what data they can use, and how they can use it, than they are by what data may be available to them. This article lays out the different types of TDM processes currently in use, the issues that may block researchers from being able to do the work they would like, and some possible solutions.

https://doi.org/10.31274/jlsc.15530

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Long-Term Preservation and Reusability of Open Access Scholar-Led Press Monographs"


This brief report outlines some initial findings and challenges identified by the Community-Led Open Publication Infrastructures for Monographs (COPIM) project when looking to archive and preserve open access books produced by small, scholar-led presses. This paper is based on the research conducted by Work Package 7 in COPIM, which has a focus on the preservation and archiving of open access monographs in all their complexity, along with any accompanying materials.

http://www.ijdc.net/article/view/826

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"A Sustainable Infrastructure Concept for Improved Accessibility, Reusability, and Archival of Research Software "


Research software is an integral part of most research today and it is widely accepted that research software artifacts should be accessible and reproducible. However, the sustainable archival of research software artifacts is an ongoing effort. We identify research software artifacts as snapshots of the current state of research and an integral part of a sustainable cycle of software development, research, and publication. We develop requirements and recommendations to improve the archival, access, and reuse of research software artifacts based on installable, configurable, extensible research software, and sustainable public open-access infrastructure. The described goal is to enable the reuse and exploration of research software beyond published research results, in parallel with reproducibility efforts, and in line with the FAIR principles for data and software. Research software artifacts can be reused in varying scenarios. To this end, we design a multi-modal representation concept supporting multiple reuse scenarios. We identify types of research software artifacts that can be viewed as different modes of the same software-based research result, for example, installation-free configurable browser-based apps to containerized environments, descriptions in journal publications and software documentation, or source code with installation instructions. We discuss how the sustainability and reuse of research software are enhanced or enabled by a suitable archive infrastructure. Finally, at the example of a pilot project at the University of Stuttgart, Germany—a collaborative effort between research software developers and infrastructure providers—we outline practical challenges and experiences

https://arxiv.org/abs/2301.12830

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "LABDRIVE, a Petabyte Scalable, OAIS/ISO 16363 Conformant, for Scientific Research Organisations to Preserve Documents, Processed Data, and Software"


Before LABDRIVE no system could adequately preserve such information, especially in such gigantic volume and variety. In this paper we describe the development of LABDRIVE and its ability to preserve and to scale up to tens or hundreds of Petabytes in a way which is conformant to the OAIS Reference Model and capable of being ISO 16363 certified.

https://doi.org/10.1109/BigData55660.2022.10020648

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Management Plans: Implications for Automated Analyses"


Data management plans (DMPs) are an essential part of planning data-driven research projects and ensuring long-term access and use of research data and digital objects; however, as text-based documents, DMPs must be analyzed manually for conformance to funder requirements. This study presents a comparison of DMPs evaluations for 21 funded projects using 1) an automated means of analysis to identify elements that align with best practices in support of open research initiatives and 2) a manually-applied scorecard measuring these same elements. The automated analysis revealed that terms related to availability (90% of DMPs), metadata (86% of DMPs), and sharing (81% of DMPs) were reliably supplied. Manual analysis revealed 86% (n = 18) of funded DMPs were adequate, with strong discussions of data management personnel (average score: 2 out of 2), data sharing (average score 1.83 out of 2), and limitations to data sharing (average score: 1.65 out of 2). This study reveals that the automated approach to DMP assessment yields less granular yet similar results to manual assessments of the DMPs that are more efficiently produced. Additional observations and recommendations are also presented to make data management planning exercises and automated analysis even more useful going forward.

http://doi.org/10.5334/dsj-2023-002

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Canadian Policy: Data Management Requirement Takes Effect in March"


Canadian institutions are preparing for a research data management policy developed by three major federal granting agencies to go into effect this March. The policy of the Tri-Agency Council, comprising the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC), asserts that "research data collected through the use of public funds should be responsibly and securely managed and be, where ethical, legal and commercial obligations allow, available for reuse by others."

https://cutt.ly/N9vGKLh

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Community Consensus on Core Open Science Practices to Monitor in Biomedicine"


The state of open science needs to be monitored to track changes over time and identify areas to create interventions to drive improvements. In order to monitor open science practices, they first need to be well defined and operationalized. To reach consensus on what open science practices to monitor at biomedical research institutions, we conducted a modified 3-round Delphi study. Participants were research administrators, researchers, specialists in dedicated open science roles, and librarians. In rounds 1 and 2, participants completed an online survey evaluating a set of potential open science practices, and for round 3, we hosted two half-day virtual meetings to discuss and vote on items that had not reached consensus. Ultimately, participants reached consensus on 19 open science practices. This core set of open science practices will form the foundation for institutional dashboards and may also be of value for the development of policy, education, and interventions.

https://doi.org/10.1371/journal.pbio.3001949

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"An Iterative and Interdisciplinary Categorisation Process towards FAIRer Digital Resources for Sensitive Life-Sciences Data"


For life science infrastructures, sensitive data generate an additional layer of complexity. Cross-domain categorisation and discovery of digital resources related to sensitive data presents major interoperability challenges. To support this FAIRification process, a toolbox demonstrator aiming at support for discovery of digital objects related to sensitive data (e.g., regulations, guidelines, best practice, tools) has been developed. The toolbox is based upon a categorisation system developed and harmonised across a cluster of 6 life science research infrastructures. Three different versions were built, tested by subsequent pilot studies, finally leading to a system with 7 main categories (sensitive data type, resource type, research field, data type, stage in data sharing life cycle, geographical scope, specific topics). 109 resources attached with the tags in pilot study 3 were used as the initial content for the toolbox demonstrator, a software tool allowing searching of digital objects linked to sensitive data with filtering based upon the categorisation system. Important next steps are a broad evaluation of the usability and user-friendliness of the toolbox, extension to more resources, broader adoption by different life-science communities, and a long-term vision for maintenance and sustainability.

https://doi.org/10.1038/s41598-022-25278-z

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Challenges of Qualitative Data Sharing in Social Sciences"


Open science offers hope for new accountability and transparency in social sciences. Nevertheless, it still fails to fully consider the complexities of qualitative research, as exemplified by a reflection on sensitive qualitative data sharing. As a result, the developing patterns of rewards and sanctions promoting open science raise concern that quantitative research, whose "replication crisis" brought the open science movement to life, will benefit from "good science" re-evaluations at the expense of other research epistemologies, despite the necessity to define accountability and transparency in social sciences more widely and not to conflate those with either reproducibility or data sharing.

bit.ly/3j6NTTV

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

iPres 2022 International Conference on Digital Preservation Conference Proceedings


The proceedings are the official record of all the peer reviewed submissions presented at iPres 2022, ensuring visibility and promotion of both academic research work and the projects and initiatives of institutions involved in digital preservation practices.

bit.ly/3hlkFAx

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Wolters Kluwer: The Path to Open Medicine: Driving Global Health Equity through Medical Research


The paper is divided into three parts. Part 1 traces the historical events that led to the modern system of scientific research, funding, knowledge dissemination, and recognition, which largely confines health and medical knowledge production to those in HICs [high income countries]. By understanding our shared past and the rise of structural barriers to global health equity, we can better inform our shared path to dismantle them. Part 2 takes a clear-eyed look at where the scientific community is now. Are the ideals of Open Medicine playing out as envisioned? Are the benefits of Open Medicine shared amongst all of humanity, or with only a select few? Lastly, Part 3 offers ideas and recommendations for all stakeholders to chart a path to bring Open Medicine into alignment with its goals and aspirations.

https://cutt.ly/E15vETj

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

A Preservationist’s Guide to the DMCA Exemption for Software Preservation, 2nd Edition


In late 2021, the Library of Congress adopted several exemptions to the Digital Millennium Copyright Act (DMCA) provision prohibiting circumvention of technological measures that control access to copyrighted works. In other words, they created a set of exceptions to the general legal rule against cracking digital locks on things like DVDs, software, and video games. The exemptions are set out in regulations published by the Copyright Office. They went into effect on October 28, 2021 and last until October 28th, 2024. This guide is intended to help preservationists determine whether their activities are protected by the new exemptions. It includes important updates to the first edition to reflect changes in the rule to allow offsite access to non-game software, along with a few other technical changes.

https://doi.org/10.5281/zenodo.7328908

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Access to Research Data and EU Copyright"


The article seeks to contribute to this aim by exploring the legal framework in which research data can be accessed and used in EU copyright law. First, it delineates the authors’ understanding of research data. It then examines the protection research data currently receives under EU and Member State law via copyright and related rights, as well as the ownership of these rights by different stakeholders in the scientific community. After clarifying relevant conflict-of-laws issues that surround research data, it maps ways to legally access and use them, including statutory exceptions, the open science movement and current developments in law and practice.

bit.ly/3VVx7pg

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"New Report on Value and Utility of FAIR Implementation Profiles (FIPs) Available from the WorldFAIR project"


In the WorldFAIR project, CODATA (the Committee on Data of the International Science Council), with the RDA (Research Data Alliance) Association as a major partner, is working with a set of eleven disciplinary and cross-disciplinary case studies to advance implementation of the FAIR principles and, in particular, to improve interoperability and reusability of digital research objects, including data.

To that end, the WorldFAIR project created a range of FAIR Implementation Profiles (FIPs) between July and October 2022 to better understand current FAIR data-related practices. The report, "FAIR Implementation Profiles (FIPs) in WorldFAIR: What Have We Learnt?", is published this week and available at https://doi.org/10.5281/zenodo.7378109.

The report describes the WorldFAIR project, its objectives and its rich set of Case Studies; and it introduces FIPs as a methodology for listing the FAIR implementation decisions made by a given community of practice. Subsequently, the report gives an overview of the initial feedback and findings from the Case Studies, and considers a number of issues and points of discussion that emerged from this exercise. Finally, and most importantly, we describe how we think the experience of using FIPs will assist each Case Study in its work to implement FAIR, and will assist the project as a whole in the development of two key outputs: the Cross-Domain Interoperability Framework (CDIF), and domain-sensitive recommendations for FAIR assessment.

https://cutt.ly/x1NDUAd

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Big Data-Driven Investigation into the Maturity of Library Research Data Services (RDS)"


The creation of library research data services (RDS) requires assessment of their maturity, i.e., the primary objective of this study. Its authors have set out to probe the nationwide level of library RDS maturity, based on the RDS maturity model, as proposed by Cox et al. (2019), while making use of natural language processing (NLP) tools, typical for big data analysis. The secondary objective consisted in determining the actual suitability of the above-referenced tools for this particular type of assessment.

https://doi.org/10.1016/j.acalib.2022.102646

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Federating Research Infrastructures in Europe for Fair Access to Data: Science Europe Briefing on EOSC

The European research and innovation ecosystem is going through a period of profound change. Researchers, organisations that fund or perform research, and policymakers are reshaping the research process and its outputs based on the opportunities offered by the digital transition. The findability, accessibility, interoperability, and reusability (FAIRness) of research publications, data, and software in the digital space will define research and innovation going forward. Closely related, the transition to an open research process and Open Access of its outputs is becoming the ‘new normal’. One of the most prominent initiatives in the digital and open transition of research is the European Open Science Cloud (EOSC). This federation of existing research data infrastructures in Europe aims to create a web of FAIR data and related services for research.

https://doi.org/10.5281/zenodo.7346887

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Adoption of Transparency and Openness Promotion (TOP) Guidelines across Journals"


Journal policies continuously evolve to enable knowledge sharing and support reproducible science. However, that change happens within a certain framework. Eight modular standards with three levels of increasing stringency make Transparency and Openness Promotion (TOP) guidelines which can be used to evaluate to what extent and with which stringency journals promote open science. Guidelines define standards for data citation, transparency of data, material, code and design and analysis, replication, plan and study pre-registration, and two effective interventions: "Registered reports" and "Open science badges", and levels of adoption summed up across standards define journal’s TOP Factor. In this paper, we analysed the status of adoption of TOP guidelines across two thousand journals reported in the TOP Factor metrics. We show that the majority of the journals’ policies align with at least one of the TOP’s standards, most likely "Data citation" (70%) followed by "Data transparency" (19%). Two-thirds of adoptions of TOP standard are of the stringency Level 1 (less stringent), whereas only 9% is of the stringency Level 3. Adoption of TOP standards differs across science disciplines and multidisciplinary journals (N = 1505) and journals from social sciences (N = 1077) show the greatest number of adoptions. Improvement of the measures that journals take to implement open science practices could be done: (1) discipline-specific, (2) journals that have not yet adopted TOP guidelines could do so, (3) the stringency of adoptions could be increased.

https://doi.org/10.3390/publications10040046

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Open Science Infrastructure as a Key Component of Open Science"


The Open Science movement is a response to the accumulated problems in scholarly communication, like the "reproducibility crisis", "serials crisis", and "peer review crisis". The European Commission defines priorities of Open Science as Findable, Accessible, Interoperable and Reproducible (FAIR) data, infrastructure and services in the European Open Science Cloud (EOSC), Next generation metrics, altmetrics and rewards, the future of scientific communication, research integrity and reproducibility, education and skills and citizen science. Open Science Infrastructure is also one of four key components of Open Science defined by UNESCO.

Mainly represented among Open Science Infrastructures are institutional and thematic repositories for publications, research data, software and code. Furthermore, the Open Science Infrastructure services range may include discovery, mining, publishing, the peer review process, archiving and preservation, social networking tools, training, high-performance computing, and tools for processing and analysis. Successful Open Science Infrastructure should be based on community values and responsive to needed changes. Preferably the Open Science Infrastructure should be distributed, enabling machine-actionable tools and services, supporting reusability and reproducibility, quality FAIR data, interoperability, sustainability, long-term preservation and funding.

https://doi.org/10.7557/5.6777

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Why Don’t We Share Data and Code? Perceived Barriers and Benefits to Public Archiving Practices"


Here, we define, categorize and discuss barriers to data and code sharing that are relevant to many research fields. We explore how real and perceived barriers might be overcome or reframed in the light of the benefits relative to costs. By elucidating these barriers and the contexts in which they arise, we can take steps to mitigate them and align our actions with the goals of open science, both as individual scientists and as a scientific community.

https://doi.org/10.1098/rspb.2022.1113

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Quality Assurance at Research Data Repositories"


This paper presents findings from a survey on the status quo of data quality assurance practices at research data repositories.

The personalised online survey was conducted among repositories indexed in re3data in 2021. It covered the scope of the repository, types of data quality assessment, quality criteria, responsibilities, details of the review process, and data quality information and yielded 332 complete responses.

The results demonstrate that most repositories perform data quality assurance measures, and overall, research data repositories significantly contribute to data quality. Quality assurance at research data repositories is multifaceted and nonlinear, and although there are some common patterns, individual approaches to ensuring data quality are diverse. The survey showed that data quality assurance sets high expectations for repositories and requires a lot of resources. Several challenges were discovered: for example, the adequate recognition of the contribution of data reviewers and repositories, the path dependence of data review on review processes for text publications, and the lack of data quality information. The study could not confirm that the certification status of a repository is a clear indicator of whether a repository conducts in-depth quality assurance.

http://doi.org/10.5334/dsj-2022-018

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |