The Horizon Europe project ISIDORe is dedicated to pandemic preparedness and responsiveness research. It brings together 17 research infrastructures (RIs) and networks to provide a broad range of services to infectious disease researchers. An efficient and structured treatment of data is central to ISIDORe’s aim to furnish seamless access to its multidisciplinary catalogue of services, and to ensure that users’ results are treated FAIRly. ISIDORe therefore requires a data management plan (DMP) covering both access management and research outputs, applicable over a broad range of disciplines, and compatible with the constraints and existing practices of its diverse partners.
Here, we describe how, to achieve that aim, we undertook an iterative, step-by-step, process to build a community-approved living document, identifying good practices and processes, on the basis of use cases, presented as proof of concepts. International fora such as the RDA and EOSC, and primarily the BY-COVID project, furnished registries, tools and online data platforms, as well as standards, and the support of data scientists. Together, these elements provide a path for building an umbrella, FAIR-compliant DMP, aligned as fully as possible with FAIR principles, which could also be applied as a framework for data management harmonisation in other large-scale, challenge-driven projects. Finally, we discuss how data management and reuse can be further improved through the use of knowledge models when writing DMPs and, how, in the future, an inter-RI network of data stewards could contribute to the establishment of a community of practice, to be integrated subsequently into planned trans-RI competence centres.
ACS and Elsevier, members of the Coalition for Responsible Sharing, have agreed to a legal settlement with ResearchGate that ensures copyright-compliant sharing of research articles published with ACS or Elsevier on the ResearchGate site. The lawsuits pending against ResearchGate in Germany and the United States are now resolved. The specific terms of the parties’ settlement are confidential.
Although the benefits of organizational adoption are significant, most OAD-related projects fail because of organizational barriers and resistance to adoption. This study first aims to find these organizational barriers to adopting OAD to raise awareness of the obstacles organizations must overcome. Towards this aim, after conducting a systematic literature review (SLR) and an expert panel, a research model based on the Technology – Organization – Environment (TOE) framework is proposed in this study. As a result of SLR, 97 barriers were identified from ten primary studies. After critically examining these barriers, a research model classifying 22 crucial barriers to organizational OAD adoption based on the TOE framework is proposed.
Abstract Through studies and work developed over the last few years, we propose a new approach to description, where images can have a preponderant role in the description of data, assuming the role of metadata. We present several pieces of evidence, point out their challenges and determine the opportunities this new perspective can have in the research. Images have specific characteristics that can be leveraged in improving data description. Historical evidence establish that images have always been used and produced in research, yet their representational ability has never been harnessed to describe data and give more context to the scientific process.
Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing — or not citing — data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse.
Access to scientific data can enable independent reuse and verification; however, most data are not available and become increasingly irrecoverable over time. This study aimed to retrieve and preserve important datasets from 160 of the most highly-cited social science articles published between 2008-2013 and 2015-2018. We asked authors if they would share data in a public repository — the Data Ark — or provide reasons if data could not be shared. Of the 160 articles, data for 117 (73%, 95% CI [67% – 80%]) were not available and data for 7 (4%, 95% CI [0% – 12%]) were available with restrictions. Data for 36 (22%, 95% CI [16% – 30%]) articles were available in unrestricted form: 29 of these datasets were already available and 7 datasets were made available in the Data Ark. Most authors did not respond to our data requests and a minority shared reasons for not sharing, such as legal or ethical constraints. These findings highlight an unresolved need to preserve important scientific datasets and increase their accessibility to the scientific community.
The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed, which uses four literature databases (DBLP, SemanticScholar, OpenAlex, and CrossRef / CrossCite) to identify preprint-publication pairs for the arXiv preprint server. . . . Experiments were performed on a sample of 1,000 arXiv-preprints from the research field of computer science and without any publication information. . . . The results show that the PreprintResolver was able to resolve 603 out of 1,000 (60.3 %) arXiv-preprints from the research field of computer science and without any publication information. . . . In conclusion the PreprintResolver is suitable for individual, manually reviewed requests, but less suitable for bulk requests. The PreprintResolver tool (this https URL, Available from 2023-08-01) and source code (this https URL, Accessed: 2023-07-19) is available online.
Most research funders require Data Management Plans (DMPs). The review process can be time consuming, since reviewers read text documents submitted by researchers and provide their feedback. Moreover, it requires specific expert knowledge in data stewardship, which is scarce. Machine-actionable Data Management Plans (maDMPs) and semantic technologies increase the potential for automatic assessment of information contained in DMPs. However, the level of automation and new possibilities are still not well-explored and leveraged. This paper discusses methods for the automation of DMP assessment. It goes beyond generating human-readable reports. It explores how the information contained in maDMPs can be used to provide automated pre-assessment or to fetch further information, allowing reviewers to better judge the content. We map the identified methods to various reviewer goals.
Since September of 2019, a task group within the European Open Science Cloud – EOSC Nordic Project, work-package 5 (T5.3.2), has focused its attention on machine-actionable Data Management Plans (maDMPs). A delivery working-paper from the group (Hasan et al. 2021) concluded in summary that extracting useful information from traditional free-text based DMPs is problematic. While maDMPs are generally more FAIR compliant, and as such accessible to both humans and machines, more interoperable with other systems, and serving different stakeholders for processing, sharing, evaluation and reuse. Different DMP tools and templates have developed independently, to a varying degree, allowing for the creation of genuinely machine actionable DMPs. Here we will describe the first three tools or projects for creating maDMPs that were central parts of the original task group mission. We will get into a more detailed account of one of these, specifically the Stockholm University — EOSC Nordic maDMP project using the DMP Online tool, as described by Philipson (2021). We will also briefly touch upon some other current tools and projects for creating maDMPs that are compliant with the RDA DMP Common Standard (RDCS), aiming for integration with other research information systems or research data repositories. A possible conclusion from this overview is that the development of tools for maDMPs is progressing fast and seems to converge towards a common standard. Nonetheless, there remains an immense amount of work to get there.
Seven years after the seminal paper on FAIR was published, that introduced the concept of making research outputs Findable, Accessible, Interoperable, and Reusable, researchers still struggle to understand how to implement the principles. For many researchers, FAIR promises long-term benefits for near-term effort, requires skills not yet acquired, and is one more thing in a long list of unfunded mandates and onerous requirements for scientists. Even for those required to, or who are convinced that they must make time for FAIR research practices, their preference is for just-in-time advice properly sized to the scientific artifacts and process. Because of the generality of most FAIR implementation guidance, it is difficult for a researcher to adjust to the advice according to their situation. Technological advances, especially in the area of artificial intelligence (AI) and machine learning (ML), complicate FAIR adoption, as researchers and data stewards ponder how to make software, workflows, and models FAIR and reproducible. The FAIR+ Implementation Survey Tool (FAIRIST) mitigates the problem by integrating research requirements with research proposals in a systematic way. FAIRIST factors in new scholarly outputs, such as nanopublications and notebooks, and the various research artifacts related to AI research (data, models, workflows, and benchmarks). Researchers step through a self-serve survey process and receive a table ready for use in their data management plan (DMP) and/or work plan. while gaining awareness of the FAIR Principles and Open Science concepts. FAIRIST is a model that uses part of the proposal process as a way to do outreach, raise awareness of FAIR dimensions and considerations, while providing timely assistance for competitive proposals.
This study seeks to understand the relationship between research data management (RDM) services framed in the data curation life cycle and the production of open data. An electronic questionnaire was distributed to US researchers and RDM specialists, and the results were analyzed using Chi-Square tests for association. The data curation life cycle does associate with the production of open data and shareable research, but tasks like data management plans have stronger associations with the production of open data. The findings analyze the intersection of these concepts and provide insight into RDM services that facilitate the production of open data and shareable research.
Over a 10 year period Carol Tenopir of DataONE and her team conducted a global survey of scientists, managers and government workers involved in broad environmental science activities about their willingness to share data and their opinion of the resources available to do so. . . .
The most surprising result was that a higher willingness to share data corresponded with a decrease in satisfaction with data sharing resources across nations (e.g., skills, tools, training) (Fig.1). That is, researchers who did not want to share data were satisfied with the available resources, and those that did want to share data were dissatisfied. Researchers appear to only discover that the tools are insufficient when they begin the hard work of engaging in open science practices. This indicates that a cultural shift in the attitudes of researchers needs to precede the development of support and tools for data management.
Overall, R code was only available in 49 of the 1001 papers examined (4.9%) (Figure 1). When included, code was most often in the Supplemental Information (41%), followed by Github (20%), Figshare (6%), or other repositories (33%). Open-access publications were 70% more likely to include code than closed access publications (7.21% vs. 4.22%, X2 = 4.442, p < 0.05). Code-sharing was estimated to increase at 0.5% / year, but this trend was not significant (p=0.11). The year of 2021 and 2022 showed a shift towards more frequent sharing, but the percentage of code-sharing has been consistently below 15% over the past decade (Figure 1).
We found papers including code disproportionately impact the literature (Figure 2), and accumulate citations faster (i.e., a marginally significant year-by-code-inclusion interaction; p = 0.0863). Further, we found a significant interaction between Open Access and code inclusion (p = 0.0265), with publications meeting both Open Science criteria (i.e., open code and open access) having highest overall predicted citation rates (Figure 2). For example, Open Science papers are expected to receive more than doubled citations (96.25 vs. 36.89) in year 13 post-publication compared with fully closed papers (Figure 2).
This section highlights the significant changes to this document since the original plan was released in 2014. To wit:
- There shall be no publication embargo period for peer-reviewed publications
- Data that support peer-reviewed publications shall be made available in a public archive at the time of publication
- Software should be included as part of Open Access, subject to NASA software release requirements
- Software used to generate research findings/results should be made available in a public archive at the time of publication
- Other data products beyond peer-reviewed publications and software should be considered as part of Open Access
"We have learned that many publishers are requiring UC authors to sign misleading License to Publish agreements, which undermine the spirit and intent of [UC’s open access policies]," wrote Susan Cochran, Chair of the faculty Academic Senate PDF.
By purporting to restrict an author’s abilities to reuse their own work, "these agreements essentially turn faculty authors into readers, as opposed to creators and owners of their own work," the Academic Senate chair concludes.
The team that leads negotiations with scholarly publishers on behalf of the university, including representatives from UC’s California Digital Library, the 10 campus libraries, and the Academic Senate, is now taking up the charge, making author rights the next frontier in advocating for the UC research community.
Today, 96% of all code bases incorporate open-source software. GitHub, the biggest platform for the open-source community, is used by more than 100 million developers worldwide. The Biden administration’s Securing Open Source Software Act of 2022 publicly recognized open-source software as critical economic and security infrastructure. Even AWS, Amazon’s money-making cloud arm, supports the development and maintenance of open-source software; it committed its portfolio of patents to an open use community in December of last year. Over the last two years, while public trust in private technology companies has plummeted, organizations including Google, Spotify, the Ford Foundation, Bloomberg, and NASA have established new funding for open-source projects and their counterparts in open science efforts—an extension of the same values applied to scientific research.
Transparency and peer control are cornerstones of good scientific practice and entail the replication and reproduction of findings. The feasibility of replications, however, hinges on the premise that original researchers make their data and research code publicly available. This applies in particular to large-N observational studies, where analysis code is complex and may involve several ambiguous analytical decisions. To investigate which specific factors influence researchers’ code sharing behavior upon request, we emailed code requests to 1,206 authors who published research articles based on data from the European Social Survey between 2015 and 2020. In this preregistered multifactorial field experiment, we randomly varied three aspects of our code request’s wording in a 2x4x2 factorial design: the overall framing of our request (enhancement of social science research, response to replication crisis), the appeal why researchers should share their code (FAIR principles, academic altruism, prospect of citation, no information), and the perceived effort associated with code sharing (no code cleaning required, no information). Overall, 37.5% of successfully contacted authors supplied their analysis code. Of our experimental treatments, only framing affected researchers’ code sharing behavior, though in the opposite direction we expected: Scientists who received the negative wording alluding to the replication crisis were more likely to share their research code. Taken together, our results highlight that the availability of research code will hardly be enhanced by small-scale individual interventions but instead requires large-scale institutional norms.
As funder, journal, and disciplinary norms and mandates have foregrounded obligations of data sharing and opportunities for data reuse, the need to plan for and curate data sets that can reach researchers and end-users with disabilities has become even more urgent. We begin by exploring the disability studies literature, describing the need for advocacy and representation of disabled scholars as data creators, subjects, and users. We then survey the landscape of data repositories, curation guidelines, and research-data-related standards, finding little consideration of accessibility for people with disabilities. We suggest three sets of minimal good practices for moving toward truly accessible research data: 1) ensuring Web accessibility for data repositories; 2) ensuring accessibility of common text formats, including those used in documentation; and 3) enhancement of visual and audiovisual materials. We point to some signs of progress in regard to truly accessible data by highlighting exemplary practices by repositories, standards, and data professionals. Accessibility needs to become a mainstream component of curation practice included in every training, manual, and primer.
Data journals incorporate elements of traditional scholarly communications practices—reviewing for quality and rigor through editorial and peer-review—and the data sharing / open data movement—prioritizing broad dissemination through repositories, sometimes with curation or technical checks. Their goals for dataset review and sharing are recorded in journal-based data policies and operationalized through workflows. In this qualitative, small cohort semi-structured interview study of eight different journals that review and publish research data, we explored (1) journal data policy requirements, (2) data review standards, and (3) implementation of standardized data evaluation workflows. Differences among the journals can be understood by considering editors’ approaches to balancing the interests of varied stakeholders. Assessing data quality for reusability is primarily conditional on fitness for use which points to an important distinction between disciplinary and discipline-agnostic data journals.
Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship between creators and citers of datasets at the individual, institutional, and national levels. We find that the vast majority of datasets have no recorded citations, and that most cited datasets only have a single citation. Rates of self-citation by individuals and institutions tend towards the low end of previous findings and vary widely across disciplines. At the country level, the United States is by far the most prominent exporter of re-used datasets, while importation is more evenly distributed. Understanding where and how the sharing of data between researchers, institutions, and countries takes place is essential to developing open research practices.
This systematic review synthesised existing research papers that explore the available metadata standards to enable researchers to preserve, discover, and reuse research data in repositories. This review provides a broad overview of certain aspects that must be taken into consideration when creating and assessing metadata standards to enhance research data preservation discoverability and reusability strategies. Research papers on metadata standards, research data preservation, discovery and reuse, and repositories published between January 2003 and April 2023 were reviewed from a total of five databases. The review retrieved 1597 papers, and 13 papers were selected in this review. We revealed 13 research articles that explained the creation and application of metadata standards to enhance preservation, discovery, and reuse of research data in repositories. Among them, eight presented the three main types of metadata, descriptive, structural, and administrative, to enable the preservation of research data in data repositories. We noted limited evidence on how these metadata standards can be used to enhance the discovery and reuse of research data in repositories to enable the preservation, discovery, and reuse of research data in repositories. No reviews indicated specific higher education institutions employing metadata standards for the research data created by their researchers. Repository designs and a lack of expertise and technology know-how were among the challenges identified from the reviewed papers. The review has the potential to influence professional practice and decision-making by stakeholders, including researchers, students, librarians, information communication technologists, data managers, private and public organisations, intermediaries, research institutions, and non-profit organizations.
There is now an opportunity to expand US federal policies in similar ways and align their research software sharing aspects across agencies.
To do this, we recommend:
- As part of their updated policy plans submitted in response to the 2022 OSTP memo, US federal agencies should, at a minimum, articulate a pathway for developing guidance on research software sharing, and, at a maximum, incorporate research software sharing requirements as a necessary extension of any data sharing policy and a critical strategy to make data truly FAIR (as these principles have been adapted to apply to research software ).
- As part of sharing requirements, federal agencies should specify that research software should be deposited in trusted, public repositories that maximize discovery, collaborative development, version control, long-term preservation, and other key elements of the National Science and Technology Council’s "Desirable Characteristics of Data Repositories for Federally Funded Research" , as adapted to fit the unique considerations of research software.
- US federal agencies should encourage grantees to use non-proprietary software and file formats, whenever possible, to collect and store data. We realize that for some research areas and specialized techniques, viable non-proprietary software may not exist for data collection. However, in many cases, files can be exported and shared using non-proprietary formats or scripts can be provided to allow others to open files.
- Consistent with the US Administration’s approach to cybersecurity [<14], federal agencies should provide clear guidance on measures grantees are expected to undertake to ensure the security and integrity of research software. This guidance should encompass the design, development, dissemination, and documentation of research software. Examples include the National Institute of Standards and Technology’s secure software development framework and Linux Foundation’s open source security foundation.
- As part of the allowable costs that grantees can request to help them meet research sharing requirements, US federal agencies should include reasonable costs associated with developing and maintaining research software needed to maximize data accessibility and reusability for as long as it is practical. Federal agencies should ensure that such costs are additive to proposal budgets, rather than consuming funds that would otherwise go to the research itself.
- US federal agencies should encourage grantees to apply licenses to their research software that facilitate replication, reuse, and extensibility, while balancing individual and institutional intellectual property considerations. Agencies can point grantees to guidance on desirable criteria for distribution terms and approved licenses from the Open Source Initiative.
- In parallel with the actions listed above that can be immediately incorporated into new public access plans, US federal agencies should also explore long-term strategies to elevate research software to co-equal research outputs and further incentivize its maintenance and sharing to improve research reproducibility, replicability, and integrity.
Spurred by the National Institute of Health mandating a data management and sharing plan as a requirement of grant funding, research data management has exploded in importance for librarians supporting researchers and research institutions. This editorial examines the role and direction of libraries in this process from several viewpoints. Key markers of success include collaboration, establishing new relationships, leveraging existing relationships, accessing multiple avenues of communication, and building niche expertise and cachè as a valued and trustworthy partner. [Article includes case studies.]
This case study describes the development of an open research skills framework at the University of York. The framework responds to a need for more comprehensive training, clarity and understanding around open research practices across disciplines at York, in line with the University’s commitment to the long-term development of an open research culture. The framework was developed by Library, Archives and Learning Services (LALS) in partnership with practitioners from different disciplines across the University’s research community. We summarize the background of open research activities at York since 2020, describe how the project was initiated and progressed during the summer of 2022, then provide an overview of the framework itself including areas for future development and consideration. We conclude with some early indicators of usage and reflections on the project, and we hope that this case study will prove useful for research support staff who may be considering developing a similar framework for their own institution.
The U.S. House Appropriations Subcommittee on Commerce, Justice, and Science (CJS) has released an appropriations bill containing language that would block implementation of the 2022 updated OSTP policy guidance (the Nelson Memo) that would ensure immediate, free access to taxpayer-funded research. If enacted, this will prevent American taxpayers from seeing the benefits of the more than $90 billion in scientific research that the U.S. government funds each year. . . .
Write to Congress
Look up contact details for your Representatives and Senators, then call the office and tell them to remove Section 552 of the House CJS bill.