Open Science – Page 6

"Repository Staff Perspectives on the Benefits Of Trustworthy Digital Repository Certification"

This paper reports on the results from a qualitative study that asks whether and how staff members from TRAC certified repositories find value in the audit and certification process. While some interviewees found certification valuable, others argued that the costs outweighed the benefits or expressed ambivalence towards certification. Findings indicate that TRAC certification offered both internal and external benefits, such as improved documentation, accountability, transparency, communication, and standards, but there were concerns about high costs, implementation problems, and lack of objective evaluation criteria.

https://tinyurl.com/bddmuwjy

"FAIR EVA: Bringing Institutional Multidisciplinary Repositories into the FAIR Picture"

The FAIR Principles are a set of good practices to improve the reproducibility and quality of data in an Open Science context. Different sets of indicators have been proposed to evaluate the FAIRness of digital objects, including datasets that are usually stored in repositories or data portals. However, indicators like those proposed by the Research Data Alliance are provided from a high-level perspective that can be interpreted and they are not always realistic to particular environments like multidisciplinary repositories. This paper describes FAIR EVA, a new tool developed within the European Open Science Cloud context that is oriented to particular data management systems like open repositories, which can be customized to a specific case in a scalable and automatic environment. It aims to be adaptive enough to work for different environments, repository software and disciplines, taking into account the flexibility of the FAIR Principles. As an example, we present DIGITAL.CSIC repository as the first target of the tool, gathering the particular needs of a multidisciplinary institution as well as its institutional repository.

https://doi.org/10.1038/s41597-023-02652-8

"Understanding the Value of Curation: A Survey of Researcher Perspectives of Data Curation Services from Six Us Institutions"

Data curation encompasses a range of actions undertaken to ensure that research data are fit for purpose and available for discovery and reuse, and can help to improve the likelihood that data is more FAIR (Findable, Accessible, Interoperable, and Reusable). The Data Curation Network (DCN) has taken a collaborative approach to data curation, sharing curation expertise across a network of partner institutions and data repositories, and enabling those member institutions to provide expert curation for a wide variety of data types and discipline-specific datasets. This study sought to assess the satisfaction of researchers who had received data curation services, and to learn more about what curation actions were most valued by researchers. By surveying researchers who had deposited data into one of six academic generalist data repositories between 2019–2021, this study set out to collect feedback on the value of curation from the researchers themselves. A total of 568 researchers were surveyed; 42% (238) responded. Respondents were positive in their evaluation of the importance and value of curation, indicating that the participants not only value curation services, but are largely satisfied with the services provided. An overwhelming majority 97% of researchers agreed that data curation adds value to the data sharing process, 96% agreed it was worth the effort, and 90% felt more confident sharing their data due to the curation process. We share these results to provide insights into researchers’ perceptions and experience of data curation, and to contribute evidence of the positive impact of curation on repository depositors. From the perspective of researchers we surveyed, curation is worth the effort, increases their comfort with data sharing, and makes data more findable, accessible, interoperable, and reusable.

https://doi.org/10.1371/journal.pone.0293534

"Where Is All the Research Software? An Analysis of Software in UK Academic Repositories"

This research examines the prevalence of research software as independent records of output within UK academic institutional repositories (IRs). There has been a steep decline in numbers of research software submissions to the UK’s Research Excellence Framework from 2008 to 2021, but there has been no investigation into whether and how the official academic IRs have affected the low return rates. In what we believe to be the first such census of its kind, we queried the 182 online repositories of 157 UK universities. Our findings show that the prevalence of software within UK Academic IRs is incredibly low. Fewer than 28% contain software as recognised academic output. Of greater concern, we found that over 63% of repositories do not currently record software as a type of research output and that several Universities appeared to have removed software as a defined type from default settings of their repository. We also explored potential correlations, such as being a member of the Russell group, but found no correlation between these metadata and prevalence of records of software. Finally, we discuss the implications of these findings with regards to the lack of recognition of software as a discrete research output in institutions, despite the opposite being mandated by funders, and we make recommendations for changes in policies and operating procedures.

https://doi.org/10.7717/peerj-cs.1546

"How Can Open Data Sharing Policies Be More Attentive to Qualitative Researchers?"

The expected and prescriptive ways of preparing data are a key part of the problem. These are governed largely by quantitative data management strategies. Qualitative data is the outcome of personal interactions between researchers and participants. Yet, data sharing guidance is seldom attentive to the co-constructed nature of qualitative material. "The identities of researchers and what they reflexively reveal of themselves, how they interact with participants, their techniques and approaches and the messiness of qualitative work are laid bare within the artefacts of qualitative data" (Weller 2023: 9). This can make researchers especially vulnerable to personal and professional scrutiny in a way that survey and other quantitative researchers are not.

https://tinyurl.com/2fpr82vr

"Open Science 2.0: Towards a Truly Collaborative Research Ecosystem"

This Essay reviews achievements in open science over the past few decades and outlines a vision for Open Science 2.0, a research environment where the entire scientific process from idea generation to data analysis is openly available. Where researchers seamlessly interact to build on the work of others, and where the research infrastructure and cultural norms have evolved to foster efficient and widespread collaboration. We use this term not simply to suggest a large step forward but to invoke transformational change in the capacity and purpose of a system, as was observed with the Web 2.0.

Realizing this vision requires that we challenge traditional research norms and embrace a collaborative spirit to iteratively improve our research practices and infrastructures. In this sense, we end this Essay with recommendations for how funders, institutions, publishers, regulators, and other stakeholders can foster a research environment that cultivates openness, rigor, and collaboration. We argue for concerted and persistent efforts, supported by sustained public funding mechanisms, that treat open science as a milepost toward a more effective research ecosystem. But first things first: What do we mean by "open science"?

https://doi.org/10.1371/journal.pbio.3002362

"Researchers Express Growing Enthusiasm About Open Access, New Wiley Survey Reports"

Open access is quickly becoming the preferred publishing choice among researchers, according to new research from Wiley. 75% of respondents who have published research articles in the past three years have published open access, up from 44% just two years ago.

The survey of more than 600 scholars around the globe revealed the following insights:

Growing enthusiasm for open access. In addition to the increase in authors publishing open access, 75% of respondents agree that transformative agreements (TAs) are the right solution at this time to make research findings more openly available.

At least half of researchers engage in open research practices such as open data, open peer review and self-archiving. This demonstrates that researchers are embracing all the practices that will lead to a fully open research landscape, and are not limiting their activities to open access publishing.

Researchers who are publishing open access are motivated more by the benefits than by requirements. Respondents chose "visibility and impact" (65%) and "public benefit" (54%), followed by “transparency and reuse” (33%), when asked why they engage in open access publishing, significantly more often than journal requirements (25%) and institutional requirements (22%).

Lack of funding presents the most prominent roadblock for publishing open access. The top barrier, reported by 58% of respondents, is no or limited funds available to pay fees for open access publishing. 77% of respondents said they were likely to very likely to publish open access if their APCs were paid by their funder or institution. In addition, more than half of authors who publish open access are not clear on the license requirements from their funder (51%) or institution (55%).

https://tinyurl.com/bdetnz7y

Paywall: "The New Information Retrieval Problem: Data Availability"

In this paper, we discuss a method for exploring and locating datasets made available by scientists from federally funded projects in the US. The data pathways method was tested on federal awards. Here we describe the method and the results from analyzing fifty federal awards granted by the National Science Foundation to pursue data resources and their availability in publications, data repositories, or institutional repositories. The data pathways approach contributes to the development of a practical approach on availability that captures the current ways in which data are accessible from federally funded science projects –ranging from institutional repositories, journal data deposit, PI and project web pages, and science data platforms, among other found possibilities

https://doi.org/10.1002/pra2.796

"Implementation of a Federated Information System by Means of Reuse of Research Data Archived in Research Data Repositories"

At universities, research data is increasingly stored in research data repositories according to a data management plan (DMP) and thus made available for further use. The challenge of reusing hundreds, thousands, or millions of data sets is to obtain an overview of the data in a short period of time and to search through all the data. The high variability of the formats used to store research data requires a new approach to data reusability that focuses on the visualisation and searchability of archived research data, which can also be combined with each other. In this article, we present a practical DMP that describes how information systems can be created on demand by reusing research data archived in research data repositories and how these systems can be merged into a federated information system. As a result, in our projects, information systems have been created in minutes or a couple of hours with few resources. The initial effort to create a federated system remains; however, this allows federated searches to be performed. Extending a federated system to include other information systems can then be accomplished by making a few configurations and manageable adjustments to the source code.

https://doi.org/10.5334/dsj-2023-039

"Connecting Fragmented Support on Campus: Growing Research Data Services Programs Through Collaboration"

Research data services are provided by multiple units across and beyond the library, which is why communication and collaboration are paramount to building support for researchers. By exploring how Research Data Services (RDS) programs can function in the fragmented landscape of research support on campuses, we outline the role of collaboration in building programs. In this paper, we discuss building an RDS program by emphasizing three strategies for collaboration: collaborating within the library, collaborating across campus, and collaborating externally with those without direct ties to your organization. The aim of this paper is to offer attainable examples and strategies for building collaborations across campuses for libraries that have small or nascent RDS programs—how to approach and cultivate partnerships, how to set realistic goals, and how to work holistically within the fragmented academy.

https://tinyurl.com/9hbz49df

Paywall: "DMPFrame: A Conceptual Metadata Framework for Data Management Plans"

We have examined 12 open-source DMP tools, in particular, to evaluate the metadata adopted by these tools. The current study spots and highlights the gaps in the DMP metadata management in DMP tools and suggests DMPFrame as a conceptual framework addressing such gaps to improve the existing tools’ DMP metadata management practices. Based on the examined DMP tool’s metadata elements analysis and mapping, DMPFrame manages DMP metadata under 6 categories, namely, contributors, project, funding, organization, DMP, and output. The current study also suggests a systematic workflow that DMP tools could incorporate for metadata creation for DMPs.

https://doi.org/10.1080/19386389.2023.2268474

"Disappearing Repositories — Taking an Infrastructure Perspective on the Long-Term Availability of Research Data"

Currently, there is limited research investigating the phenomenon of research data repositories being shut down, and the impact this has on the long-term availability of data. This paper takes an infrastructure perspective on the preservation of research data by using a registry to identify 191 research data repositories that have been closed and presenting information on the shutdown process. The results show that 6.2 % of research data repositories indexed in the registry were shut down. The risks resulting in repository shutdown are varied. The median age of a repository when shutting down is 12 years. Strategies to prevent data loss at the infrastructure level are pursued to varying extent. 44 % of the repositories in the sample migrated data to another repository, and 12 % maintain limited access to their data collection. However, both strategies are not permanent solutions. Finally, the general lack of information on repository shutdown events as well as the effect on the findability of data and the permanence of the scholarly record are discussed.

https://arxiv.org/abs/2310.06712

"The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices"

In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and computer science to analyze the adoption of data and method link-sharing practices over time and their impact on article reception. To identify links to data and methods, we train a neural text classification model to automatically classify URL types based on contextual mentions in papers. We find evidence that the practice of link-sharing to methods and data is spreading as more papers include such URLs over time. Reproducibility efforts may also be spreading because the same links are being increasingly reused across papers (especially in computer science); and these links are increasingly concentrated within fewer web domains (e.g. Github) over time. Lastly, articles that share data and method links receive increased recognition in terms of citation count, with a stronger effect when the shared links are active (rather than defunct). Together, these findings demonstrate the increased spread and perceived value of data and method sharing practices in open science.

https://arxiv.org/abs/2310.03193

"UKRN ORCC Primer on Open Access"

This is an introductory guide for those working and considering working in the area of open access. It was drafted by members of the Open Research Competencies Coalition. Open Access (OA) refers to research that is published as digital, online, free of charge for reading, and free to re-use or share.

https://doi.org/10.31219/osf.io/v3q75

Scholarly Communication Librarianship and Open Knowledge

The book consists of three parts. Part I offers definitions of scholarly communication and scholarly communication librarianship and provides an introduction to the social, economic, technological, and policy/legal pressures that underpin and shape scholarly communication work in libraries. These pressures, which have framed ACRL’s understanding of scholarly communication for the better part of the past two decades, have unsettled many foundational assumptions and practices in the field, removing core pillars of scholarly communication as it was practiced in the twentieth century. These pressures have also cleared fresh ground, and scholarly communication practitioners have begun to seed the space with values and practices designed to renew and often improve the field. Part II begins with an introduction to "open," the core response to the pressures described in part I. This part offers a general overview of the idea of openness in scholarly communication followed by chapters on different permutations and practices of open, each edited by a recognized expert of these areas with authors of their selection. Amy Buckland edited chapter 2.1, "Open Access." Brianna Marshall edited chapter 2.2, "Open Data." Lillian Hogendoorn edited chapter 2.3, "Open Education." Micah Vandegrift edited chapter 2.4, "Open Science and Infrastructure." Each of them brought on incredible expertise through contributors whom they identified, through both original contributions and repurposing existing openly licensed work, which is something we want to model where possible. Part III consists of twenty-four concise perspectives, intersections, and case studies from practicing librarians and closely related stakeholders, which we hope will stimulate discussion and reflection on theory and implications for practice. In every single case, we’re really excited by the editors and authors and the ideas they bring to the whole. Each contribution features light pedagogical apparatuses like suggested further reading, discussion or reflection prompts, and potential activities. It’s all available for free and openly licensed with a Creative Commons Attribution Non-Commercial (CC BY-NC) license, so anyone is encouraged to grab whatever parts are useful and to adapt and repurpose and improve them to meet specific course goals and student needs within the confines of the license.

https://bit.ly/SCLAOK

"ACME-FAIR: a Guide for Research Performing Organisations (RPO)"

The overall purpose of ACME-FAIR is to help those managing and delivering relevant professional services to self-assess how they are enabling researchers and their colleagues to do just that. Each part deals with one of the key issues that Research Performing Organisations (RPO) face in establishing the capabilities to put the FAIR principles into practice. . . . Each of the 7 guides has a thematic introduction, an overview of the relevant capabilities, and a rubric for assessing the levels of maturity and community engagement for each capability.

https://tinyurl.com/yckfdjtd

"An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions"

Jupyter Notebooks have become a powerful tool to foster use of these collections by digital humanities researchers. Based on previous approaches for quality assessment, which have been adapted for cultural heritage collections, this paper proposes a methodology for assessing the quality of projects based on Jupyter Notebooks published by relevant GLAM institutions. A list of projects based on Jupyter Notebooks using cultural heritage data has been evaluated. Common features and best practices have been identified. A detailed analysis, that can be useful for organizations interested in creating their own Jupyter Notebooks projects, has been provided. Open issues requiring further work and additional avenues for exploration are outlined.

https://doi.org/10.1002/asi.24835

"How Does Mandated Code-Sharing Change Peer Review?"

At the end of the year-long trial period, code sharing had risen from 53% in 2019 to 87% for 2021 articles submitted after the policy went into effect. Evidence in hand, the journal Editors-in-Chief decided to make code sharing a permanent feature of the journal. Today, the sharing rate is 96%.

https://tinyurl.com/5n9yh9yj

"Umbrella Data Management Plans to Integrate FAIR Data: Lessons From the ISIDORe and BY-COVID Consortia for Pandemic Preparedness"

The Horizon Europe project ISIDORe is dedicated to pandemic preparedness and responsiveness research. It brings together 17 research infrastructures (RIs) and networks to provide a broad range of services to infectious disease researchers. An efficient and structured treatment of data is central to ISIDORe’s aim to furnish seamless access to its multidisciplinary catalogue of services, and to ensure that users’ results are treated FAIRly. ISIDORe therefore requires a data management plan (DMP) covering both access management and research outputs, applicable over a broad range of disciplines, and compatible with the constraints and existing practices of its diverse partners.

Here, we describe how, to achieve that aim, we undertook an iterative, step-by-step, process to build a community-approved living document, identifying good practices and processes, on the basis of use cases, presented as proof of concepts. International fora such as the RDA and EOSC, and primarily the BY-COVID project, furnished registries, tools and online data platforms, as well as standards, and the support of data scientists. Together, these elements provide a path for building an umbrella, FAIR-compliant DMP, aligned as fully as possible with FAIR principles, which could also be applied as a framework for data management harmonisation in other large-scale, challenge-driven projects. Finally, we discuss how data management and reuse can be further improved through the use of knowledge models when writing DMPs and, how, in the future, an inter-RI network of data stewards could contribute to the establishment of a community of practice, to be integrated subsequently into planned trans-RI competence centres.

https://doi.org/10.5334/dsj-2023-035

"ACS, Elsevier, and Researchgate Resolve Litigation, with Solution to Support Researchers"

ACS and Elsevier, members of the Coalition for Responsible Sharing, have agreed to a legal settlement with ResearchGate that ensures copyright-compliant sharing of research articles published with ACS or Elsevier on the ResearchGate site. The lawsuits pending against ResearchGate in Germany and the United States are now resolved. The specific terms of the parties’ settlement are confidential.

Background: "Munich Court Ruling Sides with Elsevier, ACS over ResearchGate."

https://tinyurl.com/mrr9xywj

"Understanding Barriers Affecting the Adoption and Usage of Open Access Data in the Context of Organizations"

Although the benefits of organizational adoption are significant, most OAD-related projects fail because of organizational barriers and resistance to adoption. This study first aims to find these organizational barriers to adopting OAD to raise awareness of the obstacles organizations must overcome. Towards this aim, after conducting a systematic literature review (SLR) and an expert panel, a research model based on the Technology – Organization – Environment (TOE) framework is proposed in this study. As a result of SLR, 97 barriers were identified from ten primary studies. After critically examining these barriers, a research model classifying 22 crucial barriers to organizational OAD adoption based on the TOE framework is proposed.

https://doi.org/10.1016/j.dim.2023.100049

Paywall: "Images as Metadata: A New Perspective for Describing Research Data"

Abstract Through studies and work developed over the last few years, we propose a new approach to description, where images can have a preponderant role in the description of data, assuming the role of metadata. We present several pieces of evidence, point out their challenges and determine the opportunities this new perspective can have in the research. Images have specific characteristics that can be leveraged in improving data description. Historical evidence establish that images have always been used and produced in research, yet their representational ability has never been harnessed to describe data and give more context to the scientific process.

https://doi.org/10.1080/19386389.2023.2252722

"Tracing Data: A Survey Investigating Disciplinary Differences in Data Citation"

Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing — or not citing — data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse.

https://doi.org/10.1162/qss_a_00264

"Expanding the Data Ark: An Attempt to Make the Data from Highly Cited Social Science Papers Publicly Available"

Access to scientific data can enable independent reuse and verification; however, most data are not available and become increasingly irrecoverable over time. This study aimed to retrieve and preserve important datasets from 160 of the most highly-cited social science articles published between 2008-2013 and 2015-2018. We asked authors if they would share data in a public repository — the Data Ark — or provide reasons if data could not be shared. Of the 160 articles, data for 117 (73%, 95% CI [67% – 80%]) were not available and data for 7 (4%, 95% CI [0% – 12%]) were available with restrictions. Data for 36 (22%, 95% CI [16% – 30%]) articles were available in unrestricted form: 29 of these datasets were already available and 7 datasets were made available in the Data Ark. Most authors did not respond to our data requests and a minority shared reasons for not sharing, such as legal or ethical constraints. These findings highlight an unresolved need to preserve important scientific datasets and increase their accessibility to the scientific community.

https://doi.org/10.31222/osf.io/w9crz

"PreprintResolver: Improving Citation Quality by Resolving Published Versions of ArXiv Preprints using Literature Databases"

The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed, which uses four literature databases (DBLP, SemanticScholar, OpenAlex, and CrossRef / CrossCite) to identify preprint-publication pairs for the arXiv preprint server. . . . Experiments were performed on a sample of 1,000 arXiv-preprints from the research field of computer science and without any publication information. . . . The results show that the PreprintResolver was able to resolve 603 out of 1,000 (60.3 %) arXiv-preprints from the research field of computer science and without any publication information. . . . In conclusion the PreprintResolver is suitable for individual, manually reviewed requests, but less suitable for bulk requests. The PreprintResolver tool (this https URL, Available from 2023-08-01) and source code (this https URL, Accessed: 2023-07-19) is available online.

https://arxiv.org/abs/2309.01373