“DeepGreen—A Data Hub for the Distribution of Scholarly Articles From Publishers to Open Access Repositories in Germany”


  • DeepGreen is an automated delivery service for open access articles. Originally conceived to take advantage of the so-called open access component—a secondary publication right in Alliance and National licences in Germany to promote green open access—it aims to streamline open access processes by automating the distribution of full-text articles and metadata from publishers to repositories.
  • The service, developed by a consortium and funded by the German Research Foundation (DFG) in its initial phase, has successfully established itself as a national service, facilitating open access content distribution and contributing to Germany’s open access infrastructure.
  • As of December 2024, DeepGreen distributes articles from 14 publishers to 84 institutional repositories and 6 subject-specific repositories.
  • This article describes the role of the DeepGreen service in Germany, its collaboration with publishers and the potential of automated processes for storing articles in open access repositories, which, as publicly owned institutional infrastructures, ensure sustainable access and provide secure, redundant storage.

https://doi.org/10.1002/leap.70000

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Open But Hidden: Open Access Gaps in the National Science Foundation Public Access Repository”


Introduction: In 2022, the U.S. government released new guidelines for making publicly funded research open and available. For the National Science Foundation (NSF), these policies reinforce requirements in place since 2016 for supported research to be submitted to the Public Access Repository (PAR).

Methods: To evaluate the public access compliance of research articles submitted to the NSF-PAR, this study searched for NSF-PAR records published between 2017 and 2021 from two research intensive institutions. Records were reviewed to determine whether the PAR held a deposited copy, as required by the 2016 policies, or provided a link out to publisher-held version(s).

Results: A total of 841 unique records were identified, all with publicly accessible versions. Yet only 42% had a deposited PDF version available in the repository as required by the NSF 2016 Public Access Policy. The remaining 58% directed instead to publisher-held versions. In total, only 55% of record links labeled “Full Text Available” directed users to a publicly accessible version with a single click.

Discussion: Records within PAR do not clearly direct users to the publicly accessible full text. In almost half of records, the most prominently displayed link directed users to a paywall version, even when a publicly available version existed. Records accessible only through the CHORUS (Clearing House for the Open Research of the United States) initiative were further obscured by requiring specialized navigation of publisher-owned sites.

Conclusion: Despite having a repository mandate since 2016, NSF compliance rates remain low. Additional support and/or oversight is needed to address the additional requirements introduced under the 2022 memo.

https://doi.org/10.31274/jlsc.17767

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Moving Open Repositories out of the Blind Spot of Initiatives to Correct the Scholarly Record”


Open repositories were created to enhance access and visibility of scholarly publications, driven by open science ideals emphasising transparency and accessibility. However, they lack mechanisms to update the status of corrected or retracted publications, posing a threat to the integrity of the scholarly record. To explore the scope of the problem, a manually verified corpus was examined: we extracted all the entries in the Crossref × Retraction Watch database for which the publication date of the corrected or retracted document ranged from 2013 to 2023. This corresponded to 24,430 entries with a DOI, which we use to query Unpaywall and identify their possible indexing in HAL, an open repository (second largest institutional repository worldwide). In most cases (91%), HAL does not mention corrections. While the study needs broader scope, it highlights the necessity of improving the role of open repositories in correction processes with better curation practices. We discuss how harvesting operations and the interoperability of platforms can maintain the integrity of the entire scholarly record. Not only will the open repositories avoid damaging its reliability through ambiguous reporting, but on the contrary, they will also strengthen it.

https://doi.org/10.1002/leap.1655

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Enabling Preprint Discovery, Evaluation, and Analysis with Europe PMC"


Preprints provide an indispensable tool for rapid and open communication of early research findings. Preprints can also be revised and improved based on scientific commentary uncoupled from journal-organised peer review. The uptake of preprints in the life sciences has increased significantly in recent years, especially during the COVID-19 pandemic, when immediate access to research findings became crucial to address the global health emergency. With ongoing expansion of new preprint servers, improving discoverability of preprints is a necessary step to facilitate wider sharing of the science reported in preprints. To address the challenges of preprint visibility and reuse, Europe PMC, an open database of life science literature, began indexing preprint abstracts and metadata from several platforms in July 2018. Since then, Europe PMC has continued to increase coverage through addition of new servers, and expanded its preprint initiative to include the full text of preprints related to COVID-19 in July 2020 and then the full text of preprints supported by the Europe PMC funder consortium in April 2022. The preprint collection can be searched via the website and programmatically, with abstracts and the open access full text of COVID-19 and Europe PMC funder preprint subsets available for bulk download in a standard machine-readable JATS XML format. This enables automated information extraction for large-scale analyses of the preprint corpus, accelerating scientific research of the preprint literature itself. This publication describes steps taken to build trust, improve discoverability, and support reuse of life science preprints in Europe PMC. Here we discuss the benefits of indexing preprints alongside peer-reviewed publications, and challenges associated with this process.

https://doi.org/10.1371/journal.pone.0303005

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Knowledge Infrastructures are Growing Up: The Case for Institutional (Data) Repositories 10 Years After the Holdren Memo"


Institutional data repositories are uniquely positioned to support researchers in sharing scholarly outputs. As funding agencies develop and institute policies for research data access and sharing, institutional data repositories have emerged as a critical feature in ecosystems for data stewardship and sharing. We show that institutional data repositories can meet and exceed the requirements and recommendations of federal data policy, thereby maximizing the benefits of data sharing. We present results of a mixed-method study which explores the adoption and usage of institutional repositories to share data from 2017 to 2023. Data from two previous studies were combined with data collected in 2023 on the data sharing solutions of Association of Research Libraries member institutions in the United States and Canada. The analysis of the aggregated data indicates that data stewardship has increased in both institutional repositories and institutional data repositories with an increase in complementary infrastructure to support data sharing. We then conduct an “infrastructural inversion” (Bowker & Star, 1999) to ‘surface invisible work’ of making data repositories function well, and demonstrate that institutional data repositories have advantages for providing sustainable stewardship, curation, and sharing of research data. Finally, we show that institutional data repositories may produce additional benefits through established infrastructure, local interoperability, and control.

https://doi.org/10.5334/dsj-2024-046

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"’Does It Feel like a Scientific Paper?’: A Qualitative Analysis of Preprint Servers’ Moderation and Quality Assurance Processes"


In recent years, preprints—i.e., scholarly manuscripts that have not been peer reviewed or published in a journal—have emerged as a major source of research communication and a critical component of open science. However, concerns have been raised about preprints’ potential to facilitate the spread of flawed or misleading research due to the lack of quality control performed by preprint servers. Yet, there is limited knowledge of how servers currently vet incoming content and how this impacts the openness and diversity of scholarly content. In this paper, we examine preprint servers’ moderation processes, the intentions underpinning them, and their potential effects through a qualitative analysis of in-depth interviews with 14 key preprint server personnel. We find a wide range of moderation processes, which vary depending on specific server contexts and needs and are motivated by a desire to prevent the spread of misinformation and protect trust in preprints and servers. Participants repeatedly emphasized the difference between their moderation processes and peer review, but in practice often applied similar criteria for delineating scientific from unscientific content. Moreover, moderation processes often relied on trust cues, such as article formats or author affiliations, as proxies for research quality, potentially introducing similar biases as have been found in traditional journal peer review. We discuss implications for the diversity of preprint content and authors, as well as the future of preprint servers within an evolving scholarly communication ecosystem.

https://doi.org/10.31222/osf.io/mp6ky

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Interview: Deciphering the Law: Hachette v. Internet Archive Pt. 1 (2023) with Dave Hansen"


This is the first in a series of interviews with those closely tied to the Hachette v. Internet Archive lawsuit. In March 2023, the court ruled against the Internet Archive and its use of the Emergency Lending Library causing a ripple throughout the library and education fields. Below, find the answers to some of the questions that the case elicited by JCEL contributors and copyright scholars Dave Hansen, Michelle Wu, and Kyle Courtney.

https://doi.org/10.17161/jcel.v7i2.21337

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Closing Gaps: A Model of Cumulative Curation and Preservation Levels for Trustworthy Digital Repositories "


Curation and preservation measures carried out by digital repository staff are an important building block in maintaining the accessibility and usability of digital resources over time. The measures adequate to achieve long-term usability for a given audience strongly depend on scenarios of (re)use, the (intended) users’ needs and skills, the organisational setting (e.g., mission, resources, policies), as well as the characteristics of the digital objects to be preserved. The assessment of curation and preservation measures also forms an important part of existing certification procedures for trustworthy digital repositories (TDRs) as offered, for example, by the CoreTrustSeal foundation, the nestor network, or ISO.

The digital curation community is presented with the challenge of finding community-, organisation-, and object-specific approaches to curation and preservation at the same time as defining the minimum level of curation and preservation measures expected from a TDR in sufficiently generic terms to ensure applicability to a wide array of repositories. Against this backdrop, this paper discusses the need for and benefits of community-agreed levels of curation and preservation to address this challenge, and considers the tiered model proposed by the CoreTrustSeal Board as an example.

The proposed model is then applied in an analysis of successful CoreTrustSeal applications from 2018–2022 in an effort to better understand the capacity of the curation and preservation levels to capture the respective practices of repositories and to identify potential gaps.

https://doi.org/10.2218/ijdc.v18i1.926

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Transparent Disclosure, Curation & Preservation of Dynamic Digital Resources "


This paper explores an enhanced curation lifecycle being developed at the UK Data Service (UKDS), with our Data Product Builder. Through a Graphical User Interface, we aim to provide the researcher with a tailored digital resource. We detail the threefold motivation behind this initiative: data dissemination scalability, researcher satisfaction and the reduction of nationwide duplication of research effort.

Subsequent sections detail the technical components and challenges involved. In addition to more standard data subsetting, filtering and linking components, this data dissemination platform offers dynamic disclosure assessments – identifying combinations of variables that present a potential disclosure risk. All components are underpinned by the Data Documentation Initiative’s new Cross-Domain Integration standard (DDI-CDI), designed to handle the many structures in which data may be organised.

Ever conscious of the scale of the task we are embarking on, we remain motivated by the need for such advances in data dissemination and optimistic of the feasibility of such a system to meet the needs of the researcher while balancing the data disclosivity concerns of the data depositor.

https://doi.org/10.2218/ijdc.v18i1.937

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Constructing Risk in Trustworthy Digital Repositories"


This article investigates the construction of risk within trustworthy digital repository audits. It contends that risk is a social construct, and social factors influence how stakeholders in digital preservation processes comprehend and react to risk.

https://doi.org/10.1108/JD-08-2023-0157

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Data Quality Assurance Practices in Research Data Repositories — A Systematic Literature Review"


This study conducted a systematic analysis of data quality assurance (DQA) practices in RDRs, guided by activity theory and data quality literature, resulting in conceptualizing a data quality assurance model (DQAM) for RDRs. DQAM outlines a DQA process comprising evaluation, intervention, and communication activities and categorizes 17 quality dimensions into intrinsic and product-level data quality. It also details specific improvement actions for data products and identifies the essential roles, skills, standards, and tools for DQA in RDRs. By comparing DQAM with existing DQA models, the study highlights its potential to improve these models by adding a specific DQA activity structure.

https://doi.org/10.1002/asi.24948

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Internet Archive Forced to Remove 500,000 Books after Publishers’ Court Win"


As a result of book publishers successfully suing the Internet Archive (IA) last year, the free online library that strives to keep growing online access to books recently shrank by about 500,000 titles. . . .

To restore access, IA is now appealing, hoping to reverse the prior court’s decision by convincing the US Court of Appeals in the Second Circuit that IA’s controlled digital lending of its physical books should be considered fair use under copyright law. An April court filing shows that IA intends to argue that the publishers have no evidence that the e-book market has been harmed by the open library’s lending, and copyright law is better served by allowing IA’s lending than by preventing it. . . ./p>

Freeland [Chris Freeland, IA’s director of library service] told Ars it could take months or even more than a year before a decision is reached in the case.

While IA fights to end the injunction, its other library services continue growing, IA has said. IA "may still digitize books for preservation purposes" and "provide access to our digital collections" through interlibrary loan and other means. IA can also continue lending out-of-print and public domain books.

https://tinyurl.com/47aws7z7

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Analyzing Research Data Repositories (RDR) from BRICS Nations: A Comprehensive Study"


As of March 2, 2024, re3data.org indexes a total of 3,192 Research Data Repositories (RDRs) worldwide, with BRICS nations contributing 195. China leads among BRICS nations, followed by India, Russia, and Brazil. . . . "House, tailor-made " software is widely used for creating RDRs, followed by Dataverse and DSpace. . . . Most repositories are disciplinary, followed by institutional ones. Most repositories specify data upload types, with "restricted " being the most common, followed by closed types. Open access is predominant in data access, followed by restricted access and embargo periods, while a small number restrict access entirely.

https://doi.org/10.1108/LM-04-2024-0040

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Biomedical Data Repository Concepts and Management Principles"


The demand for open data and open science is on the rise, fueled by expectations from the scientific community, calls to increase transparency and reproducibility in research findings, and developments such as the Final Data Management and Sharing Policy from the U.S. National Institutes of Health and a memorandum on increasing public access to federally funded research, issued by the U.S. Office of Science and Technology Policy. This paper explores the pivotal role of data repositories in biomedical research and open science, emphasizing their importance in managing, preserving, and sharing research data. Our objective is to familiarize readers with the functions of data repositories, set expectations for their services, and provide an overview of methods to evaluate their capabilities. The paper serves to introduce fundamental concepts and community-based guiding principles and aims to equip researchers, repository operators, funders, and policymakers with the knowledge to select appropriate repositories for their data management and sharing needs and foster a foundation for the open sharing and preservation of research data.

https://doi.org/10.1038/s41597-024-03449-z

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Understanding the Value of Curation: A Survey of Us Data Repository Curation Practices and Perceptions"


Data curators play an important role in assessing data quality and take actions that may ultimately lead to better, more valuable data products. This study explores the curation practices of data curators working within US-based data repositories. We performed a survey in January 2021 to benchmark the levels of curation performed by repositories and assess the perceived value and impact of curation on the data sharing process. Our analysis included 95 responses from 59 unique data repositories. Respondents primarily were professionals working within repositories and examined curation performed within a repository setting. A majority 72.6% of respondents reported that "data-level" curation was performed by their repository and around half reported their repository took steps to ensure interoperability and reproducibility of their repository’s datasets. Curation actions most frequently reported include checking for duplicate files, reviewing documentation, reviewing metadata, minting persistent identifiers, and checking for corrupt/broken files. The most "value-add" curation action across generalist, institutional, and disciplinary repository respondents was related to reviewing and enhancing documentation. Respondents reported high perceived impact of curation by their repositories on specific data sharing outcomes including usability, findability, understandability, and accessibility of deposited datasets; respondents associated with disciplinary repositories tended to perceive higher impact on most outcomes. Most survey participants strongly agreed that data curation by the repository adds value to the data sharing process and that it outweighs the effort and cost. We found some differences between institutional and disciplinary repositories, both in the reported frequency of specific curation actions as well as the perceived impact of data curation. Interestingly, we also found variation in the perceptions of those working within the same repository regarding the level and frequency of curation actions performed, which exemplifies the complexity of a repository curation work. Our results suggest data curation may be better understood in terms of specific curation actions and outcomes than broadly defined curation levels and that more research is needed to understand the resource implications of performing these activities. We share these results to provide a more nuanced view of curation, and how curation impacts the broader data lifecycle and data sharing behaviors.

https://doi.org/10.1371/journal.pone.0301171

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Puzzle of Large-Scale Digital Collections: Have We Reached an Inflection Point?"


Shared Collections allows institutions either to have JSTOR harvest their digital collections of documents, photos, and other special collections from a local Digital Asset Management System, or to create and share those same collections through JSTOR’s collection management tool. . . . While Shared Collections appears to represent a significant advance, the jury will be out for some time. The fundamental issues facing DPLA and Shared Collections are simply difficult, and the struggles with them have little or nothing to do with the skills or intentions of the capable people of both organizations. It is both a tough economic problem and an outcome of what we might call "rugged individualism in heritage collections": while shared descriptive efforts have been in place for books for more than a century, many standards for heritage collections have emerged since 2000. It’s a symptom of under-investment in cultural heritage in the United States.

https://rb.gy/597nkq

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Census of Institutional Repositories at Regional Public Universities"


This study reports on the implementation of institutional repositories (IRs) at regional public universities (RPUs) in the United States and its territories. The author investigated repository platform choice, operation style, and content. More than half of RPUs have implemented an IR. The author discusses how these findings align with trends in previous research and explores the unique aspects of IRs at RPUs—particularly the prevalence of student works and special collections materials. For over two decades, institutional repositories (IRs) have been used at institutions of higher education to collect, preserve, and share the scholarly works of an institution. During that same time there have been an increasing number of studies looking at who has implemented an IR, the most popular IR platforms, and type and number of objects deposited in IRs. While some studies have looked at small or teaching-focused institutions, most of these studies have focused on IR implementations at large research-focused institutions.

https://tinyurl.com/yc2fs4r2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Developing Open Access Resource Management Principles in a Consortial Environment: A University of California Model"


In the summer of 2021, the University of California (UC) migrated to a new integrated library system, called the Systemwide Integrated Library System project (SILS), which for the first time brought all ten UC campuses, two regional storage facilities, and the California Digital Library (CDL) together into one shared library system. With new potential for increased collaboration and cooperation, SILS leadership groups identified consortial open access (OA) resource management as a key opportunity in the new system, in alignment with UC’s priorities around discovery and access to library collections, as well as UC’s commitment to open access and transforming the scholarly communication landscape. This article discusses the formation of the UC Open Access Resource Management Task Force (OARMTF), a group charged to investigate what it would mean to consortially manage OA resources. Specifically, this article focuses on the OARMTF’s work setting out principles for OA resource management, which the authors hope may serve as a useful case study for other institutions or consortia interested in developing principles around OA resource management, as well as encourage more discussion and research into best practices for consortial management of OA resources.

https://doi.org/10.5860/lrts.68n1.8216

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Opening Up: A Global Context for Local Open Access Initiatives in Higher Education"


Open access policies and mandates can be a useful tool in persuading faculty at higher education institutions around the globe to produce and share open scholarship. But are such policies widely written, accepted, and adopted? Leveraging information found on the Registry of Open Access Repositories Mandatory Archiving Policies, this paper analyzes open access policies at higher education institutions worldwide. The data indicate that Europe holds the most policies, while fewer policies have been enacted in the Americas, Africa, Oceania, and Asia due to a myriad of barriers. Overall, better strategies to promote open access are needed, and such strategies may not necessarily take the form of an open access policy. My own investigation of global open access policies has informed my practices with respect to open access. In this paper, I demonstrate how librarians acting as policy entrepreneurs can assist with the promotion of open access at their institutions and then conclude with suggestions, solutions, and pathways beyond policy adoption to promote and advocate for open access.

https://tinyurl.com/2h3uz5n4

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Preprints, Journals and Openness: Disentangling Goals and Incentives "


I would argue that private funders such as the Gates Foundation or the Howard Hughes Medical Institute (HHMI) could provide material support through grants and policies for quality peer review, baking peer review into selection of grantees. Such an approach will require careful structures and mechanisms for reviewer selection, and measures of success, or we may run the risk of creating further inequities. Mind you, in many fields it is just hard to find good reviewers prepared to put in the effort required for a considered, thoughtful review. Societies, such as my own, could also consider material ways to support peer review more actively — a philosophical and practical approach to raising the profile of peer review at an early stage in the life of a researcher.

https://tinyurl.com/ymckyb9x

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

1 Million Images and Counting: "AI-Startup Launches Ever-Expanding Library of Free Stock Photos and Music"


StockCake is a new platform by AI startup Imaginary Machines. The site currently hosts more than a million pre-generated images. These images can be downloaded, used, and shared for free. There are no strings attached as all photos are in the public domain.

https://tinyurl.com/mvjd3683

StockCake

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

2024 Fedora Technology Assessment Report


The Fedora Program Team, in collaboration with the Technology Working Group, designed a project to understand the specific Fedora-related priorities of using institutions, along with the capacity and available resources of both individuals and institutions to contribute to the Fedora community between 2024 and 2026. They collaborated with the Research and Innovation Division at Lyrasis to survey Fedora users. Responses were collected between November 2023 and January 31, 2024, and analyzed by Leigh A. Grinstead, Senior Digital Services Consultant from Lyrasis, an independent, nonprofit, research group.

https://tinyurl.com/2s4b4rec

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Support for OSF Preprint Infrastructure and Community Servers"


Numerous Ivy Plus Libraries Confederation (IPLC) partner institutions* will provide three years of financial support for the Center for Open Science’s OSF Preprints, an open source platform and infrastructure that enables the facilitation and discovery of scholarship. COS notes that submission and consumption of preprints continues to grow with "~150,000 preprints hosted across all of the current and prior preprint communities, and 1.7 million views on preprint pages since September 2023."

https://tinyurl.com/yn9nntvu

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |