“Linking Data Citation to Repository Visibility: An Empirical Study”


In today’s data-driven research landscape, dataset visibility and accessibility play a crucial role in advancing scientific knowledge. At the same time, data citation is essential for maintaining academic integrity, acknowledging contributions, validating research outcomes, and fostering scientific reproducibility. As a critical link, it connects scholarly publications with the datasets that drive scientific progress. This study investigates whether repository visibility influences data citation rates. We hypothesize that repositories with higher visibility, as measured by search engine metrics, are associated with increased dataset citations. Using OpenAlex data and repository impact indicators (including the visibility index from Sistrix, the h-index of repositories, and citation metrics such as mean and median citations), we analyze datasets in Social Sciences and Economics to explore their relationship. Our findings suggest that datasets hosted on more visible web domains tend to receive more citations, with a positive correlation observed between web domain visibility and dataset citation counts, particularly for datasets with at least one citation. However, when analyzing domain-level citation metrics, such as the h-index, mean, and median citations, the correlations are inconsistent and weaker. While higher visibility domains tend to host datasets with greater citation impact, the distribution of citations across datasets varies significantly. These results suggest that while visibility plays a role in increasing citation counts, it is not the sole factor influencing dataset citation impact. Other elements, such as dataset quality, research trends, and disciplinary norms, also contribute significantly to citation patterns.

https://arxiv.org/abs/2506.09530

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Preservation and Digital Repositories: Connections, Possibilities, and Needs”


This chapter aims to explore certain aspects of the challenges of digital preservation and digital repositories, including their roles, significance, and associated costs. . . . Beginning with a necessary delineation of the relationship between digital preservation, digital repositories, and their digital assets, the chapter proceeds to conduct a brief analysis of the perceived needs for these components. These needs primarily encompass organizational aspects (policy, planning, actions), financial considerations (costs), and technological factors (standardization) crucial for supporting digital preservation and repositories.

https://tinyurl.com/5y5bfbdr

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Web-Scraping AI Bots Cause Disruption for Scientific Databases and Journals”


This year, the BMJ, a publisher of medical journals based in London, has seen bot traffic to its websites surpass that of real users. The aggressive behaviour of these bots overloaded the publisher’s servers and led to interruptions in services for legitimate customers, says Ian Mulvany, BMJ’s chief technology officer. . . .

The Confederation of Open Access Repositories (COAR) reported in April that more than 90% of 66 members it surveyed had experienced AI bots scraping content from their sites — of which roughly two-thirds had experienced service disruptions as a result.

https://tinyurl.com/wva9sx6p

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“OpenAlex now available on DSpace”


In partnership with the University of Cambridge, 4Science has enhanced interoperability in DSpace for a richer repository and better data quality: OpenAlex now available on DSpace! . . .

OpenAlex is an innovative platform that offers free access to millions of scientific publications, researchers, institutions, and academic sources. The integration simplifies and speeds up the entry of publications and other entities. This ensures a repository that is always up-to-date and comprehensive. By using the existing live import features, users will be able to import items complete with metadata both from MyDSpace and through the suggestions dedicated to those accessing the platform and administrators.

The import from external sources, accessible through MyDSpace, allows for direct searching in the OpenAlex database integrates the selected results into DSpace, automatically pre-filling the submission form with the imported metadata. The entities supported by the integration are Publications, Journals, People, and Organizational Units, providing advanced and more comprehensive content management.

https://librarytechnology.org/pr/31378

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: “Organizational Structure and Representation in Digital Institutional Repository Collections”


This study finds that most digital repositories favor a flat organizational structure, largely due to technological constraints and user interface design choices. This approach often neglects the original hierarchical structure of archival collections, leading to user frustration and difficulties in information retrieval.

https://doi.org/10.1108/DLP-06-2024-0092

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Making the Connection: An Examination of Institutional Repositories and Scholarly Communication Crosslinking Practices”


Institutional repositories (IRs) remain a powerful tool for opening, sharing, and preserving scholarship. Scholarly communication (SC) services and resources are essential to promoting and supporting IRs. Linking SC services within an IR offers support to users at their point of need. This study investigates the prevalence of web linking between IR and SC services in 145 Association of Research Libraries and Carnegie R1 libraries. This quantitative analysis identifies gaps and offers practical recommendations for developing connections between SC and IR websites at academic libraries. . . .

[T]he authors expected a comparable number of SC pages at institutions that had IRs. However, over 30 percent of the study’s library websites did not feature a dedicated SC web page. Furthermore, it is noteworthy that between spring 2021 and spring 2022 there was a 10 percent decrease in the number of institutions that offer SC services information to their user community. . . .

It is reassuring that the number of IRs remained consistent. Another bright spot is the nearly 14 percent increase in links made from the IR to SC services between spring 2021 and spring 2022. . . .

The few IRs in the study that did crosslink back to SC pages (9.1% in spring 2021; 23.0% in spring 2022) often included the SC link directly on the repository’s homepage.

https://tinyurl.com/mrxdj59j

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Open Repositories Are Being Profoundly Impacted By AI Bots and Other Crawlers: Results of a COAR Survey”


The results of the survey found that over 90% of respondents are encountering AI bots, usually more than once a week, and often leading to service disruptions. Respondents also reported using a variety of measures to minimize or stop AI bots from accessing the repository applying a mix of approaches such as rate-limiting, firewall rules, robots.txt rules and shared white-lists.

https://tinyurl.com/vh38m8a7

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Making Your Repository (More) Accessible”


Introduction: As colleges and universities make increasing and overdue efforts under the auspices of access, equity, and inclusion to make their resources accessible to all users, these efforts must extend to the institution’s online presence, including its institutional repository. IR managers must first ask what “accessible” means for compliance with university policies as well as the Americans with Disability Act (ADA), immediately followed by plans for both remediating existing content and imposing best practices on new content, amid current workflows and budgetary restraints.

Literature Review: Literature on the topic of accessibility in IRs has mostly focused on the need to make collections accessible and the challenges for doing so. Advice on how to navigate the actual process is harder to come by.

Description of Service: The University of Mississippi established a goal that everything going into its IR would use OCR software to convert images of text into searchable text and create a process by which patrons could request remediation of older content from the IR, whether documents or recordings. A combination of shared tools (including Equidox and SensusAccess) and interdepartmental partnerships has made a significant difference in making these digital collections proactively accessible.

Next Steps: We continue to maintain partnerships with units around campus, made challenging by frequent turnover as in demand specialists take positions at other institutions. Despite our efforts to provide searchable text as a minimum level of service, OCR correction provides tags but not necessarily headings or alt-text. Hopefully future versions of OCR editors will include such features.

https://doi.org/10.31274/jlsc.18308

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“DeepGreen—A Data Hub for the Distribution of Scholarly Articles From Publishers to Open Access Repositories in Germany”


  • DeepGreen is an automated delivery service for open access articles. Originally conceived to take advantage of the so-called open access component—a secondary publication right in Alliance and National licences in Germany to promote green open access—it aims to streamline open access processes by automating the distribution of full-text articles and metadata from publishers to repositories.
  • The service, developed by a consortium and funded by the German Research Foundation (DFG) in its initial phase, has successfully established itself as a national service, facilitating open access content distribution and contributing to Germany’s open access infrastructure.
  • As of December 2024, DeepGreen distributes articles from 14 publishers to 84 institutional repositories and 6 subject-specific repositories.
  • This article describes the role of the DeepGreen service in Germany, its collaboration with publishers and the potential of automated processes for storing articles in open access repositories, which, as publicly owned institutional infrastructures, ensure sustainable access and provide secure, redundant storage.

https://doi.org/10.1002/leap.70000

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Moving Open Repositories out of the Blind Spot of Initiatives to Correct the Scholarly Record”


Open repositories were created to enhance access and visibility of scholarly publications, driven by open science ideals emphasising transparency and accessibility. However, they lack mechanisms to update the status of corrected or retracted publications, posing a threat to the integrity of the scholarly record. To explore the scope of the problem, a manually verified corpus was examined: we extracted all the entries in the Crossref × Retraction Watch database for which the publication date of the corrected or retracted document ranged from 2013 to 2023. This corresponded to 24,430 entries with a DOI, which we use to query Unpaywall and identify their possible indexing in HAL, an open repository (second largest institutional repository worldwide). In most cases (91%), HAL does not mention corrections. While the study needs broader scope, it highlights the necessity of improving the role of open repositories in correction processes with better curation practices. We discuss how harvesting operations and the interoperability of platforms can maintain the integrity of the entire scholarly record. Not only will the open repositories avoid damaging its reliability through ambiguous reporting, but on the contrary, they will also strengthen it.

https://doi.org/10.1002/leap.1655

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Knowledge Infrastructures are Growing Up: The Case for Institutional (Data) Repositories 10 Years After the Holdren Memo"


Institutional data repositories are uniquely positioned to support researchers in sharing scholarly outputs. As funding agencies develop and institute policies for research data access and sharing, institutional data repositories have emerged as a critical feature in ecosystems for data stewardship and sharing. We show that institutional data repositories can meet and exceed the requirements and recommendations of federal data policy, thereby maximizing the benefits of data sharing. We present results of a mixed-method study which explores the adoption and usage of institutional repositories to share data from 2017 to 2023. Data from two previous studies were combined with data collected in 2023 on the data sharing solutions of Association of Research Libraries member institutions in the United States and Canada. The analysis of the aggregated data indicates that data stewardship has increased in both institutional repositories and institutional data repositories with an increase in complementary infrastructure to support data sharing. We then conduct an “infrastructural inversion” (Bowker & Star, 1999) to ‘surface invisible work’ of making data repositories function well, and demonstrate that institutional data repositories have advantages for providing sustainable stewardship, curation, and sharing of research data. Finally, we show that institutional data repositories may produce additional benefits through established infrastructure, local interoperability, and control.

https://doi.org/10.5334/dsj-2024-046

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Constructing Risk in Trustworthy Digital Repositories"


This article investigates the construction of risk within trustworthy digital repository audits. It contends that risk is a social construct, and social factors influence how stakeholders in digital preservation processes comprehend and react to risk.

https://doi.org/10.1108/JD-08-2023-0157

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Analyzing Research Data Repositories (RDR) from BRICS Nations: A Comprehensive Study"


As of March 2, 2024, re3data.org indexes a total of 3,192 Research Data Repositories (RDRs) worldwide, with BRICS nations contributing 195. China leads among BRICS nations, followed by India, Russia, and Brazil. . . . "House, tailor-made " software is widely used for creating RDRs, followed by Dataverse and DSpace. . . . Most repositories are disciplinary, followed by institutional ones. Most repositories specify data upload types, with "restricted " being the most common, followed by closed types. Open access is predominant in data access, followed by restricted access and embargo periods, while a small number restrict access entirely.

https://doi.org/10.1108/LM-04-2024-0040

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Developing Open Access Resource Management Principles in a Consortial Environment: A University of California Model"


In the summer of 2021, the University of California (UC) migrated to a new integrated library system, called the Systemwide Integrated Library System project (SILS), which for the first time brought all ten UC campuses, two regional storage facilities, and the California Digital Library (CDL) together into one shared library system. With new potential for increased collaboration and cooperation, SILS leadership groups identified consortial open access (OA) resource management as a key opportunity in the new system, in alignment with UC’s priorities around discovery and access to library collections, as well as UC’s commitment to open access and transforming the scholarly communication landscape. This article discusses the formation of the UC Open Access Resource Management Task Force (OARMTF), a group charged to investigate what it would mean to consortially manage OA resources. Specifically, this article focuses on the OARMTF’s work setting out principles for OA resource management, which the authors hope may serve as a useful case study for other institutions or consortia interested in developing principles around OA resource management, as well as encourage more discussion and research into best practices for consortial management of OA resources.

https://doi.org/10.5860/lrts.68n1.8216

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Opening Up: A Global Context for Local Open Access Initiatives in Higher Education"


Open access policies and mandates can be a useful tool in persuading faculty at higher education institutions around the globe to produce and share open scholarship. But are such policies widely written, accepted, and adopted? Leveraging information found on the Registry of Open Access Repositories Mandatory Archiving Policies, this paper analyzes open access policies at higher education institutions worldwide. The data indicate that Europe holds the most policies, while fewer policies have been enacted in the Americas, Africa, Oceania, and Asia due to a myriad of barriers. Overall, better strategies to promote open access are needed, and such strategies may not necessarily take the form of an open access policy. My own investigation of global open access policies has informed my practices with respect to open access. In this paper, I demonstrate how librarians acting as policy entrepreneurs can assist with the promotion of open access at their institutions and then conclude with suggestions, solutions, and pathways beyond policy adoption to promote and advocate for open access.

https://tinyurl.com/2h3uz5n4

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Preprints, Journals and Openness: Disentangling Goals and Incentives "


I would argue that private funders such as the Gates Foundation or the Howard Hughes Medical Institute (HHMI) could provide material support through grants and policies for quality peer review, baking peer review into selection of grantees. Such an approach will require careful structures and mechanisms for reviewer selection, and measures of success, or we may run the risk of creating further inequities. Mind you, in many fields it is just hard to find good reviewers prepared to put in the effort required for a considered, thoughtful review. Societies, such as my own, could also consider material ways to support peer review more actively — a philosophical and practical approach to raising the profile of peer review at an early stage in the life of a researcher.

https://tinyurl.com/ymckyb9x

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Current State and Future Directions for Open Repositories in Europe


In January 2023, OpenAIRE, LIBER, SPARC Europe, and COAR launched a joint strategy aimed at strengthening the European repository network. As a first step, a survey of the European repository landscape was undertaken in February-March 2023. The survey found that, collectively, European repositories acquire, preserve and provide open access to tens or possibly hundreds of millions of valuable research outputs and represent critical, not-for-profit infrastructure in the European open science landscape. They are used for sharing articles that may be pay-walled in published journals, but also for providing access to a large variety of other types of research outputs including research data, theses/dissertations, conference papers, preprints, code, and so on.

However, in order to ensure the European repository network is fit for purpose and able to support the evolving needs of the research community, the survey also identified three areas in particular that could be strengthened: maintaining up-to-date, highly functioning software platforms; applying consistent and comprehensive good practices in terms of metadata, preservation, and usage statistics; and gaining appropriate visibility in the scholarly ecosystem.

Despite the challenges, the current climate offers exciting opportunities for repositories. Many funders are actively promoting the repository route for articles because of their role in supporting equitable access to content (i.e. no fees to access or deposit). The value proposition for open science is growing and repositories are increasingly recognised as the main mechanism for collecting and providing access to a wide range of other research outputs. Add to this, the nascent, but growing, interest in the publish-review-curate model in which repositories have a central function, and it seems they are well placed to expand their current role in the ecosystem.

https://doi.org/10.5281/zenodo.10255559

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Jupyter Notebooks and Institutional Repositories: A Landscape Analysis of Realities, Opportunities and Paths Forward"


Jupyter Notebooks are important outputs of modern scholarship, though the longevity of these resources within the broader scholarly record is still unclear. Communities and their creators have yet to holistically understand creation, access, sharing and preservation of computational notebooks, and such notebooks have yet to be designated a proper place among institutional repositories or other preservation environments as first class scholarly digital assets. Before this can happen, repository managers and curators need to have the appropriate tools, schemas and best practices to maximize the benefit of notebooks within their repository landscape and environments.

This paper explores the landscape of Jupyter notebooks today, and focuses on the opportunities and challenges related to bringing Jupyter Notebooks into institutional repositories. We explore the extent to which Jupyter Notebooks are currently accessioned into institutional repositories, and how metadata schemas like CodeMeta might facilitate their adoption. We also discuss characteristics of Jupyter Notebooks created by researchers at the National Center for Atmospheric Research, to provide additional insight into how to assess and accession Jupyter Notebooks and related resources into an institutional repository.

https://journal.code4lib.org/articles/17751

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Islandora for Archival Access and Discovery"


This article is a case study describing the implementation of Islandora 2 to create a public online portal for the discovery, access, and use of archives and special collections materials at the University of Nevada, Las Vegas. The authors will explain how the goal of providing users with a unified point of access across diverse data (including finding aids, digital objects, and agents) led to the selection of Islandora 2 and they will discuss the benefits and challenges of using this open source software. They will describe the various steps of implementation, including custom development, migration from CONTENTdm, integration with ArchivesSpace, and developing new skills and workflows to use Islandora most effectively. As hindsight always provides additional perspective, the case study will also offer reflection on lessons learned since the launch, insights on open-source repository sustainability, and priorities for future development.

https://journal.code4lib.org/articles/17929

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Open Access Movement in the Scholarly World: Pathways for Libraries in Developing Countries"


Open access is a scholarly publishing model that has emerged as an alternative to traditional subscription-based journal publishing. This study explores the adoption of the open access movement worldwide and the role that libraries can play in addressing those factors which are slowing its progress within developing countries. The study has drawn upon both qualitative data from a focused literature review and quantitative data from major open access platforms. The results indicate that while the open access movement is steadily gaining acceptance worldwide, the progress in developing countries within geographical areas such as Africa, Asia and Oceania is quite a bit slower. Two significant factors are the cost of publishing fees and the lack of institutional open access mandates and policies to encourage uptake. The study provides suggested strategies for academic libraries to help overcome current challenges.

https://doi.org/10.1177/01655515231202758

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"US Repository Network Launches Pilot to Enhance Discoverability of Open Access Content in Repositories"


In November, the US Repository Network (USRN) will launch a pilot project aimed at improving the discoverability of articles in repositories. This pilot project involves the use of services from CORE, a not-for-profit aggregator based at Open University in the UK, to evaluate and improve local repository practices. Additional technical support will be provided by Antleaf Ltd.

As part of the project, CORE will aggregate the metadata and full text of articles from a subset of US repositories, allowing them to be findable through a centralized discovery service with prominent links back to the original full text of the repository. At the same time, the project will assess current practices related to metadata quality, the tracking of Open Access deposits, the use of PIDs, technical support for OAI-PMH, and the adoption of more recent protocols, such as FAIR Signposting. At the level of the centralized aggregation, CORE will enrich the existing US metadata with information from its larger international aggregation. A Dashboard service for participating institutions will be provided, enabling them to assess, validate and monitor their practices.

https://tinyurl.com/2utfpvj3

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"FAIR EVA: Bringing Institutional Multidisciplinary Repositories into the FAIR Picture"


The FAIR Principles are a set of good practices to improve the reproducibility and quality of data in an Open Science context. Different sets of indicators have been proposed to evaluate the FAIRness of digital objects, including datasets that are usually stored in repositories or data portals. However, indicators like those proposed by the Research Data Alliance are provided from a high-level perspective that can be interpreted and they are not always realistic to particular environments like multidisciplinary repositories. This paper describes FAIR EVA, a new tool developed within the European Open Science Cloud context that is oriented to particular data management systems like open repositories, which can be customized to a specific case in a scalable and automatic environment. It aims to be adaptive enough to work for different environments, repository software and disciplines, taking into account the flexibility of the FAIR Principles. As an example, we present DIGITAL.CSIC repository as the first target of the tool, gathering the particular needs of a multidisciplinary institution as well as its institutional repository.

https://doi.org/10.1038/s41597-023-02652-8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Where Is All the Research Software? An Analysis of Software in UK Academic Repositories"


This research examines the prevalence of research software as independent records of output within UK academic institutional repositories (IRs). There has been a steep decline in numbers of research software submissions to the UK’s Research Excellence Framework from 2008 to 2021, but there has been no investigation into whether and how the official academic IRs have affected the low return rates. In what we believe to be the first such census of its kind, we queried the 182 online repositories of 157 UK universities. Our findings show that the prevalence of software within UK Academic IRs is incredibly low. Fewer than 28% contain software as recognised academic output. Of greater concern, we found that over 63% of repositories do not currently record software as a type of research output and that several Universities appeared to have removed software as a defined type from default settings of their repository. We also explored potential correlations, such as being a member of the Russell group, but found no correlation between these metadata and prevalence of records of software. Finally, we discuss the implications of these findings with regards to the lack of recognition of software as a discrete research output in institutions, despite the opposite being mandated by funders, and we make recommendations for changes in policies and operating procedures.

https://doi.org/10.7717/peerj-cs.1546

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |