"Strategies for Digital Library Migration"


A migration of the datastore and data model for Stanford Digital Repository’s digital object metadata was recently completed. This paper describes the motivations for this work and some of the strategies used to accomplish the migration. Strategies include: adopting a validatable data model, abstracting the datastore behind an API, separating concerns, testing metadata mappings against real digital objects, using reports to understand the data, templating unit tests, performing a rolling migration, and incorporating the migration into ongoing project work. These strategies may be useful to other repository or digital library application migrations.

https://journal.code4lib.org/articles/17290

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "We Need a Plan D"


Researchers, institutions and funders should collaborate to develop an overarching strategy for data preservation — a plan D. There will doubtless be calls for a ‘PubMed Central for data’. But what we really need is a federated system of repositories with functionality tailored to the information that they archive. This will require domain experts to agree standards for different types of data from different fields: what should be archived and when, which format, where, and for how long.

https://doi.org/10.1038/s41592-023-01817-y

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Smithsonian Puts 4.5 Million High-Res Images Online and Into the Public Domain, Making Them Free to Use"


"Anyone can download, reuse, and remix these images at any time — for free under the Creative Commons Zero (CC0) license," write My Modern Met’s Jessica Stewart and Madeleine Muzdakis. "A dive into the 3D records shows everything from CAD models of the Apollo 11 command module to Horatio Greenough’s 1840 sculpture of George Washington."

http://bit.ly/3KBhZsV

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Know(ing) Infrastructure: The Wayback Machine as Object and Instrument of Digital Research"


From documenting human rights abuses to studying online advertising, web archives are increasingly positioned as critical resources for a broad range of scholarly Internet research agendas. In this article, we reflect on the motivations and methodological challenges of investigating the world’s largest web archive, the Internet Archive’s Wayback Machine (IAWM). Using a mixed methods approach, we report on a pilot project centred around documenting the inner workings of ‘Save Page Now’ (SPN) — an Internet Archive tool that allows users to initiate the creation and storage of ‘snapshots’ of web resources. By improving our understanding of SPN and its role in shaping the IAWM, this work examines how the public tool is being used to ‘save the Web’ and highlights the challenges of operationalising a study of the dynamic sociotechnical processes supporting this knowledge infrastructure. Inspired by existing Science and Technology Studies (STS) approaches, the paper charts our development of methodological interventions to support an interdisciplinary investigation of SPN, including: ethnographic methods, ‘experimental blackbox tactics’, data tracing, modelling and documentary research. We discuss the opportunities and limitations of our methodology when interfacing with issues associated with temporality, scale and visibility, as well as critically engage with our own positionality in the research process (in terms of expertise and access). We conclude with reflections on the implications of digital STS approaches for ‘knowing infrastructure’, where the use of these infrastructures is unavoidably intertwined with our ability to study the situated and material arrangements of their creation.

https://doi.org/10.1177/13548565231164759

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Interoperable Infrastructure for Software and Data Publishing"


Achieving scalable, high-quality, interoperable data and software publishing is possible. There are already builders, some represented by the authorship of this article, that are on the right path, building tools that effectively meet the needs of researchers in an open and pluggable way. One example is InvenioRDM, a flexible and turn-key next-generation research data management repository built by CERN and more than 25 multi-disciplinary partners world-wide; InvenioRDM leverages community standards and supports FAIR practices out of the box. Another example of agnostic, pluggable tooling, in this case for software submission, are the submission workflow tools currently developed in the HERMES project. These allow researchers to automate the publication of software artifacts together with rich metadata, to create software publications following the FAIR Principles for Research Software.

http://bit.ly/42Lc5Oe

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Study on the Readiness of Research Data and Literature Repositories to Facilitate Compliance With the Open Science Horizon Europe MGA Requirement

In this study we analysed 220 repositories and, via a structured methodology, we identified 165 trusted repositories and tested their readiness to facilitate the compliance with the HE MGA Open Science requirements.

We show that it is not straightforward to assess whether a given repository is suitable to facilitate compliance with the HE MGA requirements. This is mainly due to varying interpretations of definitions and requirements, whether information on repository specifications is publicly available, and the high level of technical expertise needed to assess all requirements.

We highlight that repository registries, such as FAIRsharing, re3data or the CoreTrustSeal (CTS) website, are not sufficient on their own to assess the readiness of repositories to facilitate compliance with the HE MGA requirements, as the definition of what constitutes a trusted repository is subtle and varied and needs to be carefully interpreted and applied to repositories. This is also the case for related concepts such as community endorsement or for policy requirements in terms of preservation, curation and security of the repository contents.

https://doi.org/10.5281/zenodo.7728016

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"At Hearing, Judge Appears Skeptical of Internet Archive’s Scanning and Lending Program"


Over the course of a 90-minute hearing on the parties’ cross motions for summary judgment, Koeltl appeared skeptical that there was sufficient basis in law to support the Internet Archive’s scanning and lending of print library books under a legally untested protocol known as controlled digital lending, and unconvinced that the case is fundamentally about the future of library lending, as Internet Archive attorneys have argued.

http://bit.ly/3FFjVyS

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Book Publishers with Surging Profits Struggle to Prove Internet Archive Hurt Sales"


Today, the Internet Archive (IA) defended its practice of digitizing books and lending those e-books for free to users of its Open Library. In 2020, four of the wealthiest book publishers sued IA, alleging this kind of digital lending was actually "willful digital piracy" causing them "massive harm." But IA’s lawyer, Joseph Gratz, argued that the Open Library’s digitization of physical books is fair use, and publishers have yet to show they’ve been harmed by IA’s digital lending.

bit.ly/3JTMDP2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Trustworthy Digital Repository Certification: A Longitudinal Study"


To understand the impact of certification on repositories’ infrastructure, processes, and services, we analyzed a sample of publicly available TDR audit reports (n = 175) from the Data Seal of Approval (DSA) and Core Trust Seal (CTS) certification programs. This first longitudinal study of TDR certification over a ten-year period (from 2010 to 2020) found that many repositories either maintain a relatively high standard of trustworthiness in terms of their compliance with guidelines in DSA or CTS standards or improve their trustworthiness by raising their compliance levels with these guidelines each time they get recertified.

https://doi.org/10.1007/978-3-031-28032-0_42

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Better World Books and the Internet Archive: "Saving 4 Million Books From Landfill"


The service that BWB provides is an important one for libraries. BWB collects used books from libraries, booksellers, colleges, and universities in six countries, which are then either resold online, donated or recycled. To date, Better World Books has donated over 35 million books worldwide, has raised close to $34 million for libraries and literacy, and has saved more than 450 million books from landfills. Through the partnership with the Internet Archive, BWB has donated more than one million books each year for preservation and digitization, totaling 4 million books to date.

https://cutt.ly/X8CaoCv

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Lack of Sustainability Plans for Preprint Services Risks Their Potential to Improve Science"


Despite successfully building a revenue model that shares the burden between Cornell University, the Simons Foundation and several members and supporters, arXiv’s “funding is still outpaced by [their] growth” – the server hosts over 2 million preprints already and is growing by 10% each year. And while arXiv has been supporting more and more scholars to share and discover preprints, the team behind it has been through significant changes in leadership and is dealing with the urgent need to modernize their 30-year-old technology. As a former Executive Director of arXiv noted, “[arXiv’s success] may not last forever”. Similarly, the recent news that Chan Zuckerberg Initiative has renewed its financial support for the leading preprint servers in biology and medicine, bioRxiv and medRxiv is welcome relief, but this support is temporary, and the team must find a way to continue in the long run.

bit.ly/3y745Ji

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

2.6 Billion Total Downloads: arXiv Annual Report 2022


Our critical priorities during 2022 were to secure additional funding, hire technical and program directors, and ramp up our efforts to modernize arXiv’s software by moving it to the cloud, which will provide better stability, scalability and maintainability. I’m pleased to report that we were able to make significant progress on all of these fronts. arXiv brought in more funding than expected in the form of grants, memberships, and donations, and we hired Stephanie Orphan as program director and Charles Frankston as technical director. Both bring strong and complementary expertise to the team. Moving the technical operations of arXiv—a service with a 30 year history—off of Cornell’s on-premises servers is a major, complicated task. The move to the cloud is currently in progress and on track

bit.ly/41exRsX

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"University of Oregon and Oregon State University Collaborate to Launch Oregon Digital"


The University of Oregon and Oregon State University are proud to announce the launch of Oregon Digital, a cultural heritage repository that brings together more than 500,000 digitized works from both universities, including unique digitized and born-digital collections. This collaborative effort includes historic and modern photographs, manuscripts, publications, and more.

https://library.uoregon.edu/node/7904

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Archive for the BRAIN Initiative (DABI)"


Data sharing is becoming ubiquitous and can be advantageous for most biomedical research. However, some data are inherently more amenable to sharing than others. For example, human intracranial neurophysiology recordings and associated multimodal data have unique features that warrant special considerations. The associated data are heterogeneous, difficult to compare, highly specific, and collected from small cohorts with treatment resistant conditions, posing additional complications when attempting to perform generalizable analyses across projects. We present the Data Archive for the BRAIN Initiative (DABI) and describe features of the platform that are designed to overcome these and other challenges. DABI is a data repository and portal for BRAIN Initiative projects that collect human and animal intracranial recordings, and it allows users to search, visualize, and analyze multimodal data from these projects. The data providers maintain full control of data sharing privileges and can organize and manage their data with a user-friendly and intuitive interface. We discuss data privacy and security concerns, example analyses from two DABI datasets, and future goals for DABI.

https://doi.org/10.1038/s41597-023-01972-z

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Data Preservation in High Energy Physics: DPHEP Global Report 2022


This document summarizes the status of data preservation in high energy physics. The paradigms and the methodological advances are discussed from a perspective of more than ten years of experience with a structured effort at international level. The status and the scientific return related to the preservation of data accumulated at large collider experiments are presented, together with an account of ongoing efforts to ensure long-term analysis capabilities for ongoing and future experiments. Transverse projects aimed at generic solutions, most of which are specifically inspired by open science and FAIR principles, are presented as well. A prospective and an action plan are also indicated.

https://arxiv.org/abs/2302.03583

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Forthcoming: Discoverability in Digital Repositories: Systems, Perspectives, and User Studies


It examines discoverability in digital repositories from both user and system perspectives by exploring how users access content (including their search patterns and habits, need for digital content, effects of outreach, or integration with Wikipedia and other web-based tools) and how systems support or prevent discoverability through the structure or quality of metadata, system interfaces, exposure to search engines or lack thereof, and integration with library discovery tools.

bit.ly/3XbbRvT

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Outside the Library: Early Career Researchers and Use of Alternative Information Sources in Pandemic Times"


Presents findings from a study into the attitudes and practices of pandemic-era early career researchers (ECRs) in regard to obtaining access to the formally published scholarly literature, which focused on alternative providers, notably ResearchGate and Sci-Hub. . . . Findings show that alternative providers, as represented by ResearchGate and Sci-Hub, have become established and appear to be gaining ground. However, there are considerable country- and discipline-associated differences.

https://doi.org/10.1002/leap.1522

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"arXiv Announces New Policy on ChatGPT and Similar Tools"

In view of this, we

  1. continue to require authors to report in their work any significant use of sophisticated tools, such as instruments and software; we now include in particular text-to-text generative AI among those that should be reported consistent with subject standards for methodology.
  2. remind all colleagues that by signing their name as an author of a paper, they each individually take full responsibility for all its contents, irrespective of how the contents were generated. If generative AI language tools generate inappropriate language, plagiarized content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).
  3. generative AI language tools should not be listed as an author; instead authors should refer to (1).

bit.ly/3wKlx5J

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"A Framework for Improving the Accessibility of Research Papers on arXiv.org"


The research content hosted by arXiv is not fully accessible to everyone due to disabilities and other barriers. This matters because a significant proportion of people have reading and visual disabilities, it is important to our community that arXiv is as open as possible, and if science is to advance, we need wide and diverse participation. In addition, we have mandates to become accessible, and accessible content benefits everyone. In this paper, we will describe the accessibility problems with research, review current mitigations (and explain why they aren’t sufficient), and share the results of our user research with scientists and accessibility experts. Finally, we will present arXiv’s proposed next step towards more open science: offering HTML alongside existing PDF and TeX formats. An accessible HTML version of this paper is also available at https://info.arxiv.org/about/accessibility_research_report.html

https://arxiv.org/abs/2212.07286

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Phase 1 of the NIH Preprint Pilot: Testing the Viability of Making Preprints Discoverable in PubMed Central and PubMed"


Introduction: The National Library of Medicine (NLM) launched a pilot in June 2020 to 1) explore the feasibility and utility of adding preprints to PubMed Central (PMC) and making them discoverable in PubMed and 2) to support accelerated discoverability of NIH-supported research without compromising user trust in NLM’s widely used literature services. Methods: The first phase of the Pilot focused on archiving preprints reporting NIH-supported SARS-CoV-2 virus and COVID-19 research. To launch Phase 1, NLM identified eligible preprint servers and developed processes for identifying NIH-supported preprints within scope in these servers. Processes were also developed for the ingest and conversion of preprints in PMC and to send corresponding records to PubMed. User interfaces were modified for display of preprint records. NLM collected data on the preprints ingested and discovery of preprint records in PMC and PubMed and engaged users through focus groups and a survey to obtain direct feedback on the Pilot and perceptions of preprints. Results: Between June 2020 and June 2022, NLM added more than 3,300 preprint records to PMC and PubMed, which were viewed 4 million times and 3 million times, respectively. Nearly a quarter of preprints in the Pilot were not associated with a peer-reviewed published journal article. User feedback revealed that the inclusion of preprints did not have a notable impact on trust in PMC or PubMed. Discussion: NIH-supported preprints can be identified and added to PMC and PubMed without disrupting existing operations processes. Additionally, inclusion of preprints in PMC and PubMed accelerates discovery of NIH research without reducing trust in NLM literature services. Phase 1 of the Pilot provided a useful testbed for studying NIH investigator preprint posting practices, as well as knowledge gaps among user groups, during the COVID-19 public health emergency, an unusual time with heightened interest in immediate access to research results.

https://doi.org/10.1101/2022.12.12.520156

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Ten Recommended Practices for Managing Preprints in Generalist and Institutional Repositories"


Currently, there are numerous gaps in geographic and domain coverage and some authors will choose to deposit their research outputs into another type of repository, such as an institutional or generalist repository. . . . To address these gaps, a COAR-ASAPbio Working Group on Preprint in Repositories identified ten recommended practices for managing preprints across three areas: linking, discovery, and editorial processes. While we acknowledge that many of these practices are not currently in use by institutional and generalist repositories, we hope that these recommendations will encourage repositories around the world that collect preprints to begin to apply them locally.

https://cutt.ly/R0gursT

Full report

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Data Quality Assurance at Research Data Repositories"


This paper presents findings from a survey on the status quo of data quality assurance practices at research data repositories.

The personalised online survey was conducted among repositories indexed in re3data in 2021. It covered the scope of the repository, types of data quality assessment, quality criteria, responsibilities, details of the review process, and data quality information and yielded 332 complete responses.

The results demonstrate that most repositories perform data quality assurance measures, and overall, research data repositories significantly contribute to data quality. Quality assurance at research data repositories is multifaceted and nonlinear, and although there are some common patterns, individual approaches to ensuring data quality are diverse. The survey showed that data quality assurance sets high expectations for repositories and requires a lot of resources. Several challenges were discovered: for example, the adequate recognition of the contribution of data reviewers and repositories, the path dependence of data review on review processes for text publications, and the lack of data quality information. The study could not confirm that the certification status of a repository is a clear indicator of whether a repository conducts in-depth quality assurance.

http://doi.org/10.5334/dsj-2022-018

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "A Comprehensive Review of Open Data Platforms, Prevalent Technologies, and Functionalities"


We will discuss seven major open data platforms, such as (1) CKAN (2) DKAN (3) Socrata (4) OpenDataSoft (5) GitHub (6) Google datasets (7) Kaggle. We will evaluate the technological commons, techniques, features, methods, and visualization offered by each tool. In addition, why are these platforms important to users such as providers, curators, and end-users? And what are the key options available on these platforms to publish open data?

https://doi.org/10.1145/3560107.3560142

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "Expanding Your Institutional Repository: Librarians Working with Faculty"


Since a successful institutional repository will contain a higher percentage of the contributors’ materials, we implemented a system to upload faculty publications more effectively to our academic library’s institutional repository.. . . The success of this method is indicated by the increase in articles that have been uploaded to our institutional repository; as a result of the implementation of this program, the number of publications in our university’s institutional repository by these authors has increased 174 %.

https://doi.org/10.1016/j.acalib.2022.102628

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |