"Scaling Up Digital Preservation Workflows With Homegrown Tools and Automation"


At NC State University Libraries, the Special Collections Research Center leverages an integrated system of locally developed applications and open-source technologies to facilitate the long-term preservation of digitized and born-digital archival assets. These applications automate many previously manual tasks, such as creating access derivatives from preservation scans and ingest into preservation storage. They have allowed us to scale up the number of digitized assets we create and publish online; born-digital assets we acquire from storage media, appraise, and package; and total assets in local and distributed preservation storage. The origin of these applications lies in scripted workflows put into use more than a decade ago, and the applications were built in close collaboration with developers in the Digital Library Initiatives department between 2011 and 2023. This paper presents a strategy for managing digital curation and preservation workflows that does not solely depend on standalone and third-party applications. It describes our iterative approach to deploying these tools, the functionalities of each application, and sustainability considerations of managing in-house applications and using Academic Preservation Trust for offsite preservation.

https://tinyurl.com/4mjpzth2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"An Analysis of the Effects of Sharing Research Data, Code, and Preprints on Citations"


In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122’000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations. We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% (±.7) on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% (±.8) on average. However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

https://doi.org/10.1371/journal.pone.0311493

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Software Management Plans – Current Concepts, Tools, and Application"


The present article is a review of the state of the art about software management plans (SMPs). It provides a selection of questionnaires, tools and application cases for SMPs from a European (German) point of view, and discusses the possible connections of SMPs to other aspects of software sustainability, such as metadata, FAIR4RS principles or machine-actionable SMPs. The aim of our publication is to provide basic knowledge to start diving into the subject and a handout for infrastructure providers who are about to establish/develop a SMP service in one’s own institution.

https://doi.org/10.5334/dsj-2024-043

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Creating a Fully Open Environment for Research Code and Data"


Quantitative research in the social and natural sciences is increasingly dependent on new datasets and forms of code. Making these resources open and accessible is a key aspect of open research and underpins efforts to maintain research integrity. Erika Pastrana explains how Springer Nature developed Nature Computational Science to be fully compliant with open research and data principles.

https://tinyurl.com/7uwdxrrz

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Ten Simple Rules for Recognizing Data and Software Contributions in Hiring, Promotion, and Tenure"


The ways in which promotion and tenure committees operate vary significantly across universities and departments. While committees often have the capability to evaluate the rigor and quality of articles and monographs in their scientific field, assessment with respect to practices concerning research data and software is a recent development and one that can be harder to implement, as there are few guidelines to facilitate the process. More specifically, the guidelines given to tenure and promotion committees often reference data and software in general terms, with some notable exceptions such as guidelines in [5] and are almost systematically trumped by other factors such as the number and perceived impact of journal publications. The core issue is that many colleges establish a scholarship versus service dichotomy: Peer-reviewed articles or monographs published by university presses are considered scholarship, while community service, teaching, and other categories are given less weight in the evaluation process. This dichotomy unfairly disadvantages digital scholarship and community-based scholarship, including data and software contributions [6]. In addition, there is a lack of resources for faculties to facilitate the inclusion of responsible data and software metrics into evaluation processes or to assess faculty’s expertise and competencies to create, manage, and use data and software as research objects. As a result, the outcome of the assessment by the tenure and promotion committee is as dependent on the guidelines provided as on the committee members’ background and proficiency in the data and software domains.

The presented guidelines aim to help alleviate these issues and align the academic evaluation processes to the principles of open science. We focus here on hiring, tenure, and promotion processes, but the same principles apply to other areas of academic evaluation at institutions. While these guidelines are by no means sufficient for handling the complexity of a multidimensional process that involves balancing a large set of nuanced and diverse information, we hope that they will support an increasing adoption of processes that recognize data and software as key research contributions.

https://doi.org/10.1371/journal.pcbi.1012296

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Sharing Practices of Software Artefacts and Source Code for Reproducible Research"


While source code of software and algorithms depicts an essential component in all fields of modern research involving data analysis and processing steps, it is uncommonly shared upon publication of results throughout disciplines. Simple guidelines to generate reproducible source code have been published. Still, code optimization supporting its repurposing to different settings is often neglected and even less thought of to be registered in catalogues for a public reuse. Though all research output should be reasonably curated in terms of reproducibility, it has been shown that researchers are frequently non-compliant with availability statements in their publications. These do not even include the use of persistent unique identifiers that would allow referencing archives of code artefacts at certain versions and time for long-lasting links to research articles. In this work, we provide an analysis on current practices of authors in open scientific journals in regard to code availability indications, FAIR principles applied to code and algorithms. We present common repositories of choice among authors. Results further show disciplinary differences of code availability in scholarly publications over the past years. We advocate proper description, archiving and referencing of source code and methods as part of the scientific knowledge, also appealing to editorial boards and reviewers for supervision.

https://doi.org/10.1007/s41060-024-00617-7

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Software Preservation after the Internet"


Software preservation must consider knowledge management as a key challenge. We suggest a conceptualization of software preservation approaches that are available at different stages of the software lifecycle and can support memory institutions to assess the current state of software items in their collection, the capabilities of their infrastructure, and completeness and applicability of knowledge that is required to successfully steward the collection.

https://tinyurl.com/8y9svs7x

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Where Is All the Research Software? An Analysis of Software in UK Academic Repositories"


This research examines the prevalence of research software as independent records of output within UK academic institutional repositories (IRs). There has been a steep decline in numbers of research software submissions to the UK’s Research Excellence Framework from 2008 to 2021, but there has been no investigation into whether and how the official academic IRs have affected the low return rates. In what we believe to be the first such census of its kind, we queried the 182 online repositories of 157 UK universities. Our findings show that the prevalence of software within UK Academic IRs is incredibly low. Fewer than 28% contain software as recognised academic output. Of greater concern, we found that over 63% of repositories do not currently record software as a type of research output and that several Universities appeared to have removed software as a defined type from default settings of their repository. We also explored potential correlations, such as being a member of the Russell group, but found no correlation between these metadata and prevalence of records of software. Finally, we discuss the implications of these findings with regards to the lack of recognition of software as a discrete research output in institutions, despite the opposite being mandated by funders, and we make recommendations for changes in policies and operating procedures.

https://doi.org/10.7717/peerj-cs.1546

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How Does Mandated Code-Sharing Change Peer Review?"


At the end of the year-long trial period, code sharing had risen from 53% in 2019 to 87% for 2021 articles submitted after the policy went into effect. Evidence in hand, the journal Editors-in-Chief decided to make code sharing a permanent feature of the journal. Today, the sharing rate is 96%.

https://tinyurl.com/5n9yh9yj

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Towards Research Software-ready Libraries"


Software is increasingly acknowledged as valid research output. Academic libraries adapt to this change to become research software-ready. Software publication and citation are key areas in this endeavor. We present and discuss the current state of the practice of software publication and software citation, and discuss four areas of activity that libraries engage in: (1) technical infrastructure, (2) training and support, (3) software management and curation, (4) policies.

https://doi.org/10.1515/abitech-2023-0031

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Computational Reproducibility of Jupyter Notebooks from Biomedical Publications"


Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. We address computational reproducibility at two levels: First, using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks related to publications indexed in PubMed Central. We identified such notebooks by mining the articles full text, locating them on GitHub and re-running them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. Second, this study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over two years. Out of 27271 notebooks from 2660 GitHub repositories associated with 3467 articles, 22578 notebooks were written in Python, including 15817 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 10388 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 1203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. We zoom in on common problems, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.

https://arxiv.org/abs/2308.07333

More about Jupyter notebooks.

The Jupyter Notebook is an interactive computing environment that enables users to author notebook documents that include code, interactive widgets, plots, narrative text, equations, images and even video! The Jupyter name comes from 3 programming languages: Julia, Python, and R. It is a popular tool for literate programming. Donald Knuth first defined literate programming as a script, notebook, or computational document that contains an explanation of the program logic in a natural language (e.g. English or Mandarin), interspersed with snippets of macros and source code, which can be compiled and rerun. You can think of it as an executable paper!

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Code Sharing Increases Citations, but Remains Uncommon"


Overall, R code was only available in 49 of the 1001 papers examined (4.9%) (Figure 1). When included, code was most often in the Supplemental Information (41%), followed by Github (20%), Figshare (6%), or other repositories (33%). Open-access publications were 70% more likely to include code than closed access publications (7.21% vs. 4.22%, X2 = 4.442, p < 0.05). Code-sharing was estimated to increase at 0.5% / year, but this trend was not significant (p=0.11). The year of 2021 and 2022 showed a shift towards more frequent sharing, but the percentage of code-sharing has been consistently below 15% over the past decade (Figure 1).

We found papers including code disproportionately impact the literature (Figure 2), and accumulate citations faster (i.e., a marginally significant year-by-code-inclusion interaction; p = 0.0863). Further, we found a significant interaction between Open Access and code inclusion (p = 0.0265), with publications meeting both Open Science criteria (i.e., open code and open access) having highest overall predicted citation rates (Figure 2). For example, Open Science papers are expected to receive more than doubled citations (96.25 vs. 36.89) in year 13 post-publication compared with fully closed papers (Figure 2).

https://doi.org/10.21203/rs.3.rs-3222221/v1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Care to Share? Experimental Evidence on Code Sharing Behavior in the Social Sciences"


Transparency and peer control are cornerstones of good scientific practice and entail the replication and reproduction of findings. The feasibility of replications, however, hinges on the premise that original researchers make their data and research code publicly available. This applies in particular to large-N observational studies, where analysis code is complex and may involve several ambiguous analytical decisions. To investigate which specific factors influence researchers’ code sharing behavior upon request, we emailed code requests to 1,206 authors who published research articles based on data from the European Social Survey between 2015 and 2020. In this preregistered multifactorial field experiment, we randomly varied three aspects of our code request’s wording in a 2x4x2 factorial design: the overall framing of our request (enhancement of social science research, response to replication crisis), the appeal why researchers should share their code (FAIR principles, academic altruism, prospect of citation, no information), and the perceived effort associated with code sharing (no code cleaning required, no information). Overall, 37.5% of successfully contacted authors supplied their analysis code. Of our experimental treatments, only framing affected researchers’ code sharing behavior, though in the opposite direction we expected: Scientists who received the negative wording alluding to the replication crisis were more likely to share their research code. Taken together, our results highlight that the availability of research code will hardly be enhanced by small-scale individual interventions but instead requires large-scale institutional norms.

https://doi.org/10.1371/journal.pone.0289380

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Policy Recommendations to Ensure That Research Software Is Openly Accessible and Reusable"


There is now an opportunity to expand US federal policies in similar ways and align their research software sharing aspects across agencies.

To do this, we recommend:

  1. As part of their updated policy plans submitted in response to the 2022 OSTP memo, US federal agencies should, at a minimum, articulate a pathway for developing guidance on research software sharing, and, at a maximum, incorporate research software sharing requirements as a necessary extension of any data sharing policy and a critical strategy to make data truly FAIR (as these principles have been adapted to apply to research software [12]).
  2. As part of sharing requirements, federal agencies should specify that research software should be deposited in trusted, public repositories that maximize discovery, collaborative development, version control, long-term preservation, and other key elements of the National Science and Technology Council’s "Desirable Characteristics of Data Repositories for Federally Funded Research" [13], as adapted to fit the unique considerations of research software.
  3. US federal agencies should encourage grantees to use non-proprietary software and file formats, whenever possible, to collect and store data. We realize that for some research areas and specialized techniques, viable non-proprietary software may not exist for data collection. However, in many cases, files can be exported and shared using non-proprietary formats or scripts can be provided to allow others to open files.
  4. Consistent with the US Administration’s approach to cybersecurity [<14], federal agencies should provide clear guidance on measures grantees are expected to undertake to ensure the security and integrity of research software. This guidance should encompass the design, development, dissemination, and documentation of research software. Examples include the National Institute of Standards and Technology’s secure software development framework and Linux Foundation’s open source security foundation.
  5. As part of the allowable costs that grantees can request to help them meet research sharing requirements, US federal agencies should include reasonable costs associated with developing and maintaining research software needed to maximize data accessibility and reusability for as long as it is practical. Federal agencies should ensure that such costs are additive to proposal budgets, rather than consuming funds that would otherwise go to the research itself.
  6. US federal agencies should encourage grantees to apply licenses to their research software that facilitate replication, reuse, and extensibility, while balancing individual and institutional intellectual property considerations. Agencies can point grantees to guidance on desirable criteria for distribution terms and approved licenses from the Open Source Initiative.
  7. In parallel with the actions listed above that can be immediately incorporated into new public access plans, US federal agencies should also explore long-term strategies to elevate research software to co-equal research outputs and further incentivize its maintenance and sharing to improve research reproducibility, replicability, and integrity.

https://doi.org/10.1371/journal.pbio.3002204

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Replayed: Essential Writings on Software Preservation and Game Histories


Since the early 2000s, Henry Lowood has led or had a key role in numerous initiatives devoted to the preservation and documentation of virtual worlds, digital games, and interactive simulations, establishing himself as a major scholar in the field of game studies. . . . Replayed consolidates Lowood’s far-flung and significant publications on these subjects into a single volume.

https://www.press.jhu.edu/books/title/12805/replayed

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Advancing Software Citation Implementation (Software Citation Workshop 2022)"


Software is foundationally important to scientific and social progress, however, traditional acknowledgment of the use of others’ work has not adapted in step with the rapid development and use of software in research. This report outlines a series of collaborative discussions that brought together an international group of stakeholders and experts representing many communities, forms of labor, and expertise. Participants addressed specific challenges about software citation that have so far gone unresolved. The discussions took place in summer 2022 both online and in-person and involved a total of 51 participants. The activities described in this paper were intended to identify and prioritize specific software citation problems, develop (potential) interventions, and lay out a series of mutually supporting approaches to address them. The outcomes of this report will be useful for the GLAM (Galleries, Libraries, Archives, Museums) community, repository managers and curators, research software developers, and publishers.

https://arxiv.org/abs/2302.07500v1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Sustainable Infrastructure Concept for Improved Accessibility, Reusability, and Archival of Research Software "


Research software is an integral part of most research today and it is widely accepted that research software artifacts should be accessible and reproducible. However, the sustainable archival of research software artifacts is an ongoing effort. We identify research software artifacts as snapshots of the current state of research and an integral part of a sustainable cycle of software development, research, and publication. We develop requirements and recommendations to improve the archival, access, and reuse of research software artifacts based on installable, configurable, extensible research software, and sustainable public open-access infrastructure. The described goal is to enable the reuse and exploration of research software beyond published research results, in parallel with reproducibility efforts, and in line with the FAIR principles for data and software. Research software artifacts can be reused in varying scenarios. To this end, we design a multi-modal representation concept supporting multiple reuse scenarios. We identify types of research software artifacts that can be viewed as different modes of the same software-based research result, for example, installation-free configurable browser-based apps to containerized environments, descriptions in journal publications and software documentation, or source code with installation instructions. We discuss how the sustainability and reuse of research software are enhanced or enabled by a suitable archive infrastructure. Finally, at the example of a pilot project at the University of Stuttgart, Germany—a collaborative effort between research software developers and infrastructure providers—we outline practical challenges and experiences

https://arxiv.org/abs/2301.12830

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

A Preservationist’s Guide to the DMCA Exemption for Software Preservation, 2nd Edition


In late 2021, the Library of Congress adopted several exemptions to the Digital Millennium Copyright Act (DMCA) provision prohibiting circumvention of technological measures that control access to copyrighted works. In other words, they created a set of exceptions to the general legal rule against cracking digital locks on things like DVDs, software, and video games. The exemptions are set out in regulations published by the Copyright Office. They went into effect on October 28, 2021 and last until October 28th, 2024. This guide is intended to help preservationists determine whether their activities are protected by the new exemptions. It includes important updates to the first edition to reflect changes in the rule to allow offsite access to non-game software, along with a few other technical changes.

https://doi.org/10.5281/zenodo.7328908

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"How Often Do Cancer Researchers Make Their Data and Code Available and What Factors Are Associated with Sharing"


One in five studies declared data were publicly available (59/306, 19%, 95% CI: 15–24%). However, when data availability was investigated this percentage dropped to 16% (49/306, 95% CI: 12–20%), and then to less than 1% (1/306, 95% CI: 0–2%) when data were checked for compliance with key FAIR principles. While only 4% of articles that used inferential statistics reported code to be available (10/274, 95% CI: 2–6%), the odds of reporting code to be available were 5.6 times higher for researchers who shared data.

https://doi.org/10.1186/s12916-022-02644-2

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Supporting Software Preservation Services in Research and Memory Organizations


Supporting Software Preservation Services in Research and Memory Organizationsidentifies concepts, skill sets, barriers, and future directions related to software preservation work. Although definitions of "software" can vary across preservation contexts, the study found that there appears to be wide support for inter-organizational collaboration in software preservation. The report includes 13 recommendations for broadening representation in the field, defining the field, networking and community building, informal and formal learning, and implementing shared infrastructures and model practices.

https://cutt.ly/4NJHcoF

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Who Writes Scholarly Code?"


This paper presents original research about the behaviours, histories, demographics, and motivations of scholars who code, specifically how they interact with version control systems locally and on the Web. By understanding patrons through multiple lenses—daily productivity habits, motivations, and scholarly needs—librarians and archivists can tailor services for software management, curation, and long-term reuse, raising the possibility for long-term reproducibility of a multitude of scholarship.

http://www.ijdc.net/article/view/839

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "A Perspective on Computational Research Support Programs in the Library: More than 20 Years of Data from Stanford University Libraries"


Presentation of data is a major component to academic research. However, programming languages, computational tools, and methods for exploring and analyzing data can be time consuming and frustrating to learn and finding help with these stages of the broader research process can be daunting. In this work, we highlight the impacts that computational research support programs housed in library contexts can have for fulfilling gaps in student, staff, and faculty research needs. The archival history of one such organization, Software and Services for Data Science (SSDS) in the Stanford University Cecil H. Green Library, is used to outline challenges faced by social sciences and humanities researchers from the 1980s to the present day.

https://doi.org/10.1177/09610006221124619

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Introducing the FAIR Principles for Research Software"


The FAIR for Research Software (FAIR4RS) Working Group has adapted the FAIR Guiding Principles to create the FAIR Principles for Research Software (FAIR4RS Principles). The contents and context of the FAIR4RS Principles are summarised here to provide the basis for discussion of their adoption. Examples of implementation by organisations are provided to share information on how to maximise the value of research outputs, and to encourage others to amplify the importance and impact of this work.

https://doi.org/10.1038/s41597-022-01710-x

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Nine Best Practices for Research Software Registries and Repositories"


Scientific software registries and repositories improve software findability and research transparency, provide information for software citations, and foster preservation of computational methods in a wide range of disciplines. Registries and repositories play a critical role by supporting research reproducibility and replicability, but developing them takes effort and few guidelines are available to help prospective creators of these resources. To address this need, the FORCE11 Software Citation Implementation Working Group convened a Task Force to distill the experiences of the managers of existing resources in setting expectations for all stakeholders. In this article, we describe the resultant best practices which include defining the scope, policies, and rules that govern individual registries and repositories, along with the background, examples, and collaborative work that went into their development.

https://doi.org/10.7717/peerj-cs.1023

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |