Open Source Software – DigitalKoans

“CODE beyond FAIR”

FAIR principles are a set of guidelines aiming at simplifying the distribution of scientific data to enhance reuse and reproducibility. This article focuses on research software, which significantly differs from data through its living nature, and its relationship with free and open-source software. Based on the second French plan for Open Science, we provide a tiered roadmap to improve the state of research software, which is inclusive to all stakeholders in the research software ecosystem: scientific staff, but also institutions, funders, libraries and publishers.

https://inria.hal.science/hal-04930405

“Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs”

Paywalls, licenses and copyright rules often restrict the broad dissemination and reuse of scientific knowledge. We take the position that it is both legally and technically feasible to extract the scientific knowledge in scholarly texts. Current methods, like text embeddings, fail to reliably preserve factual content, and simple paraphrasing may not be legally sound. We urge the community to adopt a new idea: convert scholarly documents into Knowledge Units using LLMs. These units use structured data capturing entities, attributes and relationships without stylistic content. We provide evidence that Knowledge Units: (1) form a legally defensible framework for sharing knowledge from copyrighted research texts, based on legal analyses of German copyright law and U.S. Fair Use doctrine, and (2) preserve most (~95%) factual knowledge from original text, measured by MCQ performance on facts from the original copyrighted text across four research domains. Freeing scientific knowledge from copyright promises transformative benefits for scientific research and education by allowing language models to reuse important facts from copyrighted text. To support this, we share open-source tools for converting research documents into Knowledge Units. Overall, our work posits the feasibility of democratizing access to scientific knowledge while respecting copyright.

https://arxiv.org/abs/2502.19413

“Streamlining Electronic Theses and Dissertations Processing Procedures at the University of Alabama Libraries”

This article provides a brief history of the ETD processing procedures at The University of Alabama from 2010 to 2021, a detailed description of the revised workflow, and a discussion of the improvements made. The workflow utilizes Python scripts and MarcEdit mapping and task files. All scripts and files are available in a GitHub repository for anyone to use and modify at https://github.com/bpclark2/UA_ETD.

https://doi.org/10.1080/07317131.2025.2467571

"New ‘Fair Source’ Movement Aims to Bridge the Gap Between Open Source and Proprietary Licensing"

Key principles of the model include publicly available source code, allowing third-party use and modification with “minimal restrictions,” and a delayed open-source publication clause, where the software transitions to a true open-source license after a predefined period (two years under Sentry’s Functional Source License).

https://tinyurl.com/ypswkw3j

"The Living Library: A Process-Based Tool for Open Literature Review, Probing the Boundaries of Open Science"

In this paper, we present a new tool for open science research, the Living Library. The Living Library provides an online platform and methodological framework for open, continuous literature reviewing. As a research medium, it explores what openness means in light of the human dimension and interpretive nature of engaging with societal questions. As a tool, the Living Library allows researchers to collectively sort, dynamically interpret and openly discuss the evolving literature on a topic of interest. The interface is built around a timeline along which articles can be filtered, themes with which articles are coded, and an open researcher logbook that documents the development of the library. The first rendition of a Living Library can be found via this link: https://eduvision-living-library.web.app/, and the code to develop your own Living Library can be found via this link: https://github.com/Simon-Dirks/living-library.

https://doi.org/10.1007/s43545-024-00964-z

"On the Modification and Revocation of Open Source Licences"

Historically, open source commitments have been deemed irrevocable once materials are released under open source licenses. In this paper, the authors argue for the creation of a subset of rights that allows open source contributors to force users to (i) update to the most recent version of a model, (ii) accept new use case restrictions, or even (iii) cease using the software entirely. While this would be a departure from the traditional open source approach, the legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses. Recent legislative changes have also opened the door to liability of open source contributors in certain cases. The authors believe that contributors would welcome the ability to ensure that downstream users are implementing updates that address issues like bias, guardrail workarounds or adversarial attacks on their contributions. Finally, this paper addresses how this license category would interplay with RAIL licenses, and how it should be operationalized and adopted by key stakeholders such as OSS platforms and scanning tools.

https://arxiv.org/abs/2407.13064

"HERITRACE: Tracing Evolution and Bridging Data for Streamlined Curatorial Work in the GLAM Domain"

HERITRACE is a semantic data management system tailored for the GLAM sector. It is engineered to streamline data curation for non-technical users while also offering an efficient administrative interface for technical staff. The paper compares HERITRACE with other established platforms such as OmekaS, Semantic MediaWiki, Research Space, and CLEF, emphasizing its advantages in user friendliness, provenance management, change tracking, customization capabilities, and data integration. The system leverages SHACL for data modeling and employs the OpenCitations Data Model (OCDM) for provenance and change tracking, ensuring a harmonious blend of advanced technical features and user accessibility. Future developments include the integration of a robust authentication system and the expansion of data compatibility via the RDF Mapping Language (RML), enhancing HERITRACE’s utility in digital heritage management.

https://arxiv.org/abs/2402.00477

"Stanford University Adopts FOLIO Library Services Platform"

Stanford’s FOLIO upgrade marks the first time Stanford has migrated to a new library services platform in over 20 years. To move from two separate legacy software systems to FOLIO, Stanford’s Library Systems team migrated bibliographic and holdings data for over 12 million library items, along with data for orders, patrons, loans, and requests. Stanford’s Digital Library Systems and Services team developed several integrations between FOLIO and other systems, including Stanford’s Searchworks discovery layer and a custom internal tool for managing vendor-supplied bibliographic data.

Most of the over 100 FOLIO libraries worldwide rely on a vendor to host the FOLIO software. Stanford opted for a self-hosting model, setting up a local environment that runs FOLIO.

In another notable aspect of Stanford’s FOLIO migration, five major library units (Stanford Libraries, Graduate School of Business Library, Hoover Institution Library and Archives, Lane Medical Library, and Robert Crown Law Library) collaborated to harmonize workflows in FOLIO and to train 400 staff members in the technical and patron-facing service areas. . . .

Caia Software & Solutions developed a robust remote storage management integration to meet Stanford’s remote storage requirements that are not handled out-of-the-box in FOLIO. The integration automatically updates FOLIO inventory records as items are moved in and out of Stanford’s remote storage facility using the CaiaSoft storage management application.

https://tinyurl.com/yc5zvv83

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "DMPFrame: A Conceptual Metadata Framework for Data Management Plans"

We have examined 12 open-source DMP tools, in particular, to evaluate the metadata adopted by these tools. The current study spots and highlights the gaps in the DMP metadata management in DMP tools and suggests DMPFrame as a conceptual framework addressing such gaps to improve the existing tools’ DMP metadata management practices. Based on the examined DMP tool’s metadata elements analysis and mapping, DMPFrame manages DMP metadata under 6 categories, namely, contributors, project, funding, organization, DMP, and output. The current study also suggests a systematic workflow that DMP tools could incorporate for metadata creation for DMPs.

https://doi.org/10.1080/19386389.2023.2268474

"Creating a Scholarly API Cookbook: Supporting Library Users with Programmatic Access to Information"

Scholarly web-based application programming interfaces (APIs) allow users to interact with information and data programmatically. Interacting with information programmatically allows users to create advanced information query workflows and quickly access machine-readable data for downstream computations. With the growing availability of scholarly APIs from open and commercial library databases, supporting access to information via an API has become a key support area for research data services in libraries. This article describes our efforts with supporting API access through the development of an online Scholarly API Cookbook. The Cookbook contains code recipes (i.e., tutorials) for getting started with 10 different scholarly APIs, including for example, Scopus, World Bank, and PubMed. API tutorials are available in Python, Bash, Matlab, and Mathematica. A tutorial for interacting with library catalog data programmatically via Z39.50 is also included, as traditional library catalog metadata is rarely available via an API. In addition to describing the Scholarly API Cookbook content, we discuss our experiences building a student research data services programming team, challenges we encountered, and ideas to improve the Cookbook. The University of Alabama Libraries Scholarly API Cookbook is freely available and hosted on GitHub. All code within the API Cookbook is licensed with the permissive MIT license, and as a result, users are free to reuse and adapt the code in their teaching and research.

https://tinyurl.com/93essmxj

"Introducing Open Data Editor (beta): Towards a No-Code Data App for Everyone "

Intuitive Data Editing: Open Data Editor (beta) provides a user-friendly, spreadsheet-like interface that allows you to view, edit, and validate your data effortlessly.

Data Transformation: Easily transform your data from one format to another with a wide range of supported data formats, including CSV, Excel, JSON, and more.

Data Validation: Ensure data quality and consistency with built-in validation checks that generate a visual validation report, making it super easy for you to clean your data.

Schema Management: Define and manage data schemas to ensure data consistency and compliance with standards.

Data Publishing: Seamlessly publish your data to the web or data portals. It is easy to publish the processed data to CKAN, Github and Zenodo with a single button click, making it accessible to a wider audience and increasing its impact.

Generative AI: Optionally add a generative AI provider to unlock many features based on chat-based language models. The feature is currently limited to OpenAI, but more providers will be added soon.

https://tinyurl.com/2xwcp87x

"Finding the Right Platform: A Crosswalk of Academy-Owned and Open-Source Digital Publishing Platforms"

A key responsibility for many library publishers is to collaborate with authors to determine the best mechanisms for sharing and publishing research. Librarians are often asked to assist with a wide range of research outputs and publication types, including eBooks, digital humanities (DH) projects, scholarly journals, archival and thematic collections, and community projects. These projects can exist on a variety of platforms both for profit and academy owned. Additionally, over the past decade, more and more academy owned platforms have been created to support both library publishing programs. Library publishers who wish to emphasize open access and open-source publishing can feel overwhelmed by the proliferation of available academy-owned or -affiliated publishing platforms. For many of these platforms, documentation exists but can be difficult to locate and interpret. While experienced users can usually find and evaluate the available resources for a particular platform, this kind of documentation is often less useful to authors and librarians who are just starting a new publishing project and want to determine if a given platform will work for them. Because of the challenges involved in identifying and evaluating the various platforms, we created this comparative crosswalk to help library publishers (and potentially authors) determine which platforms are right for their services and authors’ needs.

https://hcommons.org/deposits/item/hc:59231/

Digital Scholarship Has Released the Artificial Intelligence and Libraries Bibliography

The Artificial Intelligence and Libraries Bibliography includes over 125 selected English-language articles and books that are useful in understanding how libraries are exploring and adopting modern artificial intelligence (AI) technologies. It covers works from January 2018 through August 2023. It includes a Google Translate link. The bibliography is available as a website and a website PDF with live links.

Libraries have been exploring AI technology for a long time. In particular, there was an active period of experimentation from the mid-1980s through the mid-1990s that primarily focused on the use of expert systems. Many projects used expert system shells, which simplified development; however, some projects also used AI languages, such as Prolog. This period produced a significant number of library-related AI papers.

Subsequently, library interest in AI diminished until around 2018, when research activity increased.

The public release of generative AI systems in late 2022, such as ChatGPT, sparked a strong upsurge of interest in them and a rush to utilize their capabilities. Since these systems are relatively easy to use, this development may result in a significant new wave of library-oriented AI activity.

https://digital-scholarship.org/ai/ai-libraries.htm

"Creating a Full Multitenant Back End User Experience in Omeka S with the Teams Module"

When Omeka S appeared as a beta release in 2016, it offered the opportunity for researchers or larger organizations to publish multiple Omeka sites from the same installation. Multisite functionality was and continues to be a major advance for what had become the premiere platform for scholarly digital exhibits produced by libraries, museums, researchers, and students. However, while geared to larger institutional contexts, Omeka S poses some user experience challenges on the back end for larger organizations with numerous users creating different sites. These challenges include a "cluttered" effect for many users seeing resources they do not need to access and data integrity challenges due to the possibility of users editing resources that other users need in their current state. The University of Illinois Library, drawing on two local use cases as well as two additional external use cases, developed the Teams module to address these challenges. This article describes the needs leading to the decision to create the module, the project requirement gathering process, and the implementation and ongoing development of Teams. The module and findings are likely to be of interest to other institutions adopting Omeka S but also, more generally, to libraries seeking to contribute successfully to larger open-source initiatives.

https://journal.code4lib.org/articles/17389

"A Very Small Pond: Discovery Systems That Can Be Used with FOLIO in Academic Libraries"

FOLIO, an open source library services platform, does not have a front end patron interface for searching and using library materials. Any library installing FOLIO will need at least one other software to perform those functions. This article evaluates which systems, in a limited marketplace, are available for academic libraries to use with FOLIO.

https://journal.code4lib.org/articles/17433

"Introducing the Open Resource Sharing Coalition (OpenRS)"

The Open Library Foundation (OLF) is introducing the Open Resource Sharing Coalition (OpenRS), a resource sharing initiative created in partnership with library consortia, open source developers, and vendors. OpenRS is a heterogeneous resource sharing system that is ILS and Discovery agnostic and accommodates the full spectrum of mediated and unmediated resource sharing.

OpenRS acts upon a "consortia first" mentality, striving to provide libraries with the tools needed for robust and extended functionality for resource sharing. The project will focus on developing and implementing software systems, protocols, and best practices that foster collaboration and support various library services, including seamless unmediated intra-consortial borrowing functionality and expanded sharing across multiple consortia. The software will provide a containerized code base configured for ease of deployment, maintenance, and upgrades. Libraries and consortia can choose to host the service locally or with a third party. . . .

While yet to be an official project, OLF is expected to approve the OpenRS charter by the end of August. An official web presence will be added to the OLF site soon. Core OpenRS functionality for direct consortial borrowing will be rolled out as part of the MOBIUS release in May 2024. Additional features and functionality will be determined based on coalition feedback and implemented over the coming months and years.

https://tinyurl.com/5n8b2yxx

"The Future of Open Source Is Still Very Much in Flux"

Today, 96% of all code bases incorporate open-source software. GitHub, the biggest platform for the open-source community, is used by more than 100 million developers worldwide. The Biden administration’s Securing Open Source Software Act of 2022 publicly recognized open-source software as critical economic and security infrastructure. Even AWS, Amazon’s money-making cloud arm, supports the development and maintenance of open-source software; it committed its portfolio of patents to an open use community in December of last year. Over the last two years, while public trust in private technology companies has plummeted, organizations including Google, Spotify, the Ford Foundation, Bloomberg, and NASA have established new funding for open-source projects and their counterparts in open science efforts—an extension of the same values applied to scientific research.

https://tinyurl.com/4ksns2ha

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"What Those Responsible for Open Infrastructure in Scholarly Communication Can Do about Possibly Predatory Practices"

This chapter presents a three-phase analysis of the 521 journals that use the open source publishing platform Open Journal Systems (OJS) while appearing on Beall’s list of predatory publishers and journals and/or in Cabells Predatory Reports, both of which purport to identify journals that charge authors article processing fees (APC) to publish in the pretense of a peer-reviewed journal. . . . The first phase involved the researchers reaching out to publishers and editors on Beall’s list using OJS; the second phase involves determining the extent to which journals using OJS appeared on the two predatory lists, and the third reports on a new system, involving trade organizations, such ORCID and Crossref, for authenticating journal practices.

https://tinyurl.com/2xwb94ue

"2023 Library Systems Report: The Advance of Open Systems"

Interest in open systems has been growing within the library world for at least 15 years, and recent procurements reflect important breakthroughs. The selection of the open source library services platform (LSP) FOLIO by Library of Congress (LC), the MOBIUS consortium, the National Library of Australia, and others has solidified FOLIO’s position as a major competitor in the market. . . .

Most libraries still use proprietary software for their core systems. In the US, about 10% of academic libraries and 17% of public libraries use an open source integrated library system (ILS). But the barriers to these products—real and perceived—have largely collapsed. Functionality gaps have narrowed across major open source products like Koha, Evergreen, and now FOLIO, after long periods of development.

https://bit.ly/3nh8Tdl

With Open Source Software: "How to Build a Publishers’ Catalogue"

As a consortium of six open access presses, ScholarLed had a use case for a publishers’ catalogue that would present all their recent book publications in one catalogue. . . . Fortunately all the presses have included the metadata for their monograph publications in Thoth, the open metadata management and dissemination platform that has been produced as another COPIM [Community-led Open Publication Infrastructures for Monographs] output. This made it possible to easily conceive of a catalogue published as both a website and a PDF that pulls in and arranges the bibliographic metadata automatically and that can be updated on a regular basis without manual intervention. . . .

Our computational publishing model and workflow allowed us to put together this catalogue prototype very easily using only a few pieces of readily-available open source software: Quarto, Jupyter Notebook, and Git. This provided us an instant framework for web publication that didn’t require editing any HTML or CSS. . . .

Our ScholarLed publishers’ catalogue offers a working prototype of an automatically-updated book that retrieves the data for its content directly from an API. It does so with readily available open source software that can be installed with relative ease by anyone who wants to use this model to create a publication with computational elements.

http://bit.ly/3m1exzz

Only 10% Fully Understand "Preprint": "Framing COVID-19 Preprint Research as Uncertain: A Mixed-Method Study of Public Reactions"

Unlike hedging, preprint disclosure had no impact on audience message evaluations, nor vaccine attitudes and intentions. In one sense, this is a positive finding in that transparency about preprint status is unlikely to produce negative public reactions. Yet a likely explanation for the null effects is that most participants lacked the knowledge to differentiate between preprints and peer-reviewed research and did not understand this disclosure as an indicator of preliminary science. The qualitative data supported this explanation. When asked how they interpret the term "preprint" when they see it in a scientific news article, participants’ responses indicated that most had a limited understanding of the concept, even among those who received the preprint disclosure message with a brief explanation of the term. In total, only 10% of participants provided definitions of preprint that aligned with those accepted by the scholarly community. Only 15% described the term as an indicator of uncertain or preliminary evidence.

https://doi.org/10.1080/10410236.2023.2164954

"Guest Post — Scholarly Publishing as a Global Endeavor: Leveraging Open Source Software for Bibliodiversity "

The headline numbers for OJS [Open Journal Systems] in 2021 indicate that 1.46m articles were published by 34,071 active journals on approximately 12,000 publisher and institutional installations. To give context to these numbers, Elsevier’s portfolio published ~600,000 articles in ~2,700 journals in the same year. These past few years have seen a significant acceleration in the proliferation of the number of journals and the number of articles published on open source software.

bit.ly/3XAvduC

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Beyond Web of Science and Scopus There Is Already an Open Bibliodiverse World of Research — We Ignore It at Our Peril"

Discussing their analysis of a new dataset of journals published via the Open Journals Systems publishing platform, Saurabh Khanna, Jon Ball, Juan Pablo Alperin and John Willinsky argue that rather than being an aspiration an open, regional and bibliodiverse publishing ecosystem is already in existence.

bit.ly/3XlXK6J

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"FOSS Could Be an Unintended Victim of EU Crusade to Make Software More Secure"

But FOSS is in the most danger. The underlying assumption of the regulation is that cybersecurity exists in the digital market like fire resistance does in that for soft furnishings. Putting regulatory cost burdens on a part of the market with no revenue and no gatekeeping on its distribution channels cannot work; there are no prices to increase to absorb compliance costs and no tap to turn off to keep the stuff off the market.

bit.ly/40RBepA

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Designing Digital Discovery and Access Systems for Archival Description"

Archival description is often misunderstood by librarians, administrators, and technologists in ways that have seriously hindered the development of access and discovery systems. It is not widely understood that there is currently no off-the-shelf system that provides discovery and access to digital materials using archival methods. This article is an overview of the core differences between archival and bibliographic description, and discusses how to design access systems for born-digital and digitized materials using the affordances of archival metadata. It offers a custom indexer as a working example that adds the full text of digital content to an Arclight instance and argues that the extensibility of archival description makes it a perfect match for automated description. Finally, it argues that building archives-first discovery systems allows us to use our descriptive labor more thoughtfully, better enable digitization on demand, and overall make a larger volume of cultural heritage materials available online.

bit.ly/3DhKmcC

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |