“Decoding Virtual Chats: NLP Insights Into Academic Library Services.”


This research applies a machine learning (ML) tool to the complete set of transcripts from a research university’s chat reference service (2017–2022) to examine evolving trends and patron needs in the library reference service. The study has two key objectives: 1) demonstrating ML’s effectiveness in the academic library setting, and 2) assessing the impact of COVID-19 on chat reference needs. A text classification model, trained on 1.5 % of the sample, achieves a 75 % accuracy match with human annotations

https://doi.org/10.1016/j.lisr.2025.101344

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs”


Paywalls, licenses and copyright rules often restrict the broad dissemination and reuse of scientific knowledge. We take the position that it is both legally and technically feasible to extract the scientific knowledge in scholarly texts. Current methods, like text embeddings, fail to reliably preserve factual content, and simple paraphrasing may not be legally sound. We urge the community to adopt a new idea: convert scholarly documents into Knowledge Units using LLMs. These units use structured data capturing entities, attributes and relationships without stylistic content. We provide evidence that Knowledge Units: (1) form a legally defensible framework for sharing knowledge from copyrighted research texts, based on legal analyses of German copyright law and U.S. Fair Use doctrine, and (2) preserve most (~95%) factual knowledge from original text, measured by MCQ performance on facts from the original copyrighted text across four research domains. Freeing scientific knowledge from copyright promises transformative benefits for scientific research and education by allowing language models to reuse important facts from copyrighted text. To support this, we share open-source tools for converting research documents into Knowledge Units. Overall, our work posits the feasibility of democratizing access to scientific knowledge while respecting copyright.

https://arxiv.org/abs/2502.19413

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Building Trustworthy AI Solutions: Integrating Artificial Intelligence Literacy into Records Management and Archival Systems”


This paper explores the essential role of Artificial Intelligence (AI) competencies and literacy in the fields of records management and archival practices, within the framework of the InterPARES Trust AI project. . . . The study employs two complementary approaches: (1) a detailed competency framework developed through literature reviews, interviews with archival professionals who have applied AI to the processing of records, and validation workshops with practitioners; and (2) a comprehensive AI literacy framework derived from multiple case studies and theoretical discussions. . . . Findings indicate that archival professionals can leverage AI in their work practices by acquiring basic AI literacy, practical AI skills, data-related skills, tool-testing and evaluation, adaptation of AI to their workflows, and by actively engaging in collaborative projects with information technology (IT) developers.

https://doi.org/10.48550/arXiv.2307.14852

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Pew Research Center: “U.S. Workers Are More Worried Than Hopeful About Future AI Use in the Workplace”


About half of workers (52%) say they’re worried about the future impact of AI use in the workplace, and 32% think it will lead to fewer job opportunities for them in the long run, according to a new Pew Research Center survey.

And while 36% of workers also say they feel hopeful about how AI may be used in the workplace in the future, a similar share (33%) say they feel overwhelmed.

https://tinyurl.com/3kcnwbnu

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: “Gemini, & Copilot: Using Generative AI as a Tool for Information Literacy Instruction”


In this paper, the author demonstrates their experiences using generative AI to both assist in developing class activity ideas and in facilitating appropriate student use of generative AI in an information literacy course. Attention is given to emphasizing improper uses of generative AI, specifically within the research process, and how the tools may instead be used in an ethical and useful manner to assist with brainstorming research topics. . . The author describes the activities in detail, including how generative AI was used to assist in forming ideas for an interactive lesson to demonstrate various applications of the technology.

https://doi.org/10.1080/02763877.2025.2465416

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Challenges of Responsible AI in Practice: Scoping Review and Recommended Actions”


Responsible AI (RAI) guidelines aim to ensure that AI systems respect democratic values. While a step in the right direction, they currently fail to impact practice. Our work discusses reasons for this lack of impact and clusters them into five areas: (1) the abstract nature of RAI guidelines, (2) the problem of selecting and reconciling values, (3) the difficulty of operationalising RAI success metrics, (4) the fragmentation of the AI pipeline, and (5) the lack of internal advocacy and accountability. Afterwards, we introduce a number of approaches to RAI from a range of disciplines, exploring their potential as solutions to the identified challenges. We anchor these solutions in practice through concrete examples, bridging the gap between the theoretical considerations of RAI and on-the-ground processes that currently shape how AI systems are built. Our work considers the socio-technical nature of RAI limitations and the resulting necessity of producing socio-technical solutions.

https://doi.org/10.1007/s00146-024-01880-9

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Data Stewardship Decoded: Mapping Its Diverse Manifestations and Emerging Relevance at a Time of AI”


Data stewardship has become a critical component of modern data governance, especially with the growing use of artificial intelligence (AI). Despite its increasing importance, the concept of data stewardship remains ambiguous and varies in its application. This paper explores four distinct manifestations of data stewardship to clarify its emerging position in the data governance landscape. These manifestations include a) data stewardship as a set of competencies and skills, b) a function or role within organizations, c) an intermediary organization facilitating collaborations, and d) a set of guiding principles. The paper subsequently outlines the core competencies required for effective data stewardship, explains the distinction between data stewards and Chief Data Officers (CDOs), and details the intermediary role of stewards in bridging gaps between data holders and external stakeholders. It also explores key principles aligned with the FAIR framework (Findable, Accessible, Interoperable, Reusable) and introduces the emerging principle of AI readiness to ensure data meets the ethical and technical requirements of AI systems. The paper emphasizes the importance of data stewardship in enhancing data collaboration, fostering public value, and managing data reuse responsibly, particularly in the era of AI. It concludes by identifying challenges and opportunities for advancing data stewardship, including the need for standardized definitions, capacity building efforts, and the creation of a professional association for data stewardship.

https://arxiv.org/abs/2502.10399

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Copyright’s Big Win in the First Decided US Artificial Intelligence Case”


Back in March of 2023, when there were only a handful of cases alleging copyright infringement for training purposes by AI companies, I predicted that we would soon have some guidance from the court in Thomson Reuters Enterprise Center GMBH and West Publishing Corp. V Ross Intelligence, Inc. Predicting the timing of court decisions is a fool’s errand, and this fool was repeatedly wrong in his predictions on timing. Nonetheless, on February 11, the Ross case did in fact become the first US decision on the merits to directly address copying to train AI. Now we have a clear decision, and it is favorable for rightsholders.

https://tinyurl.com/4amunsmf

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

U.S. Copyright Office: Identifying the Economic Implications of Artificial Intelligence for Copyright Policy


The Copyright Office released Identifying the Economic Implications of Artificial Intelligence for Copyright Policy, produced by a group of economic scholars discussing the economic issues at the intersection of artificial intelligence and copyright policy.

The group engaged in several months of substantive discussions, consultation with technical experts, and research, culminating in a daylong roundtable event.

The group’s goal was identifying the most consequential economic characteristics of AI and copyright and what factors may inform policy decisions. The roundtable discussion aimed to provide a structured and rigorous framework for considering economic evidence so that the broader economic research community can effectively answer specific questions and identify optimal policy choices.

This publication serves as a platform for articulating the ideas expressed by participants as part of the roundtable. All principal contributors submitted written materials summarizing the group’s prior discussions on a particular topic, with editorial support provided by the Office of the Chief Economist. The many ideas and views discussed in this project do not necessarily represent the views of every roundtable participant or their respective institutions. The Copyright Office does not take a position on these ideas for the purposes of this project.

https://tinyurl.com/5n7yd36r

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Generative AI: “Do We Trust Ourselves? Is the Human the Weak Link?”


Generative artificial intelligence tools are becoming ubiquitous in applications across personal, professional and educational contexts. Similar to the rise of social media technologies, this means they are becoming an embedded part of people’s lives, and individuals are using these tools for a variety of benign purposes. This article examines how existing information literacy understandings will not work for artificial intelligence literacy, and provides an example of artificial intelligence searching, demonstrating its shortcomings. Present approaches may fall short of the answer required to navigate these new information tools, and this begs the question of what comes next. The current scope of information literacy and technology necessitates a multidisciplinary approach to solving the question of ‘what to do with artificial intelligence’ and arguably most impactfully requires one to acknowledge that what has worked may no longer suffice.

https://doi.org/10.1177/03400352251315845

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

2025 EDUCAUSE AI Landscape Study: Into the Digital AI Divide


Key Findings

Strategy and Leadership

  • A larger proportion of respondents to this year’s survey agreed that “we view AI as a strategic priority” compared with last year’s respondents, at 57% and 49%, respectively.
  • “Training for faculty” (63%) and “training for staff” (56%) topped the list of the most commonly selected elements in institutions’ AI-related strategic planning efforts.
  • A mere 2% of respondents said that their institution is accommodating new AI-related costs through new sources of funding, and a plurality of executive leaders (34%) said that their institution has tended to underestimate AI-related costs.

Policies and Guidelines

  • The proportion of respondents reporting that their institution has AI-related AUPs increased from 23% last year to 39% this year, and only 13% of respondents reported that institution-wide policies have not been impacted by the emergence of AI.
  • Only 9% of respondents reported that their institution’s cybersecurity and privacy policies are adequate for addressing AI-related risks to the institution.

Use Cases

  • Teaching and learning is the functional area at the institution most focused on using AI, with particular focus on the areas of academic integrity (74%), coursework (65%), assessment practices (54%), and curriculum design (54%).
  • Two-thirds (68%) of respondents reported that students use AI “somewhat more” or “a lot more” than faculty, while only 2% reported that faculty use AI more than students, despite institutions’ strategically emphasizing faculty training over student training.

Workforce

  • A plurality of respondents reported that their institution is supporting needed AI skills by upskilling or reskilling existing faculty or staff (37%) rather than by hiring new staff (1%).
  • Asked about the AI-related skills needed among their faculty and staff, respondents highlighted “AI literacy” for both staff and faculty, as well as “boosting productivity” for staff and “best practices for teaching” for faculty.

The Digital AI Divide between Institutions

  • Respondents from smaller institutions are remarkably similar to respondents from larger institutions in their personal use of AI tools, their motivations for institutional use of AI, and their expectations and optimism about the future of AI.
  • Respondents from small and larger institutions differ notably, however, in the resources, capabilities, and practices they’re able to marshal for AI adoption.

https://tinyurl.com/yc8zpjtu

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Researchers Created an Open Rival to OpenAI’s o1 ‘Reasoning’ Model for Under $50”


S1 is based on a small, off-the-shelf AI model from Alibaba-owned Chinese AI lab Qwen, which is available to download for free. . . .

After training s1, which took less than 30 minutes using 16 Nvidia H100 GPUs, s1 achieved strong performance on certain AI benchmarks. . . . Niklas Muennighoff, a Stanford researcher who worked on the project, told TechCrunch he could rent the necessary compute today for about $20.

https://tinyurl.com/3mxwcv22

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“‘Meta Torrented over 81 TB of Data through Anna’s Archive, Despite Few Seeders’”


Freshly unsealed court documents reveal that Meta downloaded significant amounts of data from shadow libraries through Anna’s Archive. The company’s use of BitTorrent was already known, but internal email communication reveals sources and terabytes of downloaded data, as well as a struggle with limited availability and slow download speeds due to a lack of seeders.

https://tinyurl.com/yxzjtnvs

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

OpenAI Video: “Introduction to Deep Research”

From “Introducing Deep Research“:

Deep research is built for people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research. It can be equally useful for discerning shoppers looking for hyper-personalized recommendations on purchases that typically require careful research, like cars, appliances, and furniture. Every output is fully documented, with clear citations and a summary of its thinking, making it easy to reference and verify the information. It is particularly effective at finding niche, non-intuitive information that would require browsing numerous websites. Deep research frees up valuable time by allowing you to offload and expedite complex, time-intensive web research with just one query.

https://tinyurl.com/4h2sy9rt

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"DeepSeek Panic Triggers Tech Stock Sell-off as Chinese AI Tops App Store"


There are three elements of DeepSeek R1 that really shocked experts. First, the Chinese startup appears to have trained the model for only $6 million (reportedly about 3% of the cost of training o1) as a so-called “side project” while using less powerful Nvidia H800 AI-acceleration chips due to US export restrictions on cutting-edge GPUs. Secondly, it appeared just four months after OpenAI announced o1 in September 2024. Finally, and perhaps most importantly, DeepSeek released the model weights for free with an open MIT license, meaning anyone can download it, run it, and fine-tune (modify) it.

https://tinyurl.com/3e5bk3cw

For an in-depth analysis see: “China’s DeepSeek AI Model Shocks the World: Should You Sell Your Nvidia Stock?

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Every AI Copyright Lawsuit in the US, Visualized"


Over the past two years, dozens of other copyright lawsuits against AI companies have been filed at a rapid clip. . . . This wide variety of rights holders are alleging that AI companies have used their work to train what are often highly lucrative and powerful AI models in a manner that is tantamount to theft. . . . Nearly every major generative AI company has been pulled into this legal fight, including OpenAI, Meta, Microsoft, Google, Anthropic, and Nvidia.

We’ve created visualizations to help you track and contextualize which companies and rights holders are involved, where the cases have been filed, what they’re alleging, and everything else you need to know.

https://tinyurl.com/sv4ja66n

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Open Science at the Generative AI Turn: An Exploratory Analysis of Challenges and Opportunities"


Technology influences Open Science (OS) practices, because conducting science in transparent, accessible, and participatory ways requires tools and platforms for collaboration and sharing results. Due to this relationship, the characteristics of the employed technologies directly impact OS objectives. Generative Artificial Intelligence (GenAI) is increasingly used by researchers for tasks such as text refining, code generation/editing, reviewing literature, and data curation/analysis. Nevertheless, concerns about openness, transparency, and bias suggest that GenAI may benefit from greater engagement with OS. GenAI promises substantial efficiency gains but is currently fraught with limitations that could negatively impact core OS values, such as fairness, transparency, and integrity, and may harm various social actors. In this paper, we explore the possible positive and negative impacts of GenAI on OS. We use the taxonomy within the UNESCO Recommendation on Open Science to systematically explore the intersection of GenAI and OS. We conclude that using GenAI could advance key OS objectives by broadening meaningful access to knowledge, enabling efficient use of infrastructure, improving engagement of societal actors, and enhancing dialogue among knowledge systems. However, due to GenAI’s limitations, it could also compromise the integrity, equity, reproducibility, and reliability of research. Hence, sufficient checks, validation, and critical assessments are essential when incorporating GenAI into research workflows.

https://doi.org/10.1162/qss_a_00337

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Text Analysis of Archival Finding Aids; Collection Scoping and Beyond"


In this study, we examine the suitability of text analysis as a method for analyzing collection scope strengths across a repository’s physical archival holdings. We apply a tool for text analysis called Leximancer to analyze a corpus of archival finding aids to explore topical coverage. Leximancer results were highly aligned with the baseline subject heading analysis that we performed, but the concepts, themes, and co-occurring topic pairs surfaced by Leximancer suggest areas of collection strength and potential focus for new acquisitions. We discuss the potential applications of text analysis for internal library use including collection development, as well as potential implications for wider description, discovery, and access. Text analysis can accurately surface topical strengths and directly lead to insights that can inform future acquisition decisions and archival collection development policies.

https://tinyurl.com/mr45f8e7

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"News and Views: How Much Content Can AI Legally Exploit?"


Most OA licenses, even permissive ones like CC BY, require attribution. However, generative AI models inherently strip attribution from the data they process, making compliance nearly impossible. Specialist AIs might be trained to circumvent this, but the bulk of big-name gen AI tools don’t. Compliance with the most basic OA requirement of attribution is unworkable.

Additionally, while traditional licenses clearly delineate permissible use, OA licenses often depend on interpretations of “non-commercial” or “derivative” use that may vary by jurisdiction.

https://tinyurl.com/562k8kee

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"OpenAI and Others Seek New Path to Smarter AI as Current Methods Hit Limitations"


To overcome these challenges [in training AIs with enormous amounts of increasingly scarce data] researchers are exploring “test-time compute,” a technique that enhances existing AI models during the so-called “inference” phase, or when the model is being used. For example, instead of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real-time, ultimately choosing the best path forward. . . .

“It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer,” said Noam Brown, a researcher at OpenAI who worked on o1, at TED AI conference in San Francisco last month.

https://tinyurl.com/5n9bwkv6

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft"


Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

https://tinyurl.com/ymen65js

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

New Horizons in Artificial Intelligence in Libraries


This publication provides an opportunity to explore developing new library AI paradigms, including present use case practical implementation and opportunities on the horizon as well as current large ethics questions and needs for transparency, scenario planning, considerations and implications of bias as library AI systems are developed and implemented presently and for our collective future.

https://tinyurl.com/4b5juutm

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Intelligent Summaries: Will Artificial Intelligence Mark the Finale for Biomedical Literature Reviews?"


Manuscripts that only flatly summarize knowledge in a field could become superfluous, as AI-powered systems will become better and better at generating more comprehensive and updated summaries automatically. Furthermore, the use of A.I. technologies in data analysis and synthesis will greatly reduce human tasks, enabling more efficient and timely production of preliminary findings. What kind of reviews will still find room in an academic journal? It is reasonable to believe that reviews that provide critical analysis, unique interpretations of existing literature, which connect different areas, shed novel light on available data, that are aware of their human partiality, will continue to be valuable in academic journals.

https://doi.org/10.1002/leap.1648

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How ChatGPT Search (Mis)represents Publisher Content"


In total, we pulled two hundred quotes from twenty publications and asked ChatGPT to identify the sources of each quote. We observed a spectrum of accuracy in the responses: some answers were entirely correct (i.e., accurately returned the publisher, date, and URL of the block quote we shared), many were entirely wrong, and some fell somewhere in between. . . .

In total, ChatGPT returned partially or entirely incorrect responses on a hundred and fifty-three occasions, though it only acknowledged an inability to accurately respond to a query seven times. . . .

Our tests found that no publisher—regardless of degree of affiliation with OpenAI—was spared inaccurate representations of its content in ChatGPT.

https://tinyurl.com/3z9dxttv

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Publishers are Selling Papers to Train AIs — and Making Millions of Dollars"


[Roger] Schonfeld [VP of Ithaka S+R] and his colleagues launched the Generative AI Licensing Agreement Tracker in October. It includes information about licensing deals — confirmed and forthcoming — between technology companies and six major academic publishers, including Wiley, Sage and Taylor & Francis. Schonfeld says that the list documents only public agreements, and that there are probably several others that remain undisclosed. . . .

Some scholars have been apprehensive about deals being made without their knowledge on content they produced. To address this issue, a few publishers have taken steps to involve authors in the process.

https://tinyurl.com/56zwe54p

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |