“AI Chatbots Need More Books to Learn From. These Libraries Are Opening Their Stacks”


Harvard’s newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter’s handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians.

https://tinyurl.com/bdzxx8r7

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Copyright Claims Board is “Ineffective and Costly,” Watchdog Groups Say”


Responding to an inquiry from the U.S. Copyright Office, a coalition of watchdog groups [including ALA, and ARL] has flagged various problems with the Copyright Claims Board. . . .

According to the groups’ analysis, the CCB has spent approximately $5.4 million in its first years of operation, while only about $75,000 has been awarded to claiming copyright holders through its decisions.

With well over 1,200 complaints. . . most of these end up being dismissed and thus far the board has only reached final determinations in 35 cases, awarding little over $2,000 in damages on average.

https://tinyurl.com/5bsa8p92

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“ARL Supports Senator Wyden’s Call for FTC Action on Digital Ownership Rights”


The Association of Research Libraries (ARL) joined a letter by Public Knowledge supporting US Senator Ron Wyden’s February 25, 2025, request for Federal Trade Commission (FTC) intervention to protect consumer rights in digital marketplaces. . . .

Senator Wyden highlighted a critical issue: consumers who “purchase” digital materials like ebooks are actually only acquiring temporary access licenses, often with significant usage restrictions. Libraries must accept these restrictions when licensing essential databases and digital resources for education and scholarship. For instance, in some cases publishers have retroactively banned AI research applications through impromptu contract addendums—even after the library and publisher signed license agreements.

ARL joins the American Library Association (ALA), Software Preservation Network (SPN), University Information Policy Offers (UIPO), Public Knowledge (PK), and other library and civil society groups in signing the letter supporting Senator Wyden’s request.

https://tinyurl.com/ywbambsx

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: “Trans-Substantive Instructors: Scholarly Communication Librarians Facilitating Communities of Practice”


While many studies examine graduate students’ understanding of the ethical implications of copyright, there is still a lack of robust literature exploring students’ awareness of ownership over the material they create and their own copyright ownership rights. To address this need, this research focuses on STEM graduate students’ understanding of copyright in various scenarios and prepares a foundation for continued investigation. In this study, researchers conducted semi-structured interviews with graduate students in STEM fields at University of Illinois Urbana-Champaign institution, including those in professional programs such as Medical School or Veterinary Medicine.

https://tinyurl.com/3v4f88x2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Navigating the Introduction of Rights Retention: Lessons From Leeds Beckett University”


We, as many other institutions did, expected backlash. This has included:

  • Refusal to accept our prior notification as a blanket declaration and an expectation that we will inform them every time we submit something which falls under rights retention.
  • A publisher stating that papers with rights retention language won’t be rejected. However, no author manuscripts may be placed under a Creative Commons license, according to the terms of their journal policies. Any authors who wish to do so can only publish under the immediate gold open access route. Authors are asked to agree to this when signing their standard subscription licensing terms.
  • Another publisher asks authors to agree, as part of their author contract, that their publishing terms take precedence over any other terms authors assert during the publishing process. Authors must also sign that they haven’t assigned rights to any other third party for the article or content that will conflict with rights granted in the publishing terms.
  • One publisher has gone even further: due to the widespread adoption of rights retention in the UK, they now require all papers authored by someone from a UK institution to be published open access.

https://tinyurl.com/yau3x6te

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“A Master Class in Destroying Trust”


Clarivate’s dismissal of one-time purchases is alarming, but when you consider the company’s larger strategy, it makes sense. On the Q4 earnings call, Shem Tov refers to one-time purchases as “a drain.” He also says that Clarivate has “retained financial advisers to help us in evaluating strategic alternatives to unlock value. This may include divesting business units or an entire segment.” He goes on to say, “There is no guarantee that anything actionable will arise from this process,” but considering Clarivate will no longer sell books, Clarivate’s furthering its investment in data should make us wary.

https://tinyurl.com/y57tdkz3

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Litigating Fair Use”


Copyright law, and fair use specifically, starts from Congress’s statutory text, is informed by the Copyright Office’s guidance, is interpreted by the courts, and is analyzed by law professors. But litigators are not passive in this process; rather, they play an important role as well. In fact, the modern litigator often is in a uniquely good position to affect the development of fair use. These days, litigators practice all around the country, with admissions in many courts and pro hac vice appearances before others. This cross-country practice creates the opportunity—and in fact the necessity—to keep abreast of trends and splits across the various circuits, to figure out what best to argue in a given case, and thereby to hope to advance the law and their clients’ interests. Simply put, if the law of fair use is developed by case precedent, then the people whose arguments impact cases—that is, litigators—can help shape the law.

How do litigators do it? In this Article, I will discuss three of the primary tools in the litigator’s proverbial tool kit: law, facts, and persuasion. At the end of the day, these three things determine every fair use outcome. The art of litigating fair use is found in the gaps between precedent, when a litigator’s cutting-edge case is one about which reasonable minds may disagree. It is in those gaps where the litigator shines, because the law there is at its most malleable and the ability to persuade is most important. And with fair use, there are a lot of those gaps.

https://doi.org/10.52214/jla.v48i1.13531

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Music Labels Will Regret Coming for the Internet Archive, Sound Historian Says”


On Thursday, music labels sought to add nearly 500 more sound recordings to a lawsuit accusing the Internet Archive (IA) of mass copyright infringement through its Great 78 Project, which seeks to digitize all 3 million three-minute recordings published on 78 revolutions-per-minute (RPM) records from about 1898 to the 1950s.

https://tinyurl.com/y8yusa4s

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“ARTificial: Why Copyright Is Not the Right Policy Tool to Deal with Generative AI”


For the sake of this discussion, let’s assume that GAI ligation is successful. How would concepts of attribution and distribution work under existing copy- right rules of compensation? Should every author whose work is present in the dataset have an equivalent claim over every single output? How would such an outcome work in practice? Here, consider again the Stable Diffusion example. The model’s training dataset, LAION 5B, is composed of “5.85 billion CLIP-fil- tered image-text pairs.”151 Given the massive size of the training set, it is difficult to imagine how one could trace the attribution and weight of a single work into the final end result. To do so would be like proposing that a given output image is attributable to 5.85 billion copyright interests.

https://dx.doi.org/10.2139/ssrn.5090127

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs”


Paywalls, licenses and copyright rules often restrict the broad dissemination and reuse of scientific knowledge. We take the position that it is both legally and technically feasible to extract the scientific knowledge in scholarly texts. Current methods, like text embeddings, fail to reliably preserve factual content, and simple paraphrasing may not be legally sound. We urge the community to adopt a new idea: convert scholarly documents into Knowledge Units using LLMs. These units use structured data capturing entities, attributes and relationships without stylistic content. We provide evidence that Knowledge Units: (1) form a legally defensible framework for sharing knowledge from copyrighted research texts, based on legal analyses of German copyright law and U.S. Fair Use doctrine, and (2) preserve most (~95%) factual knowledge from original text, measured by MCQ performance on facts from the original copyrighted text across four research domains. Freeing scientific knowledge from copyright promises transformative benefits for scientific research and education by allowing language models to reuse important facts from copyrighted text. To support this, we share open-source tools for converting research documents into Knowledge Units. Overall, our work posits the feasibility of democratizing access to scientific knowledge while respecting copyright.

https://arxiv.org/abs/2502.19413

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Clarivate: “Our Letter to the Library Community”


After receiving feedback and guidance from our customers and partners, we would like to further clarify our intentions moving forward:

  • We remain unequivocally committed to preserving perpetual access to previously purchased Ebook Central titles.
  • We are committed to increased investment in Rialto as an ebook marketplace, enabling title-by-title ebook purchasing from publishers and other vendors.
  • We will work with vendors, such as EBSCO, to integrate with their book and purchasing platforms, to maximize choice and workflow efficiency for customers.
  • We will expand benchmark and collection development tools in Rialto, providing you with insights to more efficiently make book selection, purchase and access decisions.

To further support the changes announced:

  • We will extend the ability for customers to make perpetual purchases for both print and ebooks on all platforms, including Ebook Central, OASIS, Rialto and GOBI through June 30, 2026.
  • We reaffirm our commitment to always facilitate title-by-title perpetual access purchasing through the Rialto marketplace of ebooks from publishers and aggregators.
  • We will work with you and your vendors of choice to create migration toolkits, to make transitioning your workflows and profiles as efficient and seamless as possible.
  • We will provide the data and analytics you need, as well as regular updates and close communication with your local team.

https://tinyurl.com/9hbuheru

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“AI Is Reigniting Decades-Old Questions Over Digital Rights, but Fair Use Prevails”


A publisher recently provided UC Berkeley’s Library with an elusive explanation for their AI ban on a subset of their licensed materials, claiming that they would “require new and different AI terms [that] would be significantly higher in price,” and that “individual client requests [would] need to be evaluated [to] determine whether or not they will be permitted.” However, when prompted to provide said new terms and price, the publisher was unable, or perhaps unwilling, to provide any additional information, noting that there is “no set pricing model or terms to share.” . . .

Charging extra to secure AI rights is likely to be cost-prohibitive due to increased financial burdens on libraries and institutions of higher education; if publishers are successful, it could lead to less academic output as researchers may have to independently foot the bill for the right to conduct research using AI.

https://tinyurl.com/42nmfwm2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Copyright’s Big Win in the First Decided US Artificial Intelligence Case”


Back in March of 2023, when there were only a handful of cases alleging copyright infringement for training purposes by AI companies, I predicted that we would soon have some guidance from the court in Thomson Reuters Enterprise Center GMBH and West Publishing Corp. V Ross Intelligence, Inc. Predicting the timing of court decisions is a fool’s errand, and this fool was repeatedly wrong in his predictions on timing. Nonetheless, on February 11, the Ross case did in fact become the first US decision on the merits to directly address copying to train AI. Now we have a clear decision, and it is favorable for rightsholders.

https://tinyurl.com/4amunsmf

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Clarivate Unveils Transformative Subscription-Based Access Strategy for Academia”


The new strategy includes the introduction of two market-leading solutions that are now available.

ProQuest Ebooks offers. . . .

  • Over 700,000 Ebooks, across 10 core disciplines, plus additional essential interdisciplinary titles. . . .
  • The addition of Ebook Central Research Assistant, a powerful new AI tool designed to enhance student learning and streamline research.

ProQuest Digital Collections offers . . . .

  • Over 160 million primary source items complemented by over 2,500 full-text scholarly journals, more than 24,000 video titles, and 15 million audio tracks. . . .
  • [A]ccess to nine ProQuest One discipline solutions including Anthropology, Entertainment & Popular Culture, Global Studies & International Relations, History, Literature, Performing Arts, Visual Arts & Design. . . .

As part of this transformative strategy and following changes in demand from libraries, Clarivate will also phase out one-time perpetual purchases of digital collections, print and digital books for libraries. These transitions will take place throughout 2025, in close co-operation with customers.

https://tinyurl.com/3mtsr3kr

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

U.S. Copyright Office: Identifying the Economic Implications of Artificial Intelligence for Copyright Policy


The Copyright Office released Identifying the Economic Implications of Artificial Intelligence for Copyright Policy, produced by a group of economic scholars discussing the economic issues at the intersection of artificial intelligence and copyright policy.

The group engaged in several months of substantive discussions, consultation with technical experts, and research, culminating in a daylong roundtable event.

The group’s goal was identifying the most consequential economic characteristics of AI and copyright and what factors may inform policy decisions. The roundtable discussion aimed to provide a structured and rigorous framework for considering economic evidence so that the broader economic research community can effectively answer specific questions and identify optimal policy choices.

This publication serves as a platform for articulating the ideas expressed by participants as part of the roundtable. All principal contributors submitted written materials summarizing the group’s prior discussions on a particular topic, with editorial support provided by the Office of the Chief Economist. The many ideas and views discussed in this project do not necessarily represent the views of every roundtable participant or their respective institutions. The Copyright Office does not take a position on these ideas for the purposes of this project.

https://tinyurl.com/5n7yd36r

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Hurdles to Open Access Publishing Faced by Authors: A Scoping Literature Review from 2004 to 2023”


Over the past two decades, numerous widespread efforts and individual contributions to shift scientific publishing to open access (OA) faced a number of obstacles. Due to the complexity of knowledge production dimension and knowledge dissemination, the challenges encountered by researchers, publishers, and readers differ. While examples of such barriers exist across multiple parties, no attempt has been made to synthesize these for active researchers. Thus, this scoping review explores the barriers documented in the scientific literature that researchers encounter in their pursuit of publishing open access. After screening 1,280 relevant sources, 113 papers were included in the review. A total of 82 distinct barriers were identified and grouped into four subclusters: Practical Barriers, Lack of Competency, Sentiment, and Policy & Governance. The largest cluster in terms of barriers assigned was Sentiment, accounting for 51.2% (n=42) of all barriers identified, suggesting that perceived barriers are the strongest determinants of publishing OA, while the most frequently occurring barrier was “high article processing charges”, reported in 88 papers. Furthermore, burdens faced specifically due to the location of the researcher were identified. Understanding and acknowledging these barriers is essential for their effective elimination or mitigation.

https://doi.org/10.31219/osf.io/vzefj_v1

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“‘Meta Torrented over 81 TB of Data through Anna’s Archive, Despite Few Seeders’”


Freshly unsealed court documents reveal that Meta downloaded significant amounts of data from shadow libraries through Anna’s Archive. The company’s use of BitTorrent was already known, but internal email communication reveals sources and terabytes of downloaded data, as well as a struggle with limited availability and slow download speeds due to a lack of seeders.

https://tinyurl.com/yxzjtnvs

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

OASPA: “Fully OA Journals Output Shrank in 2023, But Hybrid OA Made Up the Lost Ground”


The OASPA dataset shows that members collectively published almost 1.2m articles in 2023. But 2023 output grew by only 4% over 2022, which is one quarter of the previous year’s growth, and one tenth of the long-term average. . . .

Reported numbers of articles in fully OA journals [published by OASPA members] shrank for the first time in 2023. OA articles in hybrid journals continue to grow strongly, making up for the lost ground in fully OA and so total output grew overall. In 2023, the volume of articles in fully OA journals shrank by two thirds of a percent, compared with a growth of 14% the previous year. Hybrid OA articles grew by 22% in the same period, down slightly from 24% the previous year. Output grew by 4% overall, compared with 16% the previous year. . . .

In fully OA journals [published by OASPA members], the proportion of CC BY (just over 80% of output) and CC BY-NC-ND (around 10%) has been steady since 2018. CC BY fell back slightly in 2023, and that of CC BY-NC-ND grew slightly – but both by just 1 percentage point, so it’s too soon to tell if this represents a change to long-term trends. The proportion of CC BY-NC-ND licenses grew slightly: from 10% in 2021 and 2022 to 12% in 2023.

Licenses with some restrictions are significantly more prevalent in hybrid journals, although this trend is showing signs of reversing. Historically, more restrictive licenses were displacing the proportion of CC BY, which had fallen from around 75% of hybrid OA in 2014 to around 51% in 2019. However, in 2020 CC BY licenses recovered ground and now account for around 67% of Hybrid licenses (up from 62% the year before). CC BY appears to be displacing the other two Creative Commons licenses in hybrid OA. In 2023, the proportion of CC BY-NC-ND was down slightly to 23%, and CC BY-NC up slightly to 10%. CC BY now accounts for over two thirds of hybrid OA output, up from half in 2019.

https://tinyurl.com/55u5b8ue

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"New Bill Aims to Block Foreign Pirate Sites in the U.S."


Pirate site blocking orders are a step closer to becoming reality in the United States after Rep. Zoe Lofgren introduced the Foreign Anti-Digital Piracy Act earlier today. Should it become law, FAPDA would allow rightsholders to obtain site blocking orders targeted at verified pirate sites, presumably run by foreign operators.

https://tinyurl.com/zcyxms22

Bill

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Every AI Copyright Lawsuit in the US, Visualized"


Over the past two years, dozens of other copyright lawsuits against AI companies have been filed at a rapid clip. . . . This wide variety of rights holders are alleging that AI companies have used their work to train what are often highly lucrative and powerful AI models in a manner that is tantamount to theft. . . . Nearly every major generative AI company has been pulled into this legal fight, including OpenAI, Meta, Microsoft, Google, Anthropic, and Nvidia.

We’ve created visualizations to help you track and contextualize which companies and rights holders are involved, where the cases have been filed, what they’re alleging, and everything else you need to know.

https://tinyurl.com/sv4ja66n

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Primer for Applying and Interpreting Licenses for Research Data and Code"


This primer gives data curators an overview of the licenses that are commonly applied to datasets and code, familiarizes them with common requirements in institutional data policies, and makes recommendations for working with researchers who need to apply a license to their research outputs or understand a license applied to data or code they would like to reuse. While copyright issues are highly case-dependent, the introduction to the data copyright landscape and the general principles provided here can help data curators empower researchers to understand the copyright context of their own data.

https://tinyurl.com/34738m4s

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft"


Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

https://tinyurl.com/ymen65js

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"From Black Open Access to Open Access of Color: Accepting the Diversity of Approaches towards Free Science"


The aim of this article is to shed some light on ‘black open access’ model, that still remains poorly understood and largely neglected in the literature, despite being widely adopted in practice. I give an overview of the historical development of black OA and its most important projects: Sci-Hub and Library Genesis. Arguments are provided for why the term ‘black OA’ is misleading and the term ‘RGB OA’ (red, green and blue) would better describe a diverse landscape of open access projects that emerged after 2001. While practical approaches towards OA evolved dramatically in the past 20 years, theoretical discussion is still operating the same two-color scheme of ‘green’ and ‘gold’ open access from BOAI declaration of 2001: novel approaches are either not recognized as OA at all or are neglected as ‘black’. A new and more inclusive OA declaration might be needed to account for greater diversity of approaches.

https://tinyurl.com/nha7tsxd

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Publishers are Selling Papers to Train AIs — and Making Millions of Dollars"


[Roger] Schonfeld [VP of Ithaka S+R] and his colleagues launched the Generative AI Licensing Agreement Tracker in October. It includes information about licensing deals — confirmed and forthcoming — between technology companies and six major academic publishers, including Wiley, Sage and Taylor & Francis. Schonfeld says that the list documents only public agreements, and that there are probably several others that remain undisclosed. . . .

Some scholars have been apprehensive about deals being made without their knowledge on content they produced. To address this issue, a few publishers have taken steps to involve authors in the process.

https://tinyurl.com/56zwe54p

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Internet Archive Copyright Case Ends without Supreme Court Review "


After more than four years of litigation, a closely watched copyright case over the Internet Archive’s scanning and lending of library books is finally over after Internet Archive officials decided against exercising their last option, an appeal to the Supreme Court. The deadline to file an appeal was December 3.

https://tinyurl.com/6j4ukfmp

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |