Copyright – DigitalKoans

“A Master Class in Destroying Trust”

Clarivate’s dismissal of one-time purchases is alarming, but when you consider the company’s larger strategy, it makes sense. On the Q4 earnings call, Shem Tov refers to one-time purchases as “a drain.” He also says that Clarivate has “retained financial advisers to help us in evaluating strategic alternatives to unlock value. This may include divesting business units or an entire segment.” He goes on to say, “There is no guarantee that anything actionable will arise from this process,” but considering Clarivate will no longer sell books, Clarivate’s furthering its investment in data should make us wary.

https://tinyurl.com/y57tdkz3

“Litigating Fair Use”

Copyright law, and fair use specifically, starts from Congress’s statutory text, is informed by the Copyright Office’s guidance, is interpreted by the courts, and is analyzed by law professors. But litigators are not passive in this process; rather, they play an important role as well. In fact, the modern litigator often is in a uniquely good position to affect the development of fair use. These days, litigators practice all around the country, with admissions in many courts and pro hac vice appearances before others. This cross-country practice creates the opportunity—and in fact the necessity—to keep abreast of trends and splits across the various circuits, to figure out what best to argue in a given case, and thereby to hope to advance the law and their clients’ interests. Simply put, if the law of fair use is developed by case precedent, then the people whose arguments impact cases—that is, litigators—can help shape the law.

How do litigators do it? In this Article, I will discuss three of the primary tools in the litigator’s proverbial tool kit: law, facts, and persuasion. At the end of the day, these three things determine every fair use outcome. The art of litigating fair use is found in the gaps between precedent, when a litigator’s cutting-edge case is one about which reasonable minds may disagree. It is in those gaps where the litigator shines, because the law there is at its most malleable and the ability to persuade is most important. And with fair use, there are a lot of those gaps.

https://doi.org/10.52214/jla.v48i1.13531

“Music Labels Will Regret Coming for the Internet Archive, Sound Historian Says”

On Thursday, music labels sought to add nearly 500 more sound recordings to a lawsuit accusing the Internet Archive (IA) of mass copyright infringement through its Great 78 Project, which seeks to digitize all 3 million three-minute recordings published on 78 revolutions-per-minute (RPM) records from about 1898 to the 1950s.

https://tinyurl.com/y8yusa4s

“ARTificial: Why Copyright Is Not the Right Policy Tool to Deal with Generative AI”

For the sake of this discussion, let’s assume that GAI ligation is successful. How would concepts of attribution and distribution work under existing copy- right rules of compensation? Should every author whose work is present in the dataset have an equivalent claim over every single output? How would such an outcome work in practice? Here, consider again the Stable Diffusion example. The model’s training dataset, LAION 5B, is composed of “5.85 billion CLIP-fil- tered image-text pairs.”¹⁵¹ Given the massive size of the training set, it is difficult to imagine how one could trace the attribution and weight of a single work into the final end result. To do so would be like proposing that a given output image is attributable to 5.85 billion copyright interests.

https://dx.doi.org/10.2139/ssrn.5090127

“Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs”

Paywalls, licenses and copyright rules often restrict the broad dissemination and reuse of scientific knowledge. We take the position that it is both legally and technically feasible to extract the scientific knowledge in scholarly texts. Current methods, like text embeddings, fail to reliably preserve factual content, and simple paraphrasing may not be legally sound. We urge the community to adopt a new idea: convert scholarly documents into Knowledge Units using LLMs. These units use structured data capturing entities, attributes and relationships without stylistic content. We provide evidence that Knowledge Units: (1) form a legally defensible framework for sharing knowledge from copyrighted research texts, based on legal analyses of German copyright law and U.S. Fair Use doctrine, and (2) preserve most (~95%) factual knowledge from original text, measured by MCQ performance on facts from the original copyrighted text across four research domains. Freeing scientific knowledge from copyright promises transformative benefits for scientific research and education by allowing language models to reuse important facts from copyrighted text. To support this, we share open-source tools for converting research documents into Knowledge Units. Overall, our work posits the feasibility of democratizing access to scientific knowledge while respecting copyright.

https://arxiv.org/abs/2502.19413

Clarivate: “Our Letter to the Library Community”

After receiving feedback and guidance from our customers and partners, we would like to further clarify our intentions moving forward:

We remain unequivocally committed to preserving perpetual access to previously purchased Ebook Central titles.

We are committed to increased investment in Rialto as an ebook marketplace, enabling title-by-title ebook purchasing from publishers and other vendors.

We will work with vendors, such as EBSCO, to integrate with their book and purchasing platforms, to maximize choice and workflow efficiency for customers.

We will expand benchmark and collection development tools in Rialto, providing you with insights to more efficiently make book selection, purchase and access decisions.

To further support the changes announced:

We will extend the ability for customers to make perpetual purchases for both print and ebooks on all platforms, including Ebook Central, OASIS, Rialto and GOBI through June 30, 2026.

We reaffirm our commitment to always facilitate title-by-title perpetual access purchasing through the Rialto marketplace of ebooks from publishers and aggregators.

We will work with you and your vendors of choice to create migration toolkits, to make transitioning your workflows and profiles as efficient and seamless as possible.

We will provide the data and analytics you need, as well as regular updates and close communication with your local team.

https://tinyurl.com/9hbuheru

“AI Is Reigniting Decades-Old Questions Over Digital Rights, but Fair Use Prevails”

A publisher recently provided UC Berkeley’s Library with an elusive explanation for their AI ban on a subset of their licensed materials, claiming that they would “require new and different AI terms [that] would be significantly higher in price,” and that “individual client requests [would] need to be evaluated [to] determine whether or not they will be permitted.” However, when prompted to provide said new terms and price, the publisher was unable, or perhaps unwilling, to provide any additional information, noting that there is “no set pricing model or terms to share.” . . .

Charging extra to secure AI rights is likely to be cost-prohibitive due to increased financial burdens on libraries and institutions of higher education; if publishers are successful, it could lead to less academic output as researchers may have to independently foot the bill for the right to conduct research using AI.

https://tinyurl.com/42nmfwm2

“Copyright’s Big Win in the First Decided US Artificial Intelligence Case”

Back in March of 2023, when there were only a handful of cases alleging copyright infringement for training purposes by AI companies, I predicted that we would soon have some guidance from the court in Thomson Reuters Enterprise Center GMBH and West Publishing Corp. V Ross Intelligence, Inc. Predicting the timing of court decisions is a fool’s errand, and this fool was repeatedly wrong in his predictions on timing. Nonetheless, on February 11, the Ross case did in fact become the first US decision on the merits to directly address copying to train AI. Now we have a clear decision, and it is favorable for rightsholders.

https://tinyurl.com/4amunsmf

“Clarivate Unveils Transformative Subscription-Based Access Strategy for Academia”

The new strategy includes the introduction of two market-leading solutions that are now available.

ProQuest Ebooks offers. . . .

Over 700,000 Ebooks, across 10 core disciplines, plus additional essential interdisciplinary titles. . . .

The addition of Ebook Central Research Assistant, a powerful new AI tool designed to enhance student learning and streamline research.

ProQuest Digital Collections offers . . . .

Over 160 million primary source items complemented by over 2,500 full-text scholarly journals, more than 24,000 video titles, and 15 million audio tracks. . . .

[A]ccess to nine ProQuest One discipline solutions including Anthropology, Entertainment & Popular Culture, Global Studies & International Relations, History, Literature, Performing Arts, Visual Arts & Design. . . .

As part of this transformative strategy and following changes in demand from libraries, Clarivate will also phase out one-time perpetual purchases of digital collections, print and digital books for libraries. These transitions will take place throughout 2025, in close co-operation with customers.

https://tinyurl.com/3mtsr3kr

U.S. Copyright Office: Identifying the Economic Implications of Artificial Intelligence for Copyright Policy

The Copyright Office released Identifying the Economic Implications of Artificial Intelligence for Copyright Policy, produced by a group of economic scholars discussing the economic issues at the intersection of artificial intelligence and copyright policy.

The group engaged in several months of substantive discussions, consultation with technical experts, and research, culminating in a daylong roundtable event.

The group’s goal was identifying the most consequential economic characteristics of AI and copyright and what factors may inform policy decisions. The roundtable discussion aimed to provide a structured and rigorous framework for considering economic evidence so that the broader economic research community can effectively answer specific questions and identify optimal policy choices.

This publication serves as a platform for articulating the ideas expressed by participants as part of the roundtable. All principal contributors submitted written materials summarizing the group’s prior discussions on a particular topic, with editorial support provided by the Office of the Chief Economist. The many ideas and views discussed in this project do not necessarily represent the views of every roundtable participant or their respective institutions. The Copyright Office does not take a position on these ideas for the purposes of this project.

https://tinyurl.com/5n7yd36r

“Hurdles to Open Access Publishing Faced by Authors: A Scoping Literature Review from 2004 to 2023”

Over the past two decades, numerous widespread efforts and individual contributions to shift scientific publishing to open access (OA) faced a number of obstacles. Due to the complexity of knowledge production dimension and knowledge dissemination, the challenges encountered by researchers, publishers, and readers differ. While examples of such barriers exist across multiple parties, no attempt has been made to synthesize these for active researchers. Thus, this scoping review explores the barriers documented in the scientific literature that researchers encounter in their pursuit of publishing open access. After screening 1,280 relevant sources, 113 papers were included in the review. A total of 82 distinct barriers were identified and grouped into four subclusters: Practical Barriers, Lack of Competency, Sentiment, and Policy & Governance. The largest cluster in terms of barriers assigned was Sentiment, accounting for 51.2% (n=42) of all barriers identified, suggesting that perceived barriers are the strongest determinants of publishing OA, while the most frequently occurring barrier was “high article processing charges”, reported in 88 papers. Furthermore, burdens faced specifically due to the location of the researcher were identified. Understanding and acknowledging these barriers is essential for their effective elimination or mitigation.

https://doi.org/10.31219/osf.io/vzefj_v1

“‘Meta Torrented over 81 TB of Data through Anna’s Archive, Despite Few Seeders’”

Freshly unsealed court documents reveal that Meta downloaded significant amounts of data from shadow libraries through Anna’s Archive. The company’s use of BitTorrent was already known, but internal email communication reveals sources and terabytes of downloaded data, as well as a struggle with limited availability and slow download speeds due to a lack of seeders.

https://tinyurl.com/yxzjtnvs

OASPA: “Fully OA Journals Output Shrank in 2023, But Hybrid OA Made Up the Lost Ground”

The OASPA dataset shows that members collectively published almost 1.2m articles in 2023. But 2023 output grew by only 4% over 2022, which is one quarter of the previous year’s growth, and one tenth of the long-term average. . . .

Reported numbers of articles in fully OA journals [published by OASPA members] shrank for the first time in 2023. OA articles in hybrid journals continue to grow strongly, making up for the lost ground in fully OA and so total output grew overall. In 2023, the volume of articles in fully OA journals shrank by two thirds of a percent, compared with a growth of 14% the previous year. Hybrid OA articles grew by 22% in the same period, down slightly from 24% the previous year. Output grew by 4% overall, compared with 16% the previous year. . . .

In fully OA journals [published by OASPA members], the proportion of CC BY (just over 80% of output) and CC BY-NC-ND (around 10%) has been steady since 2018. CC BY fell back slightly in 2023, and that of CC BY-NC-ND grew slightly – but both by just 1 percentage point, so it’s too soon to tell if this represents a change to long-term trends. The proportion of CC BY-NC-ND licenses grew slightly: from 10% in 2021 and 2022 to 12% in 2023.

Licenses with some restrictions are significantly more prevalent in hybrid journals, although this trend is showing signs of reversing. Historically, more restrictive licenses were displacing the proportion of CC BY, which had fallen from around 75% of hybrid OA in 2014 to around 51% in 2019. However, in 2020 CC BY licenses recovered ground and now account for around 67% of Hybrid licenses (up from 62% the year before). CC BY appears to be displacing the other two Creative Commons licenses in hybrid OA. In 2023, the proportion of CC BY-NC-ND was down slightly to 23%, and CC BY-NC up slightly to 10%. CC BY now accounts for over two thirds of hybrid OA output, up from half in 2019.

https://tinyurl.com/55u5b8ue

"New Bill Aims to Block Foreign Pirate Sites in the U.S."

Pirate site blocking orders are a step closer to becoming reality in the United States after Rep. Zoe Lofgren introduced the Foreign Anti-Digital Piracy Act earlier today. Should it become law, FAPDA would allow rightsholders to obtain site blocking orders targeted at verified pirate sites, presumably run by foreign operators.

https://tinyurl.com/zcyxms22

Bill

"Every AI Copyright Lawsuit in the US, Visualized"

Over the past two years, dozens of other copyright lawsuits against AI companies have been filed at a rapid clip. . . . This wide variety of rights holders are alleging that AI companies have used their work to train what are often highly lucrative and powerful AI models in a manner that is tantamount to theft. . . . Nearly every major generative AI company has been pulled into this legal fight, including OpenAI, Meta, Microsoft, Google, Anthropic, and Nvidia.

We’ve created visualizations to help you track and contextualize which companies and rights holders are involved, where the cases have been filed, what they’re alleging, and everything else you need to know.

https://tinyurl.com/sv4ja66n

"A Primer for Applying and Interpreting Licenses for Research Data and Code"

This primer gives data curators an overview of the licenses that are commonly applied to datasets and code, familiarizes them with common requirements in institutional data policies, and makes recommendations for working with researchers who need to apply a license to their research outputs or understand a license applied to data or code they would like to reuse. While copyright issues are highly case-dependent, the introduction to the data copyright landscape and the general principles provided here can help data curators empower researchers to understand the copyright context of their own data.

https://tinyurl.com/34738m4s

"Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft"

Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

https://tinyurl.com/ymen65js

"From Black Open Access to Open Access of Color: Accepting the Diversity of Approaches towards Free Science"

The aim of this article is to shed some light on ‘black open access’ model, that still remains poorly understood and largely neglected in the literature, despite being widely adopted in practice. I give an overview of the historical development of black OA and its most important projects: Sci-Hub and Library Genesis. Arguments are provided for why the term ‘black OA’ is misleading and the term ‘RGB OA’ (red, green and blue) would better describe a diverse landscape of open access projects that emerged after 2001. While practical approaches towards OA evolved dramatically in the past 20 years, theoretical discussion is still operating the same two-color scheme of ‘green’ and ‘gold’ open access from BOAI declaration of 2001: novel approaches are either not recognized as OA at all or are neglected as ‘black’. A new and more inclusive OA declaration might be needed to account for greater diversity of approaches.

https://tinyurl.com/nha7tsxd

"Publishers are Selling Papers to Train AIs — and Making Millions of Dollars"

[Roger] Schonfeld [VP of Ithaka S+R] and his colleagues launched the Generative AI Licensing Agreement Tracker in October. It includes information about licensing deals — confirmed and forthcoming — between technology companies and six major academic publishers, including Wiley, Sage and Taylor & Francis. Schonfeld says that the list documents only public agreements, and that there are probably several others that remain undisclosed. . . .

Some scholars have been apprehensive about deals being made without their knowledge on content they produced. To address this issue, a few publishers have taken steps to involve authors in the process.

https://tinyurl.com/56zwe54p

"Internet Archive Copyright Case Ends without Supreme Court Review "

After more than four years of litigation, a closely watched copyright case over the Internet Archive’s scanning and lending of library books is finally over after Internet Archive officials decided against exercising their last option, an appeal to the Supreme Court. The deadline to file an appeal was December 3.

https://tinyurl.com/6j4ukfmp

"Video Game Preservationists Have Lost a Legal Fight to Study Games Remotely"

When video game scholars want to study games that are no longer on sale, they sometimes have to drive many hours to do it legally — and that won’t be changing anytime soon. The US Copyright Office has just denied a request from video game preservationists to let libraries, archives and museums temporarily lend individuals some virtual, remotely accessible copies of those works.

https://tinyurl.com/3sb37jn6

"‘Massive Copyright Violation’ Threatens One of the World’s Hottest AI Apps"

News Corp has officially filed a lawsuit against Perplexity AI over accusations that the startup has committed copyright infringement on a “massive scale.” . . .

Perplexity’s value proposition is instead to insert itself between search and content producers as a middleman, training its AI on copyrighted content that its chatbot will then regurgitate. . . to its own paying customers, without compensating or attributing the original content producers. . . .

https://tinyurl.com/y2h5fpeu

Perplexity

"Publishers Join with Worldwide Coalition to Condemn the Theft of Creative and Intellectual Authorship by Tech Companies for Generative AI Training"

Today, the Association of American Publishers (AAP) joined forces with more than 10,000 creators and coalition partners, including authors, musicians, actors, artists, and photographers, to condemn the theft of creative and intellectual authorship by big tech companies for use in their Generative AI models. In fact, these consumer-facing models and tools would not exist without the books, newspapers, songs, performances, and other invaluable human expressions that were—and continue to be—copied, ingested, and regenerated in blatant disregard of the law.

https://tinyurl.com/4e37e3ff

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"New ‘Fair Source’ Movement Aims to Bridge the Gap Between Open Source and Proprietary Licensing"

Key principles of the model include publicly available source code, allowing third-party use and modification with “minimal restrictions,” and a delayed open-source publication clause, where the software transitions to a true open-source license after a predefined period (two years under Sentry’s Functional Source License).

https://tinyurl.com/ypswkw3j

"CDL Decision Round Two: The Good, the Bad, and the Ugly and Why There is Still Hope OR The Reports of CDL’s Death Have Been Greatly Exaggerated"

Let me be unequivocal: libraries do not need a license to loan books, whether physical or digital. Lending legally acquired books is not illegal. Libraries are entitled to share these works, with no obligation to enter into licensing agreements or contracts beforehand. Furthermore, libraries—and their patrons—are legally permitted to make various uses of these works, including interlibrary loan, reserves, preservation, and fair use, all without needing permission from rightsholders.

This is because various exceptions in the law, including Section 108 for Libraries and Archives, ad Section 109 known as the first sale doctrine. We know that Section 109 preserves the balance between rightsholders and libraries. When a library purchases a book, it has the right to loan that work freely, without requiring additional permissions or payments to the copyright holder. A digitized version of a legally acquired book simply replaces the physical copy, not an unpurchased one in the marketplace. Any “market harm” is already factored into the initial sale, for which both the authors and publishers have been compensated.

https://tinyurl.com/3exh96bu