Copyright – DigitalKoans

"Academic Authors ‘Shocked’ After Taylor & Francis Sells Access to Their Research to Microsoft AI"

One of the biggest concerns raised by Clemens [Dr Ruth Alison Clemens] is over whether it is possible for Taylor & Francis’ authors to opt out of the AI partnership with Microsoft. Clemens told The Bookseller: "There is no clarity from Taylor & Francis about whether an opt-out policy is in place or on the cards. But as they did not inform their authors about the deal in the first place, any opt-out policy is now not functional."

Taylor & Francis was paid around $10 million for the license.

https://tinyurl.com/3yyarxnj

"Tell Congress: Don’t Let Anyone Own the Law"

A large portion of the regulations we all live by (such as fire safety codes, or the national electrical code) are initially written—by industry experts, government officials, and other volunteers—under the auspices of standards development organizations (SDOs). Federal, state, or municipal policymakers then review the codes and decide whether the standard is good broad rule. The Pro Codes Act effectively endorses the claim that SDOs can "retain" copyright in codes, even after they are made law, as long as they make the codes available through a "publicly accessible" website — which means read-only, and subject to licensing limits.

https://tinyurl.com/bdrdfnr3

"On the Modification and Revocation of Open Source Licences"

Historically, open source commitments have been deemed irrevocable once materials are released under open source licenses. In this paper, the authors argue for the creation of a subset of rights that allows open source contributors to force users to (i) update to the most recent version of a model, (ii) accept new use case restrictions, or even (iii) cease using the software entirely. While this would be a departure from the traditional open source approach, the legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses. Recent legislative changes have also opened the door to liability of open source contributors in certain cases. The authors believe that contributors would welcome the ability to ensure that downstream users are implementing updates that address issues like bias, guardrail workarounds or adversarial attacks on their contributions. Finally, this paper addresses how this license category would interplay with RAIL licenses, and how it should be operationalized and adopted by key stakeholders such as OSS platforms and scanning tools.

https://arxiv.org/abs/2407.13064

AI Is Running Out of New Training Data: Consent in Crisis: The Rapid Decline of the AI Data Commons

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. . . .Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems.

https://tinyurl.com/4k56axzk

"STM Statement Regarding Unlicensed Use of STM’s Members’ Content in the Training, Development, and Operation of AI Models"

The unlicensed use of STM’s members’ content in the training, development, and operation of AI models is of great concern to STM and to our members. Because STM’s members do not share a single jurisdiction, the particular actions and practices of a given AI developer with respect to a given domestic copyright law are too varied to enumerate here. However, regardless of legal nuances among jurisdictions, STM considers the conclusion to be the same — the collection of our members’ content and its use in AI training without authorization, compensation or attribution, amounts to infringement. We support the statements about third parties’ use of content in generative AI training and development that have been made by our sister organizations the International Publishers Association and the UK Publishers Association.

https://tinyurl.com/5n6zh9sy

"Google’s Wrong Answer to the Threat of AI — Stop Indexing Content"

"Google is no longer trying to index the entire web," writes Schmalbach [Vincent Schmalbach, SEO expert]. "In fact, it’s become extremely selective, refusing to index most content. This isn’t about content creators failing to meet some arbitrary standard of quality. Rather, it’s a fundamental change in how Google approaches its role as a search engine." The default setting from now on will be not to index content unless it is genuinely unique, authoritative and has ‘brand recognition’.

https://tinyurl.com/32t98fhu

"RIAA Sues Suno & Udio AI Music Generators For ‘Trampling’ on Copyright"

Major recording labels of the RIAA have filed a pair of broadly similar copyright lawsuits against two key generative AI music services. The owners of Udio and Suno stand accused of copying the labels’ music on a massive scale and the labels suggest that they’re already on the back foot. In pre-litigation correspondence, both were ‘evasive’ on content sources before citing fair use, which the RIAA notes only arises as a defense in cases of unauthorized use of copyright works.

https://tinyurl.com/p9tnycte

"Internet Archive Forced to Remove 500,000 Books after Publishers’ Court Win"

As a result of book publishers successfully suing the Internet Archive (IA) last year, the free online library that strives to keep growing online access to books recently shrank by about 500,000 titles. . . .

To restore access, IA is now appealing, hoping to reverse the prior court’s decision by convincing the US Court of Appeals in the Second Circuit that IA’s controlled digital lending of its physical books should be considered fair use under copyright law. An April court filing shows that IA intends to argue that the publishers have no evidence that the e-book market has been harmed by the open library’s lending, and copyright law is better served by allowing IA’s lending than by preventing it. . . ./p>

Freeland [Chris Freeland, IA’s director of library service] told Ars it could take months or even more than a year before a decision is reached in the case.

While IA fights to end the injunction, its other library services continue growing, IA has said. IA "may still digitize books for preservation purposes" and "provide access to our digital collections" through interlibrary loan and other means. IA can also continue lending out-of-print and public domain books.

https://tinyurl.com/47aws7z7

"Copyright, the Right to Research and Open Science: About Time to Connect the Dots"

In this contribution, we highlight the necessity to design a research-enabling copyright framework that provides researchers with access to the necessary knowledge, information and data, and to tackle the challenges of the future.

For that purpose, we examine copyright through the prism of the Open Science movement and in the light of a "right to research " and connect both to a larger, constitutional argument which suggests that enabling research through copyright law is a pressing constitutional imperative. Based on this theoretical framework, we suggest substantive and institutional modifications to copyright law, through legislative interventions and judicial interpretations that would remove significant barriers towards open science as envisaged by European and international institutions. The conflict between the proprietary interests of rightholders and the societal interests in unhindered, purpose-bound research should, in case of doubt, be decided in favour of research and open science as crucial enablers for innovation and progress. For authors, remuneration is most of the time not the primary motivation or incentive to produce research; they can often rely on other revenues (e.g. through institutional employment) and other interest prevail, such as the broadest possible dissemination of their works that will secure them reputation and career advancement. The incentive mechanisms therefore are entirely different in the research field compared to other creative sectors, an aspect that must be taken into account when designing a research-friendly copyright system.

https://ssrn.com/abstract=4857765

"Contracts in Publishing: A Toolkit for Authors and Publishers"

A toolkit for authors and publishers provides information on copyright-related aspects and contractual options in the publishing sector. With a balanced approach considering the interests of both authors and publishers, the publication offers guidance to building basic knowledge and skills for successful publishing, co-publishing and licensing deals, targeting an audience of authors, visual artists, translators and publishers, especially in developing countries.

https://tinyurl.com/bdea9cp8

"On-Demand Circulation of Software Licenses: Checking Out Software on Patron’ Own Devices"

The Miami University Libraries (MUL) developed an open-source Software Checkout system to allow patrons to make use of software licenses owned by the library. The system takes advantage of user-based licensing under the Software as a Service (SaaS) license model and vendor-created APIs to easily and legally assign access to users. The service currently supports Adobe Creative Cloud, Final Cut Pro, and Logic Pro software. MUL has successfully used this software for three years. This article describes the expansion of offerings and the increasing use of the service over that time. Built on a model developed by Pixar for managing employee software licenses, the Software Checkout system is believed to be the first of its kind for circulating licenses to library patrons. Both this lending model and the open-source software developed by MUL are available to other libraries. This paper is intended to prompt libraries to take advantage of the legal and technical environment to expand software license sharing to other libraries.

https://tinyurl.com/yx4fyw98

UC and Authors Alliance: "Outcomes, Questions, and Answers: ‘The Right to Deposit (r2d) Uniform Guidance to Ensure Author Compliance and Public Access’"

The United States Office of Management and Budget uniform guidance for grants and agreements contains the following language in 2 CFR §200.315(b):

To the extent permitted by law, the recipient or subrecipient may copyright any work that is subject to copyright and was developed, or for which ownership was acquired, The Right to Deposit (R2D)under a Federal award. The Federal agency reserves a royalty-free, nonexclusive, and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes and to authorize others to do so. This includes the right to require recipients and subrecipients to make such works available through agency-designated public access repositories.¹

This provision, the Federal purpose license, has existed in some form since at least 1976. Some federal agencies, including the Department of Energy (DOE), have already been relying on it in the implementation of their public access plans. The Federal purpose license applies upon creation of an article, overriding all subsequent terms and licenses. It provides a highly effective, non-disruptive, elegant and familiar solution for accomplishing the ends of the Nelson memo without having to rely on individual authors and institutions to protect this right or navigate differing institutional approaches. Leveraging the Federal purpose license could also provide consistency for articles and authors subject to policies from multiple granting agencies. . . .

If the Federal purpose license has already existed for a long time, and has new language clarifying that it can be used this way, does that solve the problem for authors?

It depends on the author’s funder. Agencies have rights in federally funded research publications, but they are not uniformly using them. Only some agencies are telling their grantees in agency guidance that the Federal purpose license covers sharing publications in agency-designated repositories. Other agencies aren’t relying on their own rights from the license, and instead advising grantees to work with their publisher and secure the rights to post their publications independently. The Federal purpose license does not help authors if they don’t know about it.

https://tinyurl.com/bdfks8pu

"Supreme Court: There’s No ‘Time Limit’ on Copyright Infringement Claims"

Copyright holders can claim damages for copyright infringements that occurred years or even decades ago, the U.S. Supreme Court has clarified. In a majority decision, the Court rejected the lower court’s argument that there’s a three-year time limit for damages. Older claims are fair game, as long as the lawsuit is filed within three years of ‘discovering’ an infringement.

https://tinyurl.com/55mvn5er

Paywall: "Rethinking Copyright Exceptions in the Era of Generative AI: Balancing Innovation and Intellectual Property Protection"

In response to these identified [copyright and AI] challenges, this paper proposes a hybrid model for TDM exceptions emerges, along with recommended specific mechanisms. The model divides exceptions into noncommercial and commercial uses, providing a nuanced solution to complex copyright issues in AI training. Recommendations incorporate mandatory exceptions for noncommercial uses, an opt-out clause for commercial uses, enhanced transparency measures, and a searchable portal for copyright owners. In conclusion, striking a delicate equilibrium between technological progress and the incentive for creative expression is of paramount importance. These suggested solutions aim to establish a harmonious foundation that nurtures innovation and creativity while honoring creators’ rights, facilitating AI development, promoting transparency, and ensuring fair compensation for creators.

https://doi.org/10.1111/jwip.12301

Digital Scholarship and DigitalKoans Are Now 19 Years Old

Digital Scholarship and DigitalKoans were established on 4/20/2005. Digital Scholarship provides information and commentary about artificial intelligence, digital copyright, digital curation, open access, research data management, scholarly communication, and other digital information issues. Digital Scholarship is an open access noncommercial publisher. All of its publications are currently under a Creative Commons Attribution License.

DigitalKoans has published over 16,200 posts. Since 2008, over 5,600 job ads have been posted, with slightly over 4,000 of them for digital library jobs.

Digital Scholarship has published the following books and book supplements: the Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (2005; published with the Association of Research Libraries), the Scholarly Electronic Publishing Bibliography: 2008 Annual Edition (2009), Digital Scholarship 2009 (2010), Transforming Scholarly Publishing through Open Access: A Bibliography (2010), the Scholarly Electronic Publishing Bibliography 2010 (2011), the Digital Curation and Preservation Bibliography 2010 (2011), the Institutional Repository and ETD Bibliography 2011 (2011), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works (2012), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement (2013), and the Research Data Curation and Management Bibliography (2021).

It has also published and updated the following bibliographies, webliographies, and weblogs: the Scholarly Electronic Publishing Bibliography (1996-2011), the Scholarly Electronic Publishing Weblog (2001-2013), the Electronic Theses and Dissertations Bibliography (2005-2021), the Google Books Bibliography (2005-2011), the Institutional Repository Bibliography (2009-2011), the Open Access Journals Bibliography (2010), the Digital Curation and Preservation Bibliography (2010-2011), the E-science and Academic Libraries Bibliography (2011), the Digital Curation Resource Guide (2012), the Research Data Curation Bibliography (2012-2019), the Altmetrics Bibliography (2013), the Transforming Peer Review Bibliography (2014), the Academic Library as Scholarly Publisher Bibliography (2018-2023), the Research Data Sharing and Reuse Bibliography (2021), the Research Data Publication and Citation Bibliography (2022), Digital Curation Certificate and Master’s Degree Programs (2023), the Academic Libraries and Research Data Management Bibliography (2023), and the Artificial Intelligence and Libraries Bibliography (2023).

"Author Granted Copyright over Book with AI-Generated Text—with a Twist"

The USCO’s notice granting Shupe copyright registration of her book does not recognize her as author of the whole text as is conventional for written works. Instead she is considered the author of the "selection, coordination, and arrangement of text generated by artificial intelligence." This means no one can copy the book without permission, but the actual sentences and paragraphs themselves are not copyrighted and could theoretically be rearranged and republished as a different book.

https://tinyurl.com/bd97jbw6

1 Million Images and Counting: "AI-Startup Launches Ever-Expanding Library of Free Stock Photos and Music"

StockCake is a new platform by AI startup Imaginary Machines. The site currently hosts more than a million pre-generated images. These images can be downloaded, used, and shared for free. There are no strings attached as all photos are in the public domain.

https://tinyurl.com/mvjd3683

StockCake

"Towards a Books Data Commons for AI Training"

This white paper describes ways of building a books data commons: a responsibly designed, broadly accessible data set of digitized books to be used in training AI models. This report, written in partnership with Creative Commons and Proteus Strategies, is based on a series of workshops that brought together practitioners building AI models, legal and policy scholars, and experts working with collections of digitized books.

In the paper, we first explain why books matter for AI training and how broader access could be beneficial. We then summarize two tracks that might be considered for developing such a resource, highlighting existing projects that help foreground the potential challenges. One track relies on public domain and permissively licensed books, while the other depends on exceptions to copyright to enable training on in-copyright books. The report also presents several key design choices and next steps that could advance further development of this approach.

https://tinyurl.com/2fu47552

"TDM & AI Rights Reserved? Fair Use & Evolving Publisher Copyright Statements"

Earlier this year, we noticed that some academic publishers have revised the copyright notices on their websites to state they reserve rights to text and data mining (TDM) and AI training (for example, see the website footers for Elsevier and Wiley). . . .SPARC asked Kyle K. Courtney, Director of Copyright and Information Policy for Harvard Library, to address key questions regarding these revised copyright statements and the continuing viability of fair use justifications for TDM.

https://tinyurl.com/4prkfbb3

Paywall: "Starting In-House Copyright Education Programs: Commonalities and Conclusions from Two Southeastern Us Academic Libraries"

This case study introduces two copyright education programs and summarizes the state of copyright education within library and information science (LIS) and academic libraries. . . . The following themes within the two copyright education programs were identified through a case study: the complexity of copyright, the engagement (or lack thereof) across a college or university, the necessity of including copyright in information literacy instruction and the calls for professional development with copyright.

https://doi.org/10.1108/RSR-09-2023-0069

"[AAP] Publishers File Brief Opposing Internet Archive Appeal of Loss"

Controlled digital lending is a frontal assault on the foundational copyright principle that rightsholders exclusively control the terms of sale for every different format of their work — a principle that has spawned the broad diversity in formats of books, movies, television and music that consumers enjoy today.

"[T]here is no resemblance between IA’s conversion of millions of print books into ebooks and the historical practice of lending print books. Nor does IA’s distribution of ebooks without paying authors and their publishers a dime conform with the modern practices of libraries, which acquire licenses to lend ebooks to their local communities and enjoy the benefits of digital distribution lawfully."

The Internet Archive ("IA") operates a mass-digitization enterprise in which it copies millions of complete, in-copyright print books and distributes the resulting bootleg ebooks from its website to anyone in the world for free. Granting summary judgment, the District Court properly held that IA’s infringement is not saved by fair use as each of the four factors weighs against IA under longstanding case law.

https://tinyurl.com/5ah5vx3x

"Fair Use Rights to Conduct Text and Data Mining and Use Artificial Intelligence Tools Are Essential for UC Research and Teaching"

The UC Libraries invest more than $60 million each year licensing systemwide electronic content needed by scholars for these and other studies. (Indeed, the $60 million figure represents license agreements made at the UC systemwide and multi-campus levels. But each individual campus also licenses electronic resources, adding millions more in total expenditures.) Our libraries secure campus access to a broad range of digital resources including books, scientific journals, databases, multimedia resources, and other materials. In doing so, the UC Libraries must negotiate licensing terms that ensure scholars can make both lawful and comprehensive use of the materials the libraries have procured. Increasingly, however, publishers and vendors are presenting libraries with content license agreements that attempt to preclude, or charge additional and unsupportable fees for, fair uses like training AI tools in the course of conducting TDM. . . .

If the UC Libraries are unable to protect these fair uses, UC scholars will be at the mercy of publishers aggregating and controlling what may be done with the scholarly record. Further, UC scholars’ pursuit of knowledge will be disproportionately stymied relative to academic colleagues in other global regions, given that a large proportion of other countries preclude contractual override of research exceptions.

Indeed, in more than forty countries—including all those within the European Union (EU)—publishers are prohibited from using contracts to abrogate exceptions to copyright in non-profit scholarly and educational contexts. Article 3 of the EU’s Directive on Copyright in the Digital Single Market preserves the right for scholars within research organizations and cultural heritage institutions (like those researchers at UC) to conduct TDM for scientific research, and further proscribes publishers from invalidating this exception by license agreements (see Article 7). Moreover, under AI regulations recently adopted by the European Parliament, copyright owners may not opt out of having their works used in conjunction with artificial intelligence tools in TDM research—meaning copyrighted works must remain available for scientific research that is reliant on AI training, and publishers cannot override these AI training rights through contract. Publishers are thus obligated to—and do—preserve fair use-equivalent research exceptions for TDM and AI within the EU, and can do so in the United States, too. . . .

In all events, adaptable licensing language can address publishers’ concerns by reiterating that the licensed products may be used with AI tools only to the extent that doing so would not: i. create a competing or commercial product or service for use by third parties; ii. unreasonably disrupt the functionality of the subscribed products; or iii. reproduce or redistribute the subscribed products for third parties. In addition, license agreements can require commercially reasonable security measures (as also required in the EU) to extinguish the risk of content dissemination beyond permitted uses. In sum, these licensing terms can replicate the research rights that are unequivocally reserved for scholars elsewhere.

https://tinyurl.com/4fvpdz35

U.S. Copyright Office Update on Its Artificial Intelligence Initiatives

In March 2023, the Office announced a broad initiative to examine the copyright implications of the current forms of generative AI. Although we had previously examined the scope of copyright in works created using AI, the increasing sophistication and public adoption of generative AI tools raised new questions about the process of training and the legal status of the outputs. Our goal was to gather information from a full range of knowledgeable and interested parties in order to produce a report to assist Congress, thecourts, and others in formulating policy in this area. In taking this initiative forward, we are monitoring related work being done in other agencies, including the U.S. Patent and Trademark Office (USPTO) and the Federal Trade Commission, and communicating with them on an ongoing basis.

This letter summarizes the Office’s work so far and describes our agenda for the rest of 2024, including the release of the report, updates to the Compendium of U.S. Copyright Office Practices, and the publication of a proposed economic research agenda.

http://tinyurl.com/4tpeyw3t

"The Text File That Runs the Internet"

But robots.txt is not a legal document — and 30 years after its creation, it still relies on the good will of all parties involved. Disallowing a bot on your robots.txt page. . . sends a message, but it’s not going to stand up in court. Any crawler that wants to ignore robots.txt can simply do so, with little fear of repercussions. . . . As the AI companies continue to multiply, and their crawlers grow more unscrupulous, anyone wanting to sit out or wait out the AI takeover has to take on an endless game of whac-a-mole. . . . If AI is in fact the future of search, as Google and others have predicted, blocking AI crawlers could be a short-term win but a long-term disaster.

http://tinyurl.com/5n8s72bz

"Court Dismisses Authors’ Copyright Infringement Claims Against OpenAI"

Several authors, including comedian Sarah Silverman, have suffered an early loss in their copyright battle against OpenAI. The authors accused OpenAI of using pirated copies of their books to train its models. A California federal court dismissed the vicarious copyright infringement and DMCA violation claims. However, the lawsuit isn’t over yet.

http://tinyurl.com/478vm6kw