"Fair Use Rights to Conduct Text and Data Mining and Use Artificial Intelligence Tools Are Essential for UC Research and Teaching"


The UC Libraries invest more than $60 million each year licensing systemwide electronic content needed by scholars for these and other studies. (Indeed, the $60 million figure represents license agreements made at the UC systemwide and multi-campus levels. But each individual campus also licenses electronic resources, adding millions more in total expenditures.) Our libraries secure campus access to a broad range of digital resources including books, scientific journals, databases, multimedia resources, and other materials. In doing so, the UC Libraries must negotiate licensing terms that ensure scholars can make both lawful and comprehensive use of the materials the libraries have procured. Increasingly, however, publishers and vendors are presenting libraries with content license agreements that attempt to preclude, or charge additional and unsupportable fees for, fair uses like training AI tools in the course of conducting TDM. . . .

If the UC Libraries are unable to protect these fair uses, UC scholars will be at the mercy of publishers aggregating and controlling what may be done with the scholarly record. Further, UC scholars’ pursuit of knowledge will be disproportionately stymied relative to academic colleagues in other global regions, given that a large proportion of other countries preclude contractual override of research exceptions.

Indeed, in more than forty countries—including all those within the European Union (EU)—publishers are prohibited from using contracts to abrogate exceptions to copyright in non-profit scholarly and educational contexts. Article 3 of the EU’s Directive on Copyright in the Digital Single Market preserves the right for scholars within research organizations and cultural heritage institutions (like those researchers at UC) to conduct TDM for scientific research, and further proscribes publishers from invalidating this exception by license agreements (see Article 7). Moreover, under AI regulations recently adopted by the European Parliament, copyright owners may not opt out of having their works used in conjunction with artificial intelligence tools in TDM research—meaning copyrighted works must remain available for scientific research that is reliant on AI training, and publishers cannot override these AI training rights through contract. Publishers are thus obligated to—and do—preserve fair use-equivalent research exceptions for TDM and AI within the EU, and can do so in the United States, too. . . .

In all events, adaptable licensing language can address publishers’ concerns by reiterating that the licensed products may be used with AI tools only to the extent that doing so would not: i. create a competing or commercial product or service for use by third parties; ii. unreasonably disrupt the functionality of the subscribed products; or iii. reproduce or redistribute the subscribed products for third parties. In addition, license agreements can require commercially reasonable security measures (as also required in the EU) to extinguish the risk of content dissemination beyond permitted uses. In sum, these licensing terms can replicate the research rights that are unequivocally reserved for scholars elsewhere.

https://tinyurl.com/4fvpdz35

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?"


Today’s researchers expect to be able to complete text and data mining (TDM) work on many types of textual data. But they are often blocked more by contractual limitations on what data they can use, and how they can use it, than they are by what data may be available to them. This article lays out the different types of TDM processes currently in use, the issues that may block researchers from being able to do the work they would like, and some possible solutions.

https://doi.org/10.31274/jlsc.15530

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"CADRE: A Cloud-Based Data Service for Big Bibliographic Data"

https://dl.acm.org/doi/abs/10.1145/3459637.3481898

CADRE: Collaborative Archive & Data Research Environment

Academic Library as Scholarly Publisher Bibliography, Version 2 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"ODDPub—a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications"

http://doi.org/10.5334/dsj-2020-042

Research Data Curation Bibliography, Version 10 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Releasing 1.8 Million Open Access Publications from Publisher Systems for Text and Data Mining"

Petr Knoth, Nancy Pontika and Lucas Anastasiou have published "Releasing 1.8 Million Open Access Publications from Publisher Systems for Text and Data Mining" in LSE Impact of Social Sciences.

Here's an excerpt:

Text and data mining offers an opportunity to improve the way we access and analyse the outputs of academic research. But the technical infrastructure of the current scholarly communication system is not yet ready to support TDM to its full potential, even for open access outputs. To address this problem, Petr Knoth, Nancy Pontika and Lucas Anastasiou have developed the CORE Publisher Connector, a toolkit service designed to assist text miners in accessing content though a single machine interface. The Connector aims to solve the heterogeneity among publisher APIs and assist text miners with data collection, provide a centralised point of access to all openly available scientific publications, and provide a high-performance, constantly updated access interface.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

HathiTrust Research Center User Requirements Study White Paper

Eleanor Dickson et al. have self-archived "HathiTrust Research Center User Requirements Study White Paper ."

Here's an excerpt:

This paper presents findings from an investigation into trends and practices in humanities and social sciences research that incorporates text data mining. As affiliates of the HathiTrust Research Center (HTRC), the purpose of our study was to illuminate researcher needs and expectations for text data, tools, and training for text mining in order to better understand our current and potential user community. Results of our study have and will continue to inform development of HTRC tools and services for computational text analysis.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Text Data Mining from the Author’s Perspective: Whose Text, Whose Mining, and to Whose Benefit?"

Christine L. Borgman has self-archived "Text Data Mining from the Author's Perspective: Whose Text, Whose Mining, and to Whose Benefit?."

Here's an excerpt:

Given the many technical, social, and policy shifts in access to scholarly content since the early days of text data mining, it is time to expand the conversation about text data mining from concerns of the researcher wishing to mine data to include concerns of researcher-authors about how their data are mined, by whom, for what purposes, and to whose benefits.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

An Analytical Review of Text and Data Mining Practices and Approaches in Europe

OpenForum Europe has released An Analytical Review of Text and Data Mining Practices and Approaches in Europe: Policy Recommendations in View of the Upcoming Copyright Legislative Proposal.

Here's an excerpt:

Europe needs a regime which enables any researcher, citizen, company or other entity to engage in TDM activities, using material to which they have lawful access, wherever they feel there is a good idea. The exact commercial rewards can be managed at subsequent stages, depending on the implementation of the mining outcome. The protection could be considered at the point at which some clearly commercially beneficial project, product, service, business or company has emerged.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"NIH Manuscript Collection Optimized for Text-Mining and More"

NIH has released "NIH Manuscript Collection Optimized for Text-Mining and More."

Here's an excerpt:

You can download the entire PMC collection of NIH-supported author manuscripts as a package in either XML or plain text formats. The collection will encompass all NIH manuscripts posted to PMC since July 2008. While the public can access the articles' full text and accompanying figures, tables, and multimedia on the PMC Web site, the newly available article packages include full text only, in a form that facilitates text-mining.

Digital Scholarship | Digital Scholarship Sitemap

"The Social, Political and Legal Aspects of Text and Data Mining (TDM)"

Michelle Brook, Peter Murray-Rust, and Charles Oppenheim have published "The Social, Political and Legal Aspects of Text and Data Mining (TDM)" in D-Lib Magazine.

Here's an excerpt:

The ideas of textual or data mining (TDM) and subsequent analysis go back hundreds if not thousands of years. Originally carried out manually, textual and data analysis has long been a tool which has enabled new insights to be drawn from text corpora. However, for the potential benefits of TDM to be unlocked, a number of non-technological barriers need to be overcome. These include legal uncertainty resulting from complicated copyright, database rights and licensing, the fact that some publishers are not currently embracing the opportunities TDM offers the academic community, and a lack of awareness of TDM among many academics, alongside a skills gap.

Digital Scholarship | "A Quarter-Century as an Open Access Publisher"

"Response to Elsevier’s Text and Data Mining Policy: A LIBER Discussion Paper"

LIBER has released "Response to Elsevier's Text and Data Mining Policy: A LIBER Discussion Paper."

Here's an excerpt from the announcement:

LIBER believes that the right to read is the right to mine and that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive. Furthermore, as this discussion paper highlights, licensing has the potential to limit the innovative potential of digital research methods by:

  1. restricting the tools that researchers can use
  2. limiting the way in which research results can be made available
  3. impacting on the transparency and reproducibility of research results.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Text & Data Mining—A Librarian Overview"

IFLA has released "Text & Data Mining—A Librarian Overview" by Ann Okerson.

Here's an excerpt:

Text and data mining offers exciting research opportunities over a broad range of fields. . . .

This paper reviews some of the possibilities for such work and outlines the challenges and the way ahead for librarians. One challenge lies in the terms by which data sets are licensed and made available to academic and other users; librarians need to be proactive in ensuring that these terms are favorable for the kind of use researchers will need and that the resources themselves are available in a format that allows innovative mining-based research. Another challenge is the need to support users who wish to engage in text and data mining with limited experience, especially when they approach data sets made available through library resources. Librarians should develop the expertise to support their users by making data resources available to them on favorable terms and supporting their mining efforts.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap