"Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?"


Today’s researchers expect to be able to complete text and data mining (TDM) work on many types of textual data. But they are often blocked more by contractual limitations on what data they can use, and how they can use it, than they are by what data may be available to them. This article lays out the different types of TDM processes currently in use, the issues that may block researchers from being able to do the work they would like, and some possible solutions.

https://doi.org/10.31274/jlsc.15530

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"CADRE: A Cloud-Based Data Service for Big Bibliographic Data"

https://dl.acm.org/doi/abs/10.1145/3459637.3481898

CADRE: Collaborative Archive & Data Research Environment

Academic Library as Scholarly Publisher Bibliography, Version 2 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"ODDPub—a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications"

http://doi.org/10.5334/dsj-2020-042

Research Data Curation Bibliography, Version 10 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Releasing 1.8 Million Open Access Publications from Publisher Systems for Text and Data Mining"

Petr Knoth, Nancy Pontika and Lucas Anastasiou have published "Releasing 1.8 Million Open Access Publications from Publisher Systems for Text and Data Mining" in LSE Impact of Social Sciences.

Here's an excerpt:

Text and data mining offers an opportunity to improve the way we access and analyse the outputs of academic research. But the technical infrastructure of the current scholarly communication system is not yet ready to support TDM to its full potential, even for open access outputs. To address this problem, Petr Knoth, Nancy Pontika and Lucas Anastasiou have developed the CORE Publisher Connector, a toolkit service designed to assist text miners in accessing content though a single machine interface. The Connector aims to solve the heterogeneity among publisher APIs and assist text miners with data collection, provide a centralised point of access to all openly available scientific publications, and provide a high-performance, constantly updated access interface.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

HathiTrust Research Center User Requirements Study White Paper

Eleanor Dickson et al. have self-archived "HathiTrust Research Center User Requirements Study White Paper ."

Here's an excerpt:

This paper presents findings from an investigation into trends and practices in humanities and social sciences research that incorporates text data mining. As affiliates of the HathiTrust Research Center (HTRC), the purpose of our study was to illuminate researcher needs and expectations for text data, tools, and training for text mining in order to better understand our current and potential user community. Results of our study have and will continue to inform development of HTRC tools and services for computational text analysis.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Text Data Mining from the Author’s Perspective: Whose Text, Whose Mining, and to Whose Benefit?"

Christine L. Borgman has self-archived "Text Data Mining from the Author's Perspective: Whose Text, Whose Mining, and to Whose Benefit?."

Here's an excerpt:

Given the many technical, social, and policy shifts in access to scholarly content since the early days of text data mining, it is time to expand the conversation about text data mining from concerns of the researcher wishing to mine data to include concerns of researcher-authors about how their data are mined, by whom, for what purposes, and to whose benefits.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

An Analytical Review of Text and Data Mining Practices and Approaches in Europe

OpenForum Europe has released An Analytical Review of Text and Data Mining Practices and Approaches in Europe: Policy Recommendations in View of the Upcoming Copyright Legislative Proposal.

Here's an excerpt:

Europe needs a regime which enables any researcher, citizen, company or other entity to engage in TDM activities, using material to which they have lawful access, wherever they feel there is a good idea. The exact commercial rewards can be managed at subsequent stages, depending on the implementation of the mining outcome. The protection could be considered at the point at which some clearly commercially beneficial project, product, service, business or company has emerged.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"NIH Manuscript Collection Optimized for Text-Mining and More"

NIH has released "NIH Manuscript Collection Optimized for Text-Mining and More."

Here's an excerpt:

You can download the entire PMC collection of NIH-supported author manuscripts as a package in either XML or plain text formats. The collection will encompass all NIH manuscripts posted to PMC since July 2008. While the public can access the articles' full text and accompanying figures, tables, and multimedia on the PMC Web site, the newly available article packages include full text only, in a form that facilitates text-mining.

Digital Scholarship | Digital Scholarship Sitemap

"The Social, Political and Legal Aspects of Text and Data Mining (TDM)"

Michelle Brook, Peter Murray-Rust, and Charles Oppenheim have published "The Social, Political and Legal Aspects of Text and Data Mining (TDM)" in D-Lib Magazine.

Here's an excerpt:

The ideas of textual or data mining (TDM) and subsequent analysis go back hundreds if not thousands of years. Originally carried out manually, textual and data analysis has long been a tool which has enabled new insights to be drawn from text corpora. However, for the potential benefits of TDM to be unlocked, a number of non-technological barriers need to be overcome. These include legal uncertainty resulting from complicated copyright, database rights and licensing, the fact that some publishers are not currently embracing the opportunities TDM offers the academic community, and a lack of awareness of TDM among many academics, alongside a skills gap.

Digital Scholarship | "A Quarter-Century as an Open Access Publisher"

"Response to Elsevier’s Text and Data Mining Policy: A LIBER Discussion Paper"

LIBER has released "Response to Elsevier's Text and Data Mining Policy: A LIBER Discussion Paper."

Here's an excerpt from the announcement:

LIBER believes that the right to read is the right to mine and that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive. Furthermore, as this discussion paper highlights, licensing has the potential to limit the innovative potential of digital research methods by:

  1. restricting the tools that researchers can use
  2. limiting the way in which research results can be made available
  3. impacting on the transparency and reproducibility of research results.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Text & Data Mining—A Librarian Overview"

IFLA has released "Text & Data Mining—A Librarian Overview" by Ann Okerson.

Here's an excerpt:

Text and data mining offers exciting research opportunities over a broad range of fields. . . .

This paper reviews some of the possibilities for such work and outlines the challenges and the way ahead for librarians. One challenge lies in the terms by which data sets are licensed and made available to academic and other users; librarians need to be proactive in ensuring that these terms are favorable for the kind of use researchers will need and that the resources themselves are available in a format that allows innovative mining-based research. Another challenge is the need to support users who wish to engage in text and data mining with limited experience, especially when they approach data sets made available through library resources. Librarians should develop the expertise to support their users by making data resources available to them on favorable terms and supporting their mining efforts.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

The Value and Benefits of Text Mining

JIASC has released The Value and Benefits of Text Mining.

Here's an excerpt:

Vast amounts of new information and data are generated everyday through economic, academic and social activities. This sea of data, predicted to increase at a rate of 40% p.a., has significant potential economic and societal value. Techniques such as text and data mining and analytics are required to exploit this potential. . . .

To date there has been no systematic analysis of the value and benefits of text mining to UK further and higher education (UKFHE), nor of the additional value and benefits that might result from the exceptions to copyright proposed by Hargreaves. JISC thus commissioned this analysis of 'The Value and Benefits of Text Mining to UK Further and Higher Education'.

We have explored the costs, benefits, barriers and risks associated with text mining within UKFHE research using the approach to welfare economics laid out in the UK Treasury best practice guidelines for evaluation [2]. We gathered our evidence from consultations with key stakeholders and a set of case studies.

| Institutional Repository and ETD Bibliography 2011 | Digital Scholarship |