"NIH Manuscript Collection Optimized for Text-Mining and More"

NIH has released "NIH Manuscript Collection Optimized for Text-Mining and More."

Here's an excerpt:

You can download the entire PMC collection of NIH-supported author manuscripts as a package in either XML or plain text formats. The collection will encompass all NIH manuscripts posted to PMC since July 2008. While the public can access the articles' full text and accompanying figures, tables, and multimedia on the PMC Web site, the newly available article packages include full text only, in a form that facilitates text-mining.

Digital Scholarship | Digital Scholarship Sitemap

"The Social, Political and Legal Aspects of Text and Data Mining (TDM)"

Michelle Brook, Peter Murray-Rust, and Charles Oppenheim have published "The Social, Political and Legal Aspects of Text and Data Mining (TDM)" in D-Lib Magazine.

Here's an excerpt:

The ideas of textual or data mining (TDM) and subsequent analysis go back hundreds if not thousands of years. Originally carried out manually, textual and data analysis has long been a tool which has enabled new insights to be drawn from text corpora. However, for the potential benefits of TDM to be unlocked, a number of non-technological barriers need to be overcome. These include legal uncertainty resulting from complicated copyright, database rights and licensing, the fact that some publishers are not currently embracing the opportunities TDM offers the academic community, and a lack of awareness of TDM among many academics, alongside a skills gap.

Digital Scholarship | "A Quarter-Century as an Open Access Publisher"

"Response to Elsevier’s Text and Data Mining Policy: A LIBER Discussion Paper"

LIBER has released "Response to Elsevier's Text and Data Mining Policy: A LIBER Discussion Paper."

Here's an excerpt from the announcement:

LIBER believes that the right to read is the right to mine and that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive. Furthermore, as this discussion paper highlights, licensing has the potential to limit the innovative potential of digital research methods by:

  1. restricting the tools that researchers can use
  2. limiting the way in which research results can be made available
  3. impacting on the transparency and reproducibility of research results.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Text & Data Mining—A Librarian Overview"

IFLA has released "Text & Data Mining—A Librarian Overview" by Ann Okerson.

Here's an excerpt:

Text and data mining offers exciting research opportunities over a broad range of fields. . . .

This paper reviews some of the possibilities for such work and outlines the challenges and the way ahead for librarians. One challenge lies in the terms by which data sets are licensed and made available to academic and other users; librarians need to be proactive in ensuring that these terms are favorable for the kind of use researchers will need and that the resources themselves are available in a format that allows innovative mining-based research. Another challenge is the need to support users who wish to engage in text and data mining with limited experience, especially when they approach data sets made available through library resources. Librarians should develop the expertise to support their users by making data resources available to them on favorable terms and supporting their mining efforts.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

The Value and Benefits of Text Mining

JIASC has released The Value and Benefits of Text Mining.

Here's an excerpt:

Vast amounts of new information and data are generated everyday through economic, academic and social activities. This sea of data, predicted to increase at a rate of 40% p.a., has significant potential economic and societal value. Techniques such as text and data mining and analytics are required to exploit this potential. . . .

To date there has been no systematic analysis of the value and benefits of text mining to UK further and higher education (UKFHE), nor of the additional value and benefits that might result from the exceptions to copyright proposed by Hargreaves. JISC thus commissioned this analysis of 'The Value and Benefits of Text Mining to UK Further and Higher Education'.

We have explored the costs, benefits, barriers and risks associated with text mining within UKFHE research using the approach to welfare economics laid out in the UK Treasury best practice guidelines for evaluation [2]. We gathered our evidence from consultations with key stakeholders and a set of case studies.

| Institutional Repository and ETD Bibliography 2011 | Digital Scholarship |

"Teaching with Google Books: Research, Copyright, and Data Mining"

Nathan Rinne has self-archived "Teaching with Google Books: Research, Copyright, and Data Mining" in E-LIS.

Here's an excerpt:

Google's Google Books site is a rich resource that is probably underutilized by most educators. It has all kinds of potential for a) getting students into the research process in a way that they will enjoy (for example, they can see how a famous quote has been used/quoted, find out which books cite the journal article they are interested in, or check to see if a specific book covers a topic that they want to explore, etc.); b) teaching them about the deeper civic purpose and the evolving state of copyright law; and, c) exploring, with the help of Google Book's Ngram viewer, the promise and ethics surrounding the issue of data-mining and "non-consumptive" research, or research that is accomplished by "mining" books for data, as opposed to reading them.

| Google Books Bibliography | Digital Scholarship |

Stanford University Preparing Proposal for Text Mining Center Providing Access to 30 Million Digitized Books Plus Highwire Journals

In "Possible Text Mining Opportunity at Stanford," Matthew Jockers describes a research proposal being developed at Stanford University for a text mining center that would provide access to 30 million digitized books plus Highwire Journals.

Here's an excerpt:

As I'm sure many of you already know, Stanford has been closely involved with Google's book scanning project, and we (Stanford) are currently preparing a proposal for the creation of a text mining / analysis Center on campus. The core assets of the proposed Center would include all of the Google data (approx. 30 million books) plus all of our Highwire data and all of our licensed content. We see a wide range of research opportunities for this collection, and we are envisioning a Center that would offer various levels of interaction with scholars. In particular we envision a "tiered" service model that would, on one hand, allow technically challenged researchers to work with Center staff in formulating research questions and, on the other, an opportunity for more technically advanced scholars to write their own algorithms and run them on the corpus. We are imagining the Center as both a resource and as a physical place, a place that will offer support to both internal and external scholars and graduate students.

Scholarship in the Age of Abundance: Enhancing Historical Research with Text-Mining and Analysis Tools Project

The Center for History and New Media's Scholarship in the Age of Abundance: Enhancing Historical Research with Text-Mining and Analysis Tools project has been awarded a two-year grant from the National Endowment for the Humanities.

Here's an excerpt from "Enhancing Historical Research with Text-Mining and Analysis Tools":

We will first conduct a survey of historians to examine closely their use of digital resources and prospect for particularly helpful uses of digital technology. We will then explore three main areas where text mining might help in the research process: locating documents of interest in the sea of texts online; extracting and synthesizing information from these texts; and analyzing large-scale patterns across these texts. A focus group of historians will be used to assess the efficacy of different methods of text mining and analysis in real-world research situations in order to offer recommendations, and even some tools, for the most promising approaches.