In "Possible Text Mining Opportunity at Stanford," Matthew Jockers describes a research proposal being developed at Stanford University for a text mining center that would provide access to 30 million digitized books plus Highwire Journals.
Here's an excerpt:
As I'm sure many of you already know, Stanford has been closely involved with Google's book scanning project, and we (Stanford) are currently preparing a proposal for the creation of a text mining / analysis Center on campus. The core assets of the proposed Center would include all of the Google data (approx. 30 million books) plus all of our Highwire data and all of our licensed content. We see a wide range of research opportunities for this collection, and we are envisioning a Center that would offer various levels of interaction with scholars. In particular we envision a "tiered" service model that would, on one hand, allow technically challenged researchers to work with Center staff in formulating research questions and, on the other, an opportunity for more technically advanced scholars to write their own algorithms and run them on the corpus. We are imagining the Center as both a resource and as a physical place, a place that will offer support to both internal and external scholars and graduate students.