Objections to the Google Books Settlement and Responses in the Amended Settlement: A Report

The Public-Interest Book Search Initiative at the New York Law School has released Objections to the Google Books Settlement and Responses in the Amended Settlement: A Report.

Here's an excerpt:

This report collects information about the objections raised to the original proposed settlement in the Authors Guild v. Google litigation. We identified 76 distinct issues, which we grouped into 11 categories. This report briefly summarizes each issue, provides an illustrative quotation from a filing with the court, and indicates any related changes in the amended settlement. . . .

This report is descriptive, not evaluative. Inclusion of an issue means only that at least one party made the full argument in a filing to the court. It does not represent any judgment about whether the objection accurately characterizes the settlement or the underlying facts. Nor does it represent any judgment about the legal merits of the objection. Our classification and ordering of the objections are meant as an aid to the reader, not substantive commentary. Our choice of representative quotations is not meant as an endorsement of any particular filer’s arguments. Similarly, inclusion of changes from the amended settlement does not represent a judgment about whether the changes address the relevant objection.

Digging into Data Challenge Projects Funded

JISC has announced that eight projects have been awarded Digging into Data Challenge grants.

Here's an excerpt from the press release:

"Data mining and analysis are not just for scientists" is the message coming strongly out of an international JISC-funded competition, the "Digging into Data Challenge."

Entrants have been challenged to answer the question "what would you do with a million books? Or a million pages of newspapers? Or a million photographs of artworks?" That is, how can analysis done over immense quantities of digital data be employed in humanities and social science research? What would you do with a million books? Or a million pages of newspapers? Or a million photographs of artworks?

Eight international research teams from the UK, US and Canada will be using a variety of data analysis tools to demonstrate that techniques currently used in the sciences can leverage open, new avenues for scholarship in the humanities and social sciences.

The winners of the competition are announced today by the four leading research agencies sponsoring the competition: JISC, the Social Sciences and Humanities Research Council (SSHRC) of Canada, the National Endowment for the Humanities (NEH) and the National Science Foundation (NSF), both of the United States.

Investment from the four agencies together amounts to over a million pounds, allowing new links to be forged across the different countries, as well as breaking down disciplinary boundaries.

Here are the funded projects

  • Data Mining with Criminal Intent: George Mason University, University of Alberta, and University of Hertfordshire
  • Digging into Image Data to Answer Authorship Related Questions: Michigan State University, University of Illinois, Urbana-Champaign, and University of Sheffield
  • Digging into the Enlightenment: Mapping the Republic of Letters: University of Oklahoma, University of Oxford, and Stanford University
  • Harvesting Speech Datasets for Linguistic Research on the Web: McGill University and Cornell University
  • Mining a Year of Speech: University of Oxford and University of Pennsylvania
  • Railroads and the Making of Modern America—Tools for Spatio-Temporal Correlation, Analysis, and Visualization: University of Portsmouth and University of Nebraska-Lincoln
  • Structural Analysis of Large Amounts of Music Information: University of Illinois, Urbana-Champaign, University of Southampton, and McGill University
  • Towards Dynamic Variorum Editions: Mount Allison University, Imperial College, London, and Tufts University

A Guide for the Perplexed Part III: The Amended Settlement Agreement

The American Library Association, the Association of Research Libraries, and the Association of College and Research Libraries have released A Guide for the Perplexed Part III: The Amended Settlement Agreement.

Here's an excerpt from the press release:

The guide describes the major changes in the amended settlement agreement (ASA), submitted to the Court by Google, the Authors Guild, and the Association of American Publishers on November 13, 2009, with emphasis on those changes relevant to libraries.

While many of the amendments will have little direct impact on libraries, the ASA significantly reduces the scope of the settlement because it excludes most books published outside of the United States. In addition, the ASA provides the Book Rights Registry the authority to increase the number of free public access terminals in public libraries that had initially been set at one per library building, among other changes.

Looking ahead, the Court has accepted the parties’ recommended schedule and set January 28, 2010, as the deadline for class members to opt out of the ASA or to file objections, and February 4, 2010, as the deadline for the Department of Justice to file its comments. The Court will hold the fairness hearing on February 18, 2010.

Preliminary Approval Granted for Amended Google Book Search Settlement

US District Court Judge Denny Chin has granted preliminary approval of the amended Google Book Search Settlement.

Here's the order.

Read more about it at "Judge Gives Preliminary Approval to Google Deal, Sets Feb. 18 for Final Hearing" and "Judge Sets February Hearing for New Google Books Deal."

Google Book Search Settlement Amended

An amended version of the Google Book Search Settlement has been filed by the AAP, the Authors Guild, and Google with the U.S. District Court for the Southern District of New York.

The complete amended agreement is available from Google as a Zip file.

Exhibit 1 provides the primary text of the amended settlement agreement.

An overview of the amended settlement agreement is available, as is an FAQ.

Read more about it at "Google Books Settlement Sets Geographic, Business Limits"; "Is the Google Books Settlement Worth the Wait?"; and "Terms of Digital Book Deal with Google Revised."

Stanford University Preparing Proposal for Text Mining Center Providing Access to 30 Million Digitized Books Plus Highwire Journals

In "Possible Text Mining Opportunity at Stanford," Matthew Jockers describes a research proposal being developed at Stanford University for a text mining center that would provide access to 30 million digitized books plus Highwire Journals.

Here's an excerpt:

As I'm sure many of you already know, Stanford has been closely involved with Google's book scanning project, and we (Stanford) are currently preparing a proposal for the creation of a text mining / analysis Center on campus. The core assets of the proposed Center would include all of the Google data (approx. 30 million books) plus all of our Highwire data and all of our licensed content. We see a wide range of research opportunities for this collection, and we are envisioning a Center that would offer various levels of interaction with scholars. In particular we envision a "tiered" service model that would, on one hand, allow technically challenged researchers to work with Center staff in formulating research questions and, on the other, an opportunity for more technically advanced scholars to write their own algorithms and run them on the corpus. We are imagining the Center as both a resource and as a physical place, a place that will offer support to both internal and external scholars and graduate students.

HathiTrust Will Release Search Engine Indexing 1.5 Billion Pages from Digitized Books and Other Materials

Next month, the HathiTrust will release a full-text search engine indexing 1.5 billion pages from digitized books and other materials from 25 member research libraries.

An experimental version of the search engine is now available.

Read more about it at "HathiTrust Launching Full-Text Library of Books."

University of Michigan to Distribute Over 500,000 Digitized Books Using HP BookPrep POD Service

The University of Michigan Library will distribute over 500,000 rare and hard-to-find digitized books using HP BookPrep POD service.

Here's an excerpt from the press release:

HP BookPrep — a cloud computing service that enables on-demand printing of books — brings new life to the traditional publishing model, making it possible to bring any book ever published back into print through an economical and sustainable service model.

As part of a growing movement to preserve and digitize historic content, major libraries are partnering with technology leaders to scan previously hard-to-find works using high-resolution photography. HP's process transforms these scans prior to printing by cleaning up some of the wear and tear that often is present in the originals.

HP BookPrep significantly drives down the cost of republishing books by eliminating the manual cleanup work that would otherwise be required. Based on imaging and printing technology from HP Labs, the company's central research arm, HP BookPrep automates the creation of high-quality, print-ready books from these raw book scans by sharpening text and images, improving alignment and coloration, and generating and adding covers.

People can now purchase high-quality print versions of public-domain, out-of-print books from the University of Michigan Library through HP BookPrep channels, including traditional and online retailers such as Amazon.com.

"People around the world still value reading books in print," said Andrew Bolwell, director, New Business Initiatives, HP. "HP BookPrep technology allows publishers to extend the life cycle of their books, removes the cost and waste burdens of maintaining inventory, and uses a full spectrum of technologies to deliver convenient access to consumers."

For publishers and content owners, HP BookPrep offers an opportunity to offer their full catalog of titles online, irrespective of demand. Because HP BookPrep is a web service that processes books as they are ordered, there is little upfront investment or risk as books are printed only after they are purchased, no matter the volume, eliminating the need for high carrying costs.

Consistently ranked as one of the top 10 academic research libraries in North America, the University of Michigan Library is a true repository for the human record. The print collection contains more than 7 million volumes, covering thousands of years of civilization. HP is collaborating with the university to eliminate barriers and increase access to content as part of an ongoing effort to make the concept of "out of print" a thing of the past.

"Our partnership with HP is a testament to the University of Michigan Library's commitment to increase public access to our library's collections and our continued innovative use of digitization," said Paul N. Courant, librarian and dean of libraries, University of Michigan. "We are excited that HP BookPrep can offer print distribution of the public domain works in our collection and help to provide broad access to works that have previously been hard to find outside the walls of our library."

The collaboration also builds upon HP's existing relationship with Applewood Books, a publisher of historical, Americana books. The company, which has been using HP BookPrep for the last year to republish hundreds of titles, also will distribute HP BookPrep's best-selling titles from the University of Michigan Library.

European Commission Adopts Communication on Copyright in the Knowledge Economy

The European Commission has adopted a Communication from the Commission: Copyright in the Knowledge Economy.

Here's an excerpt from the press release:

The European Commission today adopted a Communication on Copyright in the Knowledge Economy aiming to tackle the important cultural and legal challenges of mass-scale digitisation and dissemination of books, in particular of European library collections. The Communication was jointly drawn up by Commissioners Charlie McCreevy and Viviane Reding. Digital libraries such as Europeana ( http//www.europeana.eu ) will provide researchers and consumers across Europe with new ways to gain access to knowledge. For this, however, the EU will need to find a solution for orphan works, whose uncertain copyright status means they often cannot be digitised. Improving the distribution and availability of works for persons with disabilities, particularly the visually impaired, is another cornerstone of the Communication.

On adoption, Commissioners McCreevy and Reding stressed that the debate over the Google Books Settlement in the United States once again has shown that Europe could not afford to be left behind on the digital frontier.

"We must boost Europe as a centre of creativity and innovation. The vast heritage in Europe's libraries cannot be left to languish but must be made accessible to our citizens", Commissioner McCreevy, responsible for the Internal Market, stated.

Commissioner Reding, in charge of Information Society and Media, said: "Important digitisation efforts have already started all around the globe. Europe should seize this opportunity to take the lead, and to ensure that books digitisation takes place on the basis of European copyright law, and in full respect of Europe's cultural diversity. Europe, with its rich cultural heritage, has most to offer and most to win from books digitisation. If we act swiftly, pro-competitive European solutions on books digitisation may well be sooner operational than the solutions presently envisaged under the Google Books Settlement in the United States."

The Communication addresses the actions that the Commission intends to launch: digital preservation and dissemination of scholarly and cultural material and of orphan works, as well as access to knowledge for persons with disabilities. The challenges identified by the Commission today stem from last year’s public consultation on a Green Paper ( IP/08/1156 ), the Commission's High Level Group on Digital Libraries and the experiences gained with Europe's Digital Library Europeana ( IP/09/1257 ).

Google Books Settlement Status Conference Reports

Kenneth Crews and James Grimmelmann have posted blog reports about the Google Books Settlement status conference on October 7th. An amended agreement is anticipated to be filed by November 9th.

Here's an excerpt from the Grimmelmann's post:

Judge Chin is trying to move this case, and his overall attitude seemed to be that he wants as clean a record as possible, and soon, so that he can act on it. That would incline me to think that he is hoping to be able to approve the settlement, or at the least to kick some of the legal issues upstairs to the Second Circuit for its guidance.

Read more about it at "Amended Google Deal Targeted for November 9."

No Contract Awarded for GPO Mass Digitization of All Federal Publications

The U.S. Government Printing Office has been unable to award a contract for the digitization of all Federal publications.

Here's an excerpt from the announcement:

In 2004, GPO proposed digitizing all retrospective Federal publications back to the earliest days of the Federal Government. Following the conduct of a pilot project in 2006 and its evaluation in 2007, we issued an RFP in 2008 for a cooperative relationship with a public or private sector participant or participants where the uncompressed, unaltered files created as a result of the conversion process would be delivered to GPO at no cost to the Government, for ingest into GPO's Federal Digital System (FDsys). Unfortunately, we were unable to make an award for this RFP in the allocated timeframe.

We are very disappointed in this setback, but are currently developing new digitization alternatives. In addition to our longstanding goal of serving as one of the repositories for electronic files through the submission of material to FDsys, our focus for digitization will be on coordinating projects among institutions, assisting in the establishment and implementation of preservation guidelines, maintaining a registry of digitization projects, and ensuring that there is appropriate bibliographic metadata for the titles in the collection.

Yale: "Digitization Project Derailed"

In "Digitization Project Derailed," Carol Hsin discusses the status of digitization efforts at the Yale University Library. (Thanks to ResourceShelf.)

Here's an excerpt:

Four months after Microsoft abruptly terminated its multi-million dollar book digitization deal with the University, Yale officials said they will have to wait for donations or grants to come in before they start another major book scanning project.

New York Public Library and Kirtas Technologies Make Half-Million Public Domain Books Available

The New York Public Library and Kirtas Technologies are making a half-million public domain books available for sale as digitized or printed copies.

Here's an excerpt from the press release:

Readers and researchers looking for hard-to-find books now have the opportunity to dip into the collections of one of the world's most comprehensive libraries to purchase digitized copies of public domain titles. Through their Digitize-on-Demand program, Kirtas Technologies has partnered with The New York Public Library to make 500,000 public domain works from the Library's collections available (to anyone in the world).

"New technology has allowed the Library to greatly expand access to its collections," said Paul LeClerc, President of The New York Public Library. "Now, for the first time, library users are able to order copies of specific items from our vast public domain collections that are useful to them. Additionally the program creates a digital legacy for future users of the same item and a revenue stream to support our operations. We are very pleased to participate in a program that is so beneficial to everyone involved."

Using existing information from NYPL's catalog records, Kirtas will make the library's public domain books available for sale through its retail site before they are ever digitized. Customers can search for a desired title on www.kirtasbooks.com and place an order for that book. When the order is placed, only then is it pulled from the shelf, digitized and made available as a high-quality reprint or digital file.

What makes this approach to digitization unique is that NYPL incurs no up-front printing, production or storage costs. It also provides the library with a self-funding, commercial model helping it to sustain its digitization programs in the future. Unlike other free or low-cost digitization programs, the library retains the rights and ownership to their own digitized content.

Mining a Million Scanned Books: Linguistic and Structure Analysis, Fast Expanded Search, and Improved OCR Grant Awarded

The NSF Division of Information & Intelligent Systems has awarded a grant to the Center for Intelligent Information Retrieval at UMass Amherst, the Perseus Digital Library Project at Tufts, and the Internet Archive for their "Mining a Million Scanned Books: Linguistic and Structure Analysis, Fast Expanded Search, and Improved OCR" proposal.

Here's an excerpt from the award abstract:

The Center for Intelligent Information Retrieval at UMass Amherst, the Perseus Digital Library Project at Tufts, and the Internet Archive are investigating large-scale information extraction and retrieval technologies for digitized book collections. To provide effective analysis and search for scholars and the general public, and to handle the diversity and scale of these collections, this project focuses on improvements in seven interlocking technologies: improved OCR accuracy through word spotting, creating probabilistic models using joint distributions of features, and building topic-specific language models across documents; structural metadata extraction, to mine headers, chapters, tables of contents, and indices; linguistic analysis and information extraction, to perform syntactic analysis and entity extraction on noisy OCR output; inferred document relational structure, to mine citations, quotations, translations, and paraphrases; latent topic modeling through time, to improve language modeling for OCR and retrieval, and to track the spread of ideas across periods and genres; query expansion for relevance models, to improve relevance in information retrieval by offline pre-processing of document comparisons; and interfaces for exploratory data analysis, to provide users of the document collection with efficient tools to update complex models of important entities, events, topics, and linguistic features. When applied across large corpora, these technologies reinforce each other: improved topic modeling enables more targeted language models for OCR; extracting structural metadata improves citation analysis; and entity extraction improves topic modeling and query expansion. The testbed for this project is the growing corpus of over one million open-access books from the Internet Archive.

The Google Books Settlement: Who Is Filing And What Are They Saying?

ACRL, ALA, and ARL have released The Google Books Settlement: Who Is Filing And What Are They Saying?.

Here's an excerpt:

The Association of Research Libraries, the American Library Association, and the Association of College and Research Libraries have prepared this document to summarize in a few pages of charts some key information about the hundreds of filings that have been submitted to the federal district court presiding over the Google Books litigation. The Google Books Settlement is the proposed settlement of a class action lawsuit brought against Google, Inc. by groups and individuals representing authors and publishers who objected to Google’s large-scale scanning of in-copyright books to facilitate its Book Search service. The Settlement would bind not only the groups who sued Google, but also most owners of copyrights in printed books ("class-members"), unless they choose to opt out of the Settlement. Class-members who opt out retain their right to sue Google over its scanning activities, but will not be part of the collective licensing scheme created by the Settlement. Under the Settlement, participating class-members will get a one-time payment in compensation for past scanning as well as a share of Google’s future revenues from its scanning activities. A new, non-profit entity called the Book Rights Registry will represent rightsholders under the Settlement going forward.

Kenneth Crews on the U.S. Department of Justice Google Book Search Settlement Filing

In "Justice and Google Books: First Thoughts about the Government's Brief," Kenneth Crews, Director of the Copyright Advisory Office at Columbia University, discusses the U.S. Department of Justice Antitrust Division's filing on the Google Book Search Settlement.

Here's an excerpt:

The filing is remarkable for its lucid dissection of select issues. It is diplomatic, and it holds out repeated hope for the continued talks among the parties to the case. But clearly the DOJ does not like what it sees.

Google Book Settlement Fairness Hearing Postponed

U.S. District Judge Denny Chin has postponed the October 7th fairness hearing for the Google Book Search Settlement; however, a status conference will occur on that date.

Here's the ruling.

Read more about it at "Google Judge Calls 'Status Conference' for 7th October" and "Judge Agrees to Postpone Google Books Hearing."

"Copyright as Information Policy: Google Book Search from a Law and Economics Perspective"

Douglas Lichtman, Professor of Law at the UCLA School of Law, has self-archived "Copyright as Information Policy: Google Book Search from a Law and Economics Perspective" in SSRN.

Here's an excerpt:

The copyright system has long been understood to play a critical role when it comes to the development and distribution of creative work. Copyright serves a second fundamental purpose, however: it encourages the development and distribution of related technologies like hardware that might be used to duplicate creative work and software that can manipulate it. When it comes to issues of online infringement, then, copyright policy serves two goals, not one: protect the incentives copyright has long served to provide authors, and at the same time facilitate the continued emergence of innovative Internet services and equipment. In this Chapter, I use the Google Book Search litigation as a lens through which to study copyright law’s efforts to serve these two sometimes-competing masters. The Google case is an ideal lens for this purpose because both the technology implications and the authorship implications are apparent. With respect to the technology, Google tells us that the only way for it to build its Book Search engine is to have copyright law excuse the infringement that is today by design part of the project. With respect to authorship, copyright owners are resisting that result for fear that the infringement here could significantly erode both author control and author profitability over the long run. I myself am optimistic that copyright law can and will balance these valid concerns. The Chapter explains how, discussing not only the formal legal rules but also the economic intuitions behind them.

Pamela Samuelson: "DOJ Says No to Google Book Settlement"

In "DOJ Says No to Google Book Settlement," noted copyright expert Pamela Samuelson examines the U.S. Department of Justice's Google Book Search Settlement filing.

Here's an excerpt:

Among the most significant recommendations DOJ made for modifying the Proposed Settlement is one to ameliorate the risk of market foreclosure as to institutional subscriptions. DOJ suggests the parties should find a way to "provide some mechanism by which Google's competitors could gain comparable access to orphan works." That is, DOJ is recommending that Google, the Authors Guild and the publishers find a way to let firms such as Amazon.com and Microsoft get comparable licenses to out-of-print books, particularly to orphans. Google has previously denied that it was possible to include competitors in any license granted through the settlement. It will be interesting to see if the litigants want the settlement badly enough to conjure up a way to extend the license to firms other than Google.

University of California and Internet Archive Joint Mass Digitization Project Ends

The California Digital Library has announced that a joint mass digitization project by the University of California and the Internet Archive has ended.

Here's an excerpt from the announcement:

In 2005, the UC Libraries entered into a ground-breaking partnership with the Internet Archive to digitize public domain book collections from the University of California Libraries. With the generous support of external partners such as Microsoft, Yahoo, and the Alfred P. Sloan Foundation, our collaboration grew to encompass two major on-site scanning centers at NRLF and SRLF and scores of dedicated staff at the UC Regional Library Facilities and elsewhere throughout UC, producing an impressive corpus of close to 200,000 public domain books that are now available worldwide to students, scholars, and the general public. Today, five years and over 64 million pages later, we announce the conclusion of this phase of our Internet Archive collaboration and celebrate the work we have accomplished together.

UC's book digitization partnership with Internet Archive began in 2005 as a founding member of the Open Content Alliance. In February 2006, the first on-site digitization center comprising ten Scribe scanning machines was installed at NRLF; a second 10-station scanning center was opened at SRLF later that year. In August 2008, UC's on-site Internet Archive digitization center at NRLF was de-commissioned and relocated to an Internet Archive facility in San Francisco, leaving the SRLF scanning center as our only remaining on-site facility. One year later in August 2009, the UC-hosted Internet Archive scanning center housed at SRLF was closed and relocated to a new off-site facility in the Los Angeles area, marking the conclusion of a digitization project that has made available to the world an unparalleled digital corpus of public domain books drawn from the renowned collections of the University of California Libraries. . . .

While this phase of our work with Internet Archive is coming to an end, we look forward to continuing our collaboration for many years to come as opportunity and resources permit.

U.S. Department of Justice Files Objection to Google Book Search Settlement

The U.S. Department of Justice has filed an objection to the Google Book Search Settlement.

Here's an excerpt:

Nonetheless, the breadth of the Proposed Settlement—especially the forward-looking business arrangements it seeks to create—raises significant legal concerns. As a threshold matter, the central difficulty that the Proposed Settlement seeks to overcome—the inaccessibility of many works due to the lack of clarity about copyright ownership and copyright status—is a matter of public, not merely private, concern. A global disposition of the rights to millions of copyrighted works is typically the kind of policy change implemented through legislation, not through a private judicial settlement. If such a significant (and potentially beneficial) policy change is to be made through the mechanism of a class action settlement (as opposed to legislation), the United States respectfully submits that this Court should undertake a particularly searching analysis to ensure that the requirements of Federal Rule of Civil Procedure 23 ("Rule 23") are met and that the settlement is consistent with copyright law and antitrust law. As presently drafted, the Proposed Settlement does not meet the legal standards this Court must apply.

This Memorandum sets forth the concerns of the United States with respect to the current version of the Proposed Settlement; these concerns may be obviated by the parties' subsequent changes to the agreement. Commenters' objections to the Proposed Settlement fall into three basic categories: (1) claims that the Proposed Settlement fails to satisfy Rule 23; (2) claims that the Proposed Settlement would violate copyright law; and (3) claims that the Proposed Settlement would violate antitrust law. In the view of the United States, each category of objection is serious in isolation, and, taken together, raise cause for concern. . . .

This Court should reject the Proposed Settlement in its current form and encourage the parties to continue negotiations to modify it so as to comply with Rule 23 and the copyright and antitrust laws.

Read more about it at "Do Justice Department Objections Spell Doom for Google's Online Book Deal?," "DOJ: Court Should Reject Google Book Search Settlement," and "Government Urges Changes to Google Books Deal."