Historians’ Work Disrupted When Paper of Record Digital Archive Vanishes after Google Purchase

After Google purchased the Paper of Record digital archive, it brought the site down, upsetting historians that relied on the collection of older newspapers. Although the site will be temporarily restored with Google's permission, the incident raises issues about the permanence and reliability of scholarly digital archives.

Read more about it at "Digital Archives That Disappear" and "'Paper of Record' Disappears, Leaving Historians in the Lurch."

Google Labs Releases Google News Timeline and Similar Images

Google Labs has released Google News Timeline and Similar Images.

Here's an excerpt from the press release:

Image Search is a tool you can use to find just about any kind of image, but it can sometimes be difficult to find the right image if you can't describe it in words. The new Similar Images feature was developed with just this in mind. Using it you can now find images that look like an existing result simply by clicking on a link. Using visual similarity, you don't have to refine the text of your search, instead, you can just click on the link of an image you like. For example, if you search for [jaguar], you can use the "Similar images" link to quickly narrow your search.

“The Google Book Search Settlement: Ends, Means, and the Future of Books”

James Grimmelmann of the New York Law School has self-archived "The Google Book Search Settlement: Ends, Means, and the Future of Books" in SSRN.

Here's an excerpt:

The settlement tackles the orphan works problem, but through the judicial process. Laundering orphan works legislation through a class action lawsuit is both a brilliant response to legislative inaction and a dangerous use of the judicial power. Many of the public interest safeguards that would have been present in the political arena are attenuated in a seemingly private lawsuit; the lack of such safeguards is evident in the terms of the resulting settlement. The solution is to reinsert these missing public interest protections into the settlement.

Pamela Samuelson: “Legally Speaking: The Dead Souls of the Google Booksearch Settlement”

Pamela Samuelson, Richard M. Sherman Distinguished Professor of Law and Information at the University of California, Berkeley, has posted an eprint of "Legally Speaking: The Dead Souls of the Google Booksearch Settlement" on O'Reilly Radar.

Here's an excerpt:

This column argues that the proposed settlement of this lawsuit is a privately negotiated compulsory license primarily designed to monetize millions of orphan works. It will benefit Google and certain authors and publishers, but it is questionable whether the authors of most books in the corpus (the "dead souls" to which the title refers) would agree that the settling authors and publishers will truly represent their interests when setting terms for access to the Book Search corpus.

(Note: See the Wikipedia entry on Nikolai Gogol's Dead Souls.)

“The Google Book Search Settlement: A New Orphan-Works Monopoly?”

Randal C. Picker of the University of Chicago Law School has self-archived "The Google Book Search Settlement: A New Orphan-Works Monopoly?" in SSRN.

Here's an excerpt:

The settlement agreement is exceeding complex but I have focused on three issues that raise antitrust and competition policy concerns. First, the agreement calls for Google to act as agent for rights holders in setting the price of online access to consumers. Google is tasked with developing a pricing algorithm that will maximize revenues for each of those works. Direct competition among rights holders would push prices towards some measure of costs and would not be designed to maximize revenues. As I think that level of direct coordination of prices is unlikely to mimic what would result in competition, I have real doubts about whether the consumer access pricing provision would survive a challenge under Section 1 of the Sherman Act.

Second, and much more centrally to the settlement agreement, the opt out class action will make it possible for Google to include orphan works in its book search service. Orphan works are works as to which the rightsholder can't be identified or found. That means that a firm like Google can't contract with an orphan holder directly to include his or her work in the service and that would result in large numbers of missing works. The opt out mechanism—which shifts the default from copyright's usual out to the class action's in—brings these works into the settlement. . . .

Third, there is a risk that approval by the court of the settlement could cause antitrust immunities to attach to the arrangements created by the settlement agreement. As it is highly unlikely that the fairness hearing will undertake a meaningful antitrust analysis of those arrangements, if the district court approves the settlement, the court should include a clause—call this a no Noerr clause—in the order approving the settlement providing that no antitrust immunities attach from the court's approval.

National Academies Makes Over 9,000 Reports Freely Available on Google Book Search

The National Academies have made over 9,000 Reports freely available on Google Book Search.

Here's an excerpt from the press release:

The National Academies today announced the completion of the first phase of a partnership with Google to digitize the library's collection of reports from 1863 to 1997, making them available—free, searchable, and in full text—through Google Book Search. The Academies plan to have their entire collection of nearly 11,000 reports digitized by 2011. . . .

Prior to this project, the Academies digitized more than 4,000 books and made them available online through the National Academies Press; most of those can also be found in Google Book Search. However, researchers who needed to gain access to hard copies of older reports, part of a legacy collection in the library, could not always find what they wanted. Many of these reports exist as single copies, and the library feared potential damage or loss of this important collection. These older reports have been digitized and are now accessible through Google. In addition, the "digitizing of these materials will add another dimension to the preservation of our reports," said Harriston. The Academies hope that wider availability of its reports will be of use to scientists in developing countries, who often rely on the Internet to gather information.

Consumer Watchdog Challenges Google Book Search Settlement

Consumer Watchdog has sent a letter to Attorney General Eric Holder that challenges the terms of the Google Book Search Copyright Class Action Settlement.

Here's an excerpt from the press release:

The proposed settlement announced last year creates the nonprofit Book Rights Registry to manage book digital rights issues. Here are the deal’s two most troubling aspects, Consumer Watchdog said:

—A "most favored nation" clause guarantees Google the same terms that any future competitor might be offered. Under the most favored nation clause the registry would be prevented from offering more advantageous terms to, for example, Yahoo! or Microsoft, even if it thought better terms would be necessary to enable either to enter into the digital books business and provide competition to Google. It is inappropriate for the resolution of a class action lawsuit to effectively create an "anti-compete" clause, which precludes smaller competitors from entering a market. Given the dominance of Google over the digital book market, it would no doubt take more advantageous terms to allow another smaller competitor to enter the market.

—The settlement provides a mechanism for Google to deal with "orphan works." Orphan works are works under copyright, but with the rights holders unknown or not found. The danger of using such works is that a rights holder will emerge after the book has been exploited and demand substantial infringement penalties. The proposed settlement protects Google from such potentially damaging exposure, but provides no protection for others. This effectively is a barrier for competitors to enter the digital book business.

The most favored nation provision should be eliminated to remove barriers of entry and the orphan works provision should be extended to cover all who digitize books, Consumer Watchdog said.

Sony’s eBook Store to Offer Over a Half-Million Public Domain Books from Google

Sony's eBook store will offer over a half-million public domain e-books from Google.

Here's an excerpt from the press release:

At Sony’s eBook store (ebookstore.sony.com), a button on the front page leads to the books from Google, which people can transfer to their PRS-505 or PRS-700 Reader at no cost. The process is seamless for Reader owners who have an account at the store. Those new to the store will need to set up an account and download Sony’s free eBook Library software. To start, people can access more than a half-million public domain books from Google, boosting the available titles from the eBook Store to more than 600,000. . . .

Books from Google will feature an extensive list of traditional favorites, including "The Awakening," "A Connecticut Yankee in King Arthur’s Court," and "Black Beauty," as well as a number of items that can be more difficult for people to access. For example, literature lovers can find and read The Letters of Jane Austen in addition to "Sense and Sensibility" and "Emma." Also included are a number of titles in French, German, Italian, Spanish and other languages. People can search the full text of the collection, or they can browse by subject, author, or featured titles.

Peter Brantley on Orphan Works and the Google Book Search Settlement

In "The Orphan Monopoly," Peter Brantley, Executive Director for the Digital Library Federation, examines issues related to orphan works and the Google Book Search Copyright Class Action Settlement.

Here's an excerpt:

There is a lot to ponder: This is arguably a massive re-writing of copyright for books without any legislative input; Marybeth Peters (MBP), the U.S. Registrar of Copyrights, observed that the settlement essentially proposes a private agreement for compulsory licensing between a large class of IP holders and world’s largest search engine. The potential scope and policy ramifications are significant. MBP mentioned that there might be treaty implications under international conventions. And despite that, one of the most shocking of her statements was that the Copyright Office has not received a single inquiry from any of the 535 elected representatives of the people of the United States. Not. One.

“Orphan Works Legislation and the Google Settlement”

In "Orphan Works Legislation and the Google Settlement," Paul Courant discusses the possibility of legislation that would extend the treatment of orphan works in the Google Book Search Copyright Class Action Settlement to anyone.

Here's an excerpt:

But there is an obvious solution, one that was endorsed at the Columbia meeting by counsel for the Authors Guild, the AAP, and Google: Congress could pass a law, giving access to the same sort of scheme that Google and the BRR have under the Google Settlement to anyone. And they could pass some other law that makes it possible for people to responsibly use orphaned works, while preserving interests for the missing "parents" should they materialize. Jack Bernard and Susan Kornfield have proposed just such an architecture to "foster" these orphans. Google has also made a proposal that would be a huge improvement.

“Google & Books: An Exchange”

In "Google & Books: An Exchange," Paul N. Courant, Ann Kjellberg, J. D. McClatchy, Edward Mendelson, Margo Viscusi, Tappan Wilder et al. have commented on Robert Darnton's "Google & the Future of Books," and Darnton has replied.

Here's an excerpt:

[Darnton] Monopolies tend to charge monopoly prices. I agree that the parallel between the pricing of digital and periodical materials isn't perfect, but it is instructive. If the readers of a library become so attached to Google's database that they cannot do without it, the library will find it extremely difficult to resist stiff increases in the price for subscribing to it. As happened when the publishers of periodicals forced up their prices, the library may feel compelled to cover the increased cost by buying fewer books. Exorbitant pricing for Google's service could produce the same effect as the skyrocketing of periodical prices: reduced acquisitions of monographs, a further decline in monograph publishing by university presses, and fewer opportunities for young scholars to publish their research and get ahead in their careers.

The Google Library Project: Is Digitization for Purposes of Online Indexing Fair Use Under Copyright Law?

The Congressional Research Service has released The Google Library Project: Is Digitization for Purposes of Online Indexing Fair Use Under Copyright Law?. (Thanks to ResourceShelf.)

Here's an excerpt:

The Google Book Search Library Project, announced in December 2004, raised important questions about infringing reproduction and fair use under copyright law. Google planned to digitize, index, and display "snippets" of print books in the collections of five major libraries without the permission of the books' copyright holders, if any. Authors and publishers owning copyrights to these books sued Google in September and October 2005, seeking to enjoin and recover damages for Google's alleged infringement of their exclusive rights to reproduce and publicly display their works. Google and proponents of its Library Project disputed these allegations. They essentially contended that Google's proposed uses were not infringing because Google allowed rights holders to "opt out" of having their books digitized or indexed. They also argued that, even if Google's proposed uses were infringing, they constituted fair uses under copyright law.

The arguments of the parties and their supporters highlighted several questions of first impression. First, does an entity conducting an unauthorized digitization and indexing project avoid committing copyright infringement by offering rights holders the opportunity to "opt out," or request removal or exclusion of their content? Is requiring rights holders to take steps to stop allegedly infringing digitization and indexing like requiring rights holders to use meta-tags to keep search engines from indexing online content? Or do rights holders employ sufficient measures to keep their books from being digitized and indexed online by publishing in print? Second, can unauthorized digitization, indexing, and display of "snippets" of print works constitute a fair use? Assuming unauthorized indexing and display of "snippets" are fair uses, can digitization claim to be a fair use on the grounds that apparently prima facie infringing activities that facilitate legitimate uses are fair uses?

On October 28, 2008, Google, authors, and publishers announced a proposed settlement, which, if approved by the court, could leave these and related questions unanswered. However, although a court granted preliminary approval to the settlement on November 17, 2008, final approval is still pending. Until final approval is granted, any rights holder belonging to the proposed settlement class—which includes "all persons having copyright interests in books" in the United States—could object to the agreement. The court could also reject the agreement as unfair, unreasonable, or inadequate. Moreover, even assuming final court approval, future cases may raise similar questions about infringing reproduction and fair use.

CDL Releases Self-Guided Tutorial for the eXtensible Text Framework

The California Digital Library has released a self-guided tutorial for its eXtensible Text Framework (XTF).

Here's an excerpt from the press release:

XTF is an open source, highly customizable piece of software supporting the search, browse, and display of heterogeneous digital content and offering efficient and practical methods for creating customized end-user interfaces for distinct digital collections. The tutorial provides guidance for implementing and customizing XTF, from core functionality to overall look and feel. . . .

The tutorial comes with a complete XTF package that is ready to run when uncompressed; no other installation is required. It contains nine modules spanning the most powerful and popular features, including how to:

  • Add new content
  • Change metadata
  • Change logo and colors
  • Increase significance of titles in ranking hits
  • Customize and enable default status of advanced search
  • Change fields displayed in search results
  • Enable structural searching
  • Create a hierarchical facet
  • Change footnote behavior

ACRL, ALA, and ARL Will File Google Book Search Settlement Amicus Brief

The American Library Association, the Association of College and Research Libraries, and the Association of Research Libraries will file an amicus brief authored by Jonathan Band about the Google Book Search Settlement.

Read more about it at "Library Organizations to File Amicus Brief in Google Book Search Settlement."

E-Book Duopoly?: Chairman of the Board of Association of American Publishers on the Google Book Search Settlement

Richard Sarnoff, Chairman of the Board of Association of American Publishers, discussed the Google Book Search Copyright Class Action Settlement at Princeton University's Center for Information Technology Policy last week.

Timothy B. Lee reports on his comments in "Publisher Speculates about Amazon/Google E-Book 'Duopoly'."

Walt Crawford on the Google Books Search Settlement

The latest issue of Cites & Insights: Crawford at Large is dedicated to an in-depth (30-page) look at the Google Book Search Copyright Class Action Settlement.

Here's an excerpt:

The agreement could be a lot worse. The outcome could also be a lot better. I'm sure Google would agree with both statements, as it finds itself in businesses where it has neither expertise nor much chance of advertising-level profits. At the same time, the copyright maximalists didn't quite win this round. We'll almost certainly get somewhat better access to several million OP books—and will have to hope (and work to see) that the price (monetary and otherwise) isn't too high.

UK's Intute Repository Search Project Releases Two Search Engines for Testing

Supported by JISC funding, the Intute Repository Search project is developing increasingly sophisticated search capabilities for document discovery in UK repositories, and it has released two search engines for testing (conceptual search and text mining based search).

Here's an excerpt from the press release:

Search services harvest the metadata and full-text out-put from institutional repositories, making the aggregated content searchable and browsable via a single interface. Intute Repository Search currently searches over 95 UK institutional repositories that are taken from the Directory of Open Access Repositories, OpenDOAR.

The development path of this project involves simple metadata search, full-text indexing of documents, text-mining of full-text documents, automatic subject classification, term-based document classification, query expansion, clustering of results and browsing/visualisation of the search results. User group requirements have been integrated into the project's development iterations to ensure that the project adequately reflects what researchers want from a service such as Intute Repository Search.

Two complementary advanced search and browse services have been developed for user testing. One is Autonomy IDOL (www.autonomy.com/content/Products/products-idol-server/index.en.html) and the other is using components developed by NaCTeM (www.nactem.ac.uk).

Autonomy IDOL relates to the conceptual feature of the service. This allows users to search for documents most closely matched to their query, read the overview and abstract of those documents and also have the opportunity to view documents relating to the query's search results. The result is a richer contextual search facility for users who want to view documents that are ranked according to their relation to the query.

NaCTeM has developed the text mining component. This allows users to take advantage of the TerMine service (www.nactem.ac.uk/software/termine/) among others, to automatically discover term associations within texts that are harvested from UK HE institutional repositories. By extracting information that would have otherwise been difficult or impossible to identify in a large number of documents, users can view documents that are linked with each other via salient concepts in a way that may lead to the answer of existing research questions or the creation of new ones. This then allows for a more meaningful and personalised search facility for users who are looking for specific patterns and connections between terms, within the collective resource of Intute Repository Search.

ALA, ARL, and ACRL Meeting on Google Book Search Settlement

In "ALA, ARL, ACRL Host Meeting of Experts to Discuss Google Book Search Settlement," District Dispatch reports on the numerous questions raised about the Google Book Search Settlement in a recent meeting on that topic.

Here's an excerpt :

  • Access. What will the settlement mean for protecting the public’s ability to access and use digital resources from the nation’s libraries? Since the Book Rights Registry established as a condition of the settlement will represent the interests of the authors and publishers, who will represent the interests of libraries and the public? What are the financial implications of participation? Could the settlement create a monopoly that threatens the mission of libraries by raising the prices to an unreasonable level that limits public access?
  • Intellectual freedom. Are there academic freedom issues to consider? What are the implications of Google’s ability to remove works at its discretion? Will there be notification of their removal? What are the issues regarding possible access and use restrictions on the Research Corpus?
  • Equitable treatment. Since not all libraries are addressed in the settlement, what impact will it have on the diverse landscape of libraries? In light of tight economic times, will this negatively affect libraries with lean budgets? Will it expand the digital divide?
  • Terms of use. Under the terms of the agreement, will library users continue to enjoy the same rights to information under copyright and other laws? Will the settlement impact the legal discussions and interpretations of library exceptions that allow for library lending, limited copying and preservation?

“How to Improve the Google Book Search Settlement”

James Grimmelmann, Associate Professor at New York Law School, has made available "How to Improve the Google Book Search Settlement" in the Berkeley Electronic Press' Selected Works.

Here's the abstract:

The proposed settlement in the Google Book Search case should be approved with strings attached. The project will be immensely good for society, and the proposed deal is a fair one for Google, for authors, and for publishers. The public interest demands, however, that the settlement be modified first. It creates two new entities—the Books Rights Registry Leviathan and the Google Book Search Behemoth—with dangerously concentrated power over the publishing industry. Left unchecked, they could trample on consumers in any number of ways. We the public have a right to demand that those entities be subject to healthy, pro-competitive oversight, and so we should.