“Google & Books: An Exchange”

In "Google & Books: An Exchange," Paul N. Courant, Ann Kjellberg, J. D. McClatchy, Edward Mendelson, Margo Viscusi, Tappan Wilder et al. have commented on Robert Darnton's "Google & the Future of Books," and Darnton has replied.

Here's an excerpt:

[Darnton] Monopolies tend to charge monopoly prices. I agree that the parallel between the pricing of digital and periodical materials isn't perfect, but it is instructive. If the readers of a library become so attached to Google's database that they cannot do without it, the library will find it extremely difficult to resist stiff increases in the price for subscribing to it. As happened when the publishers of periodicals forced up their prices, the library may feel compelled to cover the increased cost by buying fewer books. Exorbitant pricing for Google's service could produce the same effect as the skyrocketing of periodical prices: reduced acquisitions of monographs, a further decline in monograph publishing by university presses, and fewer opportunities for young scholars to publish their research and get ahead in their careers.

The Google Library Project: Is Digitization for Purposes of Online Indexing Fair Use Under Copyright Law?

The Congressional Research Service has released The Google Library Project: Is Digitization for Purposes of Online Indexing Fair Use Under Copyright Law?. (Thanks to ResourceShelf.)

Here's an excerpt:

The Google Book Search Library Project, announced in December 2004, raised important questions about infringing reproduction and fair use under copyright law. Google planned to digitize, index, and display "snippets" of print books in the collections of five major libraries without the permission of the books' copyright holders, if any. Authors and publishers owning copyrights to these books sued Google in September and October 2005, seeking to enjoin and recover damages for Google's alleged infringement of their exclusive rights to reproduce and publicly display their works. Google and proponents of its Library Project disputed these allegations. They essentially contended that Google's proposed uses were not infringing because Google allowed rights holders to "opt out" of having their books digitized or indexed. They also argued that, even if Google's proposed uses were infringing, they constituted fair uses under copyright law.

The arguments of the parties and their supporters highlighted several questions of first impression. First, does an entity conducting an unauthorized digitization and indexing project avoid committing copyright infringement by offering rights holders the opportunity to "opt out," or request removal or exclusion of their content? Is requiring rights holders to take steps to stop allegedly infringing digitization and indexing like requiring rights holders to use meta-tags to keep search engines from indexing online content? Or do rights holders employ sufficient measures to keep their books from being digitized and indexed online by publishing in print? Second, can unauthorized digitization, indexing, and display of "snippets" of print works constitute a fair use? Assuming unauthorized indexing and display of "snippets" are fair uses, can digitization claim to be a fair use on the grounds that apparently prima facie infringing activities that facilitate legitimate uses are fair uses?

On October 28, 2008, Google, authors, and publishers announced a proposed settlement, which, if approved by the court, could leave these and related questions unanswered. However, although a court granted preliminary approval to the settlement on November 17, 2008, final approval is still pending. Until final approval is granted, any rights holder belonging to the proposed settlement class—which includes "all persons having copyright interests in books" in the United States—could object to the agreement. The court could also reject the agreement as unfair, unreasonable, or inadequate. Moreover, even assuming final court approval, future cases may raise similar questions about infringing reproduction and fair use.

CDL Releases Self-Guided Tutorial for the eXtensible Text Framework

The California Digital Library has released a self-guided tutorial for its eXtensible Text Framework (XTF).

Here's an excerpt from the press release:

XTF is an open source, highly customizable piece of software supporting the search, browse, and display of heterogeneous digital content and offering efficient and practical methods for creating customized end-user interfaces for distinct digital collections. The tutorial provides guidance for implementing and customizing XTF, from core functionality to overall look and feel. . . .

The tutorial comes with a complete XTF package that is ready to run when uncompressed; no other installation is required. It contains nine modules spanning the most powerful and popular features, including how to:

  • Add new content
  • Change metadata
  • Change logo and colors
  • Increase significance of titles in ranking hits
  • Customize and enable default status of advanced search
  • Change fields displayed in search results
  • Enable structural searching
  • Create a hierarchical facet
  • Change footnote behavior

ACRL, ALA, and ARL Will File Google Book Search Settlement Amicus Brief

The American Library Association, the Association of College and Research Libraries, and the Association of Research Libraries will file an amicus brief authored by Jonathan Band about the Google Book Search Settlement.

Read more about it at "Library Organizations to File Amicus Brief in Google Book Search Settlement."

E-Book Duopoly?: Chairman of the Board of Association of American Publishers on the Google Book Search Settlement

Richard Sarnoff, Chairman of the Board of Association of American Publishers, discussed the Google Book Search Copyright Class Action Settlement at Princeton University's Center for Information Technology Policy last week.

Timothy B. Lee reports on his comments in "Publisher Speculates about Amazon/Google E-Book 'Duopoly'."

Walt Crawford on the Google Books Search Settlement

The latest issue of Cites & Insights: Crawford at Large is dedicated to an in-depth (30-page) look at the Google Book Search Copyright Class Action Settlement.

Here's an excerpt:

The agreement could be a lot worse. The outcome could also be a lot better. I'm sure Google would agree with both statements, as it finds itself in businesses where it has neither expertise nor much chance of advertising-level profits. At the same time, the copyright maximalists didn't quite win this round. We'll almost certainly get somewhat better access to several million OP books—and will have to hope (and work to see) that the price (monetary and otherwise) isn't too high.

UK's Intute Repository Search Project Releases Two Search Engines for Testing

Supported by JISC funding, the Intute Repository Search project is developing increasingly sophisticated search capabilities for document discovery in UK repositories, and it has released two search engines for testing (conceptual search and text mining based search).

Here's an excerpt from the press release:

Search services harvest the metadata and full-text out-put from institutional repositories, making the aggregated content searchable and browsable via a single interface. Intute Repository Search currently searches over 95 UK institutional repositories that are taken from the Directory of Open Access Repositories, OpenDOAR.

The development path of this project involves simple metadata search, full-text indexing of documents, text-mining of full-text documents, automatic subject classification, term-based document classification, query expansion, clustering of results and browsing/visualisation of the search results. User group requirements have been integrated into the project's development iterations to ensure that the project adequately reflects what researchers want from a service such as Intute Repository Search.

Two complementary advanced search and browse services have been developed for user testing. One is Autonomy IDOL (www.autonomy.com/content/Products/products-idol-server/index.en.html) and the other is using components developed by NaCTeM (www.nactem.ac.uk).

Autonomy IDOL relates to the conceptual feature of the service. This allows users to search for documents most closely matched to their query, read the overview and abstract of those documents and also have the opportunity to view documents relating to the query's search results. The result is a richer contextual search facility for users who want to view documents that are ranked according to their relation to the query.

NaCTeM has developed the text mining component. This allows users to take advantage of the TerMine service (www.nactem.ac.uk/software/termine/) among others, to automatically discover term associations within texts that are harvested from UK HE institutional repositories. By extracting information that would have otherwise been difficult or impossible to identify in a large number of documents, users can view documents that are linked with each other via salient concepts in a way that may lead to the answer of existing research questions or the creation of new ones. This then allows for a more meaningful and personalised search facility for users who are looking for specific patterns and connections between terms, within the collective resource of Intute Repository Search.

ALA, ARL, and ACRL Meeting on Google Book Search Settlement

In "ALA, ARL, ACRL Host Meeting of Experts to Discuss Google Book Search Settlement," District Dispatch reports on the numerous questions raised about the Google Book Search Settlement in a recent meeting on that topic.

Here's an excerpt :

  • Access. What will the settlement mean for protecting the public’s ability to access and use digital resources from the nation’s libraries? Since the Book Rights Registry established as a condition of the settlement will represent the interests of the authors and publishers, who will represent the interests of libraries and the public? What are the financial implications of participation? Could the settlement create a monopoly that threatens the mission of libraries by raising the prices to an unreasonable level that limits public access?
  • Intellectual freedom. Are there academic freedom issues to consider? What are the implications of Google’s ability to remove works at its discretion? Will there be notification of their removal? What are the issues regarding possible access and use restrictions on the Research Corpus?
  • Equitable treatment. Since not all libraries are addressed in the settlement, what impact will it have on the diverse landscape of libraries? In light of tight economic times, will this negatively affect libraries with lean budgets? Will it expand the digital divide?
  • Terms of use. Under the terms of the agreement, will library users continue to enjoy the same rights to information under copyright and other laws? Will the settlement impact the legal discussions and interpretations of library exceptions that allow for library lending, limited copying and preservation?

“How to Improve the Google Book Search Settlement”

James Grimmelmann, Associate Professor at New York Law School, has made available "How to Improve the Google Book Search Settlement" in the Berkeley Electronic Press' Selected Works.

Here's the abstract:

The proposed settlement in the Google Book Search case should be approved with strings attached. The project will be immensely good for society, and the proposed deal is a fair one for Google, for authors, and for publishers. The public interest demands, however, that the settlement be modified first. It creates two new entities—the Books Rights Registry Leviathan and the Google Book Search Behemoth—with dangerously concentrated power over the publishing industry. Left unchecked, they could trample on consumers in any number of ways. We the public have a right to demand that those entities be subject to healthy, pro-competitive oversight, and so we should.

Clarifications about the Michigan/OCLC OAIster Deal

Dorothea Salo has posted "The Straight Story on OAIster and Its Move" on Caveat Lector in which the University of Michigan Library's Katrina Hagedorn answers questions about the future of OAIster.

Here's an excerpt:

Q. Once oaister.org ceases to exist, there will be no way to search the harvested records for free except through worldcat.org, is that right?

A. I think those details haven’t been hammered out yet. Worldcat.org is one choice, yes. There will be likely be other products and services, and it’s likely you’ll be able to limit to just oaister records (for what that’s worth).

University of Michigan and OCLC Form OAIster Partnership

The University of Michigan and OCLC will jointly support the OAIster search engine for open access documents.

Here's an excerpt from the press release:

Launched in 2002 with grant support from the Andrew W. Mellon Foundation, OAIster was developed to test the feasibility of building a portal to open archive collections using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIster has since grown to become one of the largest aggregations of records pointing to open archive collections in the world with over 19 million records contributed by over 1,000 organizations worldwide.

Under the partnership, OAIster.org will continue to function as the public interface to OAIster collections, through funding provided by OCLC to the University of Michigan. Later in 2009, metadata harvesting operations will transfer from the University of Michigan to OCLC. . . .

Starting in late January 2009, while OAIster continues to be freely available at the www.oaister.org Web site, OCLC will host a version of OAIster on OCLC's FirstSearch platform and make it available through subscriptions to the FirstSearch Base Package at no additional charge.

“Editorial: Google Deal or Rip-Off?”

In "Editorial: Google Deal or Rip-Off?," Francine Fialkoff, Library Journal Editor-in-Chief, takes a hard look at the Google-Association of American Publishers/Authors Guild copyright settlement.

Here's an excerpt:

Clearly, the public had little standing in the negotiations that led to the recent agreement in the class-action lawsuit against Google for scanning books from library shelves. . . . Well, the suit was never about the public interest but about corporate interests, and librarians did not have much power at the bargaining table, no matter how hard those consulted pushed. While there are many provisions in the document that specify what libraries can and can't do and portend greater access, ultimately, it is the restrictions that scream out at us from the miasma of details.

Other perspectives can be found in my recently updated Google Book Search Bibliography, Version 3.

CiteSeerX and SeerSuite: Havester + Search Engine + AI

In "CiteSeerX and SeerSuite—Adding to the Semantic Web," Avi Rappoport overviews beta versions of CiteSeerX and its open source, Java-based counterpart, SeerSuite.

Here's an excerpt:

Building on that experience, CiteSeerX is a completely new system, re-architected for scaling and modularity, to handle increasing demands from both researchers and digital library programmatic interfaces. The system uses artificial intelligence, machine learning, support vector machines, and other techniques to recognize and extract metadata for the articles found. It now uses the Lucene search engine and supports standards such as the Open Archives Initiative (OAI), including metadata browsing, and Z39.50. CiteSeerX has a simple but powerful internal structure for documents and citations. If it cannot access a document cited, it creates a virtual document as a place holder, which can then be filled when the document is available.

Google Book Search Bibliography, Version 3

The Google Book Search Bibliography, Version 3 is now available.

This bibliography presents selected English-language articles and other works that are useful in understanding Google Book Search. It primarily focuses on the evolution of Google Book Search and the legal, library, and social issues associated with it. Where possible, links are provided to works that are freely available on the Internet, including e-prints in disciplinary archives and institutional repositories. Note that e-prints and published articles may not be identical.

A Guide for the Perplexed: Libraries & the Google Library Project Settlement

ARL and ALA have released A Guide for the Perplexed: Libraries & the Google Library Project Settlement.

Here's an excerpt from the press release:

The guide is designed to help the library community better understand the terms and conditions of the recent settlement agreement between Google, the Authors Guild, and the Association of American Publishers concerning Google’s scanning of copyrighted works. Band notes that the settlement is extremely complex and presents significant challenges and opportunities to libraries. The guide outlines and simplifies the settlement’s provisions, with special emphasis on the provisions that apply directly to libraries.

Reference Extract: The Librarian-Recommendation-Weighted Search Engine

OCLC, the School of Information Studies at Syracuse University, and the University of Washington Information School have received a $100,000 grant from the John D. and Catherine T. MacArthur Foundation to plan a librarian-recommendation-weighted search engine called Reference Extract.

Here's an excerpt from the press release:

"Sometimes, the simplest ideas are the most powerful," said Dr. Mike Eisenberg, Dean Emeritus and Professor at the Information School of the University of Washington and a lead on the project. "The best search engines are great for basic search, but sometimes the Web site results lack credibility in terms of trust, accuracy and reliability. So, who can help? Librarians. If a librarian recommends a Web site, you can be pretty sure that it's credible. RefEx will take hundreds of thousands of librarian recommendations and use them in a full-scale search engine."

Reference Extract is envisioned as a Web search experience similar to those provided by the world's most popular search engines. However, unlike other search engines, Reference Extract will be built for maximum credibility of search results by relying on the expertise of librarians. Users will enter a search term and receive results weighted toward sites most often used by librarians at institutions such as the Library of Congress, the University of Washington, the State Library of Maryland, and over 2,000 other libraries worldwide.

As part of the planning process, participants are reaching out to partners in libraries, technology organizations and research institutions. "The only way this will work is by making a project of an entire community," said Dr. R. David Lankes, Director of the Information Institute of Syracuse and Associate Professor at Syracuse University's School of Information Studies. "Web searchers get to tap into the incredible skill and knowledge of the library community, while librarians will be able to serve users on a whole new scale. This work follows on previous credibility work supported by the MacArthur Foundation, most notably the Credibility Commons (http://credibilitycommons.org/)." . . .

The Reference Extract project will hold a series of meetings and consultations over the coming months. The team is eager to build a business plan and technology architecture to benefit users and the library community alike. Those interested in providing input on the project and learning more can visit the project Web site at http://digref.org.

Georgia Harper on the Google-AAP/AG Copyright Settlement

In "The LJ Academic Newswire Newsmaker Interview: Georgia Harper," Harper, Scholarly Communications Advisor at the University Libraries of the University of Texas at Austin, discusses the Google-AAP/AG copyright settlement and the part that research libraries played in it. Also see her blog posting ("Google Book Search—and Buy").

Here's an excerpt:

Brewster Kahle has chastised public libraries for working with Google under a cloak of secrecy. Can libraries realistically refuse NDAs?

I think Kahle’s point, and others raise this point too, is more about the deleterious effects of secrecy on the negotiation process itself. Secrecy tends to be isolating. If you don’t consult with your colleagues at other institutions, your leverage may be diminished. Of course, a library could also hire a business and/or legal consultant to help, and bind the consultant to the NDA. Yes, Kahle has identified a very thorny problem, but it’s one we can ameliorate. I don’t think it’s workable simply not to do business with companies whose assets are ideas and information just because they feel compelled to protect them through secrecy. Either way, consultation does increase information, and information is power—in fact, the power of information is also the source of the [NDA] problem in the first place.

Google-AAP/AG Copyright Settlement: Vaidhyanathan Questions, Google Answers

On October 28th, Siva Vaidhyanathan posed some questions to Google about its copyright settlement with the Association of American Publishers and the Authors Guild ("My Initial Take on the Google-Publishers Settlement"). Now, Google has replied ("Some Initial Answers to My Initial Questions about Google Book Search and the Settlement").