A Guide for the Perplexed: Libraries & the Google Library Project Settlement

ARL and ALA have released A Guide for the Perplexed: Libraries & the Google Library Project Settlement.

Here's an excerpt from the press release:

The guide is designed to help the library community better understand the terms and conditions of the recent settlement agreement between Google, the Authors Guild, and the Association of American Publishers concerning Google’s scanning of copyrighted works. Band notes that the settlement is extremely complex and presents significant challenges and opportunities to libraries. The guide outlines and simplifies the settlement’s provisions, with special emphasis on the provisions that apply directly to libraries.

Reference Extract: The Librarian-Recommendation-Weighted Search Engine

OCLC, the School of Information Studies at Syracuse University, and the University of Washington Information School have received a $100,000 grant from the John D. and Catherine T. MacArthur Foundation to plan a librarian-recommendation-weighted search engine called Reference Extract.

Here's an excerpt from the press release:

"Sometimes, the simplest ideas are the most powerful," said Dr. Mike Eisenberg, Dean Emeritus and Professor at the Information School of the University of Washington and a lead on the project. "The best search engines are great for basic search, but sometimes the Web site results lack credibility in terms of trust, accuracy and reliability. So, who can help? Librarians. If a librarian recommends a Web site, you can be pretty sure that it's credible. RefEx will take hundreds of thousands of librarian recommendations and use them in a full-scale search engine."

Reference Extract is envisioned as a Web search experience similar to those provided by the world's most popular search engines. However, unlike other search engines, Reference Extract will be built for maximum credibility of search results by relying on the expertise of librarians. Users will enter a search term and receive results weighted toward sites most often used by librarians at institutions such as the Library of Congress, the University of Washington, the State Library of Maryland, and over 2,000 other libraries worldwide.

As part of the planning process, participants are reaching out to partners in libraries, technology organizations and research institutions. "The only way this will work is by making a project of an entire community," said Dr. R. David Lankes, Director of the Information Institute of Syracuse and Associate Professor at Syracuse University's School of Information Studies. "Web searchers get to tap into the incredible skill and knowledge of the library community, while librarians will be able to serve users on a whole new scale. This work follows on previous credibility work supported by the MacArthur Foundation, most notably the Credibility Commons (http://credibilitycommons.org/)." . . .

The Reference Extract project will hold a series of meetings and consultations over the coming months. The team is eager to build a business plan and technology architecture to benefit users and the library community alike. Those interested in providing input on the project and learning more can visit the project Web site at http://digref.org.

Georgia Harper on the Google-AAP/AG Copyright Settlement

In "The LJ Academic Newswire Newsmaker Interview: Georgia Harper," Harper, Scholarly Communications Advisor at the University Libraries of the University of Texas at Austin, discusses the Google-AAP/AG copyright settlement and the part that research libraries played in it. Also see her blog posting ("Google Book Search—and Buy").

Here's an excerpt:

Brewster Kahle has chastised public libraries for working with Google under a cloak of secrecy. Can libraries realistically refuse NDAs?

I think Kahle’s point, and others raise this point too, is more about the deleterious effects of secrecy on the negotiation process itself. Secrecy tends to be isolating. If you don’t consult with your colleagues at other institutions, your leverage may be diminished. Of course, a library could also hire a business and/or legal consultant to help, and bind the consultant to the NDA. Yes, Kahle has identified a very thorny problem, but it’s one we can ameliorate. I don’t think it’s workable simply not to do business with companies whose assets are ideas and information just because they feel compelled to protect them through secrecy. Either way, consultation does increase information, and information is power—in fact, the power of information is also the source of the [NDA] problem in the first place.

Google-AAP/AG Copyright Settlement: Vaidhyanathan Questions, Google Answers

On October 28th, Siva Vaidhyanathan posed some questions to Google about its copyright settlement with the Association of American Publishers and the Authors Guild ("My Initial Take on the Google-Publishers Settlement"). Now, Google has replied ("Some Initial Answers to My Initial Questions about Google Book Search and the Settlement").

"Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web"

Duncan Hull, Steve R. Pettifer, and Douglas B. Kel have published "Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web" in PLoS Computational Biology.

Here's the abstract:

Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as 'thought in cold storage,' and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.

Google Newspaper Digitization Project Announced

Google has announced a newspaper digitization project that will "make more old newspapers accessible and searchable online by partnering with newspaper publishers to digitize millions of pages of news archives."

Read more about it at "Bringing History Online, One Newspaper at a Time."

SRU Open Search: Open Source Customizable Interface for Displaying SRU-Formatted XML

The Institute for Research and Innovation in Social Services at the University of Strathclyde has released SRU Open Search, an open source customizable interface for displaying SRU-formatted XML.

Here are some features selected from a more comprehensive list:

  • Bookmarkable pages, so you can share a page of results via email
  • Share items via social bookmarking sites (Delicious, Digg, Google)
  • Featured audio highlighting—inline mp3 player via flash
  • Featured content highlighting . . .
  • Visualisation of search terms via pie chart, tag cloud & tree map . . .
  • Portable version of search so users can add to their own site
  • Browser search plugin for Firefox & Internet Explorer (inc Auto Suggest)

Solr Search Engine Plug-In for Fedora Released

The DRAMA team has released a Solr plug-in for Fedora.

Here's a description of Solr from its home page:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

Coverage of the Demise of Microsoft's Mass Digitization Project

Microsoft's decision to end its Live Search Books program, which provided important funding for the Open Content Alliance, has been widely covered by newspapers, blogs, and other information sources.

Here's a selection of articles and posts: "Books Scanning to be Publicly Funded," "'It Ain’t Over Till It's Over': Impact of the Microsoft Shutdown," "Microsoft Abandons Live Search Books/Academic Scan Plan," "Microsoft Burns Book Search—Lacks 'High Consumer Intent,'" "Microsoft Shuts Down Two of Its Google 'Wannabe’s': Live Search Books and Live Search Academic," "Microsoft Will Shut Down Book Search Program," "Microsoft's Book-Search Project Has a Surprise Ending," "Post-Microsoft, Libraries Mull Digitization," "Publishers Surprised by Microsoft Move," "Why Killing Live Book Search Is Good for the Future of Books," and "Without Microsoft, British Library Keeps on Digitizing."

National Science Digital Library NCore Team Releases NSDL Search, MediaWiki Extensions, and WordPress MU Plug-Ins

The National Science Digital Library NCore team has released three applications:

Google Book Search Book Viewability API Released

Google has released the Google Book Search Book Viewability API.

Here's an excerpt from the API home page:

The Google Book Search Book Viewability API enables developers to:

  • Link to Books in Google Book Search using ISBNs, LCCNs, and OCLC numbers
  • Know whether Google Book Search has a specific title and what the viewability of that title is
  • Generate links to a thumbnail of the cover of a book
  • Generate links to an informational page about a book
  • Generate links to a preview of a book

Read more about it at "Book Info Where You Need It, When You Need It."

Digital Library Federation ILS and Discovery Systems Draft Report

The Digital Library Federation's ILS and Discovery Systems working group has issued a Draft Recommendation investigating issues related to integrated library system and discovery system integration.

Here's an excerpt from the "Introduction":

This document is the (DRAFT) report of that group. It gives technical recommendations for integrating the ILS with external discovery applications. This report includes

  • A summary of a survey of the needs and discovery applications implemented and desired by libraries in DLF (and other similar libraries).
  • A high-level summary of specific abstract functions that discovery applications need to be able to invoke on ILS's and/or their data to support desired discovery applications, as well as outgoing services from ILS software to other applications.
  • Recommendations for concrete bindings for these functions (i.e. specific protocols, APIs, data standards, etc.) that can be used with future and/or existing ILS's. Producing a complete concrete binding and reference implementation is beyond the scope of this small, short-term group; but we hope to provide sufficient requirements and details that others can produce appropriate bindings and implementations.
  • Practical recommendations to encourage libraries, ILS developers, and discovery application developers to expeditiously integrate discovery systems with the ILS and other sources of bibliographic metadata.

Summa: A Federated Search System

Statsbiblioteket is developing Summa, a federated search system.

Birte Christensen-Dalsgaard, Director of Development, discusses Summa and other topics in a new podcast (CNI Podcast: An Interview with Birte Christensen-Dalsgaard, Director of Development at the State and University Library, Denmark).

Here's an excerpt from the podcast abstract:

Summa is an open source system implementing modular, service-based architecture. It is based on the fundamental idea "free the content from the proprietary library systems," where the discovery layer is separated from the business layer. In doing so, any Internet technology can be used without the limitations traditionally set by proprietary library systems, and there is the flexibility to integrate or to be integrated into other systems. A first version of a Fedora—Summa integration has been developed.

A white paper is available that examines the system in more detail.

Columbia University and Microsoft Book Digitization Project

The Columbia University Libraries have announced that they will work with Microsoft to digitize a "large number of books" that are in the public domain.

Here's an excerpt from the press release:

Columbia University and Microsoft Corp. are collaborating on an initiative to digitize a large number of books from Columbia University Libraries and make them available to Internet users. With the support of the Open Content Alliance (OCA), publicly available print materials in Columbia Libraries will be scanned, digitized, and indexed to make them readily accessible through Live Search Books. . . .

Columbia University Libraries is playing a key role in book selection and in setting quality standards for the digitized materials. Microsoft will digitize selected portions of the Libraries’ great collections of American history, literature, and humanities works, with the specific areas to be decided mutually by Microsoft and Columbia during the early phase of the project.

Microsoft will give the Library high-quality digital images of all the materials, allowing the Library to provide worldwide access through its own digital library and to share the content with non-commercial academic initiatives and non-profit organizations.

Read more about it at "Columbia University Joins Microsoft Scan Plan."

Wikia Search Debuts to Pundits’ Criticism

An alpha version of Wikia's open source Wikia Search has gone public, but the consensus seems to be that this user-tuned search engine has a long way to go to compete with the likes of Google.

Read more about it at "Jimmy Wales Argues That His Wikia Needs More Time," "Wiki Citizens Taking on a New Area: Searching," "Wikia Launching Human-Powered Search," "Wikia Search Alpha Preview Leaves Much to Be Desired," "Wikia Search Is A Complete Letdown," and"Wikia Search—Miles Behind the Competition."

Google Gives Wikipedia a Lump of Knol for Xmas

According to "Encouraging People to Contribute Knowledge," Google has launched Knol, a Wikipedia competitor, in test mode.

Here'as an excerpt from the posting:

Earlier this week, we [Google] started inviting a selected group of people to try a new, free tool that we are calling "knol", which stands for a unit of knowledge. Our goal is to encourage people who know a particular subject to write an authoritative article about it. . . . .

A knol on a particular topic is meant to be the first thing someone who searches for this topic for the first time will want to read. The goal is for knols to cover all topics, from scientific concepts, to medical information, from geographical and historical, to entertainment, from product information, to how-to-fix-it instructions. Google will not serve as an editor in any way, and will not bless any content. . . . .For many topics, there will likely be competing knols on the same subject. . . .

Knols will include strong community tools. People will be able to submit comments, questions, edits, additional content, and so on. Anyone will be able to rate a knol or write a review of it. Knols will also include references and links to additional information. At the discretion of the author, a knol may include ads.

Read more about it at "Google to Wikipedia: "Knol" Thine Enemy," "Google's Knol: No Wikipedia Killer," "Google's 'Knols' Aren't a Threat to Wikipedia," "Google's Know-It-All Project," and "Google's Units of Knowledge May Raise Conflict of Interest."

Columbia University Libraries and Bavarian State Library Become Google Book Search Library Partners

Both the Columbia University Libraries and Bavarian State Library have joined the Google Book Search Library Project.

Here are the announcements:

Update on the British Public Library/Microsoft Digitization Project

Jim Ashling provides an update on the progress that the British Public Library and Microsoft have made in their project to digitize about 100,000 books for access in Live Book Search in his Information Today article "Progress Report: The British Library and Microsoft Digitization Partnership."

Here's an excerpt from the article:

Unlike previous BL digitization projects where material had been selected on an item-by-item basis, the sheer size of this project made such selectivity impossible. Instead, the focus is on English-language material, collected by the BL during the 19th century. . . .

Scanning produces high-resolution images (300 dpi) that are then transferred to a suite of 12 computers for OCR (optical character recognition) conversion. The scanners, which run 24/7, are specially tuned to deal with the spelling variations and old-fashioned typefaces used in the 1800s. The process creates multiple versions including PDFs and OCR text for display in the online services, as well as an open XML file for long-term storage and potential conversion to any new formats that may become future standards. In all, the data will amount to 30 to 40 terabytes. . . .

Obviously, then, an issue exists here for a collection of 19th-century literature when some authors may have lived beyond the late 1930s [British/EU law gives authors a copyright term of life plus 70 years]. An estimated 40 percent of the titles are also orphan works. Those two issues mean that item-by-item copyright checking would be an unmanageable task. Estimates for the total time required to check on the copyright issues involved vary from a couple of decades to a couple of hundred years. The BL’s approach is to use two databases of authors to identify those who were still living in 1936 and to remove their work from the collection before scanning. That, coupled with a wide publicity to encourage any rights holders to step forward, may solve the problem.

Yale Will Work with Microsoft to Digitize 100,000 Books

The Yale University Library and Microsoft will work together to digitize 100,000 English-language out-of-copyright books, which will be made available via Microsoft’s Live Search Books.

Here’s an excerpt from the press release:

The Library and Microsoft have selected Kirtas Technologies to carry out the process based on their proven excellence and state-of-the art equipment. The Library has successfully worked with Kirtas previously, and the company will establish a digitization center in the New Haven area. . . .

The project will maintain rigorous standards established by the Yale Library and Microsoft for the quality and usability of the digital content, and for the safe and careful handling of the physical books. Yale and Microsoft will work together to identify which of the approximately 13 million volumes held by Yale’s 22 libraries will be digitized. Books selected for digitization will remain available for use by students and researchers in their physical form. Digital copies of the books will also be preserved by the Yale Library for use in future academic initiatives and in collaborative scholarly ventures.