Clarifications about the Michigan/OCLC OAIster Deal

Dorothea Salo has posted "The Straight Story on OAIster and Its Move" on Caveat Lector in which the University of Michigan Library's Katrina Hagedorn answers questions about the future of OAIster.

Here's an excerpt:

Q. Once oaister.org ceases to exist, there will be no way to search the harvested records for free except through worldcat.org, is that right?

A. I think those details haven’t been hammered out yet. Worldcat.org is one choice, yes. There will be likely be other products and services, and it’s likely you’ll be able to limit to just oaister records (for what that’s worth).

University of Michigan and OCLC Form OAIster Partnership

The University of Michigan and OCLC will jointly support the OAIster search engine for open access documents.

Here's an excerpt from the press release:

Launched in 2002 with grant support from the Andrew W. Mellon Foundation, OAIster was developed to test the feasibility of building a portal to open archive collections using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIster has since grown to become one of the largest aggregations of records pointing to open archive collections in the world with over 19 million records contributed by over 1,000 organizations worldwide.

Under the partnership, OAIster.org will continue to function as the public interface to OAIster collections, through funding provided by OCLC to the University of Michigan. Later in 2009, metadata harvesting operations will transfer from the University of Michigan to OCLC. . . .

Starting in late January 2009, while OAIster continues to be freely available at the www.oaister.org Web site, OCLC will host a version of OAIster on OCLC's FirstSearch platform and make it available through subscriptions to the FirstSearch Base Package at no additional charge.

“Editorial: Google Deal or Rip-Off?”

In "Editorial: Google Deal or Rip-Off?," Francine Fialkoff, Library Journal Editor-in-Chief, takes a hard look at the Google-Association of American Publishers/Authors Guild copyright settlement.

Here's an excerpt:

Clearly, the public had little standing in the negotiations that led to the recent agreement in the class-action lawsuit against Google for scanning books from library shelves. . . . Well, the suit was never about the public interest but about corporate interests, and librarians did not have much power at the bargaining table, no matter how hard those consulted pushed. While there are many provisions in the document that specify what libraries can and can't do and portend greater access, ultimately, it is the restrictions that scream out at us from the miasma of details.

Other perspectives can be found in my recently updated Google Book Search Bibliography, Version 3.

CiteSeerX and SeerSuite: Havester + Search Engine + AI

In "CiteSeerX and SeerSuite—Adding to the Semantic Web," Avi Rappoport overviews beta versions of CiteSeerX and its open source, Java-based counterpart, SeerSuite.

Here's an excerpt:

Building on that experience, CiteSeerX is a completely new system, re-architected for scaling and modularity, to handle increasing demands from both researchers and digital library programmatic interfaces. The system uses artificial intelligence, machine learning, support vector machines, and other techniques to recognize and extract metadata for the articles found. It now uses the Lucene search engine and supports standards such as the Open Archives Initiative (OAI), including metadata browsing, and Z39.50. CiteSeerX has a simple but powerful internal structure for documents and citations. If it cannot access a document cited, it creates a virtual document as a place holder, which can then be filled when the document is available.

Google Book Search Bibliography, Version 3

The Google Book Search Bibliography, Version 3 is now available.

This bibliography presents selected English-language articles and other works that are useful in understanding Google Book Search. It primarily focuses on the evolution of Google Book Search and the legal, library, and social issues associated with it. Where possible, links are provided to works that are freely available on the Internet, including e-prints in disciplinary archives and institutional repositories. Note that e-prints and published articles may not be identical.

A Guide for the Perplexed: Libraries & the Google Library Project Settlement

ARL and ALA have released A Guide for the Perplexed: Libraries & the Google Library Project Settlement.

Here's an excerpt from the press release:

The guide is designed to help the library community better understand the terms and conditions of the recent settlement agreement between Google, the Authors Guild, and the Association of American Publishers concerning Google’s scanning of copyrighted works. Band notes that the settlement is extremely complex and presents significant challenges and opportunities to libraries. The guide outlines and simplifies the settlement’s provisions, with special emphasis on the provisions that apply directly to libraries.

Reference Extract: The Librarian-Recommendation-Weighted Search Engine

OCLC, the School of Information Studies at Syracuse University, and the University of Washington Information School have received a $100,000 grant from the John D. and Catherine T. MacArthur Foundation to plan a librarian-recommendation-weighted search engine called Reference Extract.

Here's an excerpt from the press release:

"Sometimes, the simplest ideas are the most powerful," said Dr. Mike Eisenberg, Dean Emeritus and Professor at the Information School of the University of Washington and a lead on the project. "The best search engines are great for basic search, but sometimes the Web site results lack credibility in terms of trust, accuracy and reliability. So, who can help? Librarians. If a librarian recommends a Web site, you can be pretty sure that it's credible. RefEx will take hundreds of thousands of librarian recommendations and use them in a full-scale search engine."

Reference Extract is envisioned as a Web search experience similar to those provided by the world's most popular search engines. However, unlike other search engines, Reference Extract will be built for maximum credibility of search results by relying on the expertise of librarians. Users will enter a search term and receive results weighted toward sites most often used by librarians at institutions such as the Library of Congress, the University of Washington, the State Library of Maryland, and over 2,000 other libraries worldwide.

As part of the planning process, participants are reaching out to partners in libraries, technology organizations and research institutions. "The only way this will work is by making a project of an entire community," said Dr. R. David Lankes, Director of the Information Institute of Syracuse and Associate Professor at Syracuse University's School of Information Studies. "Web searchers get to tap into the incredible skill and knowledge of the library community, while librarians will be able to serve users on a whole new scale. This work follows on previous credibility work supported by the MacArthur Foundation, most notably the Credibility Commons (http://credibilitycommons.org/)." . . .

The Reference Extract project will hold a series of meetings and consultations over the coming months. The team is eager to build a business plan and technology architecture to benefit users and the library community alike. Those interested in providing input on the project and learning more can visit the project Web site at http://digref.org.

Georgia Harper on the Google-AAP/AG Copyright Settlement

In "The LJ Academic Newswire Newsmaker Interview: Georgia Harper," Harper, Scholarly Communications Advisor at the University Libraries of the University of Texas at Austin, discusses the Google-AAP/AG copyright settlement and the part that research libraries played in it. Also see her blog posting ("Google Book Search—and Buy").

Here's an excerpt:

Brewster Kahle has chastised public libraries for working with Google under a cloak of secrecy. Can libraries realistically refuse NDAs?

I think Kahle’s point, and others raise this point too, is more about the deleterious effects of secrecy on the negotiation process itself. Secrecy tends to be isolating. If you don’t consult with your colleagues at other institutions, your leverage may be diminished. Of course, a library could also hire a business and/or legal consultant to help, and bind the consultant to the NDA. Yes, Kahle has identified a very thorny problem, but it’s one we can ameliorate. I don’t think it’s workable simply not to do business with companies whose assets are ideas and information just because they feel compelled to protect them through secrecy. Either way, consultation does increase information, and information is power—in fact, the power of information is also the source of the [NDA] problem in the first place.

Google-AAP/AG Copyright Settlement: Vaidhyanathan Questions, Google Answers

On October 28th, Siva Vaidhyanathan posed some questions to Google about its copyright settlement with the Association of American Publishers and the Authors Guild ("My Initial Take on the Google-Publishers Settlement"). Now, Google has replied ("Some Initial Answers to My Initial Questions about Google Book Search and the Settlement").

"Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web"

Duncan Hull, Steve R. Pettifer, and Douglas B. Kel have published "Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web" in PLoS Computational Biology.

Here's the abstract:

Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as 'thought in cold storage,' and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.

Google Newspaper Digitization Project Announced

Google has announced a newspaper digitization project that will "make more old newspapers accessible and searchable online by partnering with newspaper publishers to digitize millions of pages of news archives."

Read more about it at "Bringing History Online, One Newspaper at a Time."

SRU Open Search: Open Source Customizable Interface for Displaying SRU-Formatted XML

The Institute for Research and Innovation in Social Services at the University of Strathclyde has released SRU Open Search, an open source customizable interface for displaying SRU-formatted XML.

Here are some features selected from a more comprehensive list:

  • Bookmarkable pages, so you can share a page of results via email
  • Share items via social bookmarking sites (Delicious, Digg, Google)
  • Featured audio highlighting—inline mp3 player via flash
  • Featured content highlighting . . .
  • Visualisation of search terms via pie chart, tag cloud & tree map . . .
  • Portable version of search so users can add to their own site
  • Browser search plugin for Firefox & Internet Explorer (inc Auto Suggest)

Solr Search Engine Plug-In for Fedora Released

The DRAMA team has released a Solr plug-in for Fedora.

Here's a description of Solr from its home page:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

Coverage of the Demise of Microsoft's Mass Digitization Project

Microsoft's decision to end its Live Search Books program, which provided important funding for the Open Content Alliance, has been widely covered by newspapers, blogs, and other information sources.

Here's a selection of articles and posts: "Books Scanning to be Publicly Funded," "'It Ain’t Over Till It's Over': Impact of the Microsoft Shutdown," "Microsoft Abandons Live Search Books/Academic Scan Plan," "Microsoft Burns Book Search—Lacks 'High Consumer Intent,'" "Microsoft Shuts Down Two of Its Google 'Wannabe’s': Live Search Books and Live Search Academic," "Microsoft Will Shut Down Book Search Program," "Microsoft's Book-Search Project Has a Surprise Ending," "Post-Microsoft, Libraries Mull Digitization," "Publishers Surprised by Microsoft Move," "Why Killing Live Book Search Is Good for the Future of Books," and "Without Microsoft, British Library Keeps on Digitizing."

National Science Digital Library NCore Team Releases NSDL Search, MediaWiki Extensions, and WordPress MU Plug-Ins

The National Science Digital Library NCore team has released three applications:

Google Book Search Book Viewability API Released

Google has released the Google Book Search Book Viewability API.

Here's an excerpt from the API home page:

The Google Book Search Book Viewability API enables developers to:

  • Link to Books in Google Book Search using ISBNs, LCCNs, and OCLC numbers
  • Know whether Google Book Search has a specific title and what the viewability of that title is
  • Generate links to a thumbnail of the cover of a book
  • Generate links to an informational page about a book
  • Generate links to a preview of a book

Read more about it at "Book Info Where You Need It, When You Need It."

Digital Library Federation ILS and Discovery Systems Draft Report

The Digital Library Federation's ILS and Discovery Systems working group has issued a Draft Recommendation investigating issues related to integrated library system and discovery system integration.

Here's an excerpt from the "Introduction":

This document is the (DRAFT) report of that group. It gives technical recommendations for integrating the ILS with external discovery applications. This report includes

  • A summary of a survey of the needs and discovery applications implemented and desired by libraries in DLF (and other similar libraries).
  • A high-level summary of specific abstract functions that discovery applications need to be able to invoke on ILS's and/or their data to support desired discovery applications, as well as outgoing services from ILS software to other applications.
  • Recommendations for concrete bindings for these functions (i.e. specific protocols, APIs, data standards, etc.) that can be used with future and/or existing ILS's. Producing a complete concrete binding and reference implementation is beyond the scope of this small, short-term group; but we hope to provide sufficient requirements and details that others can produce appropriate bindings and implementations.
  • Practical recommendations to encourage libraries, ILS developers, and discovery application developers to expeditiously integrate discovery systems with the ILS and other sources of bibliographic metadata.

Summa: A Federated Search System

Statsbiblioteket is developing Summa, a federated search system.

Birte Christensen-Dalsgaard, Director of Development, discusses Summa and other topics in a new podcast (CNI Podcast: An Interview with Birte Christensen-Dalsgaard, Director of Development at the State and University Library, Denmark).

Here's an excerpt from the podcast abstract:

Summa is an open source system implementing modular, service-based architecture. It is based on the fundamental idea "free the content from the proprietary library systems," where the discovery layer is separated from the business layer. In doing so, any Internet technology can be used without the limitations traditionally set by proprietary library systems, and there is the flexibility to integrate or to be integrated into other systems. A first version of a Fedora—Summa integration has been developed.

A white paper is available that examines the system in more detail.

Columbia University and Microsoft Book Digitization Project

The Columbia University Libraries have announced that they will work with Microsoft to digitize a "large number of books" that are in the public domain.

Here's an excerpt from the press release:

Columbia University and Microsoft Corp. are collaborating on an initiative to digitize a large number of books from Columbia University Libraries and make them available to Internet users. With the support of the Open Content Alliance (OCA), publicly available print materials in Columbia Libraries will be scanned, digitized, and indexed to make them readily accessible through Live Search Books. . . .

Columbia University Libraries is playing a key role in book selection and in setting quality standards for the digitized materials. Microsoft will digitize selected portions of the Libraries’ great collections of American history, literature, and humanities works, with the specific areas to be decided mutually by Microsoft and Columbia during the early phase of the project.

Microsoft will give the Library high-quality digital images of all the materials, allowing the Library to provide worldwide access through its own digital library and to share the content with non-commercial academic initiatives and non-profit organizations.

Read more about it at "Columbia University Joins Microsoft Scan Plan."