Search Engines and Discovery Systems – Page 11

ALA, ARL, and ACRL Meeting on Google Book Search Settlement

In "ALA, ARL, ACRL Host Meeting of Experts to Discuss Google Book Search Settlement," District Dispatch reports on the numerous questions raised about the Google Book Search Settlement in a recent meeting on that topic.

Here's an excerpt :

Access. What will the settlement mean for protecting the public’s ability to access and use digital resources from the nation’s libraries? Since the Book Rights Registry established as a condition of the settlement will represent the interests of the authors and publishers, who will represent the interests of libraries and the public? What are the financial implications of participation? Could the settlement create a monopoly that threatens the mission of libraries by raising the prices to an unreasonable level that limits public access?

Intellectual freedom. Are there academic freedom issues to consider? What are the implications of Google’s ability to remove works at its discretion? Will there be notification of their removal? What are the issues regarding possible access and use restrictions on the Research Corpus?

Equitable treatment. Since not all libraries are addressed in the settlement, what impact will it have on the diverse landscape of libraries? In light of tight economic times, will this negatively affect libraries with lean budgets? Will it expand the digital divide?

Terms of use. Under the terms of the agreement, will library users continue to enjoy the same rights to information under copyright and other laws? Will the settlement impact the legal discussions and interpretations of library exceptions that allow for library lending, limited copying and preservation?

Barbara Quint on OCLC, OAIster, and the HathiTrust

In "OCLC and Open Access: Riding to the Rescue or Rustling the Herd?," Barbara Quint examines the planned migration of OAIster from the University of Michigan Libraries to OCLC and the HathiTrust/OCLC deal.

Google Book Search: Mobile Editions Now Available for iPhones and Other Devices

Google has announced that public domain digital books in Google Book Search are now available in mobile editions at http://books.google.com/m.

“How to Improve the Google Book Search Settlement”

James Grimmelmann, Associate Professor at New York Law School, has made available "How to Improve the Google Book Search Settlement" in the Berkeley Electronic Press' Selected Works.

Here's the abstract:

The proposed settlement in the Google Book Search case should be approved with strings attached. The project will be immensely good for society, and the proposed deal is a fair one for Google, for authors, and for publishers. The public interest demands, however, that the settlement be modified first. It creates two new entities—the Books Rights Registry Leviathan and the Google Book Search Behemoth—with dangerously concentrated power over the publishing industry. Left unchecked, they could trample on consumers in any number of ways. We the public have a right to demand that those entities be subject to healthy, pro-competitive oversight, and so we should.

ALA Office for Information Technology Policy Launches Google Books Settlement Site

The ALA Office for Information Technology Policy has launched a Google Books Settlement Web site.

Clarifications about the Michigan/OCLC OAIster Deal

Dorothea Salo has posted "The Straight Story on OAIster and Its Move" on Caveat Lector in which the University of Michigan Library's Katrina Hagedorn answers questions about the future of OAIster.

Here's an excerpt:

Q. Once oaister.org ceases to exist, there will be no way to search the harvested records for free except through worldcat.org, is that right?

A. I think those details haven’t been hammered out yet. Worldcat.org is one choice, yes. There will be likely be other products and services, and it’s likely you’ll be able to limit to just oaister records (for what that’s worth).

University of Michigan and OCLC Form OAIster Partnership

The University of Michigan and OCLC will jointly support the OAIster search engine for open access documents.

Here's an excerpt from the press release:

Launched in 2002 with grant support from the Andrew W. Mellon Foundation, OAIster was developed to test the feasibility of building a portal to open archive collections using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIster has since grown to become one of the largest aggregations of records pointing to open archive collections in the world with over 19 million records contributed by over 1,000 organizations worldwide.

Under the partnership, OAIster.org will continue to function as the public interface to OAIster collections, through funding provided by OCLC to the University of Michigan. Later in 2009, metadata harvesting operations will transfer from the University of Michigan to OCLC. . . .

Starting in late January 2009, while OAIster continues to be freely available at the www.oaister.org Web site, OCLC will host a version of OAIster on OCLC's FirstSearch platform and make it available through subscriptions to the FirstSearch Base Package at no additional charge.

“Editorial: Google Deal or Rip-Off?”

In "Editorial: Google Deal or Rip-Off?," Francine Fialkoff, Library Journal Editor-in-Chief, takes a hard look at the Google-Association of American Publishers/Authors Guild copyright settlement.

Here's an excerpt:

Clearly, the public had little standing in the negotiations that led to the recent agreement in the class-action lawsuit against Google for scanning books from library shelves. . . . Well, the suit was never about the public interest but about corporate interests, and librarians did not have much power at the bargaining table, no matter how hard those consulted pushed. While there are many provisions in the document that specify what libraries can and can't do and portend greater access, ultimately, it is the restrictions that scream out at us from the miasma of details.

Other perspectives can be found in my recently updated Google Book Search Bibliography, Version 3.

CiteSeerX and SeerSuite: Havester + Search Engine + AI

In "CiteSeerX and SeerSuite—Adding to the Semantic Web," Avi Rappoport overviews beta versions of CiteSeerX and its open source, Java-based counterpart, SeerSuite.

Here's an excerpt:

Building on that experience, CiteSeerX is a completely new system, re-architected for scaling and modularity, to handle increasing demands from both researchers and digital library programmatic interfaces. The system uses artificial intelligence, machine learning, support vector machines, and other techniques to recognize and extract metadata for the articles found. It now uses the Lucene search engine and supports standards such as the Open Archives Initiative (OAI), including metadata browsing, and Z39.50. CiteSeerX has a simple but powerful internal structure for documents and citations. If it cannot access a document cited, it creates a virtual document as a place holder, which can then be filled when the document is available.

Google Book Search: Now with Magazines

Google has added magazines to Google Book Search.

Google Book Search Bibliography, Version 3

The Google Book Search Bibliography, Version 3 is now available.

This bibliography presents selected English-language articles and other works that are useful in understanding Google Book Search. It primarily focuses on the evolution of Google Book Search and the legal, library, and social issues associated with it. Where possible, links are provided to works that are freely available on the Internet, including e-prints in disciplinary archives and institutional repositories. Note that e-prints and published articles may not be identical.

Federal Judge John Sprizzo Tentatively Approves Google-AAP/AG Settlement

Federal Judge John Sprizzo has tentatively approved the Google-Association of American Publishers/Authors Guild copyright settlement.

Read more about it at "NY Judge Tentatively OKs Google Copyright Deal."

A Guide for the Perplexed: Libraries & the Google Library Project Settlement

ARL and ALA have released A Guide for the Perplexed: Libraries & the Google Library Project Settlement.

Here's an excerpt from the press release:

The guide is designed to help the library community better understand the terms and conditions of the recent settlement agreement between Google, the Authors Guild, and the Association of American Publishers concerning Google’s scanning of copyrighted works. Band notes that the settlement is extremely complex and presents significant challenges and opportunities to libraries. The guide outlines and simplifies the settlement’s provisions, with special emphasis on the provisions that apply directly to libraries.

Reference Extract: The Librarian-Recommendation-Weighted Search Engine

OCLC, the School of Information Studies at Syracuse University, and the University of Washington Information School have received a $100,000 grant from the John D. and Catherine T. MacArthur Foundation to plan a librarian-recommendation-weighted search engine called Reference Extract.

Here's an excerpt from the press release:

"Sometimes, the simplest ideas are the most powerful," said Dr. Mike Eisenberg, Dean Emeritus and Professor at the Information School of the University of Washington and a lead on the project. "The best search engines are great for basic search, but sometimes the Web site results lack credibility in terms of trust, accuracy and reliability. So, who can help? Librarians. If a librarian recommends a Web site, you can be pretty sure that it's credible. RefEx will take hundreds of thousands of librarian recommendations and use them in a full-scale search engine."

Reference Extract is envisioned as a Web search experience similar to those provided by the world's most popular search engines. However, unlike other search engines, Reference Extract will be built for maximum credibility of search results by relying on the expertise of librarians. Users will enter a search term and receive results weighted toward sites most often used by librarians at institutions such as the Library of Congress, the University of Washington, the State Library of Maryland, and over 2,000 other libraries worldwide.

As part of the planning process, participants are reaching out to partners in libraries, technology organizations and research institutions. "The only way this will work is by making a project of an entire community," said Dr. R. David Lankes, Director of the Information Institute of Syracuse and Associate Professor at Syracuse University's School of Information Studies. "Web searchers get to tap into the incredible skill and knowledge of the library community, while librarians will be able to serve users on a whole new scale. This work follows on previous credibility work supported by the MacArthur Foundation, most notably the Credibility Commons (http://credibilitycommons.org/)." . . .

The Reference Extract project will hold a series of meetings and consultations over the coming months. The team is eager to build a business plan and technology architecture to benefit users and the library community alike. Those interested in providing input on the project and learning more can visit the project Web site at http://digref.org.

Georgia Harper on the Google-AAP/AG Copyright Settlement

In "The LJ Academic Newswire Newsmaker Interview: Georgia Harper," Harper, Scholarly Communications Advisor at the University Libraries of the University of Texas at Austin, discusses the Google-AAP/AG copyright settlement and the part that research libraries played in it. Also see her blog posting ("Google Book Search—and Buy").

Here's an excerpt:

Brewster Kahle has chastised public libraries for working with Google under a cloak of secrecy. Can libraries realistically refuse NDAs?

I think Kahle’s point, and others raise this point too, is more about the deleterious effects of secrecy on the negotiation process itself. Secrecy tends to be isolating. If you don’t consult with your colleagues at other institutions, your leverage may be diminished. Of course, a library could also hire a business and/or legal consultant to help, and bind the consultant to the NDA. Yes, Kahle has identified a very thorny problem, but it’s one we can ameliorate. I don’t think it’s workable simply not to do business with companies whose assets are ideas and information just because they feel compelled to protect them through secrecy. Either way, consultation does increase information, and information is power—in fact, the power of information is also the source of the [NDA] problem in the first place.

Google-AAP/AG Copyright Settlement: Vaidhyanathan Questions, Google Answers

On October 28th, Siva Vaidhyanathan posed some questions to Google about its copyright settlement with the Association of American Publishers and the Authors Guild ("My Initial Take on the Google-Publishers Settlement"). Now, Google has replied ("Some Initial Answers to My Initial Questions about Google Book Search and the Settlement").

"Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web"

Duncan Hull, Steve R. Pettifer, and Douglas B. Kel have published "Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web" in PLoS Computational Biology.

Here's the abstract:

Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as 'thought in cold storage,' and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.

Google Now Fully Indexes Scanned Documents Using OCR

Goggle has announced that it is now using OCR (Optical Character Recognition) to fully index scanned PDF files (these files contain text in digital images).

New Tutorial: Internet for Image Searching

TASI has launched a new online tutorial, Internet for Image Searching. It includes sites that provide images that can be freely used.

Google Newspaper Digitization Project Announced

Google has announced a newspaper digitization project that will "make more old newspapers accessible and searchable online by partnering with newspaper publishers to digitize millions of pages of news archives."

Read more about it at "Bringing History Online, One Newspaper at a Time."

SRU Open Search: Open Source Customizable Interface for Displaying SRU-Formatted XML

The Institute for Research and Innovation in Social Services at the University of Strathclyde has released SRU Open Search, an open source customizable interface for displaying SRU-formatted XML.

Here are some features selected from a more comprehensive list:

Bookmarkable pages, so you can share a page of results via email

Share items via social bookmarking sites (Delicious, Digg, Google)

Featured audio highlighting—inline mp3 player via flash

Featured content highlighting . . .

Visualisation of search terms via pie chart, tag cloud & tree map . . .

Portable version of search so users can add to their own site

Browser search plugin for Firefox & Internet Explorer (inc Auto Suggest)

Solr Search Engine Plug-In for Fedora Released

The DRAMA team has released a Solr plug-in for Fedora.

Here's a description of Solr from its home page:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

Coverage of the Demise of Microsoft's Mass Digitization Project

Microsoft's decision to end its Live Search Books program, which provided important funding for the Open Content Alliance, has been widely covered by newspapers, blogs, and other information sources.

Here's a selection of articles and posts: "Books Scanning to be Publicly Funded," "'It Ain’t Over Till It's Over': Impact of the Microsoft Shutdown," "Microsoft Abandons Live Search Books/Academic Scan Plan," "Microsoft Burns Book Search—Lacks 'High Consumer Intent,'" "Microsoft Shuts Down Two of Its Google 'Wannabe’s': Live Search Books and Live Search Academic," "Microsoft Will Shut Down Book Search Program," "Microsoft's Book-Search Project Has a Surprise Ending," "Post-Microsoft, Libraries Mull Digitization," "Publishers Surprised by Microsoft Move," "Why Killing Live Book Search Is Good for the Future of Books," and "Without Microsoft, British Library Keeps on Digitizing."

National Science Digital Library NCore Team Releases NSDL Search, MediaWiki Extensions, and WordPress MU Plug-Ins

The National Science Digital Library NCore team has released three applications:

Generic, open source version of NSDL Search
MediaWiki extensions that are used to support the NSDL Wiki
WordPress MU plug-ins that are used to support Expert Voices

Terrier 2.1 Released: Open Source Search Software for Large Document Collections

The University of Glasgow Department of Computing Science has released Terrier 2.1, an open source search engine written in Java that is designed to handle large document collections.

DigitalKoans provides news and commentary on digital copyright, digital curation, digital repository, open access, research data management, scholarly communication, and other digital information issues. It is also available via an RSS feed.

A Digital Scholarship publication. Digital Scholarship is a noncommercial publisher and it accepts no advertising. Charles W. Bailey, Jr. is the publisher of Digital Scholarship.