Update on the British Public Library/Microsoft Digitization Project

Jim Ashling provides an update on the progress that the British Public Library and Microsoft have made in their project to digitize about 100,000 books for access in Live Book Search in his Information Today article "Progress Report: The British Library and Microsoft Digitization Partnership."

Here's an excerpt from the article:

Unlike previous BL digitization projects where material had been selected on an item-by-item basis, the sheer size of this project made such selectivity impossible. Instead, the focus is on English-language material, collected by the BL during the 19th century. . . .

Scanning produces high-resolution images (300 dpi) that are then transferred to a suite of 12 computers for OCR (optical character recognition) conversion. The scanners, which run 24/7, are specially tuned to deal with the spelling variations and old-fashioned typefaces used in the 1800s. The process creates multiple versions including PDFs and OCR text for display in the online services, as well as an open XML file for long-term storage and potential conversion to any new formats that may become future standards. In all, the data will amount to 30 to 40 terabytes. . . .

Obviously, then, an issue exists here for a collection of 19th-century literature when some authors may have lived beyond the late 1930s [British/EU law gives authors a copyright term of life plus 70 years]. An estimated 40 percent of the titles are also orphan works. Those two issues mean that item-by-item copyright checking would be an unmanageable task. Estimates for the total time required to check on the copyright issues involved vary from a couple of decades to a couple of hundred years. The BL’s approach is to use two databases of authors to identify those who were still living in 1936 and to remove their work from the collection before scanning. That, coupled with a wide publicity to encourage any rights holders to step forward, may solve the problem.

Boston Public Library/Open Content Alliance Contract Made Public

Boston Public Library has made public its digitization contract with the Open Content Alliance.

Some of the most interesting provisions include the intent of the Internet Archive to provide perpetual free and open access to the works, the digitization cost arrangements (BPL pays for transport and provides bibliographic metadata, the Internet Archive pays for digitization-related costs), the specification of file formats (e.g., JPEG 2000, color PDF, and various XML files), the provision of digital copies to BPL (copies are available immediately after digitization for BPL to download via FTP or HTTP within 3 months), and use of copies (any use by either party as long as provenance metadata and/or bookplate data is not removed).

Yale Will Work with Microsoft to Digitize 100,000 Books

The Yale University Library and Microsoft will work together to digitize 100,000 English-language out-of-copyright books, which will be made available via Microsoft’s Live Search Books.

Here’s an excerpt from the press release:

The Library and Microsoft have selected Kirtas Technologies to carry out the process based on their proven excellence and state-of-the art equipment. The Library has successfully worked with Kirtas previously, and the company will establish a digitization center in the New Haven area. . . .

The project will maintain rigorous standards established by the Yale Library and Microsoft for the quality and usability of the digital content, and for the safe and careful handling of the physical books. Yale and Microsoft will work together to identify which of the approximately 13 million volumes held by Yale’s 22 libraries will be digitized. Books selected for digitization will remain available for use by students and researchers in their physical form. Digital copies of the books will also be preserved by the Yale Library for use in future academic initiatives and in collaborative scholarly ventures.

Brewster Kahle on Libraries Going Open

Brewster Kahle's "Libraries Going Open" document provides some details on where the Internet Archive and the Open Content Alliance are going with projects involving mass digitization of microfilm, mass digitization of journals, ILL of scanned out-of-print books, scanning books on demand, and other areas.

German Publishers Just Say No to Google Book Search: Libreka Launched at Frankfurt Book Fair

German publishers who want to retain control of their content have a new alternative to Google Book Search: Libreka, a full-text search engine that initially has about 8,000 books from publishers who opted-in for inclusion. Searchers retrieve book titles and cover images, but no content.

Source: "German Publishers Offer Alternative to Google Books." Deutsche Welle, 11 October 2007.

DLF and if:book Ponder Mass Digitization Issues

The Digital Library Federation and if:book are seeking comments on a series of questions about mass digitization issues that they will raise in invited brainstorming sessions as part of a project they are calling "The Really Modern Library."

Here's a suggestion: use CommentPress or a wiki to further refine ideas as the project evolves.

Source: Vershbow, Ben. "The Really Modern Library." if:book, 8 October 2007.

Cornell Joins Google Books Library Project

The Cornell University Library has joined the Google Books Library Project.

Here's an excerpt from the press release:

Google will digitize up to 500,000 works from Cornell University Library and make them available online using Google Book Search. As a result, materials from the library’s exceptional collections will be easily accessible to students, scholars and people worldwide, supporting the library’s long-standing commitment to make its collections broadly available.

“Research libraries today are integral partners in the academic enterprise through their support of research, teaching and learning. They also serve a public good by enhancing access to the works of the world's best minds,” said Interim University Librarian Anne R. Kenney. “As a major research library, Cornell University Library is pleased to join its peer institutions in this partnership with Google. The outcome of this relationship is a significant reduction in the time and effort associated with providing scholarly full-text resources online.”

Materials from Mann Library, one of 20 member libraries that comprise Cornell University Library, will be digitized as part of the agreement. Mann’s collections include some of the following subject areas: biological sciences, natural resources, plant, animal and environmental sciences, applied economics, management and public policy, human development, textiles and apparel, nutrition and food science.. . .

Cornell is the 27th institution to join the Google Book Search Library Project, which digitizes books from major libraries and makes it possible for Internet users to search their collections online. Over the next six years, Cornell will provide Google with public domain and copyrighted holdings from its collections. If a work has no copyright restrictions, the full text will be available for online viewing. For books protected by copyright, users will just get the basic background (such as the book’s title and the author’s name), at most a few lines of text related to their search and information about where they can buy or borrow a book. Cornell University Library will work with Google to choose materials that complement the contributions of the project’s other partners. In addition to making the materials available through its online search service, Google will also provide Cornell with a digital copy of all the materials scanned, which will eventually be incorporated into the university’s own digital library.

CLIR Receives Mellon Grant to Study Mass Digitization

According to a O’Reilly Radar posting, the Council on Library and Information Resources has been awarded a grant from the Mellon Foundation to study mass digitization efforts.

Here’s an excerpt from the posting that describes the grant’s objectives:

  1. Assess selected large scale digitization programs by exploring their efficacy and utility for conducting scholarship, in multiple fields or disciplines (humanities, sciences, etc.).
  2. Write and issue a report with findings and recommendations for improving the design of mass digitization projects.
  3. Create a Collegium that can serve in the long-term as an advisory group to mass digitization efforts, helping to assure and obtain the highest possible data quality and utility.
  4. Convene a series of meetings amongst scholars, libraries, publishers, and digitizing organizations to discuss ways of achieving these quality and design improvements.

The University of Maine and Two Public Libraries Adopt Emory’s Digitization Plan

Library Journal Academic Newswire reports that the University of Maine, the Toronto Public Library, and the Cincinnati Public Library will follow Emory University’s lead and digitize public domain works utilizing Kirtas scanners with print-on-demand copies being made available via BookSurge. (Also see the press release: "BookSurge, an Amazon Group, and Kirtas Collaborate to Preserve and Distribute Historic Archival Books.")

Source: "University of Maine, plus Toronto and Cincinnati Public Libraries Join Emory in Scan Alternative." Library Journal Academic Newswire, 21 June 2007.

CIC’s Digitization Contract with Google

Library Journal Academic Newswire has published a must-read article ("Questions Emerge as Terms of the CIC/Google Deal Become Public") about the Committee on Institutional Cooperation’s Google Book Search Library Project contract.

The article includes quotes from Peter Brantley, Digital Library Federation Executive Director, from his "Monetizing Libraries" posting about the contract (another must-read piece).

Here’s an excerpt from Brantley’s posting:

In other words—pretty much, unless Google ceases business operations, or there is a legal ruling or agreement with publishers that expressly permits these institutions (excepting Michigan and Wisconsin which have contracts of precedence) to receive digitized copies of In-Copyright material, it will be held in escrow until such time as it becomes public domain.

That could be a long wait. . . .

In an article early this year in The New Yorker, "Google’s Moon Shot," Jeffrey Toobin discusses possible outcomes of the antagonism this project has generated between Google and publishers. Paramount among them, in his mind, is a settlement. . . .

A settlement between Google and publishers would create a barrier to entry in part because the current litigation would not be resolved through court decision; any new entrant would be faced with the unresolved legal issues and required to re-enter the settlement process on their own terms. That, beyond the costs of mass digitization itself, is likely to deter almost any other actor in the market.

Emory Will Use Kirtas Scanner to Digitize Rare Books

Emory University’s Woodruff Library will use a Kirtas robotic book scanner to digitize rare books and to create PDF files that will be made available on the Internet and sold as print-on-demand books on Amazon.

Here’s an excerpt from the press release:

"We believe that mass digitization and print-on-demand publishing is an important new model for digital scholarship that is going to revolutionize the management of academic materials," said Martin Halbert, director for digital programs and systems at Emory’s Woodruff Library. "Information will no longer be lost in the mists of time when books go out of print. This is a way of opening up the past to the future."

Emory’s Woodruff Library is one of the premier research libraries in the United States, with extensive holdings in the humanities, including many rare and special collections. To increase accessibility to these aging materials, and ensure their preservation, the university purchased a Kirtas robotic book scanner, which can digitize as many as 50 books per day, transforming the pages from each volume into an Adobe Portable Document Format (PDF). The PDF files will be uploaded to a Web site where scholars can access them. If a scholar wishes to order a bound, printed copy of a digitized book, they can go to Amazon.com and order the book on line.

Emory will receive compensation from the sale of digitized copies, although Halbert stressed that the print-on-demand feature is not intended to generate a profit, but simply help the library recoup some of its costs in making out-of-print materials available.

Google Library Project Adds Committee on Institutional Cooperation (CIC)

The Google Book Search Library Project has an important new participant—the Committee on Institutional Cooperation (CIC). The CIC members are the University of Chicago, the University of Illinois, Indiana University, the University of Iowa, the University of Michigan, Michigan State University, the University of Minnesota, Northwestern University, Ohio State University, Pennsylvania State University, Purdue University, and the University of Wisconsin-Madison. As many as 10 million volumes will be digitized from the collections of these major research libraries.

Here’s an excerpt from the CIC press release:

This partnership between our 12 member universities and Google is unprecedented. What makes this work so exciting is that we will literally open the pages of millions of books that have been assembled on our library shelves over more than a century. In literally seconds, we’ll be able browse across the content of thousands of volumes, searching for words or phrases, and making links across those texts that would have taken weeks or months or years of dedicated and scrupulous analysis. It is an extraordinary effort, blending the efforts and aspirations of librarians, university administrators, and scholars from across 12 world-class research universities. And our corporate partner possesses unparalleled expertise in creating and opening the digital world to coherent and comprehensive searching.

The effort is not entirely without controversy—no great undertaking ever is. But our universities believe strongly in the power of information to change the world, and in preserving, protecting and extending access to information. We have carefully weighed and considered the intellectual property issues and believe that our effort is firmly within the guidelines of current copyright law, while providing some flexibility as those laws are tested in the new digital environment in the coming years.

Lawsuit Aside, McGraw-Hill Uses Google Book Search

According to an article in Network World, McGraw-Hill uses Google Book Search on its Web site in spite of the fact that it is suing Google over the product.

How can this be? McGraw-Hill participates in the Google Book Search Partner Program, which gives publishers control over access to their digitized books, but, at the same time, it objects to Google’s efforts to scan and make available copies of its books in libraries without its permission.

Source: Perez, Juan Carlos. "Google’s Book Search Available in Publisher Sites." Network World, 1 June 2007.

Princeton Joins Google Book Search Library Project

The Princeton University Library has announced that it has joined the Google Book Search Library Project.

From the press release:

A new partnership between the Princeton University Library and Google soon will make approximately 1 million books in Princeton’s collection available online in a searchable format.

In a move designed to open Princeton’s vast resources to a broad international audience, the library will work with Google over the next six years to digitize books that are in the public domain and no longer under copyright. . . .

"We will be working with Google in the next several months to choose the subject areas to be digitized and the timetable for the work," [Karin] Trainer said. "Library staff, faculty and students will be invited to suggest which parts of our distinctive collections should be digitized."

Princeton is the 12th institution to join the Google Books Library Project. Books available in the Google Book Search also include those from collections at Harvard, Oxford, Stanford, the University of California, the University of Michigan, the University of Texas-Austin, the University of Virginia, the University of Wisconsin-Madison, the New York Public Library, the University Complutense of Madrid and the National Library of Catalonia.

Google also announced the new partnership in its Inside Google Book Search blog.