Update on the British Public Library/Microsoft Digitization Project

Jim Ashling provides an update on the progress that the British Public Library and Microsoft have made in their project to digitize about 100,000 books for access in Live Book Search in his Information Today article "Progress Report: The British Library and Microsoft Digitization Partnership."

Here's an excerpt from the article:

Unlike previous BL digitization projects where material had been selected on an item-by-item basis, the sheer size of this project made such selectivity impossible. Instead, the focus is on English-language material, collected by the BL during the 19th century. . . .

Scanning produces high-resolution images (300 dpi) that are then transferred to a suite of 12 computers for OCR (optical character recognition) conversion. The scanners, which run 24/7, are specially tuned to deal with the spelling variations and old-fashioned typefaces used in the 1800s. The process creates multiple versions including PDFs and OCR text for display in the online services, as well as an open XML file for long-term storage and potential conversion to any new formats that may become future standards. In all, the data will amount to 30 to 40 terabytes. . . .

Obviously, then, an issue exists here for a collection of 19th-century literature when some authors may have lived beyond the late 1930s [British/EU law gives authors a copyright term of life plus 70 years]. An estimated 40 percent of the titles are also orphan works. Those two issues mean that item-by-item copyright checking would be an unmanageable task. Estimates for the total time required to check on the copyright issues involved vary from a couple of decades to a couple of hundred years. The BL’s approach is to use two databases of authors to identify those who were still living in 1936 and to remove their work from the collection before scanning. That, coupled with a wide publicity to encourage any rights holders to step forward, may solve the problem.

Boston Public Library/Open Content Alliance Contract Made Public

Boston Public Library has made public its digitization contract with the Open Content Alliance.

Some of the most interesting provisions include the intent of the Internet Archive to provide perpetual free and open access to the works, the digitization cost arrangements (BPL pays for transport and provides bibliographic metadata, the Internet Archive pays for digitization-related costs), the specification of file formats (e.g., JPEG 2000, color PDF, and various XML files), the provision of digital copies to BPL (copies are available immediately after digitization for BPL to download via FTP or HTTP within 3 months), and use of copies (any use by either party as long as provenance metadata and/or bookplate data is not removed).

Yale Will Work with Microsoft to Digitize 100,000 Books

The Yale University Library and Microsoft will work together to digitize 100,000 English-language out-of-copyright books, which will be made available via Microsoft’s Live Search Books.

Here’s an excerpt from the press release:

The Library and Microsoft have selected Kirtas Technologies to carry out the process based on their proven excellence and state-of-the art equipment. The Library has successfully worked with Kirtas previously, and the company will establish a digitization center in the New Haven area. . . .

The project will maintain rigorous standards established by the Yale Library and Microsoft for the quality and usability of the digital content, and for the safe and careful handling of the physical books. Yale and Microsoft will work together to identify which of the approximately 13 million volumes held by Yale’s 22 libraries will be digitized. Books selected for digitization will remain available for use by students and researchers in their physical form. Digital copies of the books will also be preserved by the Yale Library for use in future academic initiatives and in collaborative scholarly ventures.

Brewster Kahle on Libraries Going Open

Brewster Kahle's "Libraries Going Open" document provides some details on where the Internet Archive and the Open Content Alliance are going with projects involving mass digitization of microfilm, mass digitization of journals, ILL of scanned out-of-print books, scanning books on demand, and other areas.

Public Domain Works Partners with the Open Library

Public Domain Works has announced that it will partner with the Open Library, sharing its data about works that are in public domain. Public Domain Works supports the Public Domain Works DB, which is now in beta form.

Here's an excerpt from the announcement:

The plan looks to be to upload the Public Domain Works data to the Open Library, and to use read/write APIs to continue to develop different front-ends for different jurisdictions—each with its own algorithms to determine which works are in the public domain.

German Publishers Just Say No to Google Book Search: Libreka Launched at Frankfurt Book Fair

German publishers who want to retain control of their content have a new alternative to Google Book Search: Libreka, a full-text search engine that initially has about 8,000 books from publishers who opted-in for inclusion. Searchers retrieve book titles and cover images, but no content.

Source: "German Publishers Offer Alternative to Google Books." Deutsche Welle, 11 October 2007.

DLF and if:book Ponder Mass Digitization Issues

The Digital Library Federation and if:book are seeking comments on a series of questions about mass digitization issues that they will raise in invited brainstorming sessions as part of a project they are calling "The Really Modern Library."

Here's a suggestion: use CommentPress or a wiki to further refine ideas as the project evolves.

Source: Vershbow, Ben. "The Really Modern Library." if:book, 8 October 2007.

Japanese Authors Write Novels on Cell Phones

Texting has been raised to a new level as young Japanese authors have taken to writing novels on their cell phones.

Here's an excerpt from "Ring! Ring! Ring! In Japan, Novelists Find a New Medium":

When Satomi Nakamura uses her cellphone, she has to be extra careful to take frequent breaks. That's because she isn't just chatting. The 22-year-old homemaker has recently finished writing a 200-page novel titled "To Love You Again" entirely on her tiny cellphone screen, using her right thumb to tap the keys and her pinkie to hold the phone steady. . . .

Most of these novels, with their simple language and skimpy scene-setting, are rather unpolished. . . . But they are hugely popular, and publishers are delighted with them. . . . Several cellphone novels have been turned into real books, selling millions of copies and topping the best-seller lists.

Source: Kane, Yukari Iwatani. "Ring! Ring! Ring! In Japan, Novelists Find a New Medium." The Wall Street Journal, 26 September 2007, A1, A18.

National E-Books Observatory Project Studies Free E-Book Use

The JISC National E-Books Observatory Project has begun an in-depth study of the use of free e-books.

Here's an excerpt from the weblog posting:

JISC has funded a collection of e-books that will be freely available to students in all UK universities.

The aim of the JISC national e-books observatory project is to gather much needed evidence:

  • Evidence for publishers about the impact of e-books on traditional print sales to students
  • Evidence for publishers about how to create exciting e-books that will engage the digital native
  • Evidence for publishers and libraries about the pricing models for the future
  • Evidence for libraries about how to promote the use of e-books

The e-books, chosen, include some of the most popular texts in Business and Management Studies, Medicine, Engineering and Media Studies.

JISC is funding CIBER to study just what happens when these books are freely available to students. How will they find them? Will they use them? Will the e-books impact on their learning? Will medical students behave differently to Media Studies students? Will the Business and Management students stop buying from the bookshops? Will Engineering students use the e-books more or less than the other groups?

Publishers are collaborating by providing these e-books via Ingram Digital Group’s MyiLibrary platform and the Books@Ovid platform. Funding by JISC enables these publishers to experiment in a managed environment and mitigates any risk of revenue loss.

Amazon and Google E-Book Developments

Amazon is expected to release a wireless e-book reader this October called Kindle. It's anticipated to be priced in $400-$500 range.

Also in the fall, Google is expected to offer charged access to the complete contents of digital books, with pricing to be determined by publishers.

Source: Stone, Brad. "Are Books Passé? Envisioning the Next Chapter for Electronic Books." The New York Times, 6 September 2006, C1, C9.

Athabasca University Establishes AU Press, an Open Access Publisher

Athabasca University has established AU Press, which will publish open access books, journals, and other digital publications.

Here's an excerpt from the press release:

AU Press, Canada’s first 21st century university press, is dedicated to disseminating knowledge emanating from scholarly research to a broad audience through open access digital media and in a variety of formats (e.g., journals, monographs, author podcasts).

Our publications are of the highest quality and are assessed by peer review; however, we are dedicated to working with emerging writers and researchers to promote success in scholarly publishing.

Our geographical focus is Canada, the West, and the Circumpolar North, and we are mandated to publish innovative and experimental works that challenge the limits of established canons, subjects and formats. Series under development in several subject areas will promote and contribute to specific academic disciplines, and we aim to revitalize neglected forms such as diary, memoir and oral history.

At AU Press, we also publish scholarly websites with a particular focus on distance education and e-learning, labour studies, Métis and Aboriginal studies, gender studies and the environment.

Portico Studying E-Book Preservation

Portico is launching a e-Book preservation study, which will last the rest of the year.

Here's an excerpt from the press release:

In response to several requests from publishers and libraries, Portico is conducting a study in order to assess how to extend its archival infrastructure and service to respond to the emerging need to preserve e-books. During the study we will analyze the structure and preservation needs of e-books and determine what adjustments to Portico's existing, operational and technological infrastructure and the economic model developed to support e-journal preservation might be required in order to respond to this new genre. Portico's e-journal archiving service was developed through a pilot project that drew heavily upon engagement with publisher and library pilot participants. We anticipate that a similar process will be essential in understanding how best to respond to the challenges of e-book preservation. . . .

The current participants in the E-Book Preservation study include:

Publishers

  • American Math Society
  • Elsevier
  • Morgan Claypool
  • Taylor and Francis

Libraries

  • Case Western Reserve University
  • Cornell University Library
  • McGill University
  • SOLINET
  • Texas University Libraries
  • University College of London
  • Yale University Library

Cornell Joins Google Books Library Project

The Cornell University Library has joined the Google Books Library Project.

Here's an excerpt from the press release:

Google will digitize up to 500,000 works from Cornell University Library and make them available online using Google Book Search. As a result, materials from the library’s exceptional collections will be easily accessible to students, scholars and people worldwide, supporting the library’s long-standing commitment to make its collections broadly available.

“Research libraries today are integral partners in the academic enterprise through their support of research, teaching and learning. They also serve a public good by enhancing access to the works of the world's best minds,” said Interim University Librarian Anne R. Kenney. “As a major research library, Cornell University Library is pleased to join its peer institutions in this partnership with Google. The outcome of this relationship is a significant reduction in the time and effort associated with providing scholarly full-text resources online.”

Materials from Mann Library, one of 20 member libraries that comprise Cornell University Library, will be digitized as part of the agreement. Mann’s collections include some of the following subject areas: biological sciences, natural resources, plant, animal and environmental sciences, applied economics, management and public policy, human development, textiles and apparel, nutrition and food science.. . .

Cornell is the 27th institution to join the Google Book Search Library Project, which digitizes books from major libraries and makes it possible for Internet users to search their collections online. Over the next six years, Cornell will provide Google with public domain and copyrighted holdings from its collections. If a work has no copyright restrictions, the full text will be available for online viewing. For books protected by copyright, users will just get the basic background (such as the book’s title and the author’s name), at most a few lines of text related to their search and information about where they can buy or borrow a book. Cornell University Library will work with Google to choose materials that complement the contributions of the project’s other partners. In addition to making the materials available through its online search service, Google will also provide Cornell with a digital copy of all the materials scanned, which will eventually be incorporated into the university’s own digital library.

Open Access to Books: The Case of the Open Access Bibliography Updated

Last July, I reported on use of the Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals, which is both a printed book and a freely available e-book. Both versions are under a Creative Commons Attribution-NonCommercial 2.0 License. You can get a detailed history at the prior posting; the major changes since then have been the conversion of the HTML version to XHTML and the addition of a Google Custom Search Engine.

So, what does cumulative use of the e-book OAB version look like slightly over one year down the road from the last posting? Here's a summary:

  • UH PDF: 29,255 (March through May 2005)
  • All Web files on both Digital Scholarship hosts: 192,849 (33,814 uses of the PDF file; June 2005 through July 2007)
  • dLIST PDF: 655 (March 2005 to present)
  • E-LIS PDF: 556 (November 2005 to present)
  • ARL PDF: Not Available

Combined, OAB Web files have been accessed 223,315 times since March 2005.

Turning the Pages on an E-Book—Realistic Electronic Books

In this June 26th Google Tech Talk video titled Turning the Pages on an E-Book—Realistic Electronic Books, Veronica Liesaputra, PhD candidate at the University of Waikato, discusses her research on realistic e-books.

Here’s an excerpt from the presentation’s abstract:

In this talk, I will describe and demo a lightweight realistic book implementation that allows a document to be automatically presented with quick and easy-to-use animated page turning, while still providing readers with many advantages of electronic documents, such as hyperlinks and multimedia. I will also review computer graphics models for page-turning, from complex physical models based on the finite element method through 3D geometric models to simple "flatland" models involving reflection and rotation—which is what the demo uses.

British Library Licenses Turning the Pages Toolkit

The British Library has announced that it is now licensing its Turning the Pages Toolkit to libraries and museums. You can see the software in action at their Turning the Pages Web site.

Here’s an excerpt from the press release:

From today, libraries around the World will be able to license the award-winning Turning the Pages software used by the British Library to bring some of the world’s most rare and valuable books online.

Since its launch in 2004, Turning the Pages has grown to become one of the most popular resources at the British Library, allowing the Library to bring iconic treasures such as the Lindisfarne Gospels, Leonardo da Vinci’s Notebooks and Mercator’s Atlas of Europe online for everyone to see. With the launch of Turning the Pages 2.0, and a completely re-built software platform developed by Armadillo Systems, May 2007 also sees launch of a new "toolkit" that allows other libraries and museums around the World to create their own Turning the Pages gallery. . . .

Michael Stocking, Managing Director of Armadillo Systems and developer of the Turning the Pages software said "As well as making it easy for our customers to create their own collections, we also wanted to enhance the Turning the Pages experience. We have migrated the software to a new platform that places the book in a 3-D environment so, as well as being able to examine the book as a piece of text, users can now also examine it as an object. They can now look at the book from different angles, zoom in and even look at two books, side-by-side."

CLIR Receives Mellon Grant to Study Mass Digitization

According to a O’Reilly Radar posting, the Council on Library and Information Resources has been awarded a grant from the Mellon Foundation to study mass digitization efforts.

Here’s an excerpt from the posting that describes the grant’s objectives:

  1. Assess selected large scale digitization programs by exploring their efficacy and utility for conducting scholarship, in multiple fields or disciplines (humanities, sciences, etc.).
  2. Write and issue a report with findings and recommendations for improving the design of mass digitization projects.
  3. Create a Collegium that can serve in the long-term as an advisory group to mass digitization efforts, helping to assure and obtain the highest possible data quality and utility.
  4. Convene a series of meetings amongst scholars, libraries, publishers, and digitizing organizations to discuss ways of achieving these quality and design improvements.

Introducing the Networked Print Book

if:book reports that Manolis Kelaidis made a big splash at the O’Reilly Tools of Change for Publishing conference with his networked paper book.

Here’s a an excerpt from the posting:

Manolis Kelaidis, a designer at the Royal College of Art in London, has found a way to make printed pages digitally interactive. His "blueBook" prototype is a paper book with circuits embedded in each page and with text printed with conductive ink. When you touch a "linked" word on the page and your finger completes a circuit, sending a signal to a processor in the back cover which communicates by Bluetooth with a nearby computer, bringing up information on the screen.

Here’s an excerpt from a jusTaText posting about the demo:

Yes, he had a printed and bound book which communicated with his laptop. He simply touched the page, and the laptop reacted. It brought up pictures of the Mona Lisa. It translated Chinese. It played a piece of music. Kelaidis suggested that a library of such books might cross-refer, i.e. touching a section in one book might change the colors of the spines of related books on your shelves. Imagine.

POD for Library Users: New York Public Library Tries Espresso Book Machine

The New York Public Library’s Science, Industry, and Business Library has installed an Espresso Book Machine for public use through August.

Here’s an excerpt from the press release:

The first Espresso Book Machine™ ("the EBM") was installed and demonstrated today at the New York Public Library’s Science, Industry, and Business Library (SIBL). The patented automatic book making machine will revolutionize publishing by printing and delivering physical books within minutes. The EBM is a product of On Demand Books, LLC ("ODB"—www.ondemandbooks.com). . .

The Espresso Book Machine will be available to the public at SIBL through August, and will operate Monday-Saturday from 1 p.m. to 5 p.m. . . .

Library users will have the opportunity to print free copies of such public domain classics as "The Adventures of Tom Sawyer" by Mark Twain, "Moby Dick" by Herman Melville, "A Christmas Carol" by Charles Dickens and "Songs of Innocence" by William Blake, as well as appropriately themed in-copyright titles as Chris Anderson’s "The Long Tail" and Jason Epstein’s own "Book Business." The public domain titles were provided by the Open Content Alliance ("OCA"), a non-profit organization with a database of over 200,000 titles. The OCA and ODB are working closely to offer this digital content free of charge to libraries across the country. Both organizations have received partial funding from the Alfred P. Sloan Foundation. . . .

The EBM’s proprietary software transmits a digital file to the book machine, which automatically prints, binds, and trims the reader’s selection within minutes as a single, library-quality, paperback book, indistinguishable from the factory-made title.

Unlike existing print on demand technology, EBM’s are fully integrated, automatic machines that require minimal human intervention. They do not require a factory setting and are small enough to fit in a retail store or small library room. While traditional factory based print on demand machines usually cost over $1,000,000 per unit, the EBM is priced to be affordable for retailers and libraries. . . .

Additional EBM’s will be installed this fall at the New Orleans Public Library, the University of Alberta (Canada) campus bookstore, the Northshire Bookstore in Manchester, Vermont, and at the Open Content Alliance in San Francisco. Beta versions of the EBM are already in operation at the World Bank Infoshop in Washington, DC and the Bibliotheca Alexandrina (The Library of Alexandria, Egypt). National book retailers and hotel chains are among the companies in talks with ODB about ordering EBM’s in quantity.

The University of Maine and Two Public Libraries Adopt Emory’s Digitization Plan

Library Journal Academic Newswire reports that the University of Maine, the Toronto Public Library, and the Cincinnati Public Library will follow Emory University’s lead and digitize public domain works utilizing Kirtas scanners with print-on-demand copies being made available via BookSurge. (Also see the press release: "BookSurge, an Amazon Group, and Kirtas Collaborate to Preserve and Distribute Historic Archival Books.")

Source: "University of Maine, plus Toronto and Cincinnati Public Libraries Join Emory in Scan Alternative." Library Journal Academic Newswire, 21 June 2007.

CIC’s Digitization Contract with Google

Library Journal Academic Newswire has published a must-read article ("Questions Emerge as Terms of the CIC/Google Deal Become Public") about the Committee on Institutional Cooperation’s Google Book Search Library Project contract.

The article includes quotes from Peter Brantley, Digital Library Federation Executive Director, from his "Monetizing Libraries" posting about the contract (another must-read piece).

Here’s an excerpt from Brantley’s posting:

In other words—pretty much, unless Google ceases business operations, or there is a legal ruling or agreement with publishers that expressly permits these institutions (excepting Michigan and Wisconsin which have contracts of precedence) to receive digitized copies of In-Copyright material, it will be held in escrow until such time as it becomes public domain.

That could be a long wait. . . .

In an article early this year in The New Yorker, "Google’s Moon Shot," Jeffrey Toobin discusses possible outcomes of the antagonism this project has generated between Google and publishers. Paramount among them, in his mind, is a settlement. . . .

A settlement between Google and publishers would create a barrier to entry in part because the current litigation would not be resolved through court decision; any new entrant would be faced with the unresolved legal issues and required to re-enter the settlement process on their own terms. That, beyond the costs of mass digitization itself, is likely to deter almost any other actor in the market.

Emory Will Use Kirtas Scanner to Digitize Rare Books

Emory University’s Woodruff Library will use a Kirtas robotic book scanner to digitize rare books and to create PDF files that will be made available on the Internet and sold as print-on-demand books on Amazon.

Here’s an excerpt from the press release:

"We believe that mass digitization and print-on-demand publishing is an important new model for digital scholarship that is going to revolutionize the management of academic materials," said Martin Halbert, director for digital programs and systems at Emory’s Woodruff Library. "Information will no longer be lost in the mists of time when books go out of print. This is a way of opening up the past to the future."

Emory’s Woodruff Library is one of the premier research libraries in the United States, with extensive holdings in the humanities, including many rare and special collections. To increase accessibility to these aging materials, and ensure their preservation, the university purchased a Kirtas robotic book scanner, which can digitize as many as 50 books per day, transforming the pages from each volume into an Adobe Portable Document Format (PDF). The PDF files will be uploaded to a Web site where scholars can access them. If a scholar wishes to order a bound, printed copy of a digitized book, they can go to Amazon.com and order the book on line.

Emory will receive compensation from the sale of digitized copies, although Halbert stressed that the print-on-demand feature is not intended to generate a profit, but simply help the library recoup some of its costs in making out-of-print materials available.

Google Library Project Adds Committee on Institutional Cooperation (CIC)

The Google Book Search Library Project has an important new participant—the Committee on Institutional Cooperation (CIC). The CIC members are the University of Chicago, the University of Illinois, Indiana University, the University of Iowa, the University of Michigan, Michigan State University, the University of Minnesota, Northwestern University, Ohio State University, Pennsylvania State University, Purdue University, and the University of Wisconsin-Madison. As many as 10 million volumes will be digitized from the collections of these major research libraries.

Here’s an excerpt from the CIC press release:

This partnership between our 12 member universities and Google is unprecedented. What makes this work so exciting is that we will literally open the pages of millions of books that have been assembled on our library shelves over more than a century. In literally seconds, we’ll be able browse across the content of thousands of volumes, searching for words or phrases, and making links across those texts that would have taken weeks or months or years of dedicated and scrupulous analysis. It is an extraordinary effort, blending the efforts and aspirations of librarians, university administrators, and scholars from across 12 world-class research universities. And our corporate partner possesses unparalleled expertise in creating and opening the digital world to coherent and comprehensive searching.

The effort is not entirely without controversy—no great undertaking ever is. But our universities believe strongly in the power of information to change the world, and in preserving, protecting and extending access to information. We have carefully weighed and considered the intellectual property issues and believe that our effort is firmly within the guidelines of current copyright law, while providing some flexibility as those laws are tested in the new digital environment in the coming years.

Lawsuit Aside, McGraw-Hill Uses Google Book Search

According to an article in Network World, McGraw-Hill uses Google Book Search on its Web site in spite of the fact that it is suing Google over the product.

How can this be? McGraw-Hill participates in the Google Book Search Partner Program, which gives publishers control over access to their digitized books, but, at the same time, it objects to Google’s efforts to scan and make available copies of its books in libraries without its permission.

Source: Perez, Juan Carlos. "Google’s Book Search Available in Publisher Sites." Network World, 1 June 2007.