DLF and if:book Ponder Mass Digitization Issues

The Digital Library Federation and if:book are seeking comments on a series of questions about mass digitization issues that they will raise in invited brainstorming sessions as part of a project they are calling "The Really Modern Library."

Here's a suggestion: use CommentPress or a wiki to further refine ideas as the project evolves.

Source: Vershbow, Ben. "The Really Modern Library." if:book, 8 October 2007.

Digital Assets Factory Version 2.0 Released Under GPL

Bibliotheca Alexandrina has released version 2.0 of its open-source Digital Assets Factory software.

Here’s an excerpt from the project home page:

DAF v2.0 provides all the necessary tools required to manage the whole process of a digitization workflow, including its various Phases, User management, file movement and archiving. It provides the flexibility to manage multiple simultaneous projects with a diversity of materials, covering books, journals, newspapers, manuscripts, unbound materials, audio, video, and slides.

Advancing Knowledge: The IMLS/NEH Digital Partnership Grants Awarded

The Institute of Museum and Library Services and the National Endowment for the Humanities have announced the award of three grants under their Advancing Knowledge: The IMLS/NEH Digital Partnership program.

Here's an excerpt from the press release:

  • $347,520 to Historical Society of Pennsylvania for its project: PhilaPlace: A Neighborhood History and Culture Project. The Historical Society of Pennsylvania in collaboration with the Philadelphia Department of Records and the University of Pennsylvania’s School of Design will develop PhilaPlace, an interactive Web resource chronicling the history, culture, and architecture of Philadelphia's neighborhoods. Complete with maps, historical records, photographs, and digital models of select neighborhoods, PhilaPlace will serve as a prototype website for communities wishing to digitize their cultural heritage.
  • $349,939 to Tufts University, Medford for its project: Scalable Named Entity Identification in Classical Studies. The Perseus Project and the Collections and Archives of Tufts University will construct a testing database of scholarly and cultural documents about the ancient world. In the second part of the project, Tufts will develop a digital reference tool allowing researchers and librarians to conduct context-based “smart searches” of un-indexed words from existing databases in the Tufts Digital Library. By developing this database, and allowing for much shorter and complete context-based searches, Tufts hopes to lead scholars and students to the next generation of digital tools.
  • $349,996 to University of California, Berkeley for its project: Context and Relationships: Ireland and Irish Studies. The University of California, Berkeley in collaboration with the Queen’s University, Belfast, will develop a digital database of Irish studies materials to test three open-source digital tools. The Context Finder, Context Builder, and Context Provider tools will be aimed at establishing scholarly context. Using a common word search feature in digital collections, these tools will allow users to access the ideas that are associated with the words, thereby creating context through maps, primary texts and secondary works.

National Archives Seeks Comments on Draft Digitization Plan

The National Archives and Records Administration is soliciting comments on its draft Plan for Digitizing Archival Materials for Public Access, 2007-2016.

Here's an excerpt from the press release:

The document is divided into several sections. The first section, INTRODUCTION AND BACKGROUND, provides information on NARA's mission, our archival holdings, and our past experience with digitization, to give you the context of the draft Plan for Digitizing Archival Materials for Public Access, 2007-2016. Section II, PLAN OVERVIEW, describes our planned goals, activities, and priorities for digitization. Sections III through V provide listings of current digitization activities being carried out by NARA and through partnerships to digitize and make available archival materials. Appendix A contains draft operating principles that we are using as we enter into partnerships and Appendix B references relevant NARA guidance that applies to handling of archival materials being digitized and the technical guidelines for image creation and description.

Google Scholar Digitization Program

According to the article "Changes at Google Scholar: A Conversation with Anurag Acharya," Google Scholar has begun a small-scale, targeted journal digitization effort.

Here's a quote from the article:

Representing another effort to reach currently inaccessible content, Google Scholar now has its own digitization program. “It’s a small program,” said Acharya. “We mainly look for journals that would otherwise never get digitized. Under our proposal, we will digitize and host journal articles with the provision that they must be openly reachable in collaboration with publishers, fully downloadable, and fully readable. Once you get out of the U.S. and Western European space into the rest of the world, the opportunities to get and digitize research are very limited. They are often grateful for the help. It gives us the opportunity to get that country’s material or make that scholarly society more visible.”

Source: Quint, Barbara. "Changes at Google Scholar: A Conversation with Anurag Acharya." NewsBreaks 27 August 2007.

Cornell Joins Google Books Library Project

The Cornell University Library has joined the Google Books Library Project.

Here's an excerpt from the press release:

Google will digitize up to 500,000 works from Cornell University Library and make them available online using Google Book Search. As a result, materials from the library’s exceptional collections will be easily accessible to students, scholars and people worldwide, supporting the library’s long-standing commitment to make its collections broadly available.

“Research libraries today are integral partners in the academic enterprise through their support of research, teaching and learning. They also serve a public good by enhancing access to the works of the world's best minds,” said Interim University Librarian Anne R. Kenney. “As a major research library, Cornell University Library is pleased to join its peer institutions in this partnership with Google. The outcome of this relationship is a significant reduction in the time and effort associated with providing scholarly full-text resources online.”

Materials from Mann Library, one of 20 member libraries that comprise Cornell University Library, will be digitized as part of the agreement. Mann’s collections include some of the following subject areas: biological sciences, natural resources, plant, animal and environmental sciences, applied economics, management and public policy, human development, textiles and apparel, nutrition and food science.. . .

Cornell is the 27th institution to join the Google Book Search Library Project, which digitizes books from major libraries and makes it possible for Internet users to search their collections online. Over the next six years, Cornell will provide Google with public domain and copyrighted holdings from its collections. If a work has no copyright restrictions, the full text will be available for online viewing. For books protected by copyright, users will just get the basic background (such as the book’s title and the author’s name), at most a few lines of text related to their search and information about where they can buy or borrow a book. Cornell University Library will work with Google to choose materials that complement the contributions of the project’s other partners. In addition to making the materials available through its online search service, Google will also provide Cornell with a digital copy of all the materials scanned, which will eventually be incorporated into the university’s own digital library.

Australian Framework and Action Plan for Digital Heritage Collections

The Collections Council of Australia Ltd. has released Australian Framework and Action Plan for Digital Heritage Collections, Version 0.C3 for comment.

Here's an excerpt from the document:

This is the Collections Council of Australia's plan to prepare an Australian framework for digital heritage collections. It brings together information shared by people working in archives, galleries, libraries and museums at a Summit on Digital Collections held in 2006. It proposes an Action Plan to address issues shared by the Australian collections sector in relation to current and future management of digital heritage collections.

British Library Licenses Turning the Pages Toolkit

The British Library has announced that it is now licensing its Turning the Pages Toolkit to libraries and museums. You can see the software in action at their Turning the Pages Web site.

Here’s an excerpt from the press release:

From today, libraries around the World will be able to license the award-winning Turning the Pages software used by the British Library to bring some of the world’s most rare and valuable books online.

Since its launch in 2004, Turning the Pages has grown to become one of the most popular resources at the British Library, allowing the Library to bring iconic treasures such as the Lindisfarne Gospels, Leonardo da Vinci’s Notebooks and Mercator’s Atlas of Europe online for everyone to see. With the launch of Turning the Pages 2.0, and a completely re-built software platform developed by Armadillo Systems, May 2007 also sees launch of a new "toolkit" that allows other libraries and museums around the World to create their own Turning the Pages gallery. . . .

Michael Stocking, Managing Director of Armadillo Systems and developer of the Turning the Pages software said "As well as making it easy for our customers to create their own collections, we also wanted to enhance the Turning the Pages experience. We have migrated the software to a new platform that places the book in a 3-D environment so, as well as being able to examine the book as a piece of text, users can now also examine it as an object. They can now look at the book from different angles, zoom in and even look at two books, side-by-side."

CLIR Receives Mellon Grant to Study Mass Digitization

According to a O’Reilly Radar posting, the Council on Library and Information Resources has been awarded a grant from the Mellon Foundation to study mass digitization efforts.

Here’s an excerpt from the posting that describes the grant’s objectives:

  1. Assess selected large scale digitization programs by exploring their efficacy and utility for conducting scholarship, in multiple fields or disciplines (humanities, sciences, etc.).
  2. Write and issue a report with findings and recommendations for improving the design of mass digitization projects.
  3. Create a Collegium that can serve in the long-term as an advisory group to mass digitization efforts, helping to assure and obtain the highest possible data quality and utility.
  4. Convene a series of meetings amongst scholars, libraries, publishers, and digitizing organizations to discuss ways of achieving these quality and design improvements.

The University of Maine and Two Public Libraries Adopt Emory’s Digitization Plan

Library Journal Academic Newswire reports that the University of Maine, the Toronto Public Library, and the Cincinnati Public Library will follow Emory University’s lead and digitize public domain works utilizing Kirtas scanners with print-on-demand copies being made available via BookSurge. (Also see the press release: "BookSurge, an Amazon Group, and Kirtas Collaborate to Preserve and Distribute Historic Archival Books.")

Source: "University of Maine, plus Toronto and Cincinnati Public Libraries Join Emory in Scan Alternative." Library Journal Academic Newswire, 21 June 2007.

CIC’s Digitization Contract with Google

Library Journal Academic Newswire has published a must-read article ("Questions Emerge as Terms of the CIC/Google Deal Become Public") about the Committee on Institutional Cooperation’s Google Book Search Library Project contract.

The article includes quotes from Peter Brantley, Digital Library Federation Executive Director, from his "Monetizing Libraries" posting about the contract (another must-read piece).

Here’s an excerpt from Brantley’s posting:

In other words—pretty much, unless Google ceases business operations, or there is a legal ruling or agreement with publishers that expressly permits these institutions (excepting Michigan and Wisconsin which have contracts of precedence) to receive digitized copies of In-Copyright material, it will be held in escrow until such time as it becomes public domain.

That could be a long wait. . . .

In an article early this year in The New Yorker, "Google’s Moon Shot," Jeffrey Toobin discusses possible outcomes of the antagonism this project has generated between Google and publishers. Paramount among them, in his mind, is a settlement. . . .

A settlement between Google and publishers would create a barrier to entry in part because the current litigation would not be resolved through court decision; any new entrant would be faced with the unresolved legal issues and required to re-enter the settlement process on their own terms. That, beyond the costs of mass digitization itself, is likely to deter almost any other actor in the market.

Google Library Project Adds Committee on Institutional Cooperation (CIC)

The Google Book Search Library Project has an important new participant—the Committee on Institutional Cooperation (CIC). The CIC members are the University of Chicago, the University of Illinois, Indiana University, the University of Iowa, the University of Michigan, Michigan State University, the University of Minnesota, Northwestern University, Ohio State University, Pennsylvania State University, Purdue University, and the University of Wisconsin-Madison. As many as 10 million volumes will be digitized from the collections of these major research libraries.

Here’s an excerpt from the CIC press release:

This partnership between our 12 member universities and Google is unprecedented. What makes this work so exciting is that we will literally open the pages of millions of books that have been assembled on our library shelves over more than a century. In literally seconds, we’ll be able browse across the content of thousands of volumes, searching for words or phrases, and making links across those texts that would have taken weeks or months or years of dedicated and scrupulous analysis. It is an extraordinary effort, blending the efforts and aspirations of librarians, university administrators, and scholars from across 12 world-class research universities. And our corporate partner possesses unparalleled expertise in creating and opening the digital world to coherent and comprehensive searching.

The effort is not entirely without controversy—no great undertaking ever is. But our universities believe strongly in the power of information to change the world, and in preserving, protecting and extending access to information. We have carefully weighed and considered the intellectual property issues and believe that our effort is firmly within the guidelines of current copyright law, while providing some flexibility as those laws are tested in the new digital environment in the coming years.

Stanford’s Copyright Renewal Database

Researching the copyright status of post-1922 works in the US can be difficult, and this has been a barrier to digitization efforts. The Stanford University Libraries and Academic Information Resources have released a new copyright research tool that promises to make this process easier called the Copyright Renewal Database.

Here’s an excerpt from the press release:

An online database that enables people to search copyright-renewal records for books published in the United States between 1923 and 1963 has been launched by Stanford University Libraries and Academic Information Resources (SULAIR).

SULAIR developed the Copyright Renewal Database, dubbed the "Copyright Determinator," with a grant from the Hewlett Foundation. The effort built on Project Gutenberg’s transcriptions of the Catalog of Copyright Entries, which was published by the U.S. Copyright Office. . . .

Determining the copyright status of books has become a pressing issue as libraries and businesses develop plans to digitize materials and make works in the public domain widely available. In order to appropriately select books for digitization, these organizations need to determine efficiently and with some certainty the copyright status of each work in a large collection. The Determinator supports this process, bringing all 1923-1963 book-renewal records together in a single database and, more significantly, making searchable renewal records that had previously been distributed only in print.

U.S. works published from 1923 to 1963 are the only group of works for which renewal is now a concern. Renewals have expired for works published before 1923, and they are generally in the public domain. The 1976 Copyright Act made renewal automatic for works published after Jan. 1, 1964. Determining the renewal status of works published between 1923 and 1963 has been a challenge; the Copyright Office received renewals as early as 1950, but only records received by that office after 1977 are available in electronic form. Renewals received between 1950 and 1977 were announced and distributed only in a semi-annual print publication. For the Determinator databases, Stanford has converted the print records to machine-readable form and combined them with the electronic renewal records from the Copyright Office.

A Long Road Ahead for Digitization

The New York Times published an article today ("History, Digitized (and Abridged)") that examines the progress that has been made in digitization in the US. It doesn’t hold many surprises for those in the know, but it might be useful in orienting non-specialists to some of the challenges involved, especially those who think that everything is online on the Internet.

It also has some interesting tidbits, including a chart that shows the holdings of different types of materials in the National Archives and how many items have been digitized for each type.

It has some current cost data from the Library of Congress quoted below:

At the Library of Congress, for example, despite continuing and ambitious digitization efforts, perhaps only 10 percent of the 132 million objects held will be digitized in the foreseeable future. For one thing, costs are prohibitive. Scanning alone on smaller items ranges from $6 to $9 for a 35-millimeter slide, to $7 to $11 a page for presidential papers, to $12 to $25 for poster-size pieces.

It also discusses the copyright laws that apply to sound materials and their impact on digitization efforts:

When it comes to sound recordings, copyright law can introduce additional complications. Recordings made before 1972 are protected under state rather than federal laws, and under a provision of the 1976 Copyright Act, may be entitled to protection under state law until 2067. Also, an additional copyright restriction often applies to the underlying musical composition.

A study published in 2005 by the Library of Congress and the Council on Library and Information Resources found that some 84 percent of historical sound recordings spanning jazz, blues, gospel, country and classical music in the United States, and made from 1890 to 1964, have become virtually inaccessible.

An interesting, well-written article that’s worth a read.

Source: Hafner, Katie. "History, Digitized (and Abridged)." The New York Times, 11 March 2007, BU YT 1, 8-9.

Princeton Joins Google Book Search Library Project

The Princeton University Library has announced that it has joined the Google Book Search Library Project.

From the press release:

A new partnership between the Princeton University Library and Google soon will make approximately 1 million books in Princeton’s collection available online in a searchable format.

In a move designed to open Princeton’s vast resources to a broad international audience, the library will work with Google over the next six years to digitize books that are in the public domain and no longer under copyright. . . .

"We will be working with Google in the next several months to choose the subject areas to be digitized and the timetable for the work," [Karin] Trainer said. "Library staff, faculty and students will be invited to suggest which parts of our distinctive collections should be digitized."

Princeton is the 12th institution to join the Google Books Library Project. Books available in the Google Book Search also include those from collections at Harvard, Oxford, Stanford, the University of California, the University of Michigan, the University of Texas-Austin, the University of Virginia, the University of Wisconsin-Madison, the New York Public Library, the University Complutense of Madrid and the National Library of Catalonia.

Google also announced the new partnership in its Inside Google Book Search blog.