"Ramping It Up: 10 Lessons Learned in Mass Digitisation"

Rose Holley, Manager of the Australian Newspapers Digitisation Program at the National Library of Australia, has self-archived "Ramping It Up: 10 Lessons Learned in Mass Digitisation" in E-LIS.

Here's an excerpt:

In 2007 the National Library of Australia (NLA) began a large-scale newspaper digitisation program that aimed to digitise one million pages (10 million articles) per year, with a view to increasing the volume over time and ramping up digitisation to include books and journals as well as newspapers. By the end of 2009 the NLA had learnt 10 key lessons about ramping up its digitisation activities into a mass-scale operation.

"Control of Museum Art Images: The Reach and Limits of Copyright and Licensing"

Melissa A. Brown and Kenneth D. Crews have self-archived "Control of Museum Art Images: The Reach and Limits of Copyright and Licensing" in SSRN.

Here's an excerpt:

Many museums and art libraries have digitized their collections of artworks. Digital imaging capabilities represent a significant development in the academic study of art, and they enhance the availability of art images to the public at large. The possible uses of these images are likewise broad. Many of these uses, however, are potentially defined by copyright law or by license agreements imposed by some museums and libraries that attempt to define allowable uses. Often, these terms and conditions will mean that an online image is not truly available for many purposes, including publication in the context of research or simple enjoyment. Not only do these terms and conditions restrict uses, they also have dubious legal standing after the Bridgeman case. This paper examines the legal premises behind claiming copyright in art images and the ability to impose license restrictions on their use.

This paper is one outcome of a study of museum licensing practices funded by The Samuel H. Kress Foundation. This paper is principally an introduction to the relevant law in the United States and a survey of examples of museum licenses. The project is in its early stages, with the expectation that later studies will expand on this introduction and provide greater analysis of the legal complications of copyright, the public domain, and the reach of license agreements as a means for controlling the use of artwork and potentially any other works, whether or not they fall within the scope of copyright protection.

National Library of the Netherlands Plans to Digitize All Dutch Books, Newspapers, and Periodicals from 1470

The National Library of the Netherlands has released its Strategic Plan 2010-2013.

In the "Strategic priority 1" section (page 6), the document states that the library intends to ultimately "digitise all Dutch books, newspapers and periodicals from 1470." By 2013, it states that: “10% of all Dutch books, newspapers and periodicals have been digitised (60 million pages by the KB, 13 million by third parties).” (Thanks to ResourceShelf.)

Here's an excerpt:

One of the large, labour-intensive challenges is to digitise all the books, periodicals and newspapers that have appeared in the Netherlands. A component of this undertaking is the digitisation of the special pre-1800 collections for which a number of Dutch university libraries and the KB have together drawn up a project plan. In addition, the KB has collected since 1995 born digital publications (publications which are only published in digital form, such as websites, digital periodicals, e-books, etc.). The KB will intensify this undertaking. The KB aims to be able to offer customers all publications with as few restrictions as possible. Naturally the KB does this in close consultation with publishers and right holder organisations.

Understanding the Costs of Digitisation: Detail Report

JISC has released Understanding the Costs of Digitisation: Detail Report.

Here's an excerpt:

This document is the detailed output of a study to synthesise the experiences of a range of digitisation projects to provide JISC and the digitisation community with an evidence base to support funding allocation, project planning and project and programme management. Case studies are drawn from five digitisation projects that supported this study, and links are provided to other resources that provide supporting information.

Also available: Understanding the Costs of Digitisation: A Briefing Paper.

Digitization Activities: Project Planning and Management Outline

The Federal Agencies Digitization Guidelines Initiative has released Digitization Activities: Project Planning and Management Outline.

Here's an excerpt from the announcement:

The aim of this document is to define activities relating to the digitization of original cultural materials, and to outline general steps for planning and management of this process. The activities described in this document address library/archival issues, imaging and conversion work, and IT infrastructure issues in particular, and were identified using project management outlines from several organizations with significant experience working with cultural materials. This document defines "digitization" as a complete process, and covers all project components from content selection through delivery of digitized objects into a repository environment.

"Removing All Restrictions: Cornell's New Policy on Use of Public Domain Reproductions"

Peter Hirtle, Cornell University Library's Senior Policy Advisor, is interviewed in "Removing All Restrictions: Cornell's New Policy on Use of Public Domain Reproductions," which has been published in the latest issue of Research Library Issues.

Here's an excerpt:

Restrictions on the use of public domain work, sometimes labeled "copyfraud," are generating increasing criticism from the scholarly community. With significant collections of public domain materials in their collections, research libraries are faced with the question of what restrictions, if any, to place on those who seek to scan or otherwise reproduce these resources with the intention of publication.

Cornell University Library has responded by adopting new permissions guidelines that open access by no longer requiring users to seek permission to publish public domain items duplicated from its collections. Users planning to scan and publish public domain material are still expected to determine that works are in the public domain where they live (since public domain determinations can vary internationally). Users must also respect noncopyright rights, such as the rights of privacy, publicity, and trademark. The Library will continue to charge service fees associated with the reproduction of analog material or the provision of versions of files different than what is freely available on the Web. The new guidelines are found at http://cdl.library.cornell.edu/guidelines.html.

Copyright and Cultural Institutions: Guidelines for Digitization for U.S. Libraries, Archives, and Museums

The Cornell University Library has published Copyright and Cultural Institutions: Guidelines for Digitization for U.S. Libraries, Archives, and Museums by Peter B. Hirtle, Emily Hudson, and Andrew T. Kenyon. A PDF copy of the book can be freely downloaded and the print version can be purchased from CreateSpace.

Here's an excerpt from the press release:

How can cultural heritage institutions legally use the Internet to improve public access to the rich collections they hold?

"Copyright and Cultural Institutions: Guidelines for Digitization for U.S. Libraries, Archives, and Museums," a new book by published today by Cornell University Library, can help professionals at these institutions answer that question.

Based on a well-received Australian manual written by Emily Hudson and Andrew T. Kenyon of the University of Melbourne, the book has been developed by Cornell University Library's senior policy advisor Peter B. Hirtle, along with Hudson and Kenyon, to conform to American law and practice.

The development of new digital technologies has led to fundamental changes in the ways that cultural institutions fulfill their public missions of access, preservation, research, and education. Many institutions are developing publicly accessible Web sites that allow users to visit online exhibitions, search collection databases, access images of collection items, and in some cases create their own digital content. Digitization, however, also raises the possibility of copyright infringement. It is imperative that staff in libraries, archives, and museums understand fundamental copyright principles and how institutional procedures can be affected by the law.

"Copyright and Cultural Institutions" was written to assist understanding and compliance with copyright law. It addresses the basics of copyright law and the exclusive rights of the copyright owner, the major exemptions used by cultural heritage institutions, and stresses the importance of "risk assessment" when conducting any digitization project. Case studies on digitizing oral histories and student work are also included.

Hirtle is the former director of the Cornell Institute for Digital Collections, and the book evolved from his recognition of the need for such a guide when he led museum and library digitization projects. After reading Hudson and Kenyon's Australian guidelines, he realized that an American edition would be invaluable to anyone contemplating a digital edition.

Anne R. Kenney, the Carl A. Kroch University Librarian at Cornell University, noted: "The Library has a long tradition of making available to other professionals the products of its research and expertise. I am delighted that this new volume can join the ranks with award-winning library publications on digitization and preservation."

As an experiment in open-access publishing, the Library has made the work available in two formats. Print copies of the work are available from CreateSpace, an Amazon subsidiary. In addition, the entire text is available as a free download through eCommons, Cornell University's institutional repository, and from SSRN.com, which already distributes the Australian guidelines.

DiSCmap: Digitisation in Special Collections: Mapping, Assessment, Prioritisation. Final Report.

JISC has released DiSCmap: Digitisation in Special Collections: Mapping, Assessment, Prioritisation. Final Report..

Here's an excerpt:

In its widest sense the project contributes towards preliminary evidence on user-driven priorities which could help in the process of allocation of funding for digitisation projects. It also can help to define the purpose, value and impact of digitisation not on institutional basis but on UK HE scale. By development of a framework of user-driven prioritisation criteria, DiSCmap contributes towards the longer-term goal of developing a quantifiable and adjustable system of metrics in the digitisation life cycle especially addressing the selection phase.

The amount of collections nominated to the long list [of 945 collections nominated for digitisation] reached beyond the expectations of the project team. This list itself is a valuable outcome which should be enriched further in order to provide a broad and trustworthy basis for the future digitisation decisions. DiSCmap surveyed over 1000 intermediaries and end users; this report presents in a very condensed form only a small proportion of the total evidence on user demand gathered by the project team. Yet in analysing and representing fully the range of end user priorities, DiSCmap has made a considerable advance in identifying the actual digitisation needs of end users. It has done so with the aim of removing the element of guesswork and assumption hitherto inherent in our understanding of user requirements in this area. The combination of intermediary' and end user' studies provides a richness of view points which highlight the many important different aspects related to the user dimension in digitisation.

Yale University Library Gets Two Grants to Digitize Middle Eastern Materials

The Yale University Library has been awarded two grants totaling $890,000 to digitize Middle Eastern materials.

Here's an excerpt from "Yale Digitizes Documents":

The Yale University Library has received a $650,000 four-year grant from the U.S. Department of Education to digitize Syrian and Palestinian government records, and a $240,000 joint grant from the National Endowment for the Humanities and the Joint Information Systems Committee to digitize Middle Eastern scholarly materials, according to a press release Thursday. The library will use advanced technology to translate the digitized text into searchable text, which will be available online.

Harvard College Library and the National Library of China to Digitize 51,500-Volume Chinese Rare Book Collection

The Harvard College Library and the National Library of China will collaborate to digitize and make freely available the 51,500-volume Chinese rare book collection of Harvard-Yenching Library.

Here's an excerpt from the announcement:

Among the largest cooperative projects of its kind ever undertaken between China and US libraries, the project will digitize Harvard-Yenching Library's entire 51,500-volume Chinese rare book collection. One of the libraries which make up the Harvard College Library system, Harvard-Yenching is the largest university library for East Asian research in the Western world. When completed, the project will have a transformative affect on scholarship involving rare Chinese texts, Harvard-Yenching Librarian James Cheng predicted. . . .

The six-year project will be done in two three-year phases. The first phase, beginning in January 2010, will digitize books from the Song, Yuan and Ming dynasties, which date from about 960 AD to 1644. The second phase, starting in January 2013, will digitize books from the Qing Dynasty, which date from 1644 until 1795. The collection includes materials which cover an extensive range of subjects, including history, philosophy, drama, belles letters and classics.

All of the rare books will have to be examined carefully to identify those that are fragile, damaged, or are sewn in a way that hides text along the binding margin. To determine which volumes may need conservation treatment, project manager Sharon Li-Shiuan Yang, head of access services at Harvard-Yenching Library, and her team will receive training in basic condition assessment from the Weissman Preservation Center, which treats Harvard's rare library materials. Items needing repair will be sent to the Weissman for treatment by conservators before being digitized.

The digitization work will be performed by HCL Imaging Services group in its state-of-the-art lab in Widener Library, where staff members have been working to design new equipment and workflows in preparation for the huge project, said Imaging Services head Bill Comstock.

The scale of the project will present HCL and the National Library of China with many organizational and technical challenges," Comstock said. "We look forward to partnering with NLC staff, led by Dr. Zhi-geng Wang, the Director of the NLC's Department for Digital Resources and Services, to build innovative new tools and procedures that will make our work on this and other projects more robust and efficient."

No Contract Awarded for GPO Mass Digitization of All Federal Publications

The U.S. Government Printing Office has been unable to award a contract for the digitization of all Federal publications.

Here's an excerpt from the announcement:

In 2004, GPO proposed digitizing all retrospective Federal publications back to the earliest days of the Federal Government. Following the conduct of a pilot project in 2006 and its evaluation in 2007, we issued an RFP in 2008 for a cooperative relationship with a public or private sector participant or participants where the uncompressed, unaltered files created as a result of the conversion process would be delivered to GPO at no cost to the Government, for ingest into GPO's Federal Digital System (FDsys). Unfortunately, we were unable to make an award for this RFP in the allocated timeframe.

We are very disappointed in this setback, but are currently developing new digitization alternatives. In addition to our longstanding goal of serving as one of the repositories for electronic files through the submission of material to FDsys, our focus for digitization will be on coordinating projects among institutions, assisting in the establishment and implementation of preservation guidelines, maintaining a registry of digitization projects, and ensuring that there is appropriate bibliographic metadata for the titles in the collection.

Yale: "Digitization Project Derailed"

In "Digitization Project Derailed," Carol Hsin discusses the status of digitization efforts at the Yale University Library. (Thanks to ResourceShelf.)

Here's an excerpt:

Four months after Microsoft abruptly terminated its multi-million dollar book digitization deal with the University, Yale officials said they will have to wait for donations or grants to come in before they start another major book scanning project.

New York Public Library and Kirtas Technologies Make Half-Million Public Domain Books Available

The New York Public Library and Kirtas Technologies are making a half-million public domain books available for sale as digitized or printed copies.

Here's an excerpt from the press release:

Readers and researchers looking for hard-to-find books now have the opportunity to dip into the collections of one of the world's most comprehensive libraries to purchase digitized copies of public domain titles. Through their Digitize-on-Demand program, Kirtas Technologies has partnered with The New York Public Library to make 500,000 public domain works from the Library's collections available (to anyone in the world).

"New technology has allowed the Library to greatly expand access to its collections," said Paul LeClerc, President of The New York Public Library. "Now, for the first time, library users are able to order copies of specific items from our vast public domain collections that are useful to them. Additionally the program creates a digital legacy for future users of the same item and a revenue stream to support our operations. We are very pleased to participate in a program that is so beneficial to everyone involved."

Using existing information from NYPL's catalog records, Kirtas will make the library's public domain books available for sale through its retail site before they are ever digitized. Customers can search for a desired title on www.kirtasbooks.com and place an order for that book. When the order is placed, only then is it pulled from the shelf, digitized and made available as a high-quality reprint or digital file.

What makes this approach to digitization unique is that NYPL incurs no up-front printing, production or storage costs. It also provides the library with a self-funding, commercial model helping it to sustain its digitization programs in the future. Unlike other free or low-cost digitization programs, the library retains the rights and ownership to their own digitized content.

What to Withdraw: Print Collections Management in the Wake of Digitization

Ithaka has released What to Withdraw: Print Collections Management in the Wake of Digitization .

Here's an excerpt from the announcement:

Based on the expected continuing needs for print materials, this report considers the minimum time period for which access to the original will be required and assesses the number of print copies necessary to ensure that these goals are met. While complex, this methodology provides for a variety of risk profiles based on key characteristics, with preservation recommendations that similarly vary. For example, many materials that are adequately digitized and preserved in digital form, contain few images, and are held in certain quantities in system-wide print repositories may be safely withdrawn from local print holdings without impacting either preservation or access.

At the same time, the report warns that other print materials may not yet be ready for broad withdrawal without threatening both access and preservation goals. For these materials, a number of strategies are recommended to increase the flexibility available to libraries in the future.

Six TexTreasures Digitization Grants Awarded

The Texas State Library and Archives Commission has awarded digitization grants to six TexShare member libraries.

Here's an excerpt from the press release :

The exciting projects that have been funded are:

  • "Houston Oral History Project" ($25,000) – The Houston Public Library is partnering with Houston Mayor Bill White to preserve and make the video-recordings of significant Houstonians available on the web. This grant will convert an additional 288 hours of audiotapes from cassette or reel-to-reel to digital format along with transcripts for the collection.
  • "The Bexar Archives" ($19,930) – The Dolph Briscoe Center for American History at the University of Texas at Austin will create a research tool, called Bexar Archives Online, which joins digital images of the original Spanish documents with the corresponding English-language translations.
  • "Marion Butts Photography Negatives Project" ($17,571) – The Dallas Public Library will use the photographic records produced by Marion Butts, an African-American photographer and editor of the Dallas Express, as well as other primary source materials such as maps, Negro city directories and oral histories to develop a series of online Texas-focused, TEKS-based lesson plans targeting seventh grade students. The records chronicle Dallas and Texas history during the segregation and civil rights eras.
  • "Lady Bird Johnson Photo Collection Project" ($16,610) – The Lady Bird Johnson Wildflower Center at the University of Texas at Austin will digitize and provide access to a unique collection of photographs of Claudia Taylor "Lady Bird" Johnson. She is the wife of former President Lyndon B. Johnson, and was born in Karnack, Texas. As the First Lady of the United States from 1963-69, she was an advocate for nature, beautification and conservation of natural resources. Most of the photographs in this collection date after her return to Texas.
  • "Itinerant Photographer Collection" ($14,389) – The Harry Ransom Center at the University of Texas at Austin will preserve and digitize a collection of glass plate negatives depicting local businesses owners and employees in Corpus Christi, which were taken by an unidentified photographer in February 1934 during the Depression. The center will provide an online finding aid, an online catalog record and an online exhibit of the fragile items now in danger of emulsion loss.
  • "Tejano Voices Project" ($6,500) – The University of Texas at Arlington Library will digitize and describe 13 oral history interviews from notable Tejanos and Tejanas from across Texas conducted in 1992-2003 by Dr. Jose Angel Gutierrez, associate professor of political science at UT Arlington. Many of the interviews emphasize the personal struggles, from individuals of Mexican decent, who are the first in their communities elected or appointed to government offices. The interviews also reflect the history of the Tejano community as it pressed for an end to racial segregation in the state and access to political power in the post-WWII period.

Two Presentations from the ALA 2009 "Digital Library Hardware Showcase" Session

Below are two presentations from the ALA 2009 "Digital Library Hardware Showcase" session.

Presentations from JISC Digital Content Conference 2009

Presentations from the JISC Digital Content Conference 2009 are now available.

Here's an excerpt from the conference page:

In the context of the completion of Phase 2 of the JISC Digitisation Programme the JISC Digital Content Conference aims to discuss and decide the next steps that need to be taken to ensure the sustained integration of digitised content into research and education and is one of the most important events of 2009. It will consider the issues facing the UK's universities as they deal with creating, delivering, sustaining and using a whole range of digital content as well as looking into future opportunities and challenges. The following thematic strands will run throughout the conference: Managing Content; Content Development Strategies; Content In Education; User Engagement; Looking Into The Future.

Mass Digitisation: The IMPACT Project

Fifteen institutions from Europe and the UK have launched the IMPACT project.

Here's an excerpt from the press release:

Feeding into the EU's i2010 vision to significantly improve access to Europe's cultural heritage, the British Library and the University of Salford have teamed up with a group of 15 institutions from across the continent as part of the four-year IMPACT project—IMProving Access to Text—to remove the barriers that stand in the way of the mass digitisation of the European cultural heritage.

Led by the National Library of the Netherlands, Koninklijke Bibliotheek, the IMPACT project aims to share expertise from across Europe and establish international best practice guidelines with a view to speeding up, standardising and enhancing the quality of mass digitisation through establishing a Centre of Competence for text based digitisation. As one of the main participants, the British Library has taken the lead on one of IMPACT's four sub-projects, establishing the operational context of the work carried out by contributors to the project.

Mass digitisation has become one of the most prominent issues in the library world over the last 5 years, with a number of experienced libraries in Europe already scanning millions of pages each year. To help establish some standardisation over the course of the project, the British Library's team will lead work on a set of 'Decision Support Tools' in an effort to focus on practical implementation support, providing guidance on digitisation workflow, the capturing of material and the organisation of metadata based on the real world experiences of project partners. These measures, announced at the first IMPACT conference in April will help ensure new material can be digitised successfully and feed into existing workflows. . . .

With extensive experience working with the digitisation of historic material, the British Library has also been working closely with technical experts at the internationally distinguished Pattern Recognition and Image Analysis (PRImA) research group, University of Salford, exploring methods of improving Optical Character Recognition (OCR) for use in the digitisation of less standardised material. OCR technology was absolutely vital for the delivery of the Library's recent newspaper digitisation project of 19th Century UK newspapers (http://newspapers.bl.uk/blcs), allowing the text to be fully searchable, but the current technology has it limitations. . . .

Through collaboration IMPACT has already established methods for overcoming issues with geometric correction, border removal and binarisation, and is looking at examples of best practice from around the world, such as the Australian Newspaper Digitisation project's cutting edge application of collaborative user generated corrections, to increase resource discovery success for historic mass digitisation.

Planning and Managing the Digitization of Library and Archive Materials: A Multi-Model Approach Presentation

John Weaver et al. have made their "Planning and Managing the Digitization of Library and Archive Materials: A Multi-Model Approach" presentation available on SlideShare.

Here's an excerpt from the transcript:

This workshop will enable you to:

  • Identify different models and methods for digitizing library and archival materials
  • Identify the relative advantages and disadvantages of these models
  • Define and evaluate a potential digitization project at your library
  • Identify key considerations in planning and funding a digitization project
  • Identify and develop management and production processes for different types of digitization projects
  • Discover additional, relevant resources for planning and managing digitization projects

eIFL Case Studies on Low Cost Digitisation Projects: Final Report

eIFL.net has released eIFL Case Studies on Low Cost Digitisation Projects: Final Report.

Here's an excerpt:

This report summarizes the experiences with digitisation by some eIFL countries. Although there are probably many more examples of digitisation in eIFL countries, this report includes only those where the country responded to the survey.

The main objective of this study was to raise awareness about best practice digitisation projects that are: (1) affordable, (2) easily managed at the technical and organisation level, (3) sustainable, and (4) enable eIFL countries to preserve and promote their local content online.

Libraries in eIFL countries with digitisation projects were asked to complete a survey that asked them about the intent of their projects. Surveys were completed and returned by libraries with additional relevant information, such as pictures illustrating the digitisation (scanning). Subsequently, the respondents were interviewed briefly by phone or Skype about their survey answers.

Nevada Statewide Digital Planning 2008-2009: Final Report

The Nevada Statewide Digital Advisory Committee and the Nevada State Library and Archives have released the Nevada Statewide Digital Planning 2008-2009: Final Report (Thanks to Virtual Library Notes).

Here's an excerpt:

The Statewide Digital Plan (April, 2009) was developed under the leadership of the Nevada State Library and Archives (NSLA) and the Statewide Digital Advisory Committee (SDAC) (Appendix A). Through a series of activities that involved a wide range of Nevadans, including the cultural heritage community, K-12 community, and community arts organization, four goals and objectives were developed and activities prioritized.

Over the next five years the library and cultural heritage community will focus on these goals:

Goal I: Provide online access to digital collections held by Nevada cultural heritage organizations and allied information providers that are distributed throughout Nevada.

Goal II: Develop & implement standards/best practices that will support access to Nevada’s digital collections.

Goal III: Develop a leadership/governance structure that will support the growth and sustainability of a standards-compliant digital initiative created by Nevada’s cultural heritage organizations and allied information providers.

Goal IV: Establish a collaborative digitization model where the full range of types and sizes of Nevada cultural heritage organizations and allied information providers can participate.

Welsh Journals Online: Final Report

JISC has released Welsh Journals Online: Final Report.

Here's an excerpt:

Welsh Journals Online is the most challenging digitisation project ever undertaken by the National Library of Wales. It aimed to create a website giving free searchable and browsable access to the contents of back-numbers of the major journals relating to Wales or the Welsh language. These journals form the core of the Library’s collection of printed books and are its most-used resource.

The journals were chosen to represent the diversity of material available, and cover English- and Welsh-language titles including scholarly articles on topics from archaeology to zoology, poetry, fiction, reviews and obituaries. The project publishes 400,000 pages of text, from 52 titles; the 180,000 pages of Welsh content represents the single largest corpus of text in the language available on the web. Some of the titles are well-known and widely used as sources (eg Archaeologia Cambrensis), while others have been overlooked or are difficult to access (Yr Arloeswr). . . .

The website is fully exposed to Google and it is likely that many new users will find the resource through general searching of the web. For those who are unfamiliar with the journal literature of Wales some contextual help is provided in the form of factsheets; lesson plans based upon these have also been created to assists teachers wishing to use the Welsh Journals Online website to discuss the questions of copyright, searching, or referencing.

The majority of the material is covered by copyright, and licensing and rights management formed a significant part of the project. The need to control display at page level (so that where necessary a single article or photograph could be blanked) required detailed metadata to record permission, gathered in cooperation with the publishers. Of the titles included, the proportion of blanked pages is very low (less than 0.1%), but rights issues led to the exclusion of some titles completely. The Library did not offer any payment for permission and works by Dylan Thomas, Robert Graves, and R S Thomas are therefore not shown. Given that the cost-per-page of web publication is approximately £2, the payment of even minimal fees would transform the economics of mass-digitisation.