Portico Studying E-Book Preservation

Portico is launching a e-Book preservation study, which will last the rest of the year.

Here's an excerpt from the press release:

In response to several requests from publishers and libraries, Portico is conducting a study in order to assess how to extend its archival infrastructure and service to respond to the emerging need to preserve e-books. During the study we will analyze the structure and preservation needs of e-books and determine what adjustments to Portico's existing, operational and technological infrastructure and the economic model developed to support e-journal preservation might be required in order to respond to this new genre. Portico's e-journal archiving service was developed through a pilot project that drew heavily upon engagement with publisher and library pilot participants. We anticipate that a similar process will be essential in understanding how best to respond to the challenges of e-book preservation. . . .

The current participants in the E-Book Preservation study include:

Publishers

  • American Math Society
  • Elsevier
  • Morgan Claypool
  • Taylor and Francis

Libraries

  • Case Western Reserve University
  • Cornell University Library
  • McGill University
  • SOLINET
  • Texas University Libraries
  • University College of London
  • Yale University Library

Official Release of the kopal Library for Retrieval and Ingest

The German National Library and SUB Göttingen have announced the official release of the kopal Library for Retrieval and Ingest on diglib.

Here's an excerpt from the message:

The kopal project (Co-operative Development of a Long-term Digital Information Archive) was dedicated to find a solution to providing not only bitstream preservation but long-term accessibility as well in the form of a cooperatively developed and operated long-term archive for digital data. The German National Library, the Goettingen State and University Library, the Gesellschaft fuer wissenschaftliche Datenverarbeitung mbH Goettingen, and IBM Germany have been working in close cooperation on a technological solution. The now released software tools mark the successful development of such an archiving solution.

The Open-Source-Software koLibRI is a framework to integrate a long term preservation system as the IBM Digital Information Archiving System (DIAS) into the infrastructure of any institution. In particular, koLibRi organizes the creation and the import of Archival Information Packages into DIAS, and offers functions to retrieve and to govern them. Preservation methods like data customization and migration of data are part of the tasks of long term preservation. koLibRi Version 1.0 provides modules that manage future migration procedures. koLibRI Version 1.0 provides a completely functional and stable condition. Nevertheless, in the context of connecting new partners to the existing long term preservation system, the software will be constantly adjusted to the needs of different partners.

A documentation has been published with the conclusive release that describes the installation and the adjustment of a functional koLibRi-system and the basic internal layout to make individual development possible. The described release is offered for free download. . . .

100 Year Archive Requirements Survey

The Storage Networking Industry Association has released the 100 Year Archive Requirements Survey. Access requires registration.

Here's an excerpt from the "Survey Highlights":

  • 80% of respondents declared they have information they must keep over 50 years and 68% of respondents said they must keep it over 100 years. . . .
  • Long-term generally means greater than 10 to 15 years—the period beyond which multiple migrations take place and information is at risk. . .
  • Database information (structured data) was considered to be most at risk of loss. . .
  • Over 40% of respondents are keeping e-Mail records over 10 years. . . .
  • Physical migration is a big problem. Only 30% declared they were doing it correctly at 3-5 year intervals. . . .
  • 60% of respondents say they are ‘highly dissatisfied’ that they will be able to read their retained information in 50 years. . .
  • Help is needed—current practices are too manual, too prone to error, too costly and lack adequate coordination across the organization. . . .

Preserving the Digital Heritage: Principles and Policies

The Netherlands National Commission for UNESCO and the European Commission on Preservation and Access have published Preserving the Digital Heritage: Principles and Policies.

Here's an excerpt from the "Preface":

In November 2005, the Netherlands National Commission for UNESCO, in collaboration with the Koninklijke Bibliotheek (National Library of the Netherlands) and UNESCO’s Information Society Division, organized a conference entitled Preserving the Digital Heritage (The Hague, The Netherlands, 4-5 November 2005). It focused on two important issues: the selection of material to be preserved, and the division of tasks and responsibilities between institutions. This publication contains the four speeches given by the keynote speakers, preceded by a synthesis report of the conference.

Australian Framework and Action Plan for Digital Heritage Collections

The Collections Council of Australia Ltd. has released Australian Framework and Action Plan for Digital Heritage Collections, Version 0.C3 for comment.

Here's an excerpt from the document:

This is the Collections Council of Australia's plan to prepare an Australian framework for digital heritage collections. It brings together information shared by people working in archives, galleries, libraries and museums at a Summit on Digital Collections held in 2006. It proposes an Action Plan to address issues shared by the Australian collections sector in relation to current and future management of digital heritage collections.

Curation of Scientific Data: Challenges for Institutions and Their Repositories Podcast

A podcast of Chris Rusbridge’s "Curation of Scientific Data: Challenges for Institutions and their Repositories" presentation at The Adaptable Repository conference is now available. Rusbridge is Director of the Digital Curation Centre in the UK.

The PowerPoint for the presentation is also available.

Report of the Sustainability Guidelines for Australian Repositories Project (SUGAR)

The Australian Partnership for Sustainable Repositories (APSR) has released Report of the Sustainability Guidelines for Australian Repositories Project (SUGAR).

Here’s an excerpt from the report:

The Sustainability Guidelines for Australian Repositories service (SUGAR)was intended to support people working in tertiary education institutions whose activities do not focus on digital preservation. The target community creates and digitises content for a range of purposes to support learning, teaching and research. While some have access to technical and administrative support many others may not be aware of what they need to know. The typical SUGAR user may have little interest in discussions surrounding metadata, interoperability or digital preservation, and may simply want to know the essential steps involved in achieving the task at hand.

A key challenge for SUGAR was to provide a suitable level and amount of information to meet the immediate focus of the user and their level of expertise while introducing and encouraging consideration of issues of digital sustainability. SUGAR was also intended to stand alone as an online service unsupported by a helpdesk.

Towards an Open Source Repository and Preservation System

The UNESCO Memory of the World Programme, with the support of the Australian Partnership for Sustainable Repositories, has published Towards an Open Source Repository and Preservation System: Recommendations on the Implementation of an Open Source Digital Archival and Preservation System and on Related Software Development.

Here’s an excerpt from the Executive Summary and Recommendations:

This report defines the requirements for a digital archival and preservation system using standard hardware and describes a set of open source software which could used to implement it. There are two aspects of this report that distinguish it from other approaches. One is the complete or holistic approach to digital preservation. The report recognises that a functioning preservation system must consider all aspects of a digital repositories; Ingest, Access, Administration, Data Management, Preservation Planning and Archival Storage, including storage media and management software. Secondly, the report argues that, for simple digital objects, the solution to digital preservation is relatively well understood, and that what is needed are affordable tools, technology and training in using those systems.

An assumption of the report is that there is no ultimate, permanent storage media, nor will there be in the foreseeable future. It is instead necessary to design systems to manage the inevitable change from system to system. The aim and emphasis in digital preservation is to build sustainable systems rather than permanent carriers. . . .

The way open source communities, providers and distributors achieve their aims provides a model on how a sustainable archival system might work, be sustained, be upgraded and be developed as required. Similarly, many cultural institutions, archives and higher education institutions are participating in the open source software communities to influence the direction of the development of those softwares to their advantage, and ultimately to the advantage of the whole sector.

A fundamental finding of this report is that a simple, sustainable system that provides strategies to manage all the identified functions for digital preservation is necessary. It also finds that for simple discrete digital objects this is nearly possible. This report recommends that UNESCO supports the aggregation and development of an open source archival system, building on, and drawing together existing open source programs.

This report also recommends that UNESCO participates through its various committees, in open source software development on behalf of the countries, communities, and cultural institutions, who would benefit from a simple, yet sustainable, digital archival and preservation system. . . .

The University of Maine and Two Public Libraries Adopt Emory’s Digitization Plan

Library Journal Academic Newswire reports that the University of Maine, the Toronto Public Library, and the Cincinnati Public Library will follow Emory University’s lead and digitize public domain works utilizing Kirtas scanners with print-on-demand copies being made available via BookSurge. (Also see the press release: "BookSurge, an Amazon Group, and Kirtas Collaborate to Preserve and Distribute Historic Archival Books.")

Source: "University of Maine, plus Toronto and Cincinnati Public Libraries Join Emory in Scan Alternative." Library Journal Academic Newswire, 21 June 2007.

Emory Will Use Kirtas Scanner to Digitize Rare Books

Emory University’s Woodruff Library will use a Kirtas robotic book scanner to digitize rare books and to create PDF files that will be made available on the Internet and sold as print-on-demand books on Amazon.

Here’s an excerpt from the press release:

"We believe that mass digitization and print-on-demand publishing is an important new model for digital scholarship that is going to revolutionize the management of academic materials," said Martin Halbert, director for digital programs and systems at Emory’s Woodruff Library. "Information will no longer be lost in the mists of time when books go out of print. This is a way of opening up the past to the future."

Emory’s Woodruff Library is one of the premier research libraries in the United States, with extensive holdings in the humanities, including many rare and special collections. To increase accessibility to these aging materials, and ensure their preservation, the university purchased a Kirtas robotic book scanner, which can digitize as many as 50 books per day, transforming the pages from each volume into an Adobe Portable Document Format (PDF). The PDF files will be uploaded to a Web site where scholars can access them. If a scholar wishes to order a bound, printed copy of a digitized book, they can go to Amazon.com and order the book on line.

Emory will receive compensation from the sale of digitized copies, although Halbert stressed that the print-on-demand feature is not intended to generate a profit, but simply help the library recoup some of its costs in making out-of-print materials available.

ALCTS PARS Defining Digital Preservation Weblog

The Preservation and Reformatting Section (PARS) of the Association for Library Collections & Technical Services (ALCTS) has started the Defining Digital Preservation Weblog to get feedback on the efforts of a working group that has the following charge: "to draft a definition for digital preservation that would be suitable for the needs of PARS and available to support the work of ALCTS and ALA, for use on the web, in policy statements, and other documents."

The REMAP Project: Record Management and Preservation in Digital Repositories

The REMAP Project at the University of Hull has been funded by JISC investigate how record management and digital preservation functions can be best supported in digital repositories. It utilizes the Fedora system.

Here’s an except from the Project Aims page (I have added the links in this excerpt):

The REMAP project has the following aims:

  • To develop Records Management and Digital Preservation (RMDP) workflow(s) in order to understand how a digital repository can support these activities
  • To embed digital repository interaction within working practices for RMDP purposes
  • To further develop the use of a WSBPEL orchestration tool to work with external Web services, including the PRONOM Web services, to provide appropriate metadata and file information for RMDP
  • To develop and test a notification layer that can interact with the orchestration tool and allow RSS
    syndication to individuals alerting them to RMDP tasks
  • To develop and test an intermediate persistence layer to underpin the notification layer and interact
    with the WSBPEL orchestration tool to allow orchestrated workflows to take place over time
  • To test and validate the use of the enhanced WSBPEL tool with institutional staff involved in RMDP activities

Position Papers from the NSF/JISC Repositories Workshop

Position papers from the NSF/JISC Repositories Workshop are now available.

Here’s an excerpt from the Workshop’s Welcome and Themes page:

Here is some background information. A series of recent studies and reports have highlighted the ever-growing importance for all academic fields of data and information in digital formats. Studies have looked at digital information in science and in the humanities; at the role of data in Cyberinfrastructure; at repositories for large-scale digital libraries; and at the challenges of archiving and preservation of digital information. The goal of this workshop is to unite these separate studies. The NSF and JISC share two principal objectives: to develop a road map for research over the next ten years and what to support in the near term.

Here are the position papers:

RLG DigiNews Changes

In the latest issue of RLG DigiNews, Jim Michalko and Lorcan Dempsey announce significant changes to this journal. RLG DigiNews has been a five-star journal that has been essential reading for digital library and preservation specialists. I’d encourage my readers to voice support for its continued excellence as indicated in the below excerpt from the article:

The issue in front of you is the last of RLG DigiNews in its current form. As RLG continues to shape its combination with OCLC and create the new Programs and Research division, we are rethinking the publication program that will support our new agenda while providing readers and authors with the kind of vehicle that supports the re-invention of cultural institutions in the research, teaching, and learning process. RLG DigiNews will be an important part of this program. Expect to see it back with a renewed editorial direction. There’s much to do and coordinate but we’ve committed both the talent and the resources to make this happen. Watch for your next RLG DigiNews no later than January, 2008.

Thank you for your support. Let those responsible know that you’re looking forward to the future.

Thursday’s OAI5 Presentations

Presentations from Thursday’s sessions of the 5th Workshop on Innovations in Scholarly Communication in Geneva are now available.

Here are a few highlights from this major conference:

  • Business Models for Digital Repositories (PowerPoint): "Those setting up, or planning to set up, a digital repository may be interested to know more about what has gone before them. What is involved, what is the cost, how many people are needed, how have others made the case to their institution, and how do you get anything into it once it is built? I have recently undertaken a study of European repository business models for the DRIVER project and will present an overview of the findings."
  • DRIVER: Building a Sustainable Infrastructure of European Scientific Repositories (PowerPoint): "Ten partners from eight countries have entered into an international partnership, to connect and network as a first step more than 50 physically distributed institutional repositories to one, large-scale, virtual Knowledge Base of European research."
  • On the Golden Road : Open Access Publishing in Particle Physics (RealVideo): "A working party works now to bring together funding agencies, laboratories and libraries into a single consortium, called SCOAP3 (Sponsoring Consortium for Open access Publishing in Particle Physics). This consortium will engage with publishers towards building a sustainable model for open access publishing. In this model, subscription fees from multiple institutions are replaced with contracts with publishers of open access journals where the SCOAP3 consortium is a single financial partner."
  • Open Access Forever—Or Five Years, Whichever Comes First: Progress on Preserving the Digital Scholarly Record (RealVideo): "The current state of the curation and preservation of digital scholarship over its entire lifecycle will be reviewed, and progress on problems of specific interest to scholarly communication will be examined. The difficulty of curating the digital scholarly record and preserving it for future generations has important implications for the movement to make that record more open and accessible to the world, so this a timely topic for those who are interested in the future of scholarly communication."

(You may want to download PowerPoint Viewer 2007 if you don’t have PowerPoint 2007).

OpenLOCKSS Project

Led by the University of Glasgow Library, the new JISC-funded OpenLOCKSS project will preserve selected UK open access publications.

Here’s an excerpt from the project proposal:

Although LOCKSS has initially concentrated on negotiations with society and commercial publishers, there has always been an interest in smaller open-access journals, as evidenced by the LOCKSS Humanities Project1, where twelve major US libraries have collaborated to contact more than fifty predominantly North American open access journal titles, enabling them to be preserved within the LOCKSS system. . . .

At present, much open access content is under threat, and is difficult to preserve for posterity under standard arrangements, at least until the British Library, and the other UK national libraries, are able to take a more proactive and comprehensive stance in preserving websites comprising UK output. Many open access journals are small operations, often dependent on one or two enthusiastic editors, often based in university departments and/or small societies, concerned with producing the next issue, and often with very little interest in or knowledge of preservation considerations. Their long term survival beyond the first few issues can often be in doubt, but their content, where appropriate quality controls have been applied, is worthy of preservation.

LOCKSS is an ideal low-cost mechanism for ensuring preservation, provided that appropriate contacts can be made and plug-in developments completed, and sufficient libraries agree to host content, on the Humanities Project model. . . .

Earlier in 2006, a survey was carried out by the LOCKSS Pilot Project, to discover preferences for commercial/society publishers to approach with a view to participating in LOCKSS, and Content Complete Ltd have been undertaking this work, as well as negotiating with the NESLi2 publishers on their LOCKSS participation. . . .

We propose to consider initially the titles with at least six votes (it may not be appropriate to approach all these titles, for example we shall check that all are currently publishing and confirm that they appear to be of appropriate quality), followed by those with five or four votes. We propose that agreements for LOCKSS participation are concluded with at least twelve titles, with fifteen as a likely upper limit.

Is It Time to Stop Printing Journals?

There has been lively discussion about whether it is time to stop printing journals on Liblicense-l of late (March archive and April archive).

Here’s my take.

There are two aspects to this question: (1) Is the print journal format still required for reading purposes?; and (2) Is the print journal format still required to insure full access to journals given that many e-journals are under licenses (and are not owned by libraries) and digital preservation is still in its infancy?

It appears that the answer to (1) may finally be “no, for many users.” However, this may be contingent to some degree on the fact that many commercial e-journals are composed of article PDF files that allow users to print copies that replicate printed articles.

The answer to (2) is less clear, since continued access is contingent on periodic license negotiations and the changing business practices of publishers. Embargoes, ILL restrictions, incomplete back runs, and similar issues may give libraries pause. Very promising digital preservation efforts, such as LOCKSS and Portico still need to pass the test of time. Few libraries believe that publishers by themselves can be relied on to preserve e-journals (for one thing, publishers go out of business).

However, the reality for many libraries is that they have no choice but to dump print whenever possible for strictly economic reasons: print plus electronic is increasingly unaffordable for a variety of reasons.

Last Call for the International Digital Preservation Systems Survey

The Getty Research Institute is conducting an International Digital Preservation Systems Survey. It should yield interesting results, so help out by filling it out. March 30th is the last day.

Here’s a brief description from Karim Boughida and Sally Hubbard at the Getty:

This survey is intended to provide an overview of digital preservation system (DPS) implementation. DPS is defined here as an assembly of computer hardware, software and policies equivalent to a TDR (trusted digital repository) "whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now, and in the future"[1].

The survey was produced by the Getty Research Institute departments of Digital Resource Management and Library Information Systems, and will be distributed primarily among members of the Digital Library Federation (DLF). Results will be shared at the DLF Spring Forum, April 23-25, 2007 (Pasadena, California, USA), and with all respondents who provide contact information. . . .

[1] RLG. 2002. Trusted Digital Repositories: Attributes and Responsibilities. Mountain View, Calif.: RLG, Inc. http://www.rlg.org/en/pdfs/repositories.pdf.

Library of Congress Digital Preservation Web Site Redesigned

The Library of Congress has rolled out a new version of its Digital Preservation Web site.

Here’s a excerpt from Digital Preservation News: March 2007:

As you may have already noticed, the Web site for the National Digital Information Infrastructure and Preservation Program (NDIIPP) has been completely redesigned, with new content sections and more user-friendly text. The goal of the redesign was to make the subject of digital preservation in general and NDIIPP in particular more accessible to a wider audience.

Mellon Grants to CLIR/DLF

The Andrew W. Mellon Foundation has given grants to both the Council on Library and Information Resources and the Digital Library Federation.

Here’s an excerpt from the CLIR grant press release:

The Council on Library and Information Resources (CLIR) has received a three-year, $2.19 million grant from The Andrew W. Mellon Foundation to support general operations. The award will allow CLIR to launch a range of new initiatives in six program areas: cyberinfrastructure, preservation, the next scholar, the emerging library, leadership, and new models. . . .

The breadth of CLIR’s new agenda is represented in six interrelated program areas:

Cyberinfrastructure defines the base technologies of computation and communication, the software programs, and the data-curation and data-preservation programs needed to manage large-scale multimedia data sets, particularly those pertaining to the digital record of our cultural heritage;

Preservation explores sustainable strategies for preserving all media in a complex technological, policy, and economic environment;

The Next Scholar explores and assesses new methodologies, fields of inquiry, strategies for data gathering and collaboration, and modes of communication that are likely to define the next generation of scholars;

The Emerging Library explores and articulates the changing concept of the library with particular focus on its core functions and the consequences for staffing, research and teaching, and economic modeling;

Leadership investigates and defines the skills and expertise needed to administer, inspire, and inform the next generation; and

New Models extrapolates from an array of CLIR’s findings and other related research how academic organizations, institutions, behaviors, and culture may evolve over the coming decade.

Here’s an excerpt from the DLF grant press release:

The Digital Library Federation (DLF) has received an $816,000 grant from The Andrew W. Mellon Foundation for a project designed to make distributed digital collections easier for scholars to use. The project, DLF Aquifer Development for Interoperability Across Scholarly Repositories: American Social History Online, will implement schemas, data models, and technologies to enable scholars to use digital collections as one in a variety of local environments. . . .

The project will address the difficulty that humanities and social science scholars face in finding and using digital materials located in a variety of environments with a bewildering array of interfaces, access protocols, and usage requirements. DLF Aquifer seeks to provide scholars with consistent access to digital library collections pertaining to nineteenth- and twentieth-century U.S. social history across institutional boundaries. The collections are in a variety of formats and include maps and photographs from the Library of Congress historical collections; sheet music from the Sam DeVincent Collection of American Sheet Music at Indiana University; and an array of regional collections, such as Michigan County Histories from the University of Michigan and Tennessee Documentary History from the University of Tennessee, that will facilitate cross-regional studies when combined.

By integrating American Social History Online into a variety of local environments, the project will bring the library to the scholar and make distributed collections available through locally supported tools. The project will take two years to develop and implement, from April 2007 to March 2009.

PRESERV Project Report on Digital Preservation in Institutional Repositories

The JISC PRESERV (Preservation Eprint Services) project has issued a report titled Laying the Foundations for Repository Preservation Services: Final Report from the PRESERV Project.

Here’s an excerpt from the Executive Summary:

The PRESERV project (2005-2007) investigated long-term preservation for institutional repositories (IRs), by identifying preservation services in conjunction with specialists, such as national libraries and archives, and building support for services into popular repository software, in this case EPrints. . . .

PRESERV was able to work with The National Archives, which has produced PRONOMDROID, the pre-eminent tool for file format identification. Instead of linking PRONOM to individual repositories, we linked it to the widely used Registry of Open Access Repositories (ROAR), through an OAI harvesting service. As a result format profiles can be found for over 200 repositories listed in ROAR, what we call the PRONOM-ROAR service. . . .

The lubricant to ease the movement of data between the components of the services model is metadata, notably preservation metadata, which informs, describes and records a range of activities concerned with preserving specific digital objects. PRESERV identified a rich set of preservation metadata, based on the current standard in this area, PREMIS, and where this metadata could be generated in our model. . . .

The most important changes to EPrints software as a result of the project were the addition of a history module to record changes to an object and actions performed on an object, and application programs to package and disseminate data for delivery to an external service using either the Metadata Encoding and Transmission Standard (METS) or the MPEG-21 Part 2: Digital Item Declaration Language (DIDL). One change to the EPrints deposit interface is the option for authors to select a licence indicating rights for allowable use by service providers or users, and others. . . .

PRESERV has identified a powerful and flexible framework in which a wide range of preservation services from many providers can potentially be intermediated to many repositories by other types of repository services. It is proposed to develop and test this framework in the next phase of the project.

A Long Road Ahead for Digitization

The New York Times published an article today ("History, Digitized (and Abridged)") that examines the progress that has been made in digitization in the US. It doesn’t hold many surprises for those in the know, but it might be useful in orienting non-specialists to some of the challenges involved, especially those who think that everything is online on the Internet.

It also has some interesting tidbits, including a chart that shows the holdings of different types of materials in the National Archives and how many items have been digitized for each type.

It has some current cost data from the Library of Congress quoted below:

At the Library of Congress, for example, despite continuing and ambitious digitization efforts, perhaps only 10 percent of the 132 million objects held will be digitized in the foreseeable future. For one thing, costs are prohibitive. Scanning alone on smaller items ranges from $6 to $9 for a 35-millimeter slide, to $7 to $11 a page for presidential papers, to $12 to $25 for poster-size pieces.

It also discusses the copyright laws that apply to sound materials and their impact on digitization efforts:

When it comes to sound recordings, copyright law can introduce additional complications. Recordings made before 1972 are protected under state rather than federal laws, and under a provision of the 1976 Copyright Act, may be entitled to protection under state law until 2067. Also, an additional copyright restriction often applies to the underlying musical composition.

A study published in 2005 by the Library of Congress and the Council on Library and Information Resources found that some 84 percent of historical sound recordings spanning jazz, blues, gospel, country and classical music in the United States, and made from 1890 to 1964, have become virtually inaccessible.

An interesting, well-written article that’s worth a read.

Source: Hafner, Katie. "History, Digitized (and Abridged)." The New York Times, 11 March 2007, BU YT 1, 8-9.