Digital Preservation: Life2 Final Project Report

JISC has released Life2 Final Project Report.

Here's an excerpt:

LIFE Model v2 outlines a fully-revised lifecycle model taking into account feedback from user groups, the Case Studies and the wider digital preservation community.

Generic Preservation Model (GPM) summarises the update to the preservation model with an accompanying spreadsheet. This model allows institutions to estimate potential digital preservation costs for their collections. The GPM fits into the updated LIFE Model.

An Economic Evaluation of LIFE was written by economist Bo-Christer Björk on the approach used for both the first and second phases of LIFE. This independent review validates the LIFE approach for lifecycle costing.

The SHERPA DP Case Study outlines the mapping of the repository services that CeRch provides to the LIFE Model. The SHERPA-LEAP Case Study maps three very different HE repositories to the LIFE Model. Goldsmiths University of London, Royal Holloway University of London and UCL (University College London) each provide exemplars of varying collections. Each institution’s repository is at a different stage of development.

The Newspapers Case Study successfully maps both analogue and digital newspaper collections to the LIFE Model. This success means that LIFE could be developed into a fully-compatible predictive tool across both analogue and digital collections, allowing for comparison both throughout the lifecycles of a collection and across different types of collections.

SHERPA DP2: Developing Services for Archiving and Preservation in a Distributed Environment—Final Report

JISC has released SHERPA DP2: Developing Services for Archiving and Preservation in a Distributed Environment—Final Report.

Here's an excerpt:

The SHERPA DP2 project (2007-2009) was a two year project funded by the JISC under the Digital Preservation and Records Management Programme. The project was led by the Centre for e- Research at King's College London (formerly the Executive of the Arts and Humanities Data Service), which is working with several institutions to develop a preservation service that will cater for the requirements of a diverse range of digital resources and web-based resources. In summary, the project has the following objectives:

  1. Extend and refine the OAIS-based Shared Services model created for the initial SHERPA DP project to accommodate the requirements of different Content Providers and varied collaborative methods.
  2. Produce a set of services that will assist with the capture and return of research data stored in distributed locations, building upon existing software tools.
  3. Expand upon the work processes and software tools developed for SHERPA DP(1) and SOAPI to cater for the curation and preservation of increasingly diverse resource types.

Digital Preservation: Media Vault Program Interim Report

The Media Vault Program has released Media Vault Program Interim Report.

Here's an excerpt:

All major studies and reports on the sustainability of digital resources point to a multitude of barriers that can be clustered into four factors:

Economic: Who owns the problem, and who benefits from the solutions? Who pays for the services, long-term preservation, development, and curation? . . . .

Technical: Simple services are needed, but they are not simple to build, implement and support in our complex environment. Successful structures that can support digital scholarship must account for user needs, emerging technologies/file formats, adverse working contexts (fieldwork, offline, multi-platform), and should be supported at the enterprise scale. . . .

Political/Organizational: . . . . there are good reasons for the various service provider organizations to innovate on their own, but there is much to gain from working together on common goals and milestones. In fact, where communities have succeeded in softening the boundaries between content producers and consumers, supporters and beneficiaries, significant successes have been achieved. . . .

Social: We live in interesting times . . . and the prevalence of cheap/stolen media has produced an expectation that things should be always available, conveniently packaged, and free. Where some organizations, such as the Long Now Foundation, are hoping to "provide counterpoint to today's "Faster/cheaper" mind set and promote 'slower/better' thinking," it may be up to those of us who care deeply about the persistence of research data to step up as the seas continue to change.

Harvard University Library Launched Web Archive Collection Service (WAX)

The Harvard University Library has launched its Web Archive Collection Service (WAX).

Here's an excerpt from the press release (posted on DIGLIB@infoserv.nlc-bnc.ca):

WAX began as a pilot project in July 2006, funded by the University's Library Digital Initiative (LDI) to address the management of web sites by collection managers for long-term archiving. It was the first LDI project specifically oriented toward preserving "born-digital" material. . . .

During the pilot, we explored the legal terrain and implemented several methods of mitigating risks. We investigated various technologies and developed work flow efficiencies for the collection managers and the technologists. We analyzed and implemented the metadata and deposit requirements for long term preservation in our repository. We continue to look at ways to ease the labor intensive nature of the QA process, to improve display as the software matures and to assess additional requirements for long term preservation. . . .

WAX was built using several open source tools developed by the Internet Archive and other International Internet Preservation Consortium (IIPC) members. These IIPC tools include the Heritrix web crawler; the Wayback index and rendering tool; and the NutchWAX index and search tool. WAX also uses Quartz open source job scheduling software from OpenSymphony.

In February 2009, the pilot public interface was launched and announced to the University community. WAX has now transitioned to a production system supported by the University Library's central infrastructure.

English-Language Summary of A Future for Our Digital Memory: Permanent Access to Information in the Netherlands

The Netherlands Coalition for Digital Preservation has released an English-language summary of A Future for Our Digital Memory: Permanent Access to Information in the Netherlands.

Here's an excerpt:

In order to underpin its strategy, the NCDD decided to first build a detailed picture of the current situation in the public sector in the Netherlands. Can institutions or domains be identified which have successfully risen to the challenge of digital preservation and permanent access? Which categories of data are in danger of being lost? How can the risks be managed? This so-called National Digital Preservation Survey was funded by the Ministry of Ministry of Education, Culture and Science.

After some preliminary consultancy work it was decided that the survey would best be carried out by researchers with both knowledge of the issues involved in digital preservation and of the three sectors, which were identified as: scholarly communications, government & archives, and culture & heritage. A team of three researchers was recruited from among NCDD member staff, with the NCDD coordinator leading the project. The initial objective, to conduct a statistically relevant quantitative survey, had to be abandoned early in the project. The field to be surveyed was vast and varied, and some of the target groups were quite unfamiliar with the specifics of digital preservation, making online surveys unproductive. Therefore, the research team decided on a methodology of (some seventy) semi-structured interviews with knowledgeable stakeholders, adding relevant information from both Dutch and foreign published sources. Five interviews were held with major private sector parties to establish whether the private sector has best practices to offer for the public sector to emulate.

Digital Preservation: Alpha Prototype of JHOVE2 Released

An alpha prototype of JHOVE2 is now available. JHOVE2 is a tool for the characterization (i.e., identification, validation, feature extraction, and assessment) of digital objects that is used for digital library and digital preservation purposes.

Here's an excerpt from the announcement:

An alpha prototype version of JHOVE2 is now available for download and evaluation (v. 0.5.2, 2009-08-05). Distribution packages (in zip and tar.gz form) are available on the JHOVE2 public wiki at (http://confluence.ucop.edu/display/JHOVE2Info/Download). The new JHOVE2 architecture reflected in this prototype is described in the attached architectural overview (also available at http://confluence.ucop.edu/display/JHOVE2Info/Architecture). . . .

The prototype supports the following features:

  • Appropriate recursive processing of directories and Zip files.
  • High performance buffered I/O using the Java nio package.
  • Message digesting for the following algorithms: Adler-32, CRC-32,
  • MD2, MD5, SHA-1, SHA-256, SHA-384, SHA-512
  • Results formatted as JSON, text (name/value pairs), and XML.
  • Use of the Spring Inversion-of-Control container for flexible module
  • configuration.
  • A complete UTF-8 module.
  • An minimally functional Shapefile module.

OCLC Presentations on Digital Curation and Web-scale Management Services

Below are streaming video OCLC presentations from ALA Annual 2009 on digital curation and Web-scale Management Services.

  • Integrating Technical Services and Preservation Workflows: "Mainstreaming Digital Resources. After an introduction from Geri Bunker Ingram of OCLC, Amy Rudersdorf (Director, Digital Information Management Program, The State Library of North Carolina) discusses integrating a whole host of systems into a digital curation workflow, including OCLC's Connexion tools, Digital Archive, WorldCat, Digital Collection Gateway and CONTENTdm."
  • OCLC Web-scale Management Services: "Presentation by Andrew Pace, OCLC Executive Director for Networked Library Services, ALA Annual 2009. Web-scale cooperative library management services, network-level tools for managing library collections through circulation and delivery, print and licensed acquisitions, and license management. These services complement existing OCLC Web-scale services, such as cataloging, resource sharing, and integrated discovery."

Digital Preservation: Repository of Authentic Digital Objects Source Code Released

The National Archive Institute of Portugal has released the Repository of Authentic Digital Objects source code.

RODA works in conjunction with the Fedora (Flexible Extensible Digital Object Repository Architecture) software.

Read more about it at "RODA—A Service-Oriented Repository to Preserve Authentic Digital Objects" and "Source Code Available from RODA 'Repository of Authentic Digital Objects'" (includes a QuickTime video about RODA).

California Digital Library's Web Archiving Service

The California Digital Library's Web Archiving Service's first collections are available at Web Archives: Yesterday's Web; Today's Archives.

Here's an excerpt from the press release:

Researchers and scholars now will be able to delve into archived Web sites captured by the California Digital Library's Web Archiving Service (WAS). This new tool enables faculty, researchers and librarians to capture, curate and preserve Web sites, thus creating permanent archives available to researchers everywhere. The social history of our times is now being preserved in archives as rich and varied as the contentious 2003 California recall election, hundreds of California state Web archives, the Guantanamo Bay Detention Camp Web archive and the Middle East Political Sites archive. New archives continually are being built and published and will appear along with the current archives, available at webarchives.cdlib.org/.

The Web has revolutionized our access to information. Documents and publications that once were difficult to find now are readily available to anyone at any time. Popular reactions to historical events unfold via blogs and personal Web sites, and we have an unprecedented view into popular culture and the formation of public policy. "This is a tool that can track censorship in China, political regimes in Iran, and social commentary around the world," states Laine Farley, California Digital Library's executive director. "CDL and the UC libraries are leading the way in building collections for the 21st century." . . .

CDL's Web Archiving Service is the result of a 4.5-year grant awarded by the Library of Congress National Digital Information and Infrastructure Preservation Program (NDIIPP). The program's mission is to develop a national strategy to collect, preserve and make available digital content, especially materials that are created only in digital formats, for current and future generations. Working with partners at the University of North Texas, New York University, Stanford University and the campuses of the University of California, the California Digital Library has built a service that is easy to use and allows librarians to begin preserving information that was slipping away. Martha Anderson, director of program management for NDIIPP at the Library of Congress, says, "There is a growing public interest in the archiving of public Web sites for future reference. The technical challenges of constantly changing sites and technologies and the enormity of the universe of potential content require immediate and focused action."

Proceedings of DigCCurr2009: Digital Curation: Practice, Promise, and Prospects

Helen R. Tibbo has published Proceedings of DigCCurr2009: Digital Curation: Practice, Promise, and Prospects on Lulu.

Here's the ad:

DigCCurr2009 was held on April 1-3, 2009 in Chapel Hill, North Carolina as part of the Preserving Access to Our Digital Future: Building an International Digital Curation Curriculum (DigCCurr) project. DigCCurr is a three-year (2006-2009), Institute of Museum and Library Services (IMLS)-funded project to develop a graduate-level curricular framework, course modules, and experiential components to prepare students for digital curation in various environments. Contributions to DigCCurr2009 take the form of long and short papers, posters and panels. Potential contributions were submitted for peer review by a rich and diverse panel of international experts. Reviewers evaluated the submissions based on clarity and organization of presentation and writing; originality, creativity and potential for new contributions to the field; and engagement (topics addressed would be appropriate for and engaging to the diverse audience of DigCCurr2009 participants).

DuraCloud to Test Cloud Technologies for Digital Preservation

DuraCloud will test cloud technologies for digital preservation purposes.

Here's an excerpt from the press release:

How long is long enough for our collective national digital heritage to be available and accessible? The Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) and DuraSpace have announced that they will launch a one-year pilot program to test the use of cloud technologies to enable perpetual access to digital content. The pilot will focus on a new cloud-based service, DuraCloud, developed and hosted by the DuraSpace organization. Among the NDIIPP partners participating in the DuraCloud pilot program are the New York Public Library and the Biodiversity Heritage Library.

Cloud technologies use remote computers to provide local services through the Internet. Duracloud will let an institution provide data storage and access without having to maintain its own dedicated technical infrastructure.

For NDIIPP partners, it is not enough to preserve digital materials without also having strategies in place to make that content accessible. NDIIPP is concerned with many types of digital content, including geospatial, audiovisual, images and text. The NDIIPP partners will focus on deploying access-oriented services that make it easier to share important cultural, historical and scientific materials with the world. To ensure perpetual access, valuable digital materials must be stored in a durable manner. DuraCloud will provide both storage and access services, including content replication and monitoring services that span multiple cloud-storage providers.

Martha Anderson, director of NDIIPP Program Management said "Broad online public access to significant scientific and cultural collections depends on providing the communities who are responsible for curating these materials with affordable access to preservation services. The NDIIPP DuraCloud pilot project with the DuraSpace organization is an opportunity to demonstrate affordable preservation and access solutions for communities of users who need this kind of help."

Digital Preservation: Presentations from 2009 NDIIPP Partners Meeting

Presentations from the 2009 NDIIPP Partners Meeting are now available.

Here's a quick selection:

Digital Preservation: Two-Year Pilot Project Evaluation

The Chesapeake Project has released its Two-Year Pilot Project Evaluation.

Here's an excerpt:

The Chesapeake Project began as a collaborative, two-year pilot program with the goal of preserving born-digital legal information published directly to the Web. It was implemented in early 2007 by the Georgetown Law Library and the State Law Libraries of Maryland and Virginia. Having successfully completed its pilot phase, The Chesapeake Project' legal information archive is now expanding.

The following document comprises the final evaluation and account of The Chesapeake Project's accomplishments during its two-year pilot phase, spanning from February 27, 2007, to February 28, 2009.

During this time, the project's digital archive was populated with more than 4,300 digital items representing nearly 1,900 Web-published titles, the vast majority of which have no print counterpart. Each of these titles were harvested from the Web, stored within a secure digital archive and assigned permanent archive URLs. Today, each archived digital title remains accessible to users, despite whether or not the original digital files have been altered or removed from their original locations on the Web.

A 2008 analysis of the digital archive's content showed that more than eight percent of the titles archived by The Chesapeake Project had disappeared from their original URLs within the project's first year, but remained accessible thanks to the project's efforts. The current evaluation demonstrates that this figure has increased significantly over the past year. In fact, as of March 2009, nearly 14 percent of the project's archived titles—approximately one in seven—have disappeared from their original locations on the Web.

Blog Reports about the National Digital Information Infrastructure Preservation Program Partners Meeting

Several blog reports are available about the recent National Digital Information Infrastructure Preservation Program partners meeting.

Library of Congress Releases Bagit: Transferring Content for Digital Preservation Video

The Library of Congress has released a digital video, Bagit: Transferring Content for Digital Preservation.

Here's the description:

The Library of Congress's steadily growing digital collections arrive primarily over the network rather than on hardware media. But that data transfer can be difficult because different organizations have different policies and technologies.

The Library—with the California Digital Library and Stanford University – has developed guidelines for creating and moving standardized digital containers, called "bags." A bag functions like a physical envelope that is used to send content through the mail but with bags, a user sends content from one computer to another.

Bags have a sparse, uncomplicated structure that transcends differences in institutional data, data architecture, formats and practices. A bag's minimal but essential metadata is machine readable, which makes it easy to automate ingest of the data. Bags can be sent over computer networks or physically moved using portable storage devices.

Bags have built-in inventory checking, to help ensure that content transferred intact. Bags are flexible and can work in many different settings, including situations where the content is located in more than one place. This video describes the preparation and transfer of data over the network in bags.

CoOL Moves to American Institute for Conservation

Conservation OnLine (CoOL) and the Conservation DistList are moving from the Stanford University Libraries to the American Institute for Conservation.

Here's an excerpt from the press release:

The American Institute for Conservation of Historic and Artistic Works (AIC) announced that they will now host Conservation OnLine (CoOL) after 22 years of its being hosted by Stanford University Libraries. CoOL is a web-based library of conservation information, covering a wide spectrum of topics of interest to those involved with the conservation of library, archives, and museum materials. It contains approximately 120,000 documents, including an online archive of the Journal of the American Institute for Conservation. It also includes the Conservation DistList, with 9,969 subscribers from at least 91 countries. CoOL serves as both an important resource for information and as a forum for conservation professionals all over the world.

AIC’s first priority is to make the DistList operational as soon as possible. Further announcements will be made as to the resumption of activity on the DistList and where other CoOL resources will be located in the future. We are continuing discussions with allied and affiliate organizations in order to make CoOL’s transition as seamless as possible.

Texas Conference on Digital Libraries 2009 Presentations

Presentations from the Texas Conference on Digital Libraries 2009 are now available.

Here's those by Texas Digital Library staff:

American Institute of Physics Will Use CLOCKSS Digital Archive

The American Institute of Physics will use the CLOCKSS (Controlled Lots of Copies Keep Stuff Safe) "dark" digital archive.

Here's an excerpt from the press release:

CLOCKSS will make AIP content freely available in the event that AIP is no longer able to provide access. . . .

The CLOCKSS initiative was created in response to the growing concern that digital content purchased by libraries may not always be available due to discontinuation of an electronic journal or because of a catastrophic event. CLOCKSS creates a secure, multi-site archive of web-published content that can be tapped into to provide ongoing access to researchers worldwide, free of charge.

"Today, when over one half of all our subscriptions are online only, we owe it to our customers more than ever to provide the best security possible for their electronic products," said Mark Cassar, AIP's Acting Publisher. "Our nearly three-year-old partnership with Portico, and now our participation in the CLOCKSS initiative, solidifies this commitment."

CLOCKSS' decentralized, geographically distributed preservation strategy ensures that the digital assets of the global research community will survive intact. Additionally, it satisfies the demand for locally situated archives with 15 archive nodes planned worldwide by 2010.

Curating Atmospheric Data for Long Term Use: Infrastructure and Preservation Issues for the Atmospheric Sciences Community

The Digital Curation Centre has released Curating Atmospheric Data for Long Term Use: Infrastructure and Preservation Issues for the Atmospheric Sciences Community, SCARP Case Study No. 2.

Here's an excerpt:

DCC SCARP aims to understand disciplinary approaches to data curation by substantial case studies based on an immersive approach. As part of the SCARP project we engaged with a number of archives, including the British Atmospheric Data Centre, the World Data Centre Archive at the Rutherford Appleton Laboratory and the European Incoherent Scatter Scientific Association (EISCAT). We developed a preservation analysis methodology which is discipline independent in application but none the less capable of identifying and drawing out discipline specific preservation requirements and issues. In this case study report we present the methodology along with its application to the Mesospheric Stratospheric Tropospheric (MST) radar dataset, which is currently supported by and accessed through the British Atmospheric Data Centre. We suggest strategies for the long term preservation of the MST data and make recommendations for the wider community.

Foundation Grants for Preservation in Libraries, Archives, and Museums, 2009 Edition

The Library of Congress and the Foundation Center have released Foundation Grants for Preservation in Libraries, Archives, and Museums, 2009 Edition.

Here's an excerpt from the announcement:

This publication lists 1,944 grants of $5,000 or more awarded by 488 foundations, from 2004 through the publication date of this guide. It covers grants to public, academic, research, school, and special libraries, and to archives and museums for activities related to conservation and preservation. This publication includes:

  • an introduction that explains the book's coverage, arrangement, entries, and how to research using the volume. Note: This PDF file contains hotlinks to free online tutorials that cover grant writing and provide an insight into the world of U.S. foundation giving offered by the Foundation Center, as well as to some other widely used non-profit guidance on preservation grants found on the Conservation Online web site.
  • a statistical analysis of grant funding in the area of preservation by foundation, recipient location, subject, recipient type (e.g., Library), grant size, and foundation generosity nationwide.
  • state-by-state descriptions of projects funded in preservation nationwide including the foundation's name, limitations on giving, recipient(s), size of grants, and purpose of the grant described. Note: This section is hot linked in the PDF version directly to more detailed descriptions of the foundations.
  • indexes by recipient, geographic area of the recipient, and subject. Note: If you do not find what you are looking for in the indices, use the find feature to search the text for your term.
  • a list of all foundations that have donated to preservation and conservation with their contact information and limitations on giving.

DPC What’s New in Digital Preservation, No. 20

DPC What's New in Digital Preservation number 20 has been published.

Here’s a description of the publication:

This is a summary of selected recent activity in the field of digital preservation compiled from a number of resources including the digital-preservation and padiforum-l mailing lists. Additional or related items of interest may also be included.