Digital Preservation: Alpha Prototype of JHOVE2 Released

An alpha prototype of JHOVE2 is now available. JHOVE2 is a tool for the characterization (i.e., identification, validation, feature extraction, and assessment) of digital objects that is used for digital library and digital preservation purposes.

Here's an excerpt from the announcement:

An alpha prototype version of JHOVE2 is now available for download and evaluation (v. 0.5.2, 2009-08-05). Distribution packages (in zip and tar.gz form) are available on the JHOVE2 public wiki at (http://confluence.ucop.edu/display/JHOVE2Info/Download). The new JHOVE2 architecture reflected in this prototype is described in the attached architectural overview (also available at http://confluence.ucop.edu/display/JHOVE2Info/Architecture). . . .

The prototype supports the following features:

  • Appropriate recursive processing of directories and Zip files.
  • High performance buffered I/O using the Java nio package.
  • Message digesting for the following algorithms: Adler-32, CRC-32,
  • MD2, MD5, SHA-1, SHA-256, SHA-384, SHA-512
  • Results formatted as JSON, text (name/value pairs), and XML.
  • Use of the Spring Inversion-of-Control container for flexible module
  • configuration.
  • A complete UTF-8 module.
  • An minimally functional Shapefile module.

OCLC Presentations on Digital Curation and Web-scale Management Services

Below are streaming video OCLC presentations from ALA Annual 2009 on digital curation and Web-scale Management Services.

  • Integrating Technical Services and Preservation Workflows: "Mainstreaming Digital Resources. After an introduction from Geri Bunker Ingram of OCLC, Amy Rudersdorf (Director, Digital Information Management Program, The State Library of North Carolina) discusses integrating a whole host of systems into a digital curation workflow, including OCLC's Connexion tools, Digital Archive, WorldCat, Digital Collection Gateway and CONTENTdm."
  • OCLC Web-scale Management Services: "Presentation by Andrew Pace, OCLC Executive Director for Networked Library Services, ALA Annual 2009. Web-scale cooperative library management services, network-level tools for managing library collections through circulation and delivery, print and licensed acquisitions, and license management. These services complement existing OCLC Web-scale services, such as cataloging, resource sharing, and integrated discovery."

Digital Preservation: Repository of Authentic Digital Objects Source Code Released

The National Archive Institute of Portugal has released the Repository of Authentic Digital Objects source code.

RODA works in conjunction with the Fedora (Flexible Extensible Digital Object Repository Architecture) software.

Read more about it at "RODA—A Service-Oriented Repository to Preserve Authentic Digital Objects" and "Source Code Available from RODA 'Repository of Authentic Digital Objects'" (includes a QuickTime video about RODA).

California Digital Library's Web Archiving Service

The California Digital Library's Web Archiving Service's first collections are available at Web Archives: Yesterday's Web; Today's Archives.

Here's an excerpt from the press release:

Researchers and scholars now will be able to delve into archived Web sites captured by the California Digital Library's Web Archiving Service (WAS). This new tool enables faculty, researchers and librarians to capture, curate and preserve Web sites, thus creating permanent archives available to researchers everywhere. The social history of our times is now being preserved in archives as rich and varied as the contentious 2003 California recall election, hundreds of California state Web archives, the Guantanamo Bay Detention Camp Web archive and the Middle East Political Sites archive. New archives continually are being built and published and will appear along with the current archives, available at webarchives.cdlib.org/.

The Web has revolutionized our access to information. Documents and publications that once were difficult to find now are readily available to anyone at any time. Popular reactions to historical events unfold via blogs and personal Web sites, and we have an unprecedented view into popular culture and the formation of public policy. "This is a tool that can track censorship in China, political regimes in Iran, and social commentary around the world," states Laine Farley, California Digital Library's executive director. "CDL and the UC libraries are leading the way in building collections for the 21st century." . . .

CDL's Web Archiving Service is the result of a 4.5-year grant awarded by the Library of Congress National Digital Information and Infrastructure Preservation Program (NDIIPP). The program's mission is to develop a national strategy to collect, preserve and make available digital content, especially materials that are created only in digital formats, for current and future generations. Working with partners at the University of North Texas, New York University, Stanford University and the campuses of the University of California, the California Digital Library has built a service that is easy to use and allows librarians to begin preserving information that was slipping away. Martha Anderson, director of program management for NDIIPP at the Library of Congress, says, "There is a growing public interest in the archiving of public Web sites for future reference. The technical challenges of constantly changing sites and technologies and the enormity of the universe of potential content require immediate and focused action."

Proceedings of DigCCurr2009: Digital Curation: Practice, Promise, and Prospects

Helen R. Tibbo has published Proceedings of DigCCurr2009: Digital Curation: Practice, Promise, and Prospects on Lulu.

Here's the ad:

DigCCurr2009 was held on April 1-3, 2009 in Chapel Hill, North Carolina as part of the Preserving Access to Our Digital Future: Building an International Digital Curation Curriculum (DigCCurr) project. DigCCurr is a three-year (2006-2009), Institute of Museum and Library Services (IMLS)-funded project to develop a graduate-level curricular framework, course modules, and experiential components to prepare students for digital curation in various environments. Contributions to DigCCurr2009 take the form of long and short papers, posters and panels. Potential contributions were submitted for peer review by a rich and diverse panel of international experts. Reviewers evaluated the submissions based on clarity and organization of presentation and writing; originality, creativity and potential for new contributions to the field; and engagement (topics addressed would be appropriate for and engaging to the diverse audience of DigCCurr2009 participants).

DuraCloud to Test Cloud Technologies for Digital Preservation

DuraCloud will test cloud technologies for digital preservation purposes.

Here's an excerpt from the press release:

How long is long enough for our collective national digital heritage to be available and accessible? The Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) and DuraSpace have announced that they will launch a one-year pilot program to test the use of cloud technologies to enable perpetual access to digital content. The pilot will focus on a new cloud-based service, DuraCloud, developed and hosted by the DuraSpace organization. Among the NDIIPP partners participating in the DuraCloud pilot program are the New York Public Library and the Biodiversity Heritage Library.

Cloud technologies use remote computers to provide local services through the Internet. Duracloud will let an institution provide data storage and access without having to maintain its own dedicated technical infrastructure.

For NDIIPP partners, it is not enough to preserve digital materials without also having strategies in place to make that content accessible. NDIIPP is concerned with many types of digital content, including geospatial, audiovisual, images and text. The NDIIPP partners will focus on deploying access-oriented services that make it easier to share important cultural, historical and scientific materials with the world. To ensure perpetual access, valuable digital materials must be stored in a durable manner. DuraCloud will provide both storage and access services, including content replication and monitoring services that span multiple cloud-storage providers.

Martha Anderson, director of NDIIPP Program Management said "Broad online public access to significant scientific and cultural collections depends on providing the communities who are responsible for curating these materials with affordable access to preservation services. The NDIIPP DuraCloud pilot project with the DuraSpace organization is an opportunity to demonstrate affordable preservation and access solutions for communities of users who need this kind of help."

Digital Preservation: Presentations from 2009 NDIIPP Partners Meeting

Presentations from the 2009 NDIIPP Partners Meeting are now available.

Here's a quick selection:

Digital Preservation: Two-Year Pilot Project Evaluation

The Chesapeake Project has released its Two-Year Pilot Project Evaluation.

Here's an excerpt:

The Chesapeake Project began as a collaborative, two-year pilot program with the goal of preserving born-digital legal information published directly to the Web. It was implemented in early 2007 by the Georgetown Law Library and the State Law Libraries of Maryland and Virginia. Having successfully completed its pilot phase, The Chesapeake Project' legal information archive is now expanding.

The following document comprises the final evaluation and account of The Chesapeake Project's accomplishments during its two-year pilot phase, spanning from February 27, 2007, to February 28, 2009.

During this time, the project's digital archive was populated with more than 4,300 digital items representing nearly 1,900 Web-published titles, the vast majority of which have no print counterpart. Each of these titles were harvested from the Web, stored within a secure digital archive and assigned permanent archive URLs. Today, each archived digital title remains accessible to users, despite whether or not the original digital files have been altered or removed from their original locations on the Web.

A 2008 analysis of the digital archive's content showed that more than eight percent of the titles archived by The Chesapeake Project had disappeared from their original URLs within the project's first year, but remained accessible thanks to the project's efforts. The current evaluation demonstrates that this figure has increased significantly over the past year. In fact, as of March 2009, nearly 14 percent of the project's archived titles—approximately one in seven—have disappeared from their original locations on the Web.

Blog Reports about the National Digital Information Infrastructure Preservation Program Partners Meeting

Several blog reports are available about the recent National Digital Information Infrastructure Preservation Program partners meeting.

Library of Congress Releases Bagit: Transferring Content for Digital Preservation Video

The Library of Congress has released a digital video, Bagit: Transferring Content for Digital Preservation.

Here's the description:

The Library of Congress's steadily growing digital collections arrive primarily over the network rather than on hardware media. But that data transfer can be difficult because different organizations have different policies and technologies.

The Library—with the California Digital Library and Stanford University – has developed guidelines for creating and moving standardized digital containers, called "bags." A bag functions like a physical envelope that is used to send content through the mail but with bags, a user sends content from one computer to another.

Bags have a sparse, uncomplicated structure that transcends differences in institutional data, data architecture, formats and practices. A bag's minimal but essential metadata is machine readable, which makes it easy to automate ingest of the data. Bags can be sent over computer networks or physically moved using portable storage devices.

Bags have built-in inventory checking, to help ensure that content transferred intact. Bags are flexible and can work in many different settings, including situations where the content is located in more than one place. This video describes the preparation and transfer of data over the network in bags.

CoOL Moves to American Institute for Conservation

Conservation OnLine (CoOL) and the Conservation DistList are moving from the Stanford University Libraries to the American Institute for Conservation.

Here's an excerpt from the press release:

The American Institute for Conservation of Historic and Artistic Works (AIC) announced that they will now host Conservation OnLine (CoOL) after 22 years of its being hosted by Stanford University Libraries. CoOL is a web-based library of conservation information, covering a wide spectrum of topics of interest to those involved with the conservation of library, archives, and museum materials. It contains approximately 120,000 documents, including an online archive of the Journal of the American Institute for Conservation. It also includes the Conservation DistList, with 9,969 subscribers from at least 91 countries. CoOL serves as both an important resource for information and as a forum for conservation professionals all over the world.

AIC’s first priority is to make the DistList operational as soon as possible. Further announcements will be made as to the resumption of activity on the DistList and where other CoOL resources will be located in the future. We are continuing discussions with allied and affiliate organizations in order to make CoOL’s transition as seamless as possible.

Texas Conference on Digital Libraries 2009 Presentations

Presentations from the Texas Conference on Digital Libraries 2009 are now available.

Here's those by Texas Digital Library staff:

American Institute of Physics Will Use CLOCKSS Digital Archive

The American Institute of Physics will use the CLOCKSS (Controlled Lots of Copies Keep Stuff Safe) "dark" digital archive.

Here's an excerpt from the press release:

CLOCKSS will make AIP content freely available in the event that AIP is no longer able to provide access. . . .

The CLOCKSS initiative was created in response to the growing concern that digital content purchased by libraries may not always be available due to discontinuation of an electronic journal or because of a catastrophic event. CLOCKSS creates a secure, multi-site archive of web-published content that can be tapped into to provide ongoing access to researchers worldwide, free of charge.

"Today, when over one half of all our subscriptions are online only, we owe it to our customers more than ever to provide the best security possible for their electronic products," said Mark Cassar, AIP's Acting Publisher. "Our nearly three-year-old partnership with Portico, and now our participation in the CLOCKSS initiative, solidifies this commitment."

CLOCKSS' decentralized, geographically distributed preservation strategy ensures that the digital assets of the global research community will survive intact. Additionally, it satisfies the demand for locally situated archives with 15 archive nodes planned worldwide by 2010.

Curating Atmospheric Data for Long Term Use: Infrastructure and Preservation Issues for the Atmospheric Sciences Community

The Digital Curation Centre has released Curating Atmospheric Data for Long Term Use: Infrastructure and Preservation Issues for the Atmospheric Sciences Community, SCARP Case Study No. 2.

Here's an excerpt:

DCC SCARP aims to understand disciplinary approaches to data curation by substantial case studies based on an immersive approach. As part of the SCARP project we engaged with a number of archives, including the British Atmospheric Data Centre, the World Data Centre Archive at the Rutherford Appleton Laboratory and the European Incoherent Scatter Scientific Association (EISCAT). We developed a preservation analysis methodology which is discipline independent in application but none the less capable of identifying and drawing out discipline specific preservation requirements and issues. In this case study report we present the methodology along with its application to the Mesospheric Stratospheric Tropospheric (MST) radar dataset, which is currently supported by and accessed through the British Atmospheric Data Centre. We suggest strategies for the long term preservation of the MST data and make recommendations for the wider community.

Foundation Grants for Preservation in Libraries, Archives, and Museums, 2009 Edition

The Library of Congress and the Foundation Center have released Foundation Grants for Preservation in Libraries, Archives, and Museums, 2009 Edition.

Here's an excerpt from the announcement:

This publication lists 1,944 grants of $5,000 or more awarded by 488 foundations, from 2004 through the publication date of this guide. It covers grants to public, academic, research, school, and special libraries, and to archives and museums for activities related to conservation and preservation. This publication includes:

  • an introduction that explains the book's coverage, arrangement, entries, and how to research using the volume. Note: This PDF file contains hotlinks to free online tutorials that cover grant writing and provide an insight into the world of U.S. foundation giving offered by the Foundation Center, as well as to some other widely used non-profit guidance on preservation grants found on the Conservation Online web site.
  • a statistical analysis of grant funding in the area of preservation by foundation, recipient location, subject, recipient type (e.g., Library), grant size, and foundation generosity nationwide.
  • state-by-state descriptions of projects funded in preservation nationwide including the foundation's name, limitations on giving, recipient(s), size of grants, and purpose of the grant described. Note: This section is hot linked in the PDF version directly to more detailed descriptions of the foundations.
  • indexes by recipient, geographic area of the recipient, and subject. Note: If you do not find what you are looking for in the indices, use the find feature to search the text for your term.
  • a list of all foundations that have donated to preservation and conservation with their contact information and limitations on giving.

DPC What’s New in Digital Preservation, No. 20

DPC What's New in Digital Preservation number 20 has been published.

Here’s a description of the publication:

This is a summary of selected recent activity in the field of digital preservation compiled from a number of resources including the digital-preservation and padiforum-l mailing lists. Additional or related items of interest may also be included.

Keeping Research Data Safe 2: The Identification of Long-lived Digital Datasets for the Purposes of Cost Analysis: Project Plan

Charles Beagrie has released Keeping Research Data Safe 2: The Identification of Long-lived Digital Datasets for the Purposes of Cost Analysis: Project Plan.

Here's an excerpt from the project home page:

The Keeping Research Data Safe 2 project commenced on 31 March 2009 and will complete in December 2009. The project will identify and analyse sources of long-lived data and develop longitudinal data on associated preservation costs and benefits. We believe these outcomes will be critical to developing preservation costing tools and cost benefit analyses for justifying and sustaining major investments in repositories and data curation.

Digital Preservation: PARSE.Insight Project Reports on First Year Achievements

In "Annual Review Year 1: Goals and Achievements," The PARSE.Insight (Permanent Access to the Records of Science in Europe) Project reports on its first year achievements. This post includes links to a number of longer documents, including the PARSE.Insight Deliverable D2.1 Draft Roadmap.

Here's an excerpt from the PARSE.Insight Deliverable D2.1 Draft Roadmap.

The purpose of this document is to provide an overview and initial details of a number of specific components, both technical and non-technical, which would be needed to supplement existing and already planned infrastructures for science data. The infrastructure components presented here are aimed at bridging the gaps between islands of functionality, developed for particular purposes, often by other European projects, whether separated by discipline or time. Thus the infrastructure components are intended to play a general, unifying role in science data. While developed in the context of a European wide infrastructure, there would be great advantages for these types of infrastructure components to be available much more widely.

Safeguarding Collections at the Dawn of the 21st Century: Describing Roles & Measuring Contemporary Preservation Activities in ARL Libraries

The Association of Research Libraries has released Safeguarding Collections at the Dawn of the 21st Century: Describing Roles & Measuring Contemporary Preservation Activities in ARL Libraries.

Here's an excerpt from the press release:

The report is organized into three thematic sections:

  1. Reshaping the preservation functions in research libraries—Libraries must reconceptualize preservation as a core function that extends beyond activities within a preservation department. As preservation is advanced through a range of investments and partnerships, libraries are in the midst of reshaping priorities and reallocating resources to align with new services and conceptions of collections.

  2. The networked digital environment—ARL members need to expand their activities and deepen their practices related to preserving digital content though Web archiving, deployment of digital repositories, and efforts to preserve e-journals and other born digital content (whether purchased, licensed, or digitized by the library).

  3. Library collaborative strategies—Community-level activities are crucial, both to address the challenges presented by digital formats, but also to make traditional preservation activities more effective.