International Journal of Digital Curation Launched

The Digital Curation Centre has launched the International Journal of Digital Curation, which will be published twice a year in digital form (articles are PDF files). It is edited by Richard Waller, who also edits Ariadne. It is published by UKOLN at the University of Bath, using Open Journal Systems.

The journal is freely available. Although individual articles in the first volume do not have copyright statements, the Submissions page on the journal Web site has the following copyright statement:

Copyright for articles published in this journal is retained by the authors, with first publication rights granted to the University of Bath. By virtue of their appearance in this open access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.

The first issue includes "Digital Curation, Copyright, and Academic Research"; "Digital Curation for Science, Digital Libraries, and Individuals"; "Scientific Publication Packages—A Selective Approach to the Communication and Archival of Scientific Output"; and other articles.

Digital Preservation via Emulation at Koninklijke Bibliotheek

In a two-year (2005-2007) joint project with Nationaal Archief of the Netherlands, Koninklijke Bibliotheek is developing an emulation system that will allow digital objects in outmoded formats to be utilized in their original form. Regarding the emulation approach, the Koninklijke Bibliotheek says:

Emulation is difficult, the main reason why it is not applied on a large scale. Developing an emulator is complex and time-consuming, especially because the emulated environment must appear authentic en must function accurately as well. When future users are interested in the contents of a file, migration remains the better option. When it is the authentic look and feel and functionality of a file they are after, emulation is worth the effort. This can be the case for PDF documents or websites. For multimedia applications, emulation is in fact the only suitable permanent access strategy.

J. R. van der en Wijngaarden Hoeven’s paper "Modular Emulation as a Long-Term Preservation Strategy for Digital Objects" provides a overview of the emulation approach.

In a related development, a message to padiforum-l on 11/17/06 by Remco Verdegem of the Nationaal Archief of the Netherlands reported on a recent Emulation Expert Meeting, which issued a statement noting the following advantages of emulation for digital preservation purposes:

  • It preserves and permits access to each digital artifact in its original form and format; it may be the only viable approach to preserving digital artifacts that have significant executable and/or interactive behavior.
  • It can preserve digital artifacts of any form or format by saving the original software environments that were used to render those artifacts. A single emulator can preserve artifacts in a vast range of arbitrary formats without the need to understand those formats, and it can preserve huge corpuses without ever requiring conversion or any other processing of individual artifacts.
  • It enables the future generation of surrogate versions of digital artifacts directly from their original forms, thereby avoiding the cumulative corruption that would result from generating each such future surrogate from the previous one.
  • If all emulators are written to run on a stable, thoroughly-specified "emulation virtual machine" (EVM) platform and that virtual machine can be implemented on any future computer, then all emulators can be run indefinitely.

What’s in Your Digital Asset Catastrophe Plan?

Anything? You likely have a disaster plan that addresses digital asset issues. The potential problem with a disaster plan is that it can be grounded in assumptions of relative normalcy: the building burns down, a tornado hits, a lower-category hurricane strikes. It may assume severe damage within a confined area and an unimpaired ability of federal, state, and local agencies (as well as relief organizations) to respond. It may assume that workers are not at the disaster site, that they are relatively unaffected if they are, or that they can evacuate and return with relative ease and speed. It may assume that your offsite tape storage or "hot" backup site is far enough away to be unaffected.

What it probably doesn’t assume is the complete devastation of your city or town; widespread Internet, phone, power, and water outages that could last weeks or months; improbable multiple disasters across a wide region surrounding you; the inability of officials at all levels of government to adequately respond to a quickly deepening crisis; the lack of truly workable evacuation plans; depleted gas supplies for a hundred miles in all directions; your evacuated workers being scattered across a multiple-state area in motels, hotels, and the houses of friends and relatives after trips or 20 to 30 hours in massive traffic jams; your institution’s administration being relocated to a hotel in another city; evacuees ending up in new disaster zones and needing to evacuate yet again; and the possibility of more local post-catastrophe catastrophes in short order.

Here’s some thoughts. You may need to have your backups and hot sites in a part of the country that is unlikely to be experiencing a simultaneous catastrophe. This will not be reliable or convenient if physical data transportation is involved. Your latest data could end up in a delivery service depot in your city or town when the event happens. Even if this doesn’t occur, how frequently will you ship out those updates? Daily? Weekly? Another frequency?

Obviously, a remote hot site is better than just backups. But, if hot sites were cheap, we’d all have them.

In terms of backups, how software/hardware-specific are your systems? Will you have to rebuild a complex hardware/software environment to create a live system? Will the components that you need be readily available? Will you have the means to acquire, house, and implement them?

Lots of copies do keep stuff safe, but there have to be lots of copies. Here are two key issues: copyright and will (no doubt there are many more).

You may have a treasure trove of locally produced digital materials, but, if they are under normal copyright arrangements, no one can replicate them. It took considerable resources to create your digital materials. It’s a natural tendency to want to protect them so that they are accessible, but still yours alone. The question to ask yourself is what do I want to prevent users from doing, now and in the future, with these materials? The Creative Commons licences offer options that bar commercial and derivative use, but still provide the freedom to replicate licensed data. True, if you allow replication, you will not really be able to have unified use statistics, but, in the final analysis, what’s more important statistics or digital asset survival? If you allow derivative works, you may find others add value to your work in surprising and novel ways that benefit your users.

However, merely making your digital assets available doesn’t mean that anyone will go to the trouble of replicating or enhancing them. That requires will on the part of others, and they are busy with their own projects. Moreover, they assume that your digital materials will remain available, not disappear forever in the blink of an eye.

It strikes me that digital asset catastrophe planning may call for cooperative effort by libraries, IT centers, and other data-intensive nonprofit organizations. Perhaps by working jointly economic and logistical barriers can be overcome and cost-effective solutions can emerge.

FCLA Digital Archive

Since 2003, the Florida Center for Library Automation (FCLA) has been creating an IMLS-grant-funded Digital Archive (DA) to serve Florida’s public universities. The DA project’s goals are to: "1) establish a working and trusted digital archive, 2) identify costs involved in all aspects of archiving, and 3) disseminate tools, procedures and results for the widest national impact."

The DA will "accept submission packages from participating partners, ingest digital documents along with the appropriate metadata, and safely store on-site and off-site copies of the files."

The DA is a "dark" archive:

Our original idea, and the one thing that did not change over time, was the idea of
building a dark archive. By “dark” I mean an archive with no real-time, online access to
the content by anyone except repository staff. Dark archives are out of favor right now
but we had some good reasons for it. We serve ten state universities and each of them
has its own presentation systems and some have their own institutional repository
systems. Some of the libraries use FCLA delivery applications but some use their own
applications. Central Florida uses CONTENTdm, South Florida uses SiteSearch and
Florida State uses DigiTool. At FCLA we donÂ’t have the software to replicate these
access functions and we donÂ’t have any desire to; it would cost a great deal to acquire the
software licenses, and it would take a huge amount of staff to support all these
applications. So the idea of our offering presentation services on top of our stored
repository content wasnÂ’t feasible.

Real-life digital preservation efforts are always worth keeping an eye on, and this one is quite ambitious. You can track their progress through their grant page and their publications and presentations page.

The project’s most recent presentation by Priscilla Caplan ("Preservation Rumination: Digital Preservation and the Unfamiliar Future ") is available from OCLC in both PowerPoint and MP3 formats.