Archive for the 'Digital Curation & Digital Preservation' Category

A Long Road Ahead for Digitization

Posted in Digital Curation & Digital Preservation, Digitization, Scholarly Communication on March 11th, 2007

The New York Times published an article today ("History, Digitized (and Abridged)") that examines the progress that has been made in digitization in the US. It doesn’t hold many surprises for those in the know, but it might be useful in orienting non-specialists to some of the challenges involved, especially those who think that everything is online on the Internet.

It also has some interesting tidbits, including a chart that shows the holdings of different types of materials in the National Archives and how many items have been digitized for each type.

It has some current cost data from the Library of Congress quoted below:

At the Library of Congress, for example, despite continuing and ambitious digitization efforts, perhaps only 10 percent of the 132 million objects held will be digitized in the foreseeable future. For one thing, costs are prohibitive. Scanning alone on smaller items ranges from $6 to $9 for a 35-millimeter slide, to $7 to $11 a page for presidential papers, to $12 to $25 for poster-size pieces.

It also discusses the copyright laws that apply to sound materials and their impact on digitization efforts:

When it comes to sound recordings, copyright law can introduce additional complications. Recordings made before 1972 are protected under state rather than federal laws, and under a provision of the 1976 Copyright Act, may be entitled to protection under state law until 2067. Also, an additional copyright restriction often applies to the underlying musical composition.

A study published in 2005 by the Library of Congress and the Council on Library and Information Resources found that some 84 percent of historical sound recordings spanning jazz, blues, gospel, country and classical music in the United States, and made from 1890 to 1964, have become virtually inaccessible.

An interesting, well-written article that’s worth a read.

Source: Hafner, Katie. "History, Digitized (and Abridged)." The New York Times, 11 March 2007, BU YT 1, 8-9.

Trustworthy Repositories Audit & Certification: Criteria and Checklist Published

Posted in Digital Curation & Digital Preservation, Digital Repositories, Institutional Repositories, Scholarly Communication on March 9th, 2007

The Center for Research Libraries and RLG Programs have published the Trustworthy Repositories Audit & Certification: Criteria and Checklist.

Here’s an excerpt from the press release:

In 2003, RLG and the US National Archives and Records Administration created a joint task force to address digital repository certification. The goal of the RLG-NARA Task Force on Digital Repository Certification was to develop criteria to identify digital repositories capable of reliably storing, migrating, and providing access to digital collections. With partial funding from the NARA Electronic Records Archives Program, the international task force produced a set of certification criteria applicable to a range of digital repositories and archives, from academic institutional preservation repositories to large data archives and from national libraries to third-party digital archiving services. . . . .

In 2005, the Andrew W. Mellon Foundation awarded funding to the Center for Research Libraries to further establish the documentation requirements, delineate a process for certification, and establish appropriate methodologies for determining the soundness and sustainability of digital repositories. Under this effort, Robin Dale (RLG Programs) and Bernard F. Reilly (President, Center for Research Libraries) created an audit methodology based largely on the checklist, tested it on several major digital repositories, including the E-Depot at the Koninklijke Bibliotheek in the Netherlands, the Inter-University Consortium for Political and Social Research, and Portico.

Findings and methodologies were shared with those of related working groups in Europe who applied the draft checklist in their own domains: the Digital Curation Center (U.K.), DigitalPreservationEurope (Continental Europe) and NESTOR (Germany). The report incorporates the sum of knowledge and experience, new ideas, techniques, and tools that resulted from cross-fertilization between the U.S. and European efforts. It also includes a discussion of audit and certification criteria and how they can be considered from an organizational perspective.

New Digital Preservation Mailing List

Posted in Digital Curation & Digital Preservation on March 9th, 2007

The Preservation and Reformatting Section of ALCTS has started a new mailing list about digital preservation called DIGIPRES.

JISC Report Evaluates CLIR/ARL E-Journal Archiving Report

Posted in Digital Curation & Digital Preservation, E-Journals, Scholarly Communication on March 8th, 2007

JISC has just released an evaluation of the 2006 CLIR/ARL report e-Journal Archiving Metes and Bounds: A Survey of the Landscape. The new report, which is by Maggie Jones, is titled Review and Analysis of the CLIR Report E-Journal Archiving Metes and Bounds: A Survey of the Landscape.

Here is an excerpt from the Executive Summary:

Although both legal deposit legislation and institutional repositories are important developments, neither of them can reasonably be expected to provide practical solutions for libraries licensing access to e-journals. In the UK, the archiving clauses in the NESLI licence have provided a measure of security for libraries but in the absence of trusted repositories charged with managing e-journals, these have provided largely theoretical assurance.

There is a pressing requirement for trusted repositories focussed on archiving and preserving e-journals, which are independent of publishers, and which offer services which can safeguard content while sharing costs between libraries and publishers equitably. While the concerns of libraries are much the same as they were when the JISC consultancy on e-journals archiving reported in 2003, there are now a clearer set of options emerging. Over the past few years, a number of promising initiatives have been developed which provide much better prospects for continued access to licensed e-journal content and which offer cost-effective services for libraries and publishers. Twelve of these trusted repositories have been profiled in a recent CLIR survey. Many of them, including Portico, Pub Med Central, CLOCKSS, and LOCKSS are already familiar in the UK.

Despite a rapidly changing landscape, there is nevertheless a powerful momentum, as evidenced in the rapid take-up of two of the services, LOCKSS and Portico. It is also now possible to articulate a set of principles for archiving services, based on practical reality, which can guide decision-making. The CLIR survey provides a valuable catalyst which the forthcoming BL/DPC/JISC E-Journal Archiving and Preservation workshop (27th March 2007) and other mechanisms have the opportunity to take a significant step forward in this crucial area.

Nestor Project Will Continue until 2009

Posted in Digital Curation & Digital Preservation, Scholarly Communication on February 15th, 2007

The Nestor (Network of Expertise in Long-Term Storage of Digital Resources) Project will continue operations until 2009.

Here is a brief description of the project from its home page:

The project’s objective is to create a network of expertise in long-term storage of digital resources for Germany. As the perspective of current and future archive users is central to the project, the emphasis is put on long-term accessibility. Within the project the following offers will be created: a web-based information forum, a platform for information and communication, criteria for trusted digital repositories, recommendations for certification procedures of digital repositories, recommendations for collecting guidelines and selection criteria of digital resources to be archived, guidelines and policies, the concept for a permanent organisation form of the network of expertise in digital preservation. The long-term goal is a permanent distributed infrastructure for long-term preservation and long-term accessability of digital resources in Germany comparable e.g. to the Digital Preservation Coalition in the UK.

Two new working groups have been established for phase two of the project: Standards for Metadata, Transfer of Objects to Digital Repositories and Object Access Working Group and Interlinking of eScience and Long-Term Preservation Working Group.

An English version of Nestor’s Criteria Catalogue for Trusted Repositories is now available.

Senate Poised to Slash NDIIPP Funding

Posted in Digital Curation & Digital Preservation, Scholarly Communication on February 12th, 2007

The Disruptive Library Technology Jester and Free Range Librarian blogs have sounded a warning that $47 million of unobligated current-year funding for the National Digital Information Infrastructure and Preservation Program is in serious danger of being rescinded.

House Joint Resolution 20 has been passed in the House and is now being considered by the Senate.

The NDIIPP 2005 Annual Review provides a detailed look at the work of this important Library of Congress program.

See Murray’s Jester posting for the cutback details and check out his protest letter to Ohio’s Senators.

International Journal of Digital Curation Launched

Posted in Digital Curation & Digital Preservation, E-Journals on November 25th, 2006

The Digital Curation Centre has launched the International Journal of Digital Curation, which will be published twice a year in digital form (articles are PDF files). It is edited by Richard Waller, who also edits Ariadne. It is published by UKOLN at the University of Bath, using Open Journal Systems.

The journal is freely available. Although individual articles in the first volume do not have copyright statements, the Submissions page on the journal Web site has the following copyright statement:

Copyright for articles published in this journal is retained by the authors, with first publication rights granted to the University of Bath. By virtue of their appearance in this open access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.

The first issue includes "Digital Curation, Copyright, and Academic Research"; "Digital Curation for Science, Digital Libraries, and Individuals"; "Scientific Publication Packages—A Selective Approach to the Communication and Archival of Scientific Output"; and other articles.

Digital Preservation via Emulation at Koninklijke Bibliotheek

Posted in Digital Curation & Digital Preservation, Emerging Technologies on November 21st, 2006

In a two-year (2005-2007) joint project with Nationaal Archief of the Netherlands, Koninklijke Bibliotheek is developing an emulation system that will allow digital objects in outmoded formats to be utilized in their original form. Regarding the emulation approach, the Koninklijke Bibliotheek says:

Emulation is difficult, the main reason why it is not applied on a large scale. Developing an emulator is complex and time-consuming, especially because the emulated environment must appear authentic en must function accurately as well. When future users are interested in the contents of a file, migration remains the better option. When it is the authentic look and feel and functionality of a file they are after, emulation is worth the effort. This can be the case for PDF documents or websites. For multimedia applications, emulation is in fact the only suitable permanent access strategy.

J. R. van der en Wijngaarden Hoeven’s paper "Modular Emulation as a Long-Term Preservation Strategy for Digital Objects" provides a overview of the emulation approach.

In a related development, a message to padiforum-l on 11/17/06 by Remco Verdegem of the Nationaal Archief of the Netherlands reported on a recent Emulation Expert Meeting, which issued a statement noting the following advantages of emulation for digital preservation purposes:

  • It preserves and permits access to each digital artifact in its original form and format; it may be the only viable approach to preserving digital artifacts that have significant executable and/or interactive behavior.
  • It can preserve digital artifacts of any form or format by saving the original software environments that were used to render those artifacts. A single emulator can preserve artifacts in a vast range of arbitrary formats without the need to understand those formats, and it can preserve huge corpuses without ever requiring conversion or any other processing of individual artifacts.
  • It enables the future generation of surrogate versions of digital artifacts directly from their original forms, thereby avoiding the cumulative corruption that would result from generating each such future surrogate from the previous one.
  • If all emulators are written to run on a stable, thoroughly-specified "emulation virtual machine" (EVM) platform and that virtual machine can be implemented on any future computer, then all emulators can be run indefinitely.

What’s in Your Digital Asset Catastrophe Plan?

Posted in Digital Curation & Digital Preservation on September 30th, 2005

Anything? You likely have a disaster plan that addresses digital asset issues. The potential problem with a disaster plan is that it can be grounded in assumptions of relative normalcy: the building burns down, a tornado hits, a lower-category hurricane strikes. It may assume severe damage within a confined area and an unimpaired ability of federal, state, and local agencies (as well as relief organizations) to respond. It may assume that workers are not at the disaster site, that they are relatively unaffected if they are, or that they can evacuate and return with relative ease and speed. It may assume that your offsite tape storage or "hot" backup site is far enough away to be unaffected.

What it probably doesn’t assume is the complete devastation of your city or town; widespread Internet, phone, power, and water outages that could last weeks or months; improbable multiple disasters across a wide region surrounding you; the inability of officials at all levels of government to adequately respond to a quickly deepening crisis; the lack of truly workable evacuation plans; depleted gas supplies for a hundred miles in all directions; your evacuated workers being scattered across a multiple-state area in motels, hotels, and the houses of friends and relatives after trips or 20 to 30 hours in massive traffic jams; your institution’s administration being relocated to a hotel in another city; evacuees ending up in new disaster zones and needing to evacuate yet again; and the possibility of more local post-catastrophe catastrophes in short order.

Here’s some thoughts. You may need to have your backups and hot sites in a part of the country that is unlikely to be experiencing a simultaneous catastrophe. This will not be reliable or convenient if physical data transportation is involved. Your latest data could end up in a delivery service depot in your city or town when the event happens. Even if this doesn’t occur, how frequently will you ship out those updates? Daily? Weekly? Another frequency?

Obviously, a remote hot site is better than just backups. But, if hot sites were cheap, we’d all have them.

In terms of backups, how software/hardware-specific are your systems? Will you have to rebuild a complex hardware/software environment to create a live system? Will the components that you need be readily available? Will you have the means to acquire, house, and implement them?

Lots of copies do keep stuff safe, but there have to be lots of copies. Here are two key issues: copyright and will (no doubt there are many more).

You may have a treasure trove of locally produced digital materials, but, if they are under normal copyright arrangements, no one can replicate them. It took considerable resources to create your digital materials. It’s a natural tendency to want to protect them so that they are accessible, but still yours alone. The question to ask yourself is what do I want to prevent users from doing, now and in the future, with these materials? The Creative Commons licences offer options that bar commercial and derivative use, but still provide the freedom to replicate licensed data. True, if you allow replication, you will not really be able to have unified use statistics, but, in the final analysis, what’s more important statistics or digital asset survival? If you allow derivative works, you may find others add value to your work in surprising and novel ways that benefit your users.

However, merely making your digital assets available doesn’t mean that anyone will go to the trouble of replicating or enhancing them. That requires will on the part of others, and they are busy with their own projects. Moreover, they assume that your digital materials will remain available, not disappear forever in the blink of an eye.

It strikes me that digital asset catastrophe planning may call for cooperative effort by libraries, IT centers, and other data-intensive nonprofit organizations. Perhaps by working jointly economic and logistical barriers can be overcome and cost-effective solutions can emerge.

FCLA Digital Archive

Posted in Digital Curation & Digital Preservation, Digital Libraries on May 17th, 2005

Since 2003, the Florida Center for Library Automation (FCLA) has been creating an IMLS-grant-funded Digital Archive (DA) to serve Florida’s public universities. The DA project’s goals are to: "1) establish a working and trusted digital archive, 2) identify costs involved in all aspects of archiving, and 3) disseminate tools, procedures and results for the widest national impact."

The DA will "accept submission packages from participating partners, ingest digital documents along with the appropriate metadata, and safely store on-site and off-site copies of the files."

The DA is a "dark" archive:

Our original idea, and the one thing that did not change over time, was the idea of
building a dark archive. By “dark” I mean an archive with no real-time, online access to
the content by anyone except repository staff. Dark archives are out of favor right now
but we had some good reasons for it. We serve ten state universities and each of them
has its own presentation systems and some have their own institutional repository
systems. Some of the libraries use FCLA delivery applications but some use their own
applications. Central Florida uses CONTENTdm, South Florida uses SiteSearch and
Florida State uses DigiTool. At FCLA we donÂ’t have the software to replicate these
access functions and we donÂ’t have any desire to; it would cost a great deal to acquire the
software licenses, and it would take a huge amount of staff to support all these
applications. So the idea of our offering presentation services on top of our stored
repository content wasnÂ’t feasible.

Real-life digital preservation efforts are always worth keeping an eye on, and this one is quite ambitious. You can track their progress through their grant page and their publications and presentations page.

The project’s most recent presentation by Priscilla Caplan ("Preservation Rumination: Digital Preservation and the Unfamiliar Future ") is available from OCLC in both PowerPoint and MP3 formats.



Digital Scholarship

Copyright © 2005-2020 by Charles W. Bailey, Jr.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International license.