Report on Ingest Tools for Digital Repositories

Posted in Digital Curation & Digital Preservation, Digital Repositories, Institutional Repositories, Metadata, Open Access, Scholarly Communication on May 22nd, 2007

The Cairo Project has released Cairo Tools Survey: A Survey of Tools Applicable to the Preparation of Digital Archives for Ingest into a Preservation Repository. It has also released a related report, Cairo Use Cases: A Survey of User Scenarios Applicable to the Cairo Ingest Tool.

Here’s a description of the Cairo Project from its home page:

Cairo will develop a tool for ingesting complex collections of born-digital materials, with basic descriptive, preservation and relationship metadata, into a preservation repository. The project is based on needs identified by the JISC-funded Paradigm project and the Wellcome Library’s Digital Curation in Action project. It is a key building block in the partner institutions’ strategy to develop digital repository architectures which can support the development of digital collections over the long-term.

The REMAP Project: Record Management and Preservation in Digital Repositories

Posted in Digital Curation & Digital Preservation, Digital Repositories, Fedora, Institutional Repositories, Open Access, Scholarly Communication on May 20th, 2007

The REMAP Project at the University of Hull has been funded by JISC investigate how record management and digital preservation functions can be best supported in digital repositories. It utilizes the Fedora system.

Here’s an except from the Project Aims page (I have added the links in this excerpt):

The REMAP project has the following aims:

  • To develop Records Management and Digital Preservation (RMDP) workflow(s) in order to understand how a digital repository can support these activities
  • To embed digital repository interaction within working practices for RMDP purposes
  • To further develop the use of a WSBPEL orchestration tool to work with external Web services, including the PRONOM Web services, to provide appropriate metadata and file information for RMDP
  • To develop and test a notification layer that can interact with the orchestration tool and allow RSS
    syndication to individuals alerting them to RMDP tasks
  • To develop and test an intermediate persistence layer to underpin the notification layer and interact
    with the WSBPEL orchestration tool to allow orchestrated workflows to take place over time
  • To test and validate the use of the enhanced WSBPEL tool with institutional staff involved in RMDP activities

Position Papers from the NSF/JISC Repositories Workshop

Posted in Data Curation, Open Data, and Research Data Management, Digital Curation & Digital Preservation, Digital Humanities, Digital Libraries, Digital Repositories, Institutional Repositories, Open Access, Scholarly Communication on April 24th, 2007

Position papers from the NSF/JISC Repositories Workshop are now available.

Here’s an excerpt from the Workshop’s Welcome and Themes page:

Here is some background information. A series of recent studies and reports have highlighted the ever-growing importance for all academic fields of data and information in digital formats. Studies have looked at digital information in science and in the humanities; at the role of data in Cyberinfrastructure; at repositories for large-scale digital libraries; and at the challenges of archiving and preservation of digital information. The goal of this workshop is to unite these separate studies. The NSF and JISC share two principal objectives: to develop a road map for research over the next ten years and what to support in the near term.

Here are the position papers:

RLG DigiNews Changes

Posted in Digital Curation & Digital Preservation, E-Journals on April 20th, 2007

In the latest issue of RLG DigiNews, Jim Michalko and Lorcan Dempsey announce significant changes to this journal. RLG DigiNews has been a five-star journal that has been essential reading for digital library and preservation specialists. I’d encourage my readers to voice support for its continued excellence as indicated in the below excerpt from the article:

The issue in front of you is the last of RLG DigiNews in its current form. As RLG continues to shape its combination with OCLC and create the new Programs and Research division, we are rethinking the publication program that will support our new agenda while providing readers and authors with the kind of vehicle that supports the re-invention of cultural institutions in the research, teaching, and learning process. RLG DigiNews will be an important part of this program. Expect to see it back with a renewed editorial direction. There’s much to do and coordinate but we’ve committed both the talent and the resources to make this happen. Watch for your next RLG DigiNews no later than January, 2008.

Thank you for your support. Let those responsible know that you’re looking forward to the future.

Thursday’s OAI5 Presentations

Posted in Digital Curation & Digital Preservation, Digital Repositories, E-Journals, Institutional Repositories, Open Access, Publishing, Scholarly Communication on April 19th, 2007

Presentations from Thursday’s sessions of the 5th Workshop on Innovations in Scholarly Communication in Geneva are now available.

Here are a few highlights from this major conference:

  • Business Models for Digital Repositories (PowerPoint): "Those setting up, or planning to set up, a digital repository may be interested to know more about what has gone before them. What is involved, what is the cost, how many people are needed, how have others made the case to their institution, and how do you get anything into it once it is built? I have recently undertaken a study of European repository business models for the DRIVER project and will present an overview of the findings."
  • DRIVER: Building a Sustainable Infrastructure of European Scientific Repositories (PowerPoint): "Ten partners from eight countries have entered into an international partnership, to connect and network as a first step more than 50 physically distributed institutional repositories to one, large-scale, virtual Knowledge Base of European research."
  • On the Golden Road : Open Access Publishing in Particle Physics (RealVideo): "A working party works now to bring together funding agencies, laboratories and libraries into a single consortium, called SCOAP3 (Sponsoring Consortium for Open access Publishing in Particle Physics). This consortium will engage with publishers towards building a sustainable model for open access publishing. In this model, subscription fees from multiple institutions are replaced with contracts with publishers of open access journals where the SCOAP3 consortium is a single financial partner."
  • Open Access Forever—Or Five Years, Whichever Comes First: Progress on Preserving the Digital Scholarly Record (RealVideo): "The current state of the curation and preservation of digital scholarship over its entire lifecycle will be reviewed, and progress on problems of specific interest to scholarly communication will be examined. The difficulty of curating the digital scholarly record and preserving it for future generations has important implications for the movement to make that record more open and accessible to the world, so this a timely topic for those who are interested in the future of scholarly communication."

(You may want to download PowerPoint Viewer 2007 if you don’t have PowerPoint 2007).

OpenLOCKSS Project

Posted in Digital Curation & Digital Preservation, E-Journals, Open Access, Scholarly Communication on April 5th, 2007

Led by the University of Glasgow Library, the new JISC-funded OpenLOCKSS project will preserve selected UK open access publications.

Here’s an excerpt from the project proposal:

Although LOCKSS has initially concentrated on negotiations with society and commercial publishers, there has always been an interest in smaller open-access journals, as evidenced by the LOCKSS Humanities Project1, where twelve major US libraries have collaborated to contact more than fifty predominantly North American open access journal titles, enabling them to be preserved within the LOCKSS system. . . .

At present, much open access content is under threat, and is difficult to preserve for posterity under standard arrangements, at least until the British Library, and the other UK national libraries, are able to take a more proactive and comprehensive stance in preserving websites comprising UK output. Many open access journals are small operations, often dependent on one or two enthusiastic editors, often based in university departments and/or small societies, concerned with producing the next issue, and often with very little interest in or knowledge of preservation considerations. Their long term survival beyond the first few issues can often be in doubt, but their content, where appropriate quality controls have been applied, is worthy of preservation.

LOCKSS is an ideal low-cost mechanism for ensuring preservation, provided that appropriate contacts can be made and plug-in developments completed, and sufficient libraries agree to host content, on the Humanities Project model. . . .

Earlier in 2006, a survey was carried out by the LOCKSS Pilot Project, to discover preferences for commercial/society publishers to approach with a view to participating in LOCKSS, and Content Complete Ltd have been undertaking this work, as well as negotiating with the NESLi2 publishers on their LOCKSS participation. . . .

We propose to consider initially the titles with at least six votes (it may not be appropriate to approach all these titles, for example we shall check that all are currently publishing and confirm that they appear to be of appropriate quality), followed by those with five or four votes. We propose that agreements for LOCKSS participation are concluded with at least twelve titles, with fifteen as a likely upper limit.

Is It Time to Stop Printing Journals?

Posted in Digital Curation & Digital Preservation, E-Journals, Licenses, Publishing, Scholarly Communication on April 1st, 2007

There has been lively discussion about whether it is time to stop printing journals on Liblicense-l of late (March archive and April archive).

Here’s my take.

There are two aspects to this question: (1) Is the print journal format still required for reading purposes?; and (2) Is the print journal format still required to insure full access to journals given that many e-journals are under licenses (and are not owned by libraries) and digital preservation is still in its infancy?

It appears that the answer to (1) may finally be “no, for many users.” However, this may be contingent to some degree on the fact that many commercial e-journals are composed of article PDF files that allow users to print copies that replicate printed articles.

The answer to (2) is less clear, since continued access is contingent on periodic license negotiations and the changing business practices of publishers. Embargoes, ILL restrictions, incomplete back runs, and similar issues may give libraries pause. Very promising digital preservation efforts, such as LOCKSS and Portico still need to pass the test of time. Few libraries believe that publishers by themselves can be relied on to preserve e-journals (for one thing, publishers go out of business).

However, the reality for many libraries is that they have no choice but to dump print whenever possible for strictly economic reasons: print plus electronic is increasingly unaffordable for a variety of reasons.

Last Call for the International Digital Preservation Systems Survey

Posted in Digital Curation & Digital Preservation on March 29th, 2007

The Getty Research Institute is conducting an International Digital Preservation Systems Survey. It should yield interesting results, so help out by filling it out. March 30th is the last day.

Here’s a brief description from Karim Boughida and Sally Hubbard at the Getty:

This survey is intended to provide an overview of digital preservation system (DPS) implementation. DPS is defined here as an assembly of computer hardware, software and policies equivalent to a TDR (trusted digital repository) "whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now, and in the future"[1].

The survey was produced by the Getty Research Institute departments of Digital Resource Management and Library Information Systems, and will be distributed primarily among members of the Digital Library Federation (DLF). Results will be shared at the DLF Spring Forum, April 23-25, 2007 (Pasadena, California, USA), and with all respondents who provide contact information. . . .

[1] RLG. 2002. Trusted Digital Repositories: Attributes and Responsibilities. Mountain View, Calif.: RLG, Inc.

Library of Congress Digital Preservation Web Site Redesigned

Posted in Digital Curation & Digital Preservation on March 24th, 2007

The Library of Congress has rolled out a new version of its Digital Preservation Web site.

Here’s a excerpt from Digital Preservation News: March 2007:

As you may have already noticed, the Web site for the National Digital Information Infrastructure and Preservation Program (NDIIPP) has been completely redesigned, with new content sections and more user-friendly text. The goal of the redesign was to make the subject of digital preservation in general and NDIIPP in particular more accessible to a wider audience.

Mellon Grants to CLIR/DLF

Posted in Digital Curation & Digital Preservation, Digital Libraries, Scholarly Communication on March 20th, 2007

The Andrew W. Mellon Foundation has given grants to both the Council on Library and Information Resources and the Digital Library Federation.

Here’s an excerpt from the CLIR grant press release:

The Council on Library and Information Resources (CLIR) has received a three-year, $2.19 million grant from The Andrew W. Mellon Foundation to support general operations. The award will allow CLIR to launch a range of new initiatives in six program areas: cyberinfrastructure, preservation, the next scholar, the emerging library, leadership, and new models. . . .

The breadth of CLIR’s new agenda is represented in six interrelated program areas:

Cyberinfrastructure defines the base technologies of computation and communication, the software programs, and the data-curation and data-preservation programs needed to manage large-scale multimedia data sets, particularly those pertaining to the digital record of our cultural heritage;

Preservation explores sustainable strategies for preserving all media in a complex technological, policy, and economic environment;

The Next Scholar explores and assesses new methodologies, fields of inquiry, strategies for data gathering and collaboration, and modes of communication that are likely to define the next generation of scholars;

The Emerging Library explores and articulates the changing concept of the library with particular focus on its core functions and the consequences for staffing, research and teaching, and economic modeling;

Leadership investigates and defines the skills and expertise needed to administer, inspire, and inform the next generation; and

New Models extrapolates from an array of CLIR’s findings and other related research how academic organizations, institutions, behaviors, and culture may evolve over the coming decade.

Here’s an excerpt from the DLF grant press release:

The Digital Library Federation (DLF) has received an $816,000 grant from The Andrew W. Mellon Foundation for a project designed to make distributed digital collections easier for scholars to use. The project, DLF Aquifer Development for Interoperability Across Scholarly Repositories: American Social History Online, will implement schemas, data models, and technologies to enable scholars to use digital collections as one in a variety of local environments. . . .

The project will address the difficulty that humanities and social science scholars face in finding and using digital materials located in a variety of environments with a bewildering array of interfaces, access protocols, and usage requirements. DLF Aquifer seeks to provide scholars with consistent access to digital library collections pertaining to nineteenth- and twentieth-century U.S. social history across institutional boundaries. The collections are in a variety of formats and include maps and photographs from the Library of Congress historical collections; sheet music from the Sam DeVincent Collection of American Sheet Music at Indiana University; and an array of regional collections, such as Michigan County Histories from the University of Michigan and Tennessee Documentary History from the University of Tennessee, that will facilitate cross-regional studies when combined.

By integrating American Social History Online into a variety of local environments, the project will bring the library to the scholar and make distributed collections available through locally supported tools. The project will take two years to develop and implement, from April 2007 to March 2009.

PRESERV Project Report on Digital Preservation in Institutional Repositories

Posted in Digital Curation & Digital Preservation, Digital Repositories, Institutional Repositories, Open Access, Scholarly Communication on March 12th, 2007

The JISC PRESERV (Preservation Eprint Services) project has issued a report titled Laying the Foundations for Repository Preservation Services: Final Report from the PRESERV Project.

Here’s an excerpt from the Executive Summary:

The PRESERV project (2005-2007) investigated long-term preservation for institutional repositories (IRs), by identifying preservation services in conjunction with specialists, such as national libraries and archives, and building support for services into popular repository software, in this case EPrints. . . .

PRESERV was able to work with The National Archives, which has produced PRONOMDROID, the pre-eminent tool for file format identification. Instead of linking PRONOM to individual repositories, we linked it to the widely used Registry of Open Access Repositories (ROAR), through an OAI harvesting service. As a result format profiles can be found for over 200 repositories listed in ROAR, what we call the PRONOM-ROAR service. . . .

The lubricant to ease the movement of data between the components of the services model is metadata, notably preservation metadata, which informs, describes and records a range of activities concerned with preserving specific digital objects. PRESERV identified a rich set of preservation metadata, based on the current standard in this area, PREMIS, and where this metadata could be generated in our model. . . .

The most important changes to EPrints software as a result of the project were the addition of a history module to record changes to an object and actions performed on an object, and application programs to package and disseminate data for delivery to an external service using either the Metadata Encoding and Transmission Standard (METS) or the MPEG-21 Part 2: Digital Item Declaration Language (DIDL). One change to the EPrints deposit interface is the option for authors to select a licence indicating rights for allowable use by service providers or users, and others. . . .

PRESERV has identified a powerful and flexible framework in which a wide range of preservation services from many providers can potentially be intermediated to many repositories by other types of repository services. It is proposed to develop and test this framework in the next phase of the project.

A Long Road Ahead for Digitization

Posted in Digital Curation & Digital Preservation, Digitization, Scholarly Communication on March 11th, 2007

The New York Times published an article today ("History, Digitized (and Abridged)") that examines the progress that has been made in digitization in the US. It doesn’t hold many surprises for those in the know, but it might be useful in orienting non-specialists to some of the challenges involved, especially those who think that everything is online on the Internet.

It also has some interesting tidbits, including a chart that shows the holdings of different types of materials in the National Archives and how many items have been digitized for each type.

It has some current cost data from the Library of Congress quoted below:

At the Library of Congress, for example, despite continuing and ambitious digitization efforts, perhaps only 10 percent of the 132 million objects held will be digitized in the foreseeable future. For one thing, costs are prohibitive. Scanning alone on smaller items ranges from $6 to $9 for a 35-millimeter slide, to $7 to $11 a page for presidential papers, to $12 to $25 for poster-size pieces.

It also discusses the copyright laws that apply to sound materials and their impact on digitization efforts:

When it comes to sound recordings, copyright law can introduce additional complications. Recordings made before 1972 are protected under state rather than federal laws, and under a provision of the 1976 Copyright Act, may be entitled to protection under state law until 2067. Also, an additional copyright restriction often applies to the underlying musical composition.

A study published in 2005 by the Library of Congress and the Council on Library and Information Resources found that some 84 percent of historical sound recordings spanning jazz, blues, gospel, country and classical music in the United States, and made from 1890 to 1964, have become virtually inaccessible.

An interesting, well-written article that’s worth a read.

Source: Hafner, Katie. "History, Digitized (and Abridged)." The New York Times, 11 March 2007, BU YT 1, 8-9.

