Keeping Research Data Safe 2: The Identification of Long-lived Digital Datasets for the Purposes of Cost Analysis: Project Plan

Charles Beagrie has released Keeping Research Data Safe 2: The Identification of Long-lived Digital Datasets for the Purposes of Cost Analysis: Project Plan.

Here's an excerpt from the project home page:

The Keeping Research Data Safe 2 project commenced on 31 March 2009 and will complete in December 2009. The project will identify and analyse sources of long-lived data and develop longitudinal data on associated preservation costs and benefits. We believe these outcomes will be critical to developing preservation costing tools and cost benefit analyses for justifying and sustaining major investments in repositories and data curation.

Digital Preservation: PARSE.Insight Project Reports on First Year Achievements

In "Annual Review Year 1: Goals and Achievements," The PARSE.Insight (Permanent Access to the Records of Science in Europe) Project reports on its first year achievements. This post includes links to a number of longer documents, including the PARSE.Insight Deliverable D2.1 Draft Roadmap.

Here's an excerpt from the PARSE.Insight Deliverable D2.1 Draft Roadmap.

The purpose of this document is to provide an overview and initial details of a number of specific components, both technical and non-technical, which would be needed to supplement existing and already planned infrastructures for science data. The infrastructure components presented here are aimed at bridging the gaps between islands of functionality, developed for particular purposes, often by other European projects, whether separated by discipline or time. Thus the infrastructure components are intended to play a general, unifying role in science data. While developed in the context of a European wide infrastructure, there would be great advantages for these types of infrastructure components to be available much more widely.

Safeguarding Collections at the Dawn of the 21st Century: Describing Roles & Measuring Contemporary Preservation Activities in ARL Libraries

The Association of Research Libraries has released Safeguarding Collections at the Dawn of the 21st Century: Describing Roles & Measuring Contemporary Preservation Activities in ARL Libraries.

Here's an excerpt from the press release:

The report is organized into three thematic sections:

  1. Reshaping the preservation functions in research libraries—Libraries must reconceptualize preservation as a core function that extends beyond activities within a preservation department. As preservation is advanced through a range of investments and partnerships, libraries are in the midst of reshaping priorities and reallocating resources to align with new services and conceptions of collections.

  2. The networked digital environment—ARL members need to expand their activities and deepen their practices related to preserving digital content though Web archiving, deployment of digital repositories, and efforts to preserve e-journals and other born digital content (whether purchased, licensed, or digitized by the library).

  3. Library collaborative strategies—Community-level activities are crucial, both to address the challenges presented by digital formats, but also to make traditional preservation activities more effective.

Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use

The Office of Digital Humanities in the National Endowment for the Humanities has released the final version of Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use.

Here's an excerpt from the announcement:

This project is about developing archival tools and best practices for preserving born-digital documents produced by contemporary authors. Traditionally, humanists have found great scholarly value in studying the papers, correspondence, and first drafts of authors, politicians, and other historical figures. In this white paper, the project director make note that contemporary figures compose almost all of their materials on a computer. What challenges will this present to humanists, archivists, and librarians in the future? This very readable paper explores many of these issues with specific case studies involving a number of leading libraries and archives.

Reference Model for an Open Archival Information System (OAIS) Draft for Review

A near-final draft of the Reference Model for an Open Archival Information System (OAIS) has been made available for error-checking review.

Here's an excerpt:

This document is a technical Recommendation for use in developing a broader consensus on what is required for an archive to provide permanent, or indefinite long-term, preservation of digital information.

This Recommendation establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It allows existing and future archives to be more meaningfully compared and contrasted. It provides a basis for further standardization within an archival context and it should promote greater vendor awareness of, and support of, archival requirements.

DigitalKoans

JISC Project: Lifespan Initiative for the Research and Data Archive Repository

JISC's Lifespan Initiative for the Research and Data Archive Repository project started on 4/1/09.

Here's an excerpt from the project Web page:

The Lifespan Collection (www.lifespancollection.org.uk) represents an existing and unique research data set, which includes around 3,400 hours of audio-taped interviews, scorings and quantitative computerised data, capturing the lifetime experience of over 500 individuals. The outcomes of this project will be presented in terms of both a report on the processes and best-practice solutions for preserving and digitalising the data, including the creation of processes of submission of, and accessibility to, current and future critical datasets that ensure compliance with data security, copyright legislation, licensing, and associated audit functions. One or more detailed case studies will be produced that will not only inform the future development of this project but will act as illustrative examples for use by other similar start-up projects. This will lay the ground work for an exemplar implementation of the tools and solutions already delivered by JISC and other institutions.

DigitalKoans

Presentations from SCARP Workshop: Building and Curating Online Video Corpora

Presentations from the SCARP Workshop: Building and Curating Online Video Corpora are now available.

Here's an excerpt from the announcement:

This was a meeting of researchers and stakeholders in data service provision to discuss curation issues raised in our SCARP case study on the roles and re-usability of video data in social studies of interaction. This event aimed to raise mutual awareness of research communities' practices and needs for archiving, sharing and re-using digital video data; and identify how local and national research data services may contribute to the infrastructure for video data curation.

Infrastructure Planning and Data Curation: A Comparative Study of International Approaches to Enabling the Sharing of Research Data

JISC has released Infrastructure Planning and Data Curation: A Comparative Study of International Approaches to Enabling the Sharing of Research Data.

Here's an excerpt from the announcement:

The current methods of storing research data are as diverse as the disciplines that generate them and are necessarily driven by the myriad ways in which researchers need to subsequently access and exploit the information they contain. Institutional repositories, data centres and all other methods of storing data have to exist within an infrastructure that enables researchers to access ad exploit the data, and variant models for this infrastructure can be conceptualised. Discussion of effective infrastructures for curating data is taking place a all levels, wherever research is reliant on the longterm stewardship of digital material. JISC has commissioned this study to survey the different national agendas that are addressing variant infrastructure models, to inform developments within the UK and for facilitating an internationally integrated approach to data curation.

The study of data sharing initiatives in the OECD countries confirmed the traditional perception that the policy instruments are clustered more in the upper end of the stakeholder taxonomy – i.e. at the level of national and research funding organisations whereas the services and practical tools are being developed by organisations at the lower end of the taxonomy. Despite the differences that exist between countries in terms of the models used for research funding, as well as the levels at which decisions are taken, there is agreement on the expected strata of responsibility for applying the instruments of data sharing. This supports the structure of stakeholder taxonomy used in the study.

The Internet Archive’s Wayback Machine Rebooted

The Internet Archive's Wayback Machine is now running on a Sun Modular Datacenter.

Here's an excerpt from the "Wayback Machine Comes to Life in New Home":

The Wayback Machine is a 150 billion page web archive with a front end to serve it through the archive.org website.

Today the new machine came to life, so if you using the service, you are using a 20' by 8' by 8' "machine" that sits in Santa Clara, courtesy of Sun Microcomputer. It serves about 500 queries per second from the approximately 4.5 Petabytes (4.5 million gigabytes) of archived web data. We think of the cluster of computers and the Modular Datacenter as a single machine because it acts like one and looks like one. If that is true, then it might be one of the largest current computers.

Read more about Sun and the Internet Archive at "The Internet in a Box."

Historians’ Work Disrupted When Paper of Record Digital Archive Vanishes after Google Purchase

After Google purchased the Paper of Record digital archive, it brought the site down, upsetting historians that relied on the collection of older newspapers. Although the site will be temporarily restored with Google's permission, the incident raises issues about the permanence and reliability of scholarly digital archives.

Read more about it at "Digital Archives That Disappear" and "'Paper of Record' Disappears, Leaving Historians in the Lurch."

Talis Interview with Peter Brantley, Director of the Internet Archive

Richard Wallis has posted a digital audio interview with Peter Brantley, the Internet Archive's new Director, on Panlibus.

Here's an excerpt from the post:

In this conversation we look back over the last couple of years at the DLF [Digital Library Federation] and then forward in to his new challenge and opportunity at the Internet Archive.

We go on to discuss his thoughts and plans to make it easy to identify books and information and their locations in a way that is currently not possible with the processes and protocols we use today.

Center for Research Libraries to Assess and Certify Portico and HathiTrust

The Center for Research Libraries will conduct detailed assessments of Portico and HathiTrust with the objective of certifying them as trustworthy digital repositories.

Here's an excerpt from the press release:

Portico has agreed to cooperate with the CRL audit, with the goal of certification as a trustworthy digital repository. HathiTrust has asked CRL to assess its digital repository, which includes not only Google Books digitization content but a considerable amount of non-Google content as well.

Concurrently CRL is working with LOCKSS to assess the capabilities of the LOCKSS system for harvesting and archiving digitized primary source materials and related metadata. CRL is also gathering information about regional efforts to host licensed digital content locally. . . .

The general metrics to be used in the assessments will be the Trustworthy Repositories Audit and Certification checklist (TRAC).  CRL has formed a panel of advisors who represent the various sectors of its membership, to further inform the assessment process.  The Certification Advisory Panel will ensure that the certification process addresses the interests of the entire CRL community, and will include leaders in collection development, preservation, and information technology.

Draft Roadmap for Science Data Infrastructure

PARSE.Insight has released Draft Roadmap for Science Data Infrastructure.

Here's an excerpt from the announcement:

The draft roadmap provides an overview and initial details of a number of specific components, both technical and non-technical, which would be needed to supplement existing and already planned infrastructures for scientific data. The infra-structure components are aimed at bridging the gaps between islands of functionality, developed for particular purposes, often by other European projects. Thus the infrastructure components are intended to play a general, unifying role in scientific data. While developed in the context of a Europe-wide infrastructure, there would be great advantages for these types of infrastructure components to be available much more widely.

NEH Preservation and Access Research and Development Grants

The National Endowment for the Humanities is soliciting applications for Preservation and Access Research and Development grants, with an 7/30/09 deadline.

Here's an excerpt from the announcement:

Preservation and Access Research and Development grants support projects that address major challenges in preserving or providing access to humanities collections and resources. These challenges include the need to find better ways to preserve materials of critical importance to the nation's cultural heritage—from fragile artifacts and manuscripts to analog recordings and digital assets subject to technological obsolescence—and to develop advanced modes of searching, discovering, and using such materials. . . .

NEH especially encourages applications that address the following areas:

  • Digital Preservation: how to preserve digital humanities materials, including those for which no analog counterparts exist;
  • Recorded Sound and Moving Image Collections: how to preserve and increase access to the record of the twentieth century contained in these formats; and
  • Preventive Conservation: how to protect and slow the deterioration of humanities collections through the use of sustainable preservation strategies.

DOAJ and e-Depot to Preserve Open Access Journals

With support from the Swedish Library Association, the Directory of Open Access Journals and the e-Depot of the National Library of the Netherlands will preserve open access journals.

Here's an excerpt from the press release:

Long-term preservation of scholarly publications is of major importance for the research community. New formats of scholarly publications, new business models and new ways of dissemination are constantly being developed. To secure permanent access to scientific output for the future, focussed on the preservation of articles published in open access journals, a cooperation between Directory of Open Access Journals (DOAJ—www.doaj.org), developed and operated by Lund University Libraries and the e-Depot of the National Library of the Netherlands (www.kb.nl/e-Depot) has been initiated.

The composition of the DOAJ collection (currently 4000 journals) is characterized by a very large number of publishers (2.000+), each publishing a very small number of journals on different platforms, in different formats and in more than 50 different languages. Many of these publishers are—with a number of exceptions—fragile when it comes to financial, technical and administrative sustainability.

At present DOAJ and KB carry out a pilot project aimed at setting up a workflow for processing open access journals listed with DOAJ. In the pilot a limited number of open access journals will be subject to long term preservation. These activities will be scaled up shortly and long term archiving of the journals listed in the DOAJ at KB’s e-Depot will become an integral part of the service provided by the DOAJ.

DPE Briefing Paper: The Myths and Fallacies of Digital Photographs and Their Preservation

DigitalPreservationEurope has released The Myths and Fallacies of Digital Photographs and Their Preservation

Here's an excerpt:

Digital photographs offer fasciniating new possibilities and seem to be easier to store and preserve for the future than their analog counterpart, promising incredibly valuable, massive photo archives available at your fingertips. However, securely storing massive amounts of data, as well as ensuring that the file formats produced by professional cameras can be read in the near and longterm future, is a significant endeavour. This briefing paper reviews some of the core challenges in preserving digital photographs to make sure that the value of a digital photo archive remains and grows for the benefit of the photographer.

JISC Briefing Paper: Preservation of Web Resources

JISC has released Preservation of Web Resources.

Here's an excerpt:

There are institutional benefits to preserving web resources. Considerable time and money has been invested in the creation of digital outputs and content, and in their storage and maintenance. Although there are costs associated with launching a web preservation programme, it’s also money wasted if resources aren’t preserved. Institutions have responsibilities to: students and staff, who may make serious choices about their academic careers based on website information; and researchers and scholars, who may need to use the university’s resources in the future. Ensuring that the wider community has long-term access to research materials will be broadly beneficial.

There is also the matter of protecting institutions. Many risks are faced by organisations that choose to ignore web preservation. An institutional record may be required for the checking of strategic, legal, financial and contractual information, or simply for the day to day continued efficient running of the organisation. But there are external threats too. These include: data loss; loss of records and loss of resources; a failure to be information compliant (through not meeting Freedom of Information requests); risks of breaching copyright; and even risk of litigation from students or the public. Consider if a legal action were brought against an institution as a result of certain information that was exposed two years ago, and has since been taken down. Could the institution provide evidence, such as an audit trail, in court?

Repositories Support Project Podcasts Launched

The Repositories Support Project Podcasts has launched a podcast series.

Here are titles of the initial podcasts:

  • Digital Preservation: Are Repositories Doing Enough for Preservation?
  • DRIVER: Promoting Digital Repositories across Europe
  • EPrints: Repository Software of the Future or of the Past?
  • Fedora: Optimum Repository Software or Overkill?

DCC Standards Watch Papers: Information Security Management: The ISO 27000 (ISO 27K) Series

The Digital Curation Centre has released Information Security Management: The ISO 27000 (ISO 27K) Series.

Here's an excerpt:

The flexibility of digital information can be regarded as a great strength. As software and hardware develop, data can be created, accessed, edited, manipulated and shared with increasing ease, The corollary is that data is vulnerable to unauthorised access, alteration or manipulation, which without checks can easily go undetected, and undermine its authoritative nature. Successful digital curation ensures that data is managed and protected so that its authority is maintained and retained throughout the curation lifecycle. To be authoritative data needs to remain authentic, reliable and useable, while retaining its integrity. These characteristics of data can be preserved through the implementation of an effective Information Security Management Systems (ISMS). . . .

The ISO/IEC 27000 is a series of standards which, when used together, specify the complete implementation of an ISMS. The series is still under development, with four of the planned standards currently published. Work is progressing on the completion of the remainder of standards ISO/IEC 27000 to ISO/IEC 27010. These cover the fundamental requirements of an ISMS, are applicable to any domain, and can be applied to any organisation regardless of size, structure or aim. ISO/IEC numbers after this have been reserved for sector specific implementation guidelines, most of which are still at the planning or pre-draft stage. The appendix summarises the development of the series to date.

Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives

The Council on Library and Information Resources has released Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives .

Here's an excerpt:

This report addresses the question of what libraries and archives are legally empowered to do to preserve and make accessible for research their holdings of unpublished pre-1972 sound recordings. The report's author, June M. Besek, is executive director of the Kernochan Center for Law, Media and the Arts at Columbia Law School.

Unpublished sound recordings are those created for private use, or even for broadcast, but that have not been distributed to the public in copies with the right holder's consent. Examples include tapes of live musical performances or of interviews conducted as part of field research or news gathering. Such recordings may find their way into library and archive collections through donations or purchase. Some may be the only record of a particular performance or event, and therefore may have considerable cultural and historical significance. The rights for use of unpublished recordings are distinct from those for use of commercial sound recordings, which are made with the authorization of rights holders and are intended for reproduction and sale to the public.

Using examples of specific types of sound recordings, the Besek study (1) describes the different bodies of law that protect pre-1972 sound recordings, (2) explains the difficulty in defining the precise contours of the law, and (3) provides guidance for libraries evaluating their activities with respect to unpublished pre-1972 sound recordings.