Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use

The Office of Digital Humanities in the National Endowment for the Humanities has released the final version of Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use.

Here's an excerpt from the announcement:

This project is about developing archival tools and best practices for preserving born-digital documents produced by contemporary authors. Traditionally, humanists have found great scholarly value in studying the papers, correspondence, and first drafts of authors, politicians, and other historical figures. In this white paper, the project director make note that contemporary figures compose almost all of their materials on a computer. What challenges will this present to humanists, archivists, and librarians in the future? This very readable paper explores many of these issues with specific case studies involving a number of leading libraries and archives.

Reference Model for an Open Archival Information System (OAIS) Draft for Review

A near-final draft of the Reference Model for an Open Archival Information System (OAIS) has been made available for error-checking review.

Here's an excerpt:

This document is a technical Recommendation for use in developing a broader consensus on what is required for an archive to provide permanent, or indefinite long-term, preservation of digital information.

This Recommendation establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It allows existing and future archives to be more meaningfully compared and contrasted. It provides a basis for further standardization within an archival context and it should promote greater vendor awareness of, and support of, archival requirements.

DigitalKoans

JISC Project: Lifespan Initiative for the Research and Data Archive Repository

JISC's Lifespan Initiative for the Research and Data Archive Repository project started on 4/1/09.

Here's an excerpt from the project Web page:

The Lifespan Collection (www.lifespancollection.org.uk) represents an existing and unique research data set, which includes around 3,400 hours of audio-taped interviews, scorings and quantitative computerised data, capturing the lifetime experience of over 500 individuals. The outcomes of this project will be presented in terms of both a report on the processes and best-practice solutions for preserving and digitalising the data, including the creation of processes of submission of, and accessibility to, current and future critical datasets that ensure compliance with data security, copyright legislation, licensing, and associated audit functions. One or more detailed case studies will be produced that will not only inform the future development of this project but will act as illustrative examples for use by other similar start-up projects. This will lay the ground work for an exemplar implementation of the tools and solutions already delivered by JISC and other institutions.

DigitalKoans

Presentations from SCARP Workshop: Building and Curating Online Video Corpora

Presentations from the SCARP Workshop: Building and Curating Online Video Corpora are now available.

Here's an excerpt from the announcement:

This was a meeting of researchers and stakeholders in data service provision to discuss curation issues raised in our SCARP case study on the roles and re-usability of video data in social studies of interaction. This event aimed to raise mutual awareness of research communities' practices and needs for archiving, sharing and re-using digital video data; and identify how local and national research data services may contribute to the infrastructure for video data curation.

Infrastructure Planning and Data Curation: A Comparative Study of International Approaches to Enabling the Sharing of Research Data

JISC has released Infrastructure Planning and Data Curation: A Comparative Study of International Approaches to Enabling the Sharing of Research Data.

Here's an excerpt from the announcement:

The current methods of storing research data are as diverse as the disciplines that generate them and are necessarily driven by the myriad ways in which researchers need to subsequently access and exploit the information they contain. Institutional repositories, data centres and all other methods of storing data have to exist within an infrastructure that enables researchers to access ad exploit the data, and variant models for this infrastructure can be conceptualised. Discussion of effective infrastructures for curating data is taking place a all levels, wherever research is reliant on the longterm stewardship of digital material. JISC has commissioned this study to survey the different national agendas that are addressing variant infrastructure models, to inform developments within the UK and for facilitating an internationally integrated approach to data curation.

The study of data sharing initiatives in the OECD countries confirmed the traditional perception that the policy instruments are clustered more in the upper end of the stakeholder taxonomy – i.e. at the level of national and research funding organisations whereas the services and practical tools are being developed by organisations at the lower end of the taxonomy. Despite the differences that exist between countries in terms of the models used for research funding, as well as the levels at which decisions are taken, there is agreement on the expected strata of responsibility for applying the instruments of data sharing. This supports the structure of stakeholder taxonomy used in the study.

The Internet Archive’s Wayback Machine Rebooted

The Internet Archive's Wayback Machine is now running on a Sun Modular Datacenter.

Here's an excerpt from the "Wayback Machine Comes to Life in New Home":

The Wayback Machine is a 150 billion page web archive with a front end to serve it through the archive.org website.

Today the new machine came to life, so if you using the service, you are using a 20' by 8' by 8' "machine" that sits in Santa Clara, courtesy of Sun Microcomputer. It serves about 500 queries per second from the approximately 4.5 Petabytes (4.5 million gigabytes) of archived web data. We think of the cluster of computers and the Modular Datacenter as a single machine because it acts like one and looks like one. If that is true, then it might be one of the largest current computers.

Read more about Sun and the Internet Archive at "The Internet in a Box."

Historians’ Work Disrupted When Paper of Record Digital Archive Vanishes after Google Purchase

After Google purchased the Paper of Record digital archive, it brought the site down, upsetting historians that relied on the collection of older newspapers. Although the site will be temporarily restored with Google's permission, the incident raises issues about the permanence and reliability of scholarly digital archives.

Read more about it at "Digital Archives That Disappear" and "'Paper of Record' Disappears, Leaving Historians in the Lurch."

Talis Interview with Peter Brantley, Director of the Internet Archive

Richard Wallis has posted a digital audio interview with Peter Brantley, the Internet Archive's new Director, on Panlibus.

Here's an excerpt from the post:

In this conversation we look back over the last couple of years at the DLF [Digital Library Federation] and then forward in to his new challenge and opportunity at the Internet Archive.

We go on to discuss his thoughts and plans to make it easy to identify books and information and their locations in a way that is currently not possible with the processes and protocols we use today.

Center for Research Libraries to Assess and Certify Portico and HathiTrust

The Center for Research Libraries will conduct detailed assessments of Portico and HathiTrust with the objective of certifying them as trustworthy digital repositories.

Here's an excerpt from the press release:

Portico has agreed to cooperate with the CRL audit, with the goal of certification as a trustworthy digital repository. HathiTrust has asked CRL to assess its digital repository, which includes not only Google Books digitization content but a considerable amount of non-Google content as well.

Concurrently CRL is working with LOCKSS to assess the capabilities of the LOCKSS system for harvesting and archiving digitized primary source materials and related metadata. CRL is also gathering information about regional efforts to host licensed digital content locally. . . .

The general metrics to be used in the assessments will be the Trustworthy Repositories Audit and Certification checklist (TRAC).  CRL has formed a panel of advisors who represent the various sectors of its membership, to further inform the assessment process.  The Certification Advisory Panel will ensure that the certification process addresses the interests of the entire CRL community, and will include leaders in collection development, preservation, and information technology.

Draft Roadmap for Science Data Infrastructure

PARSE.Insight has released Draft Roadmap for Science Data Infrastructure.

Here's an excerpt from the announcement:

The draft roadmap provides an overview and initial details of a number of specific components, both technical and non-technical, which would be needed to supplement existing and already planned infrastructures for scientific data. The infra-structure components are aimed at bridging the gaps between islands of functionality, developed for particular purposes, often by other European projects. Thus the infrastructure components are intended to play a general, unifying role in scientific data. While developed in the context of a Europe-wide infrastructure, there would be great advantages for these types of infrastructure components to be available much more widely.

NEH Preservation and Access Research and Development Grants

The National Endowment for the Humanities is soliciting applications for Preservation and Access Research and Development grants, with an 7/30/09 deadline.

Here's an excerpt from the announcement:

Preservation and Access Research and Development grants support projects that address major challenges in preserving or providing access to humanities collections and resources. These challenges include the need to find better ways to preserve materials of critical importance to the nation's cultural heritage—from fragile artifacts and manuscripts to analog recordings and digital assets subject to technological obsolescence—and to develop advanced modes of searching, discovering, and using such materials. . . .

NEH especially encourages applications that address the following areas:

  • Digital Preservation: how to preserve digital humanities materials, including those for which no analog counterparts exist;
  • Recorded Sound and Moving Image Collections: how to preserve and increase access to the record of the twentieth century contained in these formats; and
  • Preventive Conservation: how to protect and slow the deterioration of humanities collections through the use of sustainable preservation strategies.

DOAJ and e-Depot to Preserve Open Access Journals

With support from the Swedish Library Association, the Directory of Open Access Journals and the e-Depot of the National Library of the Netherlands will preserve open access journals.

Here's an excerpt from the press release:

Long-term preservation of scholarly publications is of major importance for the research community. New formats of scholarly publications, new business models and new ways of dissemination are constantly being developed. To secure permanent access to scientific output for the future, focussed on the preservation of articles published in open access journals, a cooperation between Directory of Open Access Journals (DOAJ—www.doaj.org), developed and operated by Lund University Libraries and the e-Depot of the National Library of the Netherlands (www.kb.nl/e-Depot) has been initiated.

The composition of the DOAJ collection (currently 4000 journals) is characterized by a very large number of publishers (2.000+), each publishing a very small number of journals on different platforms, in different formats and in more than 50 different languages. Many of these publishers are—with a number of exceptions—fragile when it comes to financial, technical and administrative sustainability.

At present DOAJ and KB carry out a pilot project aimed at setting up a workflow for processing open access journals listed with DOAJ. In the pilot a limited number of open access journals will be subject to long term preservation. These activities will be scaled up shortly and long term archiving of the journals listed in the DOAJ at KB’s e-Depot will become an integral part of the service provided by the DOAJ.

DPE Briefing Paper: The Myths and Fallacies of Digital Photographs and Their Preservation

DigitalPreservationEurope has released The Myths and Fallacies of Digital Photographs and Their Preservation

Here's an excerpt:

Digital photographs offer fasciniating new possibilities and seem to be easier to store and preserve for the future than their analog counterpart, promising incredibly valuable, massive photo archives available at your fingertips. However, securely storing massive amounts of data, as well as ensuring that the file formats produced by professional cameras can be read in the near and longterm future, is a significant endeavour. This briefing paper reviews some of the core challenges in preserving digital photographs to make sure that the value of a digital photo archive remains and grows for the benefit of the photographer.

JISC Briefing Paper: Preservation of Web Resources

JISC has released Preservation of Web Resources.

Here's an excerpt:

There are institutional benefits to preserving web resources. Considerable time and money has been invested in the creation of digital outputs and content, and in their storage and maintenance. Although there are costs associated with launching a web preservation programme, it’s also money wasted if resources aren’t preserved. Institutions have responsibilities to: students and staff, who may make serious choices about their academic careers based on website information; and researchers and scholars, who may need to use the university’s resources in the future. Ensuring that the wider community has long-term access to research materials will be broadly beneficial.

There is also the matter of protecting institutions. Many risks are faced by organisations that choose to ignore web preservation. An institutional record may be required for the checking of strategic, legal, financial and contractual information, or simply for the day to day continued efficient running of the organisation. But there are external threats too. These include: data loss; loss of records and loss of resources; a failure to be information compliant (through not meeting Freedom of Information requests); risks of breaching copyright; and even risk of litigation from students or the public. Consider if a legal action were brought against an institution as a result of certain information that was exposed two years ago, and has since been taken down. Could the institution provide evidence, such as an audit trail, in court?

Repositories Support Project Podcasts Launched

The Repositories Support Project Podcasts has launched a podcast series.

Here are titles of the initial podcasts:

  • Digital Preservation: Are Repositories Doing Enough for Preservation?
  • DRIVER: Promoting Digital Repositories across Europe
  • EPrints: Repository Software of the Future or of the Past?
  • Fedora: Optimum Repository Software or Overkill?

DCC Standards Watch Papers: Information Security Management: The ISO 27000 (ISO 27K) Series

The Digital Curation Centre has released Information Security Management: The ISO 27000 (ISO 27K) Series.

Here's an excerpt:

The flexibility of digital information can be regarded as a great strength. As software and hardware develop, data can be created, accessed, edited, manipulated and shared with increasing ease, The corollary is that data is vulnerable to unauthorised access, alteration or manipulation, which without checks can easily go undetected, and undermine its authoritative nature. Successful digital curation ensures that data is managed and protected so that its authority is maintained and retained throughout the curation lifecycle. To be authoritative data needs to remain authentic, reliable and useable, while retaining its integrity. These characteristics of data can be preserved through the implementation of an effective Information Security Management Systems (ISMS). . . .

The ISO/IEC 27000 is a series of standards which, when used together, specify the complete implementation of an ISMS. The series is still under development, with four of the planned standards currently published. Work is progressing on the completion of the remainder of standards ISO/IEC 27000 to ISO/IEC 27010. These cover the fundamental requirements of an ISMS, are applicable to any domain, and can be applied to any organisation regardless of size, structure or aim. ISO/IEC numbers after this have been reserved for sector specific implementation guidelines, most of which are still at the planning or pre-draft stage. The appendix summarises the development of the series to date.

Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives

The Council on Library and Information Resources has released Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives .

Here's an excerpt:

This report addresses the question of what libraries and archives are legally empowered to do to preserve and make accessible for research their holdings of unpublished pre-1972 sound recordings. The report's author, June M. Besek, is executive director of the Kernochan Center for Law, Media and the Arts at Columbia Law School.

Unpublished sound recordings are those created for private use, or even for broadcast, but that have not been distributed to the public in copies with the right holder's consent. Examples include tapes of live musical performances or of interviews conducted as part of field research or news gathering. Such recordings may find their way into library and archive collections through donations or purchase. Some may be the only record of a particular performance or event, and therefore may have considerable cultural and historical significance. The rights for use of unpublished recordings are distinct from those for use of commercial sound recordings, which are made with the authorization of rights holders and are intended for reproduction and sale to the public.

Using examples of specific types of sound recordings, the Besek study (1) describes the different bodies of law that protect pre-1972 sound recordings, (2) explains the difficulty in defining the precise contours of the law, and (3) provides guidance for libraries evaluating their activities with respect to unpublished pre-1972 sound recordings.

DigitalPreservationEurope Briefing Paper on Database Preservation

DigitalPreservationEurope has released a briefing paper on Database Preservation.

Here's an excerpt:

Information systems for most organizations are currently supported by databases. Preservation of these databases has to address problems including defining what is to be preserved, the creation and long-term evolution of the preserved objects, organizational support for preservation actions, and technologies that will keep the preserved objects accessible and trustworthy. Some of the issues in database preservation have already been addressed in electronic record preservation, but others result from the specific nature of databases.

How Long Should Institutional Repository Items Be Preserved?: Chris Rusbridge Discusses Results of Informal Surveys

In "Repository Preservation Revisited," Chris Rusbridge, Director of the Digital Curation Centre, discusses the findings of some informal surveys he conducted about how long institutional repository items should be preserved.

Here's an excerpt:

Note, I would not draw any conclusions from the actual numerical votes on their own, but perhaps we can from the values within each group. However, ever hasty if not foolhardy, here are my own tentative interpretations:

  • First, even "experts" are alarmed at the potential implications of the term "OAIS."
  • Second, repository managers don’t believe that keeping resources accessible and/or usable for 10 years (in the context of the types of material they currently manage in repositories) will give them major problems.
  • Third, repository managers don't identify "accessibility and/or usability of its contents for the long term" as implying the mechanisms of an OAIS (this is perhaps rather a stretch given my second conclusion).

DPE Digital Preservation Video Training Course

DigitalPreservationEurope has released its Digital Preservation Video Training Course, a series of digital videos recorded at the DPE/Planets/CASPAR/nestor Joint Training Event: Starting out: Preserving Digital Objects-Principles and Practice in October 2008.

Here's an excerpt from the course page:

The training introduces participants to a number of key digital preservation principles. Participants will leave with:

  • an awareness and understanding of key digital preservation issues and challenges,
  • an appreciation of the range of roles and responsibilities involved with digital preservation activity,
  • knowledge about the reference model for Open Archival Information System (OAIS),
  • a familiarity with file formats currently considered beneficial for preservation,
  • a developed understanding of the role and use of metadata and representation information,
  • knowledge of the preservation planning process and its benefits to overall digital preservation strategies,
  • an insight into the concepts of trust and trustworthiness in the context of digital preservation,
  • a working knowledge of the issues surrounding audit methodologies and self-certification of digital repositories.

iPRES 2008: Proceedings of The Fifth International Conference on Preservation of Digital Objects

The British Library has released iPRES 2008: Proceedings of The Fifth International Conference on Preservation of Digital Objects: Joined Up and Working: Tools and Methods for Digital Preservation, The British Library, London. 29–30 September.

Here's an excerpt:

This volume brings together the proceedings of iPRES 2008, the Fifth International Conference on Digital Preservation, held at The British Library on 29-30 September, 2008. From its beginnings five years ago, iPRES has retained its strong international flavour. This year, it brings together over 250 participants from 33 countries and four continents. iPRES has become a major international forum for the exchange of ideas and practice in Digital Preservation. . . .

The iPRES 2008 conference theme and the papers gathered together here represent a major shift in the state-of-the-art. For the first time, this progress enabled the Programme Committee to establish two distinct tracks. The practitioner track is designed for those with an interest in practically preserving digital content within their organisation. The technical track is designed for those with an interest in underpinning concepts and digital preservation technology. Readers will find valuable insights to draw from in both areas.