"Digital Preservation Outreach and Education (DPOE) Training Needs Assessment Survey: Executive Summary"

The Library of Congress Digital Preservation Outreach and Education (DPOE) initiative has released the "Digital Preservation Outreach and Education (DPOE) Training Needs Assessment Survey: Executive Summary."

Here's an excerpt from the announcement :

The survey was conducted in summer and fall 2010 by the Library’s Digital Preservation Outreach and Education initiative which seeks to foster outreach, education and collaboration nationwide to encourage organizations to preserve their digital content, regardless of staff or budget size or location.

The survey received 868 responses. Of the respondents, 40% were libraries, 34% were archives and 16% were museums. The rest consisted of state and local governments, corporations, nonprofit organizations, parks, and churches.

Among the survey’s major findings:

  • Just over half of the organizations who responded to the survey have less than 25 employees.
  • Only about one-third of respondents had full-time or part-time paid staff dedicated to digital preservation duties. One-half of respondents assigned digital preservation to various staff on an as-needed basis, one-fifth had no staff for this function, and one-tenth used volunteers (figures have been rounded off).
  • Among potential subject areas for digital preservation training, the most important area to respondents was technical training. Management planning, project management and strategic training all tied for second place.
  • The most preferred format for receiving training was small, in-person workshops. Proximity was significant—onsite training was the first choice, with training within a 100-mile radius the second choice.
  • A half-day to a full day was the most preferred length for training.
  • Digital content holdings for almost 95 percent of respondents consisted entirely of digitized versions of already-held collections (typically, paper-based materials), and about 5 percent of holdings were "born digital" content.

Some general observations can be gleaned from the survey. Most organizations only work on digital preservation when it is needed; few devote a full-time staff member to such duties. Most are digitizing paper collections rather than preserving "born digital" data. Short sessions of practical training are most needed; training should be provided on-site because most respondents are small organizations with limited training budgets.

| Digital Scholarship |

"Data Preservation in High Energy Physics"

David M. South has self-archived "Data Preservation in High Energy Physics" in arXiv.org.

Here's an excerpt:

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are in many cases unique. At the same time, HEP has no coherent strategy for data preservation and re-use, and many important and complex data sets are simply lost. In a period of a few years, several important and unique experimental programs will come to an end, including those at HERA, the b-factories and at the Tevatron. An inter-experimental study group on HEP data preservation and long-term analysis (DPHEP) was formed and a series of workshops were held to investigate this issue in a systematic way. The physics case for data preservation and the preservation models established by the group are presented, as well as a description of the transverse global projects and strategies already in place.

| Digital Scholarship |

Digital Curation and Preservation Bibliography, Version 2

Version 2 of the Digital Curation and Preservation Bibliography is now available from Digital Scholarship as an XHTML website with live links to many included works. This selective bibliography includes over 500 articles, books, and technical reports that are useful in understanding digital curation and preservation. All included works are in English. It is available under a Creative Commons Attribution-Noncommercial 3.0 United States License.

Table of Contents

1 General Works about Digital Curation and Preservation
2 Digital Preservation Copyright Issues
3 Digital Preservation of Formats and Materials
3.1 General Works
3.2 Digital Data
3.3 Digital Media
3.4 E-journals
3.5 Other Digital Formats and Materials
3.6 World-Wide Web
4 Digital Preservation Metadata
5 Digital Preservation Models and Policies
6 Digital Preservation National and International Efforts
7 Digital Preservation Projects and Institutional Implementations
8 Digital Preservation Research
9 Digital Preservation Services
9.1 JSTOR
9.2 LOCKSS
9.3 Portico
10 Digital Preservation Strategies
11 Digital Repository Digital Preservation Issues
Appendix A. Related Bibliographies
Appendix B. About the Author

The following recent Digital Scholarship publications may also be of interest:

See also: Reviews of Digital Scholarship Publications.

| Digital Scholarship |

Cloud-Sourcing Research Collections: Managing Print in the Mass-Digitized Library Environment

OCLC has released Cloud-Sourcing Research Collections: Managing Print in the Mass-Digitized Library Environment.

Here's an excerpt from the press release:

The objective of the project was to examine the feasibility of outsourcing management of low-use print books held in academic libraries to shared service providers, including large-scale print and digital repositories. The study assessed the opportunity for library space saving and cost avoidance through the systematic and intentional outsourcing of local management operations for digitized books to shared service providers and progressive downsizing of local print collections in favor of negotiated access to the digitized corpus and regionally consolidated print inventory.

Some of the findings from the project that are detailed in the report include:

  • There is sufficient material in the mass-digitized library collection managed by the HathiTrust to duplicate a sizeable (and growing) portion of virtually any academic library in the United States, and there is adequate duplication between the shared digital repository and large-scale print storage facilities to enable a great number of academic libraries to reconsider their local print management operations.
  • The combination of a relatively small number of potential shared print providers, including the US Library of Congress, was sufficient to achieve more than 70% coverage of the digitized book collection, suggesting that shared service may not require a very large network of providers.
  • Substantial library space savings and cost avoidance could be achieved if academic institutions outsourced management of redundant low-use inventory to shared service providers.
  • Academic library directors can have a positive and profound impact on the future of academic print collections by adopting and implementing a deliberate strategy to build and sustain regional print service centers that can reduce the total cost of library preservation and access.

| Digital Scholarship |

Digital Preservation: Major PRONOM Update

The US National Archives has announced that PRONOM has been significantly updated.

Here's an excerpt from the press release:

The National Archives has contributed to the update of a groundbreaking system—made available online today—that supports long-term preservation of and access to electronic records. The "new and improved" version of this "PRONOM" system was developed in partnership with the National Archives of the United Kingdom and the Georgia Tech Research Institute.

PRONOM is a web-based public technical registry of more than 750 different digital file formats that enables digital archivists, records managers and the public to precisely identify and confirm digital file formats. This identification is the first step to ensuring long-term electronic file preservation by enabling the identification of those file formats that are in danger of becoming obsolete. . . .

Technology from the National Archives contributed to a 25% increase in the number of entries in the PRONOM database, greatly enhancing PRONOM's range. "The National Archives is proud to share these technologies and contribute to PRONOM. Providing sustained access to valuable digital information is essential to preserving both our nation's records, and valuable digital assets worldwide" said NCAST Director, Kenneth Thibodeau. "The electronic records of the U.S. Government must be preserved for future generations, just as traditional paper and parchment records were preserved for us."

| Digital Scholarship |

Memento Project Wins Digital Preservation Award 2010

The Memento Project has won the Digital Preservation Award 2010.

Here's an excerpt from the press release:

The Institute for Conservation and the Digital Preservation Coalition (DPC) are delighted to announce that the Memento Project led by Herbert Van De Sompel and colleagues of Los Alamos National Laboratory and Michael Nelson and colleagues of Old Dominion University, USA, has won the Digital Preservation Award 2010. . . .

"The ability to change and update pages is one of the web’s greatest advantages but it introduces a sort of structured instability which makes it hard to depend on web pages in the long term. For more than a decade services like the UK Web Archive and the Internet Archive have provided a stable but partial memory of a fragment of the web—but users had no way of linking between current content and earlier versions held by web archives."

"The Memento project resolves this by letting users set a time preference in their browser. The underlying technology then deploys basic, under-used features of the HTTP protocol to direct users to whichever archived copy of a website most closely matches their request." [Richard Ovenden, Chair of the Digital Preservation Coalition]

| Digital Scholarship |

"Selected Internet Resources on Digital Research Data Curation"

Brian Westra et al. have published "Selected Internet Resources on Digital Research Data Curation" in the latest issue of Issues in Science and Technology Librarianship.

Here's an excerpt:

In order to present a webliography of reasonable scope and length, the authors focused on resources applicable to the broader topic of digital research data curation as they relate to the natural sciences. Materials primarily or solely devoted to medical informatics, social sciences, and the humanities were not included. However, it should be noted that a number of the resources presented here are also applicable to research data curation in disciplines other than the sciences—for example, data repository software may be as useful to the social scientist as it is to a researcher in ecology. Additional scope specificity, when necessary, is provided in respective section listings below.

| Digital Scholarship |

Guide for Research Libraries: The NSF Data Sharing Policy

ARL has released the Guide for Research Libraries: The NSF Data Sharing Policy.

Here's an excerpt:

The Association for Research Libraries has developed this guide primarily for librarians, to help them make sense of the new NSF requirement. It provides the context for, and an explanation of, the policy change and its ramifications for the grant-writing process. It investigates the role of libraries in data management planning, offering guidance in helping researchers meet the NSF requirement. In addition, the guide provides a resources page, where examples of responses from ARL libraries may be found, as well as guides for data management planning created by various NSF directorates and approaches to the topic created by international data archive and curation centers.

| Digital Scholarship |

"Keeping Bits Safe: How Hard Can It Be?"

David S. H. Rosenthal has published "Keeping Bits Safe: How Hard Can It Be?" in ACM Queue.

Here's an excerpt:

There is an obvious question we should be asking: how many copies in storage systems with what reliability do we need to get a given probability that the data will be recovered when we need it? This may be an obvious question to ask, but it is a surprisingly hard question to answer. Let's look at the reasons why.

To be specific, let's suppose we need to keep a petabyte for a century and have a 50 percent chance that every bit will survive undamaged. This may sound like a lot of data and a long time, but there are already data collections bigger than a petabyte that are important to keep forever. The Internet Archive is already multiple petabytes.

E-Journal Archiving for UK HE Libraries: A Draft White Paper

JISC has released E-Journal Archiving for UK HE Libraries: A Draft White Paper for comment.

Here's an excerpt from the announcement:

Libraries are facing increasing space pressures and funding constraints. There is a growing interest in wherever possible moving more rapidly to e-only provision to help alleviate these pressures as well as to provide new electronic services to users. One of the most cited barriers and concerns both from library and faculty staff to moving to e-only has been sustaining and assuring long-term access to electronic content.

The aim of this white paper is to help universities and libraries implement policies and procedures in relation to e-journal archiving which can help support the move towards e-only provision of scholarly journals across the HE sector. The white paper is also contributing to complementary work JISC and other funders are commissioning on moving towards e-only provision of Journals.

Preserving Virtual Worlds II Gets $785,898 IMLS Grant

The Preserving Virtual Worlds II project has been awarded a $785,898 National Leadership Grant by the Institute of Museum and Library Services.

Here's an excerpt from the press release:

Preserving Virtual Worlds II: Methods for Evaluating and Preserving Significant Properties of Educational Games and Complex Interactive Environments (PVW2) is led by GSLIS Assistant Professor Jerome McDonough in partnership with the Rochester Institute of Technology, the University of Maryland, and Stanford University. PVW2 plans to help improve the capacity of libraries, museums, and archives to preserve computer games, video games, and interactive fiction.

The original Preserving Virtual Worlds project, funded by the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIP), investigated what preservation issues arose with computer games and interactive fiction, and how existing metadata and packaging standards might be employed for the long-term preservation of these materials. PVW2 will focus on determining properties for a variety of educational games and game franchises in order to provide a set of best practices for preserving the materials through virtualization technologies and migration, as well as provide an analysis of how the preservation process is documented. PVW2 is a two-year project, to be conducted between October 2010 and September 2012.

Read more about it at "Preserving Virtual Worlds 2 Funded."

Report on Digital Preservation Practice and Plans amongst LIBER Members with Recommendations for Practical Action

EuropeanaTravel has released Report on Digital Preservation Practice and Plans amongst LIBER Members with Recommendations for Practical Action.

Here's an excerpt:

As part of Work package 1 concerned with planning digitisation, a survey was designed to collect information about digital preservation practice and plans amongst all LIBER member libraries to inform future activity of LIBER’s Working Group on Preservation and Digital Curation. The survey focused on the digital preservation of digitised material.

The major findings are as follows:

  • Some LIBER members have already been engaged in digitisation activities. The number of institutions with digitisation activities and the volume of digitised material are expected to grow further in the future.
  • There is a mismatch between the perceived high value of digitised material and the frequent lack of a written policy/ procedure addressing the digital preservation of these collections. A number of the institutions without an according written policy stated they were working on developing and establishing one.
  • Storage and development of tools are areas where considerable investments are made by the majority of institutions surveyed. Those are also the fields where many of the institutions face difficulties.
  • Investments in staff assigned to digital preservation task are still inadequate at several institutions.
  • Some digital preservation practices and basic integrity measurements are more widespread than others. More than half of the institutions which responded already have an archive dedicated to digitised collections in place, use preservation metadata standards and format restrictions to support preservation, have processes of bitstream preservation implemented and provide staff training in the area of digital preservation. One can identify a clear tendency that emulation strategy is less commonly used than migration and other migration supporting practices.
  • Difficulties in establishing digital archives with a functioning preservation system, the frequent lack of institutional strategies concerning digitisation and digital preservation and funding problems seem to be amongst the most serious problems faced by LIBER members.

Preserving Virtual Worlds Final Report

Jerome McDonough et al. have self-archived Preserving Virtual Worlds Final Report in IDEALS.

Here's an excerpt from the announcement:

The report includes findings from the entire project team on issues relating to the preservation of video games and interaction fiction, including issues around library & archival collection development/management, bibliographic description, copyright & intellectual property, preservation strategies, metadata & packaging, and next steps for both the professional and research community with regards to these complex and important resources.

"Research Data: Who Will Share What, with Whom, When, and Why?"

Christine L. Borgman has self-archived "Research Data: Who Will Share What, with Whom, When, and Why?" in SelectedWorks.

Here's an excerpt:

The deluge of scientific research data has excited the general public, as well as the scientific community, with the possibilities for better understanding of scientific problems, from climate to culture. For data to be available, researchers must be willing and able to share them. The policies of governments, funding agencies, journals, and university tenure and promotion committees also influence how, when, and whether research data are shared. Data are complex objects. Their purposes and the methods by which they are produced vary widely across scientific fields, as do the criteria for sharing them. To address these challenges, it is necessary to examine the arguments for sharing data and how those arguments match the motivations and interests of the scientific community and the public. Four arguments are examined: to make the results of publicly funded data available to the public, to enable others to ask new questions of extant data, to advance the state of science, and to reproduce research. Libraries need to consider their role in the face of each of these arguments, and what expertise and systems they require for data curation.

"Keeping Research Data Safe Factsheet"

Charles Beagrie Limited has released the "Keeping Research Data Safe Factsheet."

Here's an excerpt:

This factsheet illustrates for institutions, researchers, and funders some of the key findings and recommendations from the JISC-funded Keeping Research Data Safe (KRDS1) and Keeping Research Data Safe 2 (KRDS2) projects.

Digital Preservation: PADI and Padiforum-L to Cease Operation

Established in 1997, the National Library of Australia's PADI subject gateway, which has over 3,000 resources on more than 60 topics, will be shut down at the end of this year.

Here's an excerpt from the announcement:

As is to be expected with any portal to Web based documents maintenance of web links becomes progressively more demanding over time. Websites are redesigned, migrated to new platforms, URL’s are changed, projects and their websites cease, so called persistent identifiers are not, and even when web documents or pages are archived in a web archive, questions arise as to which version of an archived page to link to (which date or even which archive as copies may be held in multiple web archives with different levels of completeness). The current structure of PADI requires the Library to commit around 0.5 of a fulltime staff member to locate, describe and enter links to new information sources and to maintain links to existing resources. Although originally conceived as a cooperative contribution model, increasingly the burden of adding material to PADI has fallen to the NLA as input from elsewhere has almost ceased.

The information-seeking and information-providing mechanisms of a community also change over time. After reviewing the gateway service the Library has concluded that the existing website, database and list no longer meet the current needs and that the Library’s resources are best invested elsewhere. While there may be more efficient ways of building a service like PADI today, using Web 2.0 tools, the Library is unable to make the investment in converting the existing service.

Reluctantly—because we still find PADI useful ourselves—we believe we cannot sustain PADI, and have decided to cease maintaining it.

A copy of the website has been archived in PANDORA, Australia’s Web Archive. The existing live website will remain available until the end of 2010; however no new resources have been added since the start of July 2010 and the existing links will not be actively managed. The archives of the padiforum-l list will continue to be available, however no new postings will be accepted from 30 September 2010.

The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age

The Council on Library and Information Resources and the Library of Congress have released The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age.

Here's an excerpt:

The publication of The State of Recorded Sound Preservation in the United States is a landmark achievement in the history of the archival preservation of audiovisual materials. The authors, Rob Bamberger and Sam Brylawski, have produced a study outlining the web of interlocking issues that now threaten the long-term survival of our sound recording history. This study tells us that major areas of America’s recorded sound heritage have already been destroyed or remain inaccessible to the public. It suggests that the lack of conformity between federal and state laws may adversely affect the long-term survival of pre-1972-era sound recordings in particular. And, it warns that the continued lack of national coordination among interested parties in the public and private sectors, in addressing the challenges in preservation, professional education and public access, may not yet be arresting permanent loss of irreplaceable sound recordings in all genres.

Long-Term Preservation Services: A Description of LTP Services in a Digital Library Environment

The British Library, Koninklijke Bibliotheek, Deutsche Nationalbibliothek, and Nasjonalbiblioteket have released Long-Term Preservation Services: A Description of LTP Services in a Digital Library Environment.

Here's an excerpt:

The main focus of this document is long-term preservation, but considered as an integral part of the overall digital library capability within a library and the corresponding workflows. We therefore seek information about long-term preservation within this broader context. Principles and implementation may vary greatly, and we are open to alternative approaches.

The document starts with an overview of all the types of services involved in LTP, and shows how different institutions might draw the boundaries between the LTP and a wider digital library capability. We then take the three core functions of an LTP system (to ingest, retain, and provide access to digital content) and show how the services work together to fulfill each function. Finally, we give a detailed description of each type of service.

Preserving Digital Public Television: Final Report

The NDIIPP-funded Preserving Digital Public Television project has released Preserving Digital Public Television: Final Report.

Here's an excerpt:

The goals of the PDPTV project were to:

  • Design and build a prototype preservation repository for born-digital public television content;
  • Develop a set of standards for metadata, file and encoding formats, and production workflow practices;
  • Recommend selection criteria for long-term retention;
  • Examine issues of long-term content accessibility and methods for sustaining digital preservation of public television materials, including IP concerns.
  • Introduce the importance of digital preservation to the public broadcasting community.

GPO Hires Its First Preservation Librarian

The U.S. Government Printing Office has hired its first preservation librarian, David Walls.

Here's an excerpt from the press release:

The U.S. Government Printing Office (GPO) is continuing its commitment to preserving the documents of our democracy by establishing the agency’s first preservation librarian position. GPO’s preservation librarian will be tasked with updating the Federal Depository Library Program (FDLP) collection management plan for the preservation of federal government documents. David Walls will serve as GPO’s first preservation librarian; he is a member of the American Library Association (ALA) and comes to the agency from Yale University where he worked as a preservation librarian for 12 years. While at Yale, Walls established practices for the digital conversion of library and special collection materials.

Digital preservation is an ongoing initiative for GPO. In 2009, the agency launched GPO’s Federal Digital System (FDsys), a content management system, preservation repository and advanced search engine that provides the public with permanent public access to federal government information. GPO is also a member of LOCKSS (Lots of Copies Keep Stuff Safe), a worldwide digital preservation alliance that collaborates with libraries and organizations on preservation initiatives.

Digital Preservation: PARSE.Insight Presentations and Report

PARSE.Insight (Permanent Access to the Records of Science in Europe) has released several presentations and reports.

Research Data Management: Incremental Project Releases Scoping Study And Implementation Plan

The Incremental Project has released the Scoping Study And Implementation Plan. The Cambridge University Library and Humanities Advanced Technology and Information Institute (HATII) at the University of Glasgow jointly run the project.

Here's a brief description of the project from its home page:

The project is a first step in improving and facilitating the day-to-day and long-term management of research data in higher education institutions (HEI's). We aim to increase researchers’ capacity and motivation for managing their digital research data, using existing tools and resources where possible and working to identify and fill gaps where additional tailored support and guidance is required. We aim to take a bottom-up approach, consulting a diverse set of researchers in each stage of the project.

Read more about it at "Scoping Study and Implementation Plan Released."

A Guide to Web Preservation

The JISC-funded PoWR project has releasd A Guide to Web Preservation.

Here's an excerpt:

The [JISC PoWR] project handbook was published in November 2008. Since then we have seen a growing awareness of the importance of digital preservation in general and in the preservation of web resources (including web pages, web-based applications and websites) in particular. The current economic crisis and the expected cuts across public sector organisations mean that a decade of growth and optimism is now over – instead we can expect to see reduced levels of funding available within the sector which will have an impact on the networked services which are used to support teaching and learning and research activities.

The need to manage the implications of these cutbacks is likely to result in a renewed interest in digital preservation. We are therefore pleased to be able to publish this new guide, based on the original PoWR: The Preservation of Web Resources Handbook, which provides practical advice to practitioners and policy makers responsible for the provision of web services.

A Future for Our Digital Memory (2): Strategic Agenda 2010-2013 for Long-Term Access to Digital Resources

The Netherlands Coalition for Digital Preservation has released A Future for Our Digital Memory (2): Strategic Agenda 2010-2013 for Long-Term Access to Digital Resources

Here's an excerpt from the announcement:

The document proposes a dual-axis approach: on the one hand collaboration within domains and information chains must be strengthened. This process is to be facilitated by so-called network leaders: the National Archives for public records, the KB, National Library of the Netherlands, for scholarly publications, Data Archiving and Networked Services for research data and the Netherlands Institute for Sound and Vision for media. A fifth network leader for cultural heritage institutions such as museums, is yet to be announced. The NCDD itself is to facilitate cross-domain cooperation and knowledge exchanges.

See also A Future for Our Digital Memory (1): Permanent Access to Information in the Netherlands.

Presentations from Computer Forensics and Born-Digital Content in Cultural Heritage Collections Meeting

The Maryland Institute for Technology in the Humanities has released presentations from the Computer Forensics and Born-Digital Content in Cultural Heritage Collections meeting.

Here's an excerpt from the meeting's background document:

While such [computer forensics] activities may seem (happily) far removed from the concerns of the cultural heritage sector, the methods and tools developed by forensics experts represent a novel approach to key issues and challenges in the archives community. Libraries, special collections, and other repositories increasingly receive computer storage media (and sometimes entire computers) as part of their acquisition of "papers" from contemporary artists, writers, musicians, government officials, politicians, scholars, and other public figures. Cell phones, e-readers, and other data-rich devices will surely follow. The same forensics software that indexes a criminal suspect's hard drive allows the archivist to prepare a comprehensive manifest of the electronic files a donor has turned over for accession; the same hardware that allows the forensics specialist to create an algorithmically authenticated "image" of a file system allows the archivist to ensure the integrity of digital content once committed to an institutional repository; the same data recovery procedures that allow the specialist to discover, recover, and present as trial evidence an "erased" file may allow a scholar to reconstruct a lost or inadvertently deleted version of an electronic manuscript—and do so with enough confidence to stake reputation and career.