"Selected Internet Resources on Digital Research Data Curation"

Brian Westra et al. have published "Selected Internet Resources on Digital Research Data Curation" in the latest issue of Issues in Science and Technology Librarianship.

Here's an excerpt:

In order to present a webliography of reasonable scope and length, the authors focused on resources applicable to the broader topic of digital research data curation as they relate to the natural sciences. Materials primarily or solely devoted to medical informatics, social sciences, and the humanities were not included. However, it should be noted that a number of the resources presented here are also applicable to research data curation in disciplines other than the sciences—for example, data repository software may be as useful to the social scientist as it is to a researcher in ecology. Additional scope specificity, when necessary, is provided in respective section listings below.

| Digital Scholarship |

Guide for Research Libraries: The NSF Data Sharing Policy

ARL has released the Guide for Research Libraries: The NSF Data Sharing Policy.

Here's an excerpt:

The Association for Research Libraries has developed this guide primarily for librarians, to help them make sense of the new NSF requirement. It provides the context for, and an explanation of, the policy change and its ramifications for the grant-writing process. It investigates the role of libraries in data management planning, offering guidance in helping researchers meet the NSF requirement. In addition, the guide provides a resources page, where examples of responses from ARL libraries may be found, as well as guides for data management planning created by various NSF directorates and approaches to the topic created by international data archive and curation centers.

| Digital Scholarship |

"Keeping Bits Safe: How Hard Can It Be?"

David S. H. Rosenthal has published "Keeping Bits Safe: How Hard Can It Be?" in ACM Queue.

Here's an excerpt:

There is an obvious question we should be asking: how many copies in storage systems with what reliability do we need to get a given probability that the data will be recovered when we need it? This may be an obvious question to ask, but it is a surprisingly hard question to answer. Let's look at the reasons why.

To be specific, let's suppose we need to keep a petabyte for a century and have a 50 percent chance that every bit will survive undamaged. This may sound like a lot of data and a long time, but there are already data collections bigger than a petabyte that are important to keep forever. The Internet Archive is already multiple petabytes.

E-Journal Archiving for UK HE Libraries: A Draft White Paper

JISC has released E-Journal Archiving for UK HE Libraries: A Draft White Paper for comment.

Here's an excerpt from the announcement:

Libraries are facing increasing space pressures and funding constraints. There is a growing interest in wherever possible moving more rapidly to e-only provision to help alleviate these pressures as well as to provide new electronic services to users. One of the most cited barriers and concerns both from library and faculty staff to moving to e-only has been sustaining and assuring long-term access to electronic content.

The aim of this white paper is to help universities and libraries implement policies and procedures in relation to e-journal archiving which can help support the move towards e-only provision of scholarly journals across the HE sector. The white paper is also contributing to complementary work JISC and other funders are commissioning on moving towards e-only provision of Journals.

Preserving Virtual Worlds II Gets $785,898 IMLS Grant

The Preserving Virtual Worlds II project has been awarded a $785,898 National Leadership Grant by the Institute of Museum and Library Services.

Here's an excerpt from the press release:

Preserving Virtual Worlds II: Methods for Evaluating and Preserving Significant Properties of Educational Games and Complex Interactive Environments (PVW2) is led by GSLIS Assistant Professor Jerome McDonough in partnership with the Rochester Institute of Technology, the University of Maryland, and Stanford University. PVW2 plans to help improve the capacity of libraries, museums, and archives to preserve computer games, video games, and interactive fiction.

The original Preserving Virtual Worlds project, funded by the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIP), investigated what preservation issues arose with computer games and interactive fiction, and how existing metadata and packaging standards might be employed for the long-term preservation of these materials. PVW2 will focus on determining properties for a variety of educational games and game franchises in order to provide a set of best practices for preserving the materials through virtualization technologies and migration, as well as provide an analysis of how the preservation process is documented. PVW2 is a two-year project, to be conducted between October 2010 and September 2012.

Read more about it at "Preserving Virtual Worlds 2 Funded."

Report on Digital Preservation Practice and Plans amongst LIBER Members with Recommendations for Practical Action

EuropeanaTravel has released Report on Digital Preservation Practice and Plans amongst LIBER Members with Recommendations for Practical Action.

Here's an excerpt:

As part of Work package 1 concerned with planning digitisation, a survey was designed to collect information about digital preservation practice and plans amongst all LIBER member libraries to inform future activity of LIBER’s Working Group on Preservation and Digital Curation. The survey focused on the digital preservation of digitised material.

The major findings are as follows:

  • Some LIBER members have already been engaged in digitisation activities. The number of institutions with digitisation activities and the volume of digitised material are expected to grow further in the future.
  • There is a mismatch between the perceived high value of digitised material and the frequent lack of a written policy/ procedure addressing the digital preservation of these collections. A number of the institutions without an according written policy stated they were working on developing and establishing one.
  • Storage and development of tools are areas where considerable investments are made by the majority of institutions surveyed. Those are also the fields where many of the institutions face difficulties.
  • Investments in staff assigned to digital preservation task are still inadequate at several institutions.
  • Some digital preservation practices and basic integrity measurements are more widespread than others. More than half of the institutions which responded already have an archive dedicated to digitised collections in place, use preservation metadata standards and format restrictions to support preservation, have processes of bitstream preservation implemented and provide staff training in the area of digital preservation. One can identify a clear tendency that emulation strategy is less commonly used than migration and other migration supporting practices.
  • Difficulties in establishing digital archives with a functioning preservation system, the frequent lack of institutional strategies concerning digitisation and digital preservation and funding problems seem to be amongst the most serious problems faced by LIBER members.

Preserving Virtual Worlds Final Report

Jerome McDonough et al. have self-archived Preserving Virtual Worlds Final Report in IDEALS.

Here's an excerpt from the announcement:

The report includes findings from the entire project team on issues relating to the preservation of video games and interaction fiction, including issues around library & archival collection development/management, bibliographic description, copyright & intellectual property, preservation strategies, metadata & packaging, and next steps for both the professional and research community with regards to these complex and important resources.

"Research Data: Who Will Share What, with Whom, When, and Why?"

Christine L. Borgman has self-archived "Research Data: Who Will Share What, with Whom, When, and Why?" in SelectedWorks.

Here's an excerpt:

The deluge of scientific research data has excited the general public, as well as the scientific community, with the possibilities for better understanding of scientific problems, from climate to culture. For data to be available, researchers must be willing and able to share them. The policies of governments, funding agencies, journals, and university tenure and promotion committees also influence how, when, and whether research data are shared. Data are complex objects. Their purposes and the methods by which they are produced vary widely across scientific fields, as do the criteria for sharing them. To address these challenges, it is necessary to examine the arguments for sharing data and how those arguments match the motivations and interests of the scientific community and the public. Four arguments are examined: to make the results of publicly funded data available to the public, to enable others to ask new questions of extant data, to advance the state of science, and to reproduce research. Libraries need to consider their role in the face of each of these arguments, and what expertise and systems they require for data curation.

"Keeping Research Data Safe Factsheet"

Charles Beagrie Limited has released the "Keeping Research Data Safe Factsheet."

Here's an excerpt:

This factsheet illustrates for institutions, researchers, and funders some of the key findings and recommendations from the JISC-funded Keeping Research Data Safe (KRDS1) and Keeping Research Data Safe 2 (KRDS2) projects.

Digital Preservation: PADI and Padiforum-L to Cease Operation

Established in 1997, the National Library of Australia's PADI subject gateway, which has over 3,000 resources on more than 60 topics, will be shut down at the end of this year.

Here's an excerpt from the announcement:

As is to be expected with any portal to Web based documents maintenance of web links becomes progressively more demanding over time. Websites are redesigned, migrated to new platforms, URL’s are changed, projects and their websites cease, so called persistent identifiers are not, and even when web documents or pages are archived in a web archive, questions arise as to which version of an archived page to link to (which date or even which archive as copies may be held in multiple web archives with different levels of completeness). The current structure of PADI requires the Library to commit around 0.5 of a fulltime staff member to locate, describe and enter links to new information sources and to maintain links to existing resources. Although originally conceived as a cooperative contribution model, increasingly the burden of adding material to PADI has fallen to the NLA as input from elsewhere has almost ceased.

The information-seeking and information-providing mechanisms of a community also change over time. After reviewing the gateway service the Library has concluded that the existing website, database and list no longer meet the current needs and that the Library’s resources are best invested elsewhere. While there may be more efficient ways of building a service like PADI today, using Web 2.0 tools, the Library is unable to make the investment in converting the existing service.

Reluctantly—because we still find PADI useful ourselves—we believe we cannot sustain PADI, and have decided to cease maintaining it.

A copy of the website has been archived in PANDORA, Australia’s Web Archive. The existing live website will remain available until the end of 2010; however no new resources have been added since the start of July 2010 and the existing links will not be actively managed. The archives of the padiforum-l list will continue to be available, however no new postings will be accepted from 30 September 2010.

The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age

The Council on Library and Information Resources and the Library of Congress have released The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age.

Here's an excerpt:

The publication of The State of Recorded Sound Preservation in the United States is a landmark achievement in the history of the archival preservation of audiovisual materials. The authors, Rob Bamberger and Sam Brylawski, have produced a study outlining the web of interlocking issues that now threaten the long-term survival of our sound recording history. This study tells us that major areas of America’s recorded sound heritage have already been destroyed or remain inaccessible to the public. It suggests that the lack of conformity between federal and state laws may adversely affect the long-term survival of pre-1972-era sound recordings in particular. And, it warns that the continued lack of national coordination among interested parties in the public and private sectors, in addressing the challenges in preservation, professional education and public access, may not yet be arresting permanent loss of irreplaceable sound recordings in all genres.

Long-Term Preservation Services: A Description of LTP Services in a Digital Library Environment

The British Library, Koninklijke Bibliotheek, Deutsche Nationalbibliothek, and Nasjonalbiblioteket have released Long-Term Preservation Services: A Description of LTP Services in a Digital Library Environment.

Here's an excerpt:

The main focus of this document is long-term preservation, but considered as an integral part of the overall digital library capability within a library and the corresponding workflows. We therefore seek information about long-term preservation within this broader context. Principles and implementation may vary greatly, and we are open to alternative approaches.

The document starts with an overview of all the types of services involved in LTP, and shows how different institutions might draw the boundaries between the LTP and a wider digital library capability. We then take the three core functions of an LTP system (to ingest, retain, and provide access to digital content) and show how the services work together to fulfill each function. Finally, we give a detailed description of each type of service.

Preserving Digital Public Television: Final Report

The NDIIPP-funded Preserving Digital Public Television project has released Preserving Digital Public Television: Final Report.

Here's an excerpt:

The goals of the PDPTV project were to:

  • Design and build a prototype preservation repository for born-digital public television content;
  • Develop a set of standards for metadata, file and encoding formats, and production workflow practices;
  • Recommend selection criteria for long-term retention;
  • Examine issues of long-term content accessibility and methods for sustaining digital preservation of public television materials, including IP concerns.
  • Introduce the importance of digital preservation to the public broadcasting community.

GPO Hires Its First Preservation Librarian

The U.S. Government Printing Office has hired its first preservation librarian, David Walls.

Here's an excerpt from the press release:

The U.S. Government Printing Office (GPO) is continuing its commitment to preserving the documents of our democracy by establishing the agency’s first preservation librarian position. GPO’s preservation librarian will be tasked with updating the Federal Depository Library Program (FDLP) collection management plan for the preservation of federal government documents. David Walls will serve as GPO’s first preservation librarian; he is a member of the American Library Association (ALA) and comes to the agency from Yale University where he worked as a preservation librarian for 12 years. While at Yale, Walls established practices for the digital conversion of library and special collection materials.

Digital preservation is an ongoing initiative for GPO. In 2009, the agency launched GPO’s Federal Digital System (FDsys), a content management system, preservation repository and advanced search engine that provides the public with permanent public access to federal government information. GPO is also a member of LOCKSS (Lots of Copies Keep Stuff Safe), a worldwide digital preservation alliance that collaborates with libraries and organizations on preservation initiatives.

Digital Preservation: PARSE.Insight Presentations and Report

PARSE.Insight (Permanent Access to the Records of Science in Europe) has released several presentations and reports.

Research Data Management: Incremental Project Releases Scoping Study And Implementation Plan

The Incremental Project has released the Scoping Study And Implementation Plan. The Cambridge University Library and Humanities Advanced Technology and Information Institute (HATII) at the University of Glasgow jointly run the project.

Here's a brief description of the project from its home page:

The project is a first step in improving and facilitating the day-to-day and long-term management of research data in higher education institutions (HEI's). We aim to increase researchers’ capacity and motivation for managing their digital research data, using existing tools and resources where possible and working to identify and fill gaps where additional tailored support and guidance is required. We aim to take a bottom-up approach, consulting a diverse set of researchers in each stage of the project.

Read more about it at "Scoping Study and Implementation Plan Released."

A Guide to Web Preservation

The JISC-funded PoWR project has releasd A Guide to Web Preservation.

Here's an excerpt:

The [JISC PoWR] project handbook was published in November 2008. Since then we have seen a growing awareness of the importance of digital preservation in general and in the preservation of web resources (including web pages, web-based applications and websites) in particular. The current economic crisis and the expected cuts across public sector organisations mean that a decade of growth and optimism is now over – instead we can expect to see reduced levels of funding available within the sector which will have an impact on the networked services which are used to support teaching and learning and research activities.

The need to manage the implications of these cutbacks is likely to result in a renewed interest in digital preservation. We are therefore pleased to be able to publish this new guide, based on the original PoWR: The Preservation of Web Resources Handbook, which provides practical advice to practitioners and policy makers responsible for the provision of web services.

A Future for Our Digital Memory (2): Strategic Agenda 2010-2013 for Long-Term Access to Digital Resources

The Netherlands Coalition for Digital Preservation has released A Future for Our Digital Memory (2): Strategic Agenda 2010-2013 for Long-Term Access to Digital Resources

Here's an excerpt from the announcement:

The document proposes a dual-axis approach: on the one hand collaboration within domains and information chains must be strengthened. This process is to be facilitated by so-called network leaders: the National Archives for public records, the KB, National Library of the Netherlands, for scholarly publications, Data Archiving and Networked Services for research data and the Netherlands Institute for Sound and Vision for media. A fifth network leader for cultural heritage institutions such as museums, is yet to be announced. The NCDD itself is to facilitate cross-domain cooperation and knowledge exchanges.

See also A Future for Our Digital Memory (1): Permanent Access to Information in the Netherlands.

Presentations from Computer Forensics and Born-Digital Content in Cultural Heritage Collections Meeting

The Maryland Institute for Technology in the Humanities has released presentations from the Computer Forensics and Born-Digital Content in Cultural Heritage Collections meeting.

Here's an excerpt from the meeting's background document:

While such [computer forensics] activities may seem (happily) far removed from the concerns of the cultural heritage sector, the methods and tools developed by forensics experts represent a novel approach to key issues and challenges in the archives community. Libraries, special collections, and other repositories increasingly receive computer storage media (and sometimes entire computers) as part of their acquisition of "papers" from contemporary artists, writers, musicians, government officials, politicians, scholars, and other public figures. Cell phones, e-readers, and other data-rich devices will surely follow. The same forensics software that indexes a criminal suspect's hard drive allows the archivist to prepare a comprehensive manifest of the electronic files a donor has turned over for accession; the same hardware that allows the forensics specialist to create an algorithmically authenticated "image" of a file system allows the archivist to ensure the integrity of digital content once committed to an institutional repository; the same data recovery procedures that allow the specialist to discover, recover, and present as trial evidence an "erased" file may allow a scholar to reconstruct a lost or inadvertently deleted version of an electronic manuscript—and do so with enough confidence to stake reputation and career.

Presentations from the Changing Role Of Libraries in Support of Research Data Activities: A Public Symposium

The Board on Research Data and Information has released presentations from the Changing Role Of Libraries in Support of Research Data Activities: A Public Symposium.

Presentations included:

  • Deanna Marcum, Library of Congress: The Role of Libraries in Digital Data Preservation and Access—The Library of Congress Experience
  • Betsy Humphreys, National Library of Medicine: More Data, More Use, Less Lead Time: Scientific Data Activities at the National Library of Medicine
  • Joyce Ray, Institute for Museum and Library Services: Libraries in the New Research Environment
  • Karla Strieb, Association of Research Libraries: Supporting E-Science: Progress at Research Institutions and Their Libraries
  • Christine Borgman, UC, Los Angeles: Why Data Matters to Librarians—and How to Educate the Next Generation

Read more about it at "National Academies Sees Libraries as Leaders in Data Preservation."

The Idea of Order: Transforming Research Collections for 21st Century Scholarship

The Council on Library and Information Resources has released The Idea of Order: Transforming Research Collections for 21st Century Scholarship.

Here's an excerpt from the announcement:

The Idea of Order explores the transition from an analog to a digital environment for knowledge access, preservation, and reconstitution, and the implications of this transition for managing research collections. The volume comprises three reports. The first, "Can a New Research Library be All-Digital?" by Lisa Spiro and Geneva Henry, explores the degree to which a new research library can eschew print. The second, "On the Cost of Keeping a Book," by Paul Courant and Matthew "Buzzy" Nielsen, argues that from the perspective of long-term storage, digital surrogates offer a considerable cost savings over print-based libraries. The final report, "Ghostlier Demarcations," examines how well large text databases being created by Google Books and other mass-digitization efforts meet the needs of scholars, and the larger implications of these projects for research, teaching, and publishing.

JISC Project Report: Digitisation Programme: Preservation Study, April 2009

JISC, the Digital Preservation Coalition, Portico, and the University of London Computer Centre have released JISC Project Report: Digitisation Programme: Preservation Study, April 2009.

Here's an excerpt from the announcement:

The digital universe grew by 62% in 2009, but those adding to these resources need to think long term if they want to make best use of their public funding. Clearly stated preservation policies are essential in guaranteeing that researchers in the future will be able to access and use a digital resource, according to a new report funded by JISC. But the responsibility needs to be shared between funders, who must articulate the need for data curation, and universities, who need to implement a preservation policy for each digital collection. . . .

Alastair Dunning, programme manager at JISC, said: "Although our initial goal was to examine our own projects, the recommendations and outcomes are relevant to funders and projects in many different sectors."

Dr William Kilbride, Executive Director of the Digital Preservation Coalition, said: "JISC challenged us to work in fine detail and in broad strokes at the same time. We immersed ourselves in the detail of sixteen different projects with a brief to support these projects and use that experience for a strategic and lasting contribution based on hard empirical evidence."

The results of this work published today contain recommendations for institutions, funders and those assessing funding projects and programmes. The authors anticipate that the template used to survey the projects could also form a useful blueprint for funders and assessors in the future.

Digital Preservation: Data-PASS Project Gets Matching IMLS Support for $1.6 Million Project

The Data-PASS Project has been given "one-to-one matching funds for the $1.6 million dollar project" by the Institute of Museum and Library Services.

Here's an excerpt from the press release:

The Institute of Museum and Library Services has generously supported members of the Data-PASS Alliance through an award to develop a policy-based archival replication system for libraries, archives and museums. . . .

The archival community has largely recognized that a geographically – and organizationally – distributed approach is necessary to minimize long-term risks to digital materials. The new system will provide a way to ensure that replicated collections are both institutionally and geographically distributed and to allow for the development of increasingly measurable and auditable trusted repository requirements. This result will be to enable any library, museum or archive to audit its content across an existing LOCKSS network and will allow groups of collaborating institutions to automatically and verifiably replicate each others' content.

The Data-PASS partnership was established as part of a previously funded Library of Congress NDIIPP program and the replication system builds upon a prototype developed through that project. Data-PASS network model

Tools and training to facilitate the creation of archival replication policies and the auditing and management of a replication network will be released this year. We will also release extensions to the Dataverse Network System that enable curators of dataverse virtual archives to easily participate in these replication networks. These tools will be distributed as open source, and as self-contained packages for non-technical users.

Digital Preservation: Open Planets Foundation Established

The four-year Planets (Preservation and Long-term Access through Networked Services) project is ending. To build on its work, the Open Planets Foundation has been established.

The initial members are:

  • Austrian Institute of Technology
  • The British Library
  • Det Kongelige Bibliotek (The Royal Library of Denmark)
  • Koninklijke Bibliotheek (The National Library of the Netherlands)
  • Microsoft Research Limited
  • Nationaal Archief (The Dutch National Archives)
  • Österreichische Nationalbibliothek (The Austrian National Library)
  • Statsbiblioteket (The State & University Library, Denmark)
  • Tessella plc

Read more about it at "Welcome to the Open Planets Foundation."

Planets Project Deposits "Digital Genome" Time Capsule in Swiss Fort Knox

The Planets project has deposited a "Digital Genome" time capsule in the Swiss Fort Knox.

Here's an excerpt from the press release:

Over the last decade the digital age has seen an explosion in the rate of data creation. Estimates from 2009 suggest that over 100 GB of data has already been created for every single individual on the planet ranging from holiday snaps to health records—that's over 1 trillion CDs worth of data, equivalent to 24 tons of books per person!

Yet research by the European Commission co-funded Planets project, co-ordinated by the British Library, highlights deep concerns regarding the preservation of these digital assets. Findings suggest that as hardware and software are superseded by more up-to-date technology, and older formats become increasingly inaccessible, the EU alone is losing over 3 billion euros worth of digital information every year.

Looking to ensure the preservation of our digital heritage, on 18 May 2010 the Planets project will deposit a time capsule containing a record of the "Digital Genome" inside Swiss Fort Knox—a high security digital storage facility hidden deep in the Swiss Alps—preserving the information and the tools to reconstruct highly valuable data long after the lifeline of supporting technology has disappeared.

Inside the Digital Time Capsule:

  • Five major at risk formats—JPEGs, JAVA source code, .Mov files, websites using HTML, and PDF documents
  • Versions of these files stored in archival standard formats—JPEG2000, PDFA, TIFF and MPEG4—to prolong lifespan for as long as possible
  • 2500 additional pieces of data—mapping the genetic code necessary to describe how to access these file formats in future
  • Translations of the required code into multiple languages to improve chances of being able to interpret in the future
  • Copies of all information stored on a complete range of storage media—from CD, DVD, USB, Blu-Ray, Floppy Disc, and Solid State Hard Drives to audio tape, microfilm and even paper print outs . . .

Since 2007 the volume of data produced globally has risen from 281 exabytes to over 700 exabytes—much of this is now considered to be at risk from the repeated discontinuation of storage formats and supporting software. Current studies suggest that common storage formats such as CDs and DVDs have an average life expectancy of less than 20 years, yet the proprietary file formats to access content often last as little as five to seven years and desktop hardware even less. Backing up this data is a start, but without the information and tools to access and read historical digital material it is clear huge gaps will open up in our digital heritage.

To meet this threat, in 2006 the European Commission established the Planets project—Preservation and Long-term Access through Networked Services—bringing together a coalition of European libraries, archives, research organisations, and technology institutions including the Austrian National Library, the University of Technology of Vienna, and the British Library to develop the software solutions to guarantee long-term access. Marking the end of the first phase of the project the deposit of the Planets "Digital Genome" in Swiss Fort Knox will help to highlight the fragility of modern data and help to protect our digital heritage from a whole range of human, environmental and technological risks.