NSF Solicits Grant Proposals for up to $20 Million for Dataset Access and Preservation

National Science Foundation's Office of Cyberinfrastructure has announced the availability of grants to U.S. academic institutions under its Sustainable Digital Data Preservation and Access Network Partners (DataNet) program.

Here's an excerpt from the solicitation:

Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

The new types of organizations envisioned in this solicitation will integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise to:

  • provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
  • continuously anticipate and adapt to changes in technologies and in user needs and expectations;
  • engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and
  • serve as component elements of an interoperable data preservation and access network.

By demonstrating feasibility, identifying best practices, establishing viable models for long term technical and economic sustainability, and incorporating frontier research, these exemplar organizations can serve as the basis for rational investment in digital preservation and access by diverse sectors of society at the local, regional, national, and international levels, paving the way for a robust and resilient national and global digital data framework.

These organizations will provide:

  • a vision and rationale that meet critical data needs, create important new opportunities and capabilities for discovery, innovation, and learning, improve the way science and engineering research and education are conducted, and guide the organization in achieving long-term sustainability;
  • an organizational structure that provides for a comprehensive range of expertise and cyberinfrastructure capabilities, ensures active participation and effective use by a wide diversity of individuals, organizations, and sectors, serves as a capable partner in an interoperable network of digital preservation and access organizations, and ensures effective management and leadership; and
  • activities to provide for the full data management life cycle, facilitate research as resource and object, engage in computer science and information science research critical to DataNet functions, develop new tools and capabilities for learning that integrate research and education at all levels, provide for active community input and participation in all phases and all aspects of Partner activities, and include a vigorous and comprehensive assessment and evaluation program.

Potential applicants should note that this program is not intended to support narrowly-defined, discipline-specific repositories. . . .

Award Information

Anticipated Type of Award: Cooperative Agreement

Estimated Number of Awards: 5 — Two to three awards are anticipated in each of two review cycles (one review cycle for fiscal year FY2008 awards and one for FY2009) for a total of five awards, contingent on the quality of proposals received and pending the availability of funds. Each award is limited to a total of up to $20,000,000 (direct plus indirect costs) for up to 5 years. The initial term of each award is expected to be 5 years with the potential at NSF's sole discretion for one terminal renewal for another 5 years, subject to performance and the availability of funds. Such performance is to include serving the needs of the relevant science and engineering research and education communities and catalyzing new opportunities for progress. If a second five-year award is made, NSF funding is expected to decrease in each successive year of the award as the Partner transitions to a sustainable economic model with other sources of support. The actual amount of the annual decrease in NSF support will be established through the cooperative agreement. Note that the maximum period NSF will support a DataNet Partner is 10 years.

Anticipated Funding Amount: $100,000,000 — Up to $100,000,000 over a five year period is expected to be available contingent on the quality of proposals received and pending the availability of funds.

2007 Digital Preservation Award Goes to DROID

The Digital Preservation Coalition has given its 2007 Digital Preservation Award to the The National Archives (UK) for its DROID (Digital Record Object Identification) software.

Here's an excerpt from the press release:

An innovative tool to analyse and identify computer file formats has won the 2007 Digital Preservation Award. DROID, developed by The National Archives in London, can examine any mystery file and identify its format. The tool works by gathering clues from the internal 'signatures' hidden inside every computer file, as well as more familiar elements such as the filename extension (.jpg, for example), to generate a highly accurate 'guess' about the software that will be needed to read the file.

Identifying file formats is a thorny issue for archivists. . . . But with rapidly changing technology and an unpredictable hardware base, preserving files is only half of the challenge. There is no guarantee that today's files will be readable or even recognisable using the software of the future.

Now, by using DROID and its big brother, the unique file format database known as PRONOM, experts at the National Archives are well on their way to cracking the problem. Once DROID has labelled a mystery file, PRONOM's extensive catalogue of software tools can advise curators on how best to preserve the file in a readable format. The database includes crucial information on software and hardware lifecycles, helping to avoid the obsolescence problem. And it will alert users if the program needed to read a file is no longer supported by manufacturers.

What’s New in Digital Preservation Published

The Digital Preservation Coalition and the National Library of Australia’s PADI program have published the the 16th issue of What’s New in Digital Preservation.

Here’s an excerpt from the padi-forum announcement:

Issue 16 features news from a range of organisations and initiatives, including the Digital Preservation Coalition (DPC), Digital Curation Centre (DCC), JISC (UK), The National Archives (UK), DigitalPreservationEurope, nestor, the Koninklijke Bibliotheek (National Library of the Netherlands), the US National Digital Information Infrastructure and Preservation Program (NDIIPP), and the PLANETS and CASPAR projects.

Xena 4.0: Open Source Digital Preservation Software

The National Archives of Australia has released Xena 4.0, which is open source digital preservation software.

Here's a brief description of its capabilities from the project homepage:

Xena software aids digital preservation by performing two important tasks:

  • Detecting the file formats of digital objects
  • Converting digital objects into open formats for preservation

Blue Ribbon Task Force on Sustainable Digital Preservation and Access

Fran Berman, director of the San Diego Supercomputer Center, and Brian Lavoie, a research scientist at OCLC, have been named co-chairs of a Blue Ribbon Task Force on Sustainable Digital Preservation and Access, which is being funded by the National Science Foundation and the Andrew W. Mellon Foundation. The Library of Congress, the National Archives and Records Administration, the Council on Library and Information Resources, and JISC will also be involved in the task force.

Here's an excerpt from the press release:

Berman and co-chair Brian Lavoie . . . will convene an international group of prominent leaders to develop actionable recommendations on economic sustainability of digital information for the science and engineering, cultural heritage, academic, public, and private sectors. The Task Force is expected to meet over the next two years and gather testimony from a broad set of thought leaders in preparation for the Task Force’s Final Report. . . .

The Task Force will bring together a group of national and international leaders who will focus attention on this critical grand challenge of the Information Age. Task Force members will represent a cross-section of fields and disciplines including information and computer sciences, economics, entertainment, library and archival sciences, government, and business. Over the next two years, the Task Force will convene a broad set of international experts from the academic, public and private sectors who will participate in quarterly panels and discussions. . . .

In its final report, the Task Force is charged with developing a comprehensive analysis of current issues, and actionable recommendations for the future to catalyze the development of sustainable resource strategies for the reliable preservation of digital information. During its tenure, the Task Force also will produce a series of articles about the challenges and opportunities of digital information preservation, for both the scholarly community and the public.

Leslie Carr on What to Do with Dead Repositories

In his "Decommissioning Repositories" posting, EPrints guru Leslie Carr grapples with the issue of what to do with repositories that have served their purpose and that no one wants to maintain.

Here's an excerpt:

But now the party's over, there is no more funding, and none of the partner institutions has offered to keep the repository going in perpetuity. Not even the hosting institution or the ex-manager wants to keep their repositories going. We know that even if we don't turn them off their hosting hardware will fail in a few of years. That sounds like very bad news because a repository is supposed to be forever! Was it irresponsible to create these repositories in the first place? Should it be forbidden to create a public repository whose life is guaranteed to be less than a decade? Or perhaps that should be factored into the original policy-making—"this repository and all its contents are guaranteed up to 31st December 2017 but not after." If that were machine readable then the community could have decided whether they want to mirror the collection, or selected bits of it.

Source: Carr, Leslie. "Decommissioning Repositories." RepositoryMan, 10 September 2007.

LIFE (Life Cycle Information for E-Literature) Project

LIFE (Life Cycle Information for E-Literature) is a joint, JISC-funded project of the University College London Library Services and the British Library that is investigating life cycle issues involved in collecting and preserving digital materials.

Here's an excerpt from the home page:

The LIFE Project has developed a methodology to model the digital lifecycle and calculate the costs of preserving digital information for the next 5, 10 or 100 years. For the first time, organisations can apply this process and plan effectively for the preservation of their digital collections.

Currently the LIFE Project is in its second phase ("LIFE2"), an 18 month project running from March 2007 to August 2008.

Documentation from the first and second phases of the project is available.

The project has just established a weblog.

AONS: Scanning Repositories for Obsolete Digital Formats

The APSR AONS II project has released a beta version of the Automatic Obsolescence Notification System (AONS).

Here's an excerpt from the announcement on apsr_announcements:

Users can register with the service by providing a URL to a repository's format scan summary. The AONS service will display the summary and allow a repository manager to compare the formats of items in their repository with information from format registries such as PRONOM and Library of Congress. These registries flag any formats that are likely to become obsolete. Repository managers can then make curation decisions about any items at risk, such as upgrading their formats.

By downloading and installing an AONS locally, an institution can also take advantage of a pilot risk metrics implementation. . . .

The AONS software is the result of the AONS II project funded under APSR and developed by David Pearson, David Levy and Matthew Walker from the National Library of Australia (NLA) with an administrative user interface developed by David Berriman at ANU.

The software is able to be downloaded from Sourceforge at http://sourceforge.net/projects/aons and a mailing list is also available for support and feedback. As this is a beta release we welcome feedback to the Sourceforge mailing list to inform our testing which will continue until mid-September.

Please try out the pilot service by sending an email to cosi@apsr.edu.au to register with the service, and tell us which institution you are from. . . .

Portico Studying E-Book Preservation

Portico is launching a e-Book preservation study, which will last the rest of the year.

Here's an excerpt from the press release:

In response to several requests from publishers and libraries, Portico is conducting a study in order to assess how to extend its archival infrastructure and service to respond to the emerging need to preserve e-books. During the study we will analyze the structure and preservation needs of e-books and determine what adjustments to Portico's existing, operational and technological infrastructure and the economic model developed to support e-journal preservation might be required in order to respond to this new genre. Portico's e-journal archiving service was developed through a pilot project that drew heavily upon engagement with publisher and library pilot participants. We anticipate that a similar process will be essential in understanding how best to respond to the challenges of e-book preservation. . . .

The current participants in the E-Book Preservation study include:

Publishers

  • American Math Society
  • Elsevier
  • Morgan Claypool
  • Taylor and Francis

Libraries

  • Case Western Reserve University
  • Cornell University Library
  • McGill University
  • SOLINET
  • Texas University Libraries
  • University College of London
  • Yale University Library

Official Release of the kopal Library for Retrieval and Ingest

The German National Library and SUB Göttingen have announced the official release of the kopal Library for Retrieval and Ingest on diglib.

Here's an excerpt from the message:

The kopal project (Co-operative Development of a Long-term Digital Information Archive) was dedicated to find a solution to providing not only bitstream preservation but long-term accessibility as well in the form of a cooperatively developed and operated long-term archive for digital data. The German National Library, the Goettingen State and University Library, the Gesellschaft fuer wissenschaftliche Datenverarbeitung mbH Goettingen, and IBM Germany have been working in close cooperation on a technological solution. The now released software tools mark the successful development of such an archiving solution.

The Open-Source-Software koLibRI is a framework to integrate a long term preservation system as the IBM Digital Information Archiving System (DIAS) into the infrastructure of any institution. In particular, koLibRi organizes the creation and the import of Archival Information Packages into DIAS, and offers functions to retrieve and to govern them. Preservation methods like data customization and migration of data are part of the tasks of long term preservation. koLibRi Version 1.0 provides modules that manage future migration procedures. koLibRI Version 1.0 provides a completely functional and stable condition. Nevertheless, in the context of connecting new partners to the existing long term preservation system, the software will be constantly adjusted to the needs of different partners.

A documentation has been published with the conclusive release that describes the installation and the adjustment of a functional koLibRi-system and the basic internal layout to make individual development possible. The described release is offered for free download. . . .

100 Year Archive Requirements Survey

The Storage Networking Industry Association has released the 100 Year Archive Requirements Survey. Access requires registration.

Here's an excerpt from the "Survey Highlights":

  • 80% of respondents declared they have information they must keep over 50 years and 68% of respondents said they must keep it over 100 years. . . .
  • Long-term generally means greater than 10 to 15 years—the period beyond which multiple migrations take place and information is at risk. . .
  • Database information (structured data) was considered to be most at risk of loss. . .
  • Over 40% of respondents are keeping e-Mail records over 10 years. . . .
  • Physical migration is a big problem. Only 30% declared they were doing it correctly at 3-5 year intervals. . . .
  • 60% of respondents say they are ‘highly dissatisfied’ that they will be able to read their retained information in 50 years. . .
  • Help is needed—current practices are too manual, too prone to error, too costly and lack adequate coordination across the organization. . . .

Preserving the Digital Heritage: Principles and Policies

The Netherlands National Commission for UNESCO and the European Commission on Preservation and Access have published Preserving the Digital Heritage: Principles and Policies.

Here's an excerpt from the "Preface":

In November 2005, the Netherlands National Commission for UNESCO, in collaboration with the Koninklijke Bibliotheek (National Library of the Netherlands) and UNESCO’s Information Society Division, organized a conference entitled Preserving the Digital Heritage (The Hague, The Netherlands, 4-5 November 2005). It focused on two important issues: the selection of material to be preserved, and the division of tasks and responsibilities between institutions. This publication contains the four speeches given by the keynote speakers, preceded by a synthesis report of the conference.

Australian Framework and Action Plan for Digital Heritage Collections

The Collections Council of Australia Ltd. has released Australian Framework and Action Plan for Digital Heritage Collections, Version 0.C3 for comment.

Here's an excerpt from the document:

This is the Collections Council of Australia's plan to prepare an Australian framework for digital heritage collections. It brings together information shared by people working in archives, galleries, libraries and museums at a Summit on Digital Collections held in 2006. It proposes an Action Plan to address issues shared by the Australian collections sector in relation to current and future management of digital heritage collections.

Curation of Scientific Data: Challenges for Institutions and Their Repositories Podcast

A podcast of Chris Rusbridge’s "Curation of Scientific Data: Challenges for Institutions and their Repositories" presentation at The Adaptable Repository conference is now available. Rusbridge is Director of the Digital Curation Centre in the UK.

The PowerPoint for the presentation is also available.

Report of the Sustainability Guidelines for Australian Repositories Project (SUGAR)

The Australian Partnership for Sustainable Repositories (APSR) has released Report of the Sustainability Guidelines for Australian Repositories Project (SUGAR).

Here’s an excerpt from the report:

The Sustainability Guidelines for Australian Repositories service (SUGAR)was intended to support people working in tertiary education institutions whose activities do not focus on digital preservation. The target community creates and digitises content for a range of purposes to support learning, teaching and research. While some have access to technical and administrative support many others may not be aware of what they need to know. The typical SUGAR user may have little interest in discussions surrounding metadata, interoperability or digital preservation, and may simply want to know the essential steps involved in achieving the task at hand.

A key challenge for SUGAR was to provide a suitable level and amount of information to meet the immediate focus of the user and their level of expertise while introducing and encouraging consideration of issues of digital sustainability. SUGAR was also intended to stand alone as an online service unsupported by a helpdesk.

Towards an Open Source Repository and Preservation System

The UNESCO Memory of the World Programme, with the support of the Australian Partnership for Sustainable Repositories, has published Towards an Open Source Repository and Preservation System: Recommendations on the Implementation of an Open Source Digital Archival and Preservation System and on Related Software Development.

Here’s an excerpt from the Executive Summary and Recommendations:

This report defines the requirements for a digital archival and preservation system using standard hardware and describes a set of open source software which could used to implement it. There are two aspects of this report that distinguish it from other approaches. One is the complete or holistic approach to digital preservation. The report recognises that a functioning preservation system must consider all aspects of a digital repositories; Ingest, Access, Administration, Data Management, Preservation Planning and Archival Storage, including storage media and management software. Secondly, the report argues that, for simple digital objects, the solution to digital preservation is relatively well understood, and that what is needed are affordable tools, technology and training in using those systems.

An assumption of the report is that there is no ultimate, permanent storage media, nor will there be in the foreseeable future. It is instead necessary to design systems to manage the inevitable change from system to system. The aim and emphasis in digital preservation is to build sustainable systems rather than permanent carriers. . . .

The way open source communities, providers and distributors achieve their aims provides a model on how a sustainable archival system might work, be sustained, be upgraded and be developed as required. Similarly, many cultural institutions, archives and higher education institutions are participating in the open source software communities to influence the direction of the development of those softwares to their advantage, and ultimately to the advantage of the whole sector.

A fundamental finding of this report is that a simple, sustainable system that provides strategies to manage all the identified functions for digital preservation is necessary. It also finds that for simple discrete digital objects this is nearly possible. This report recommends that UNESCO supports the aggregation and development of an open source archival system, building on, and drawing together existing open source programs.

This report also recommends that UNESCO participates through its various committees, in open source software development on behalf of the countries, communities, and cultural institutions, who would benefit from a simple, yet sustainable, digital archival and preservation system. . . .

The University of Maine and Two Public Libraries Adopt Emory’s Digitization Plan

Library Journal Academic Newswire reports that the University of Maine, the Toronto Public Library, and the Cincinnati Public Library will follow Emory University’s lead and digitize public domain works utilizing Kirtas scanners with print-on-demand copies being made available via BookSurge. (Also see the press release: "BookSurge, an Amazon Group, and Kirtas Collaborate to Preserve and Distribute Historic Archival Books.")

Source: "University of Maine, plus Toronto and Cincinnati Public Libraries Join Emory in Scan Alternative." Library Journal Academic Newswire, 21 June 2007.

Emory Will Use Kirtas Scanner to Digitize Rare Books

Emory University’s Woodruff Library will use a Kirtas robotic book scanner to digitize rare books and to create PDF files that will be made available on the Internet and sold as print-on-demand books on Amazon.

Here’s an excerpt from the press release:

"We believe that mass digitization and print-on-demand publishing is an important new model for digital scholarship that is going to revolutionize the management of academic materials," said Martin Halbert, director for digital programs and systems at Emory’s Woodruff Library. "Information will no longer be lost in the mists of time when books go out of print. This is a way of opening up the past to the future."

Emory’s Woodruff Library is one of the premier research libraries in the United States, with extensive holdings in the humanities, including many rare and special collections. To increase accessibility to these aging materials, and ensure their preservation, the university purchased a Kirtas robotic book scanner, which can digitize as many as 50 books per day, transforming the pages from each volume into an Adobe Portable Document Format (PDF). The PDF files will be uploaded to a Web site where scholars can access them. If a scholar wishes to order a bound, printed copy of a digitized book, they can go to Amazon.com and order the book on line.

Emory will receive compensation from the sale of digitized copies, although Halbert stressed that the print-on-demand feature is not intended to generate a profit, but simply help the library recoup some of its costs in making out-of-print materials available.

ALCTS PARS Defining Digital Preservation Weblog

The Preservation and Reformatting Section (PARS) of the Association for Library Collections & Technical Services (ALCTS) has started the Defining Digital Preservation Weblog to get feedback on the efforts of a working group that has the following charge: "to draft a definition for digital preservation that would be suitable for the needs of PARS and available to support the work of ALCTS and ALA, for use on the web, in policy statements, and other documents."