DCC Briefing Paper on "Interoperability"

The Digital Curation Centre has released a new briefing paper on "Interoperability."

Here's an excerpt:

Interoperability is the transfer and use of information in a uniform and efficient manner across multiple organisations and IT systems. Its purpose is to create a shared understanding of data

Data exchange requires the data to be semantically matched (i.e. ensuring that the data describe the same thing) and for any differences in representation within the data models to be eliminated or meaningfully handled. Data integration is the process which takes heterogeneous data and their structural information and produces a unified description and mapping information to allow seamless access to all existing data. Interpretation of these data must be unambiguous. More generally, interoperability goes beyond data compatibility as we also need interoperable hardware, software, and communication protocols to allow data to be interpreted correctly and unambiguously across system or organisational boundaries

Digital Preservation: JHOVE2 Functional Requirements Version 1.3 Released

JHOVE2 Functional Requirements version 1.3 has been released. (Thanks to the File Formats Blog.)

Here's an excerpt from the JHOVE Project Scope:

JHOVE has proven to be a successful tool for format-specific digital object identification, validation, and characterization, and has been integrated into the workflows of most major international preservation institutions and programs. Using an extensible plug-in architecture, JHOVE provides support for a variety of digital formats commonly used to represent audio, image, and textual content.

Issue 19: What's New in Digital Preservation?

Issue 19 of What's New in Digital Preservation? has been published.

Here's an excerpt from the announcement:

Issue 19 features news from a range of organisations and initiatives, including the Digital Preservation Coalition (DPC), Digital Curation Centre (DCC), JISC (UK), The British Library (BL), PLANETS (Preservation and Long-term Access through Networked Services), Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval (CASPAR), University of London Computing Centre (ULCC), Alliance for Permanent Access, The Library of Congress and the National Digital Information Infrastructure and Preservation Program (NDIPP) and The National Archives (TNA).

Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation

The Blue Ribbon Task Force on Sustainable Digital Preservation and Access has released Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation. Interim Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.

Here's an excerpt:

During 2008, as the Task Force heard testimony from a broad spectrum of institutions and enterprises with deep experience in digital access and preservation, two things became clear: First, the problem is urgent. Access to data tomorrow requires decisions concerning preservation today. Imagine future biological research without a long-term strategy to preserve the Protein Data Bank (PDB), a digital collection that drives new insights into human systems and drug therapies for disease, and represents an investment of 100 billion dollars in research funding over the last 37 years. Decisions about the future of the PDB and other digital reference collections—how they will be migrated to future information technologies without interruption, what kind of infrastructure will protect their digital content against damage and loss of data, and how such efforts will be supported—must be made now to drive future innovation.

Second, the difficulty in identifying appropriate economic models is not just a matter of finding funding or setting a price. In many institutions and enterprises, systemic challenges create barriers for sustainable digital access and preservation.

Also see the related document: A Selective Literature Review on Digital Preservation Sustainability.

DCC Releases "Database Archiving"

The Digital Curation Centre has released a new briefing paper on "Database Archiving."

Here's an excerpt:

Database archiving is usually seen as a subset of data archiving. In a computational context, data archiving means to store electronic documents, data sets, multimedia files, and so on, for a period of time. The primary goal is to maintain the data in case it is later requested for some particular purpose. Complying with government regulations on data preservation are for example a main driver behind data archiving efforts. Database archiving focuses on archiving data that are maintained under the control of a database management system and structured under a database schema, e.g., a relational database.

OCLC and HathiTrust to Collaborate on Enhancing Access to Digital Repository Materials

OCLC and HathiTrust, a shared digital repository for research libraries, will collaborate on improving access to materials in HathiTrust's repository.

Here's an excerpt from the press release:

HathiTrust, a group of some of the largest research libraries in the United States collaborating to create a repository of their vast digital collections, and OCLC will work together to increase visibility of and access to items in the HathiTrust’s shared digital repository.

Launched jointly by the 12-university consortium known as the Committee on Institutional Cooperation and the 11 university libraries of the University of California system, HathiTrust leverages the time-honored commitment to preservation and access to information that university libraries have valued for centuries. The group's digital collections, including millions of books, will be archived and preserved in a single repository hosted by HathiTrust. Materials in the public domain and those where rightsholders have given permission will be available for reading online.

OCLC and HathiTrust will work together to increase online visibility and accessibility of the digital collections by creating WorldCat records describing the content and linking to the collections via WorldCat.org and WorldCat Local. The organizations will launch a project in the coming months to develop specifications and determine next steps.

JSTOR and Ithaka Merge

JSTOR and Ithaka Merge have merged and they are now known as Ithaka.

Here's an excerpt from the press release:

JSTOR was founded in 1995 by The Andrew W. Mellon Foundation as a shared digital library to help academic institutions save costs associated with the storage of library materials and to vastly improve access to scholarship. Today, more than 5,200 academic institutions and 600 scholarly publishers and content owners participate in JSTOR. Ithaka was started in 2003 by Kevin Guthrie, the original founder of JSTOR, with funding from the Mellon Foundation as well as The William and Flora Hewlett Foundation and Stavros S. Niarchos Foundation. Ithaka was established to aid promising not-for-profit digital initiatives and to provide research and insight on important strategic issues facing the academic community. Ithaka has become known for its influential reports including the 2007 University Publishing in A Digital Age and the 2008 Sustainability and Revenue Models for Online Academic Resources. It is the organizational home to Portico, a digital preservation service, and NITLE, a suite of services supporting the use of technology in liberal arts education.

The new combined enterprise will be called Ithaka and will be dedicated to helping the academic community use digital technologies to advance scholarship and teaching and to reducing system-wide costs through collective action.

This is a natural step for these organizations. JSTOR and Ithaka already work closely together, sharing a common history, values, and fundamental purpose. During 2008, the Ithaka-incubated resource Aluka was integrated into JSTOR as an initial step, further strengthening ties between the organizations. JSTOR will now join Portico and NITLE as a coordinated set of offerings made available under the Ithaka organizational name. . .

In addition to JSTOR, Portico, and NITLE, Ithaka's existing research and strategic services groups will remain important parts of the enterprise. The board will be composed of Ithaka and JSTOR Trustees, with Henry Bienen, President of Northwestern University, serving as Chairman and Paul Brest, President of the Hewlett Foundation as Vice Chairman.

"Digital Project Staff Survey of JPEG 2000 Implementation in Libraries"

David Lowe and Michael J. Bennett, both of the University of Connecticut Libraries, have made "Digital Project Staff Survey of JPEG 2000 Implementation in Libraries" available in DigitalCommons@UConn.

Here's an excerpt from the abstract:

JPEG 2000 is the product of thorough efforts toward an open standard by experts in the imaging field. With its key components for still images published officially by the ISO/IEC by 2002, it has been solidly stable for several years now, yet its adoption has been considered tenuous enough to cause imaging software developers to question the need for continued support. Digital archiving and preservation professionals must rely on solid standards, so in the fall of 2008 we undertook a survey among implementers (and potential implementers) to capture a snapshot of JPEG 2000’s status, with an eye toward gauging its perception in our community.

The survey results reveal several key areas that JPEG 2000’s user community will need to have addressed in order to further enhance adoption of the standard, including perspectives from cultural institutions that have adopted it already, as well as insights from institutions that do not currently have it in their workflows. Current users are concerned about limited compatible software capabilities with an eye toward needed enhancements. They realize also that there is much room for improvement in the area of educating and informing the cultural heritage community about the advantages of JPEG 2000. A small set of users, in addition, alerts us to serious problems of cross-codec consistency and relate file validation issues that would likely be easily resolved given a modicum of collaborative attention toward standardization.

Digital Curation Centre Releases "Archiving Web Resources"

The Digital Curation Centre has released "Archiving Web Resources," as part of its DCC Digital Curation Manual.

Here's the abstract:

The World Wide Web is among the most important information resources, and is certainly the most voluminous. In a relatively short time, it has become a vital medium for a range of academic and commercial publishers. However, until recently, little effort has been directed towards ensuring the long term preservation of the digital assets that reside on-line. The web's dynamic nature makes it prone to frequent changes, and without a means for capture and preservation it's likely that vast quantities of content will be lost forever. Since the web is home to a vast range of materials with widely varying characteristics in terms of formats, scale and behaviour there are inevitable issues that must be overcome to facilitate their collection, management and preservation.

William Kilbride Named Executive Director of the Digital Preservation Coalition

William Kilbride has been named Executive Director of the Digital Preservation Coalition.

Here's an excerpt from the announcement:

William has many years of experience in the digital preservation community. He is currently Research Manager for Glasgow Museums, where he has been involved in digital preservation and access aspects of Glasgow's museum collections, and in supporting the curation of digital images, sound recordings and digital art within the city's museums.

Previously he was Assistant Director of the Archaeology Data Service where he was involved in many digital preservation activities. He has contributed to workshops, guides and advice papers relating to digital preservation.

In the past William has worked with the DPC on the steering committee for the UK Needs Assessment, was a tutor on the Digital Preservation Training Programme and was a judge for the 2007 Digital Preservation Award.

Library of Congress to Scan 25,000th Book in Digitizing American Imprints Program

The Library of Congress will scan the 25,000th brittle book in its Digitizing American Imprints Program, which is supported by a $2 million grant from the Alfred P. Sloan Foundation.

Here's an excerpt from the press release:

The Library, which has contracted with the Internet Archive for digitization services, is combining its efforts with other libraries as part of the open content movement. The movement, which includes over 100 libraries, universities and cultural institutions, aims to digitize and make freely available public-domain books in a wide variety of subject areas.

Books scanned in this pilot project come primarily from the Library’s local history and genealogy sections of the General Collections. For many of these titles, only a few copies exist anywhere in the world, and a reader would need to travel to Washington to view the Library’s copy. . . .

All scanning operations are housed in the Library’s John Adams Building on Capitol Hill. Internet Archive staff work two shifts each day on 10 "Scribe" scanning stations. The operation can digitize up to 1,000 volumes each week. Shortly after scanning is complete, the books are available online at www.archive.org. Books can be read online or downloaded for more intensive study. The Library of Congress is actively working with the Internet Archive on the development of a full-featured, open-source page turner. A beta version, called the Flip Book, is currently available on the Internet Archive site.

PDF Beats Microformats for Long-Term Document Storage

An AIIM report, Content Creation and Delivery—The On-Ramps and Off-Ramps of ECM, indicates that PDF has surpassed microformats for long-term document storage.

Here's an excerpt from the press release:

Recent AIIM research found that 90% of organizations are using the PDF file format for long-term storage of scanned documents, and 89% are converting Office files to PDF for distribution and archive. Not surprisingly, paper is currently used by 100% of organizations, but when asked to predict the situation in 5 years time, use of paper for long-term storage dropped to 77%, whereas PDF rose to 93%. . . .

Time-honored storage on microfilm or fiche is still used by 43% of organizations, but this is expected to drop to 28% over the next five years. At the other end of the media spectrum, 34% of organizations are archiving digital video, rising to a projected 47% in 5 years. Digital audio archiving will rise from 30% to 37%.

Larry Carver Named Digital Preservation Pioneer

The National Digital Information Infrastructure and Preservation Program at the Library of Congress has named Larry Carver, retired Director of Library Technologies and Digital Initiatives at University of California at Santa Barbara, as a digital preservation pioneer.

Here's an excerpt from the UCSB press release:

"We at the UCSB Library are thrilled that Larry Carver has received this important and well-deserved recognition," said Brenda Johnson, university librarian. "His tireless and innovative work in the development of the Map and Imagery Lab and the Alexandria Digital Library has brought international attention to our library and has benefited thousands of scholars, students, and members of the public from around the world. We offer him our heartiest congratulations on being named a Library of Congress ‘Pioneer of Digital Preservation.'" . . .

Carver began his career at the library where he helped build an impressive collection of maps, aerial photography, and satellite imagery that led to the development of the Map and Imagery Laboratory (MIL) in 1979. As the MIL collections grew, Carver felt that geospatial data presented a unique challenge to the library. He believed that coordinate-based collections should be managed differently than book-based collections. But not everyone agreed with him.

"It became apparent that handling traditional geospatial content in a typical library context was just not satisfactory and another means to control that data was important," he said. "It wasn't as easy as it sounds. I was in a very conservative environment, and they were not easily convinced that this was something a library should do."

Carver and others spent years developing an exhaustive set of requirements for building a geospatial information management system. The system had a number of innovative ideas. "We included traditional methods of handling metadata but also wanted to search by location on the Earth's surface," Carver said. "The idea was that if you point to a place on the Earth you could ask the question, 'What information do you have about that space?,' as opposed to a traditional way of having to know ahead of time who wrote about it."

An opportunity to develop that system arrived in 1994 when UCSB received funding from the National Science Foundation for Carver and his team to build the Alexandria Digital Library. "We produced the first operational digital library that was based on our research," Carver said. "Our concentration was to be able to develop a system that could search millions of records with latitude and longitude coordinates and present those results via the Internet."

The basic concepts behind the Alexandria Digital Library have been widely adopted by Google Earth, Wikipedia, and others. Carver couldn't be more delighted.

"I think it's wonderful," Carver said. "We weren't trying to be the only game in town. We were just trying to raise consciousness way back in the early 1980s that this was a viable way of handling geospatial material. This approach lets people interact with data in a realistic way without having a great deal of knowledge about an individual object. It was a new way of dealing with massive amounts of information in an environment that made finding and accessing information much easier."

Read more about it at "Digital Preservation Pioneer: Larry Carver."

Springer Digital Publications to be Archived in CLOCKSS

Springer Science+Business Media has announced that its digital publications will be archived in the dark CLOCKSS archive.

Here's an excerpt from the press release:

The CLOCKSS archive allows research libraries and scholarly publishers, who launched CLOCKSS as a pilot program, to preserve and store its electronic content. Once ingested, the econtent is kept safe and secure in a dark archive until it is triggered and the CLOCKSS Board determines that the content should be copied from the archive and made freely available to all, regardless of prior subscription. Due to the success of the pilot program, the founding members unanimously agreed to incorporate and invite others to participate in CLOCKSS.

Participating CLOCKSS libraries and publishers govern the archive themselves via three tiers of governance—an executive board, a board of directors, and an advisory council. Research libraries working alongside publishers like Springer are able to help shape policy and practice in their communities.

"In a great show of confidence, Springer has joined the CLOCKSS initiatives, putting its complete trust in an archive they helped build," says Gordon Tibbitts, Co-Chair of CLOCKSS. "Springer is helping to shoulder the responsibility, alongside its publishing peers and research library customers, of keeping their scholarly assets safe and protected for future generations of scholars." . . .

In addition to storing Springer’s journal content with CLOCKSS, the publisher has submitted a proposal to the CLOCKSS Board outlining a pilot project to test the feasibility and legal issues surrounding preservation of eBook content. Because eBook contracts differ from journal contracts, Springer can only deposit eBook files when its authors' rights are protected.

CLOCKSS is a joint venture between the world’s leading scholarly publishers and research libraries. Its mission is to build a sustainable, geographically distributed dark archive with which to ensure the long-term survival of Web-based scholarly publications for the benefit of the greater global research community. Governing Libraries include the Australian National University, EDINA at the University of Edinburgh, Indiana University, New York Public Library, OCLC Online Computer Library Center, Rice University, Stanford University, the University of Alberta, the University of Hong Kong and the University of Virginia. Governing Publishers include the American Medical Association, the American Physiological Society, bepress, Elsevier, IOP Publishing, Nature Publishing Group, Oxford University Press, SAGE Publications, Springer, Taylor & Francis and Wiley-Blackwell.

JISC-PoWR Releases Preservation of Web Resources Handbook

JISC-PoWR has released the Preservation of Web Resources Handbook.

Here's an excerpt:

The Handbook is structured in two parts. The first part deals with web resources and makes practical suggestions for their management, capture, selection, appraisal and preservation. It includes observations on web content management systems, and a list of available tools for performing web capture. It concludes with a discussion of Web 2.0 issues, and a range of related case studies. The second part is more focussed on web resources within an Institution. It offers advice about institutional drivers and policies for web archiving, along with suggestions for effecting a change within an organisation; one such approach is the adoption of Information Lifecycle Management. There are separate Appendices covering Legal guidance (written by Jordan Hatcher) and records management.

The Handbook also contains a bibliography and a glossary of terms. The Handbook is aimed at an audience of information managers, asset managers, webmasters, IT specialists, system administrators, records managers, and archivists.

First Digital Curation Centre SCARP Case Study Released on Brain Image Preservation

The first Digital Curation Centre SCARP (Sharing Curation and Re-use Preservation) case study has been released: Curating Brain Images in a Psychiatric Research Group: Infrastructure and Preservation Issues.

Here's the description:

Curating neuroimaging research data for sharing and re-use involves practical challenges for those concerned in its use and preservation. These are exemplified in a case study of the Neuroimaging Group in the University of Edinburgh’s Division of Psychiatry. The study is one of the SCARP series encompassing two aims; firstly to discover more about disciplinary approaches and attitudes to digital curation through 'immersion' in selected cases, in this case drawing on ethnographic field study. Secondly SCARP aims to apply known good practice, and where possible to identify new lessons from practice in the selected discipline areas; in this case using action research to assess risks to the long term reusability of datasets, and identify challenges and opportunities for change.

Database Preservation: The International Challenge and the Swiss Solution

DigitalPreservationEurope has released Database Preservation: The International Challenge and the Swiss Solution.

Here's the abstract:

Most administrative records are stored in databases. Today’s challenge is preserving the information and making it accessible for years to come, ensuring knowledge-transfer as well as administrative sustainability. Lack of standardization has hitherto rendered the task of archiving database content highly complex. The Swiss Federal Archives have developed a new XML based format which permits long-term preservation of the relational databases content. The Software-Independent Archiving of Relational Databases (short: SIARD) offers a unique solution for preserving data content, metadata as well as the relations in an ISO conform format.

Grant Awarded: DSpace Foundation and Fedora Commons for DuraSpace Planning

The DSpace Foundation and Fedora Commons have received a grant from the Andrew W. Mellon Foundation to support planning for DuraSpace.

Here's an excerpt from the press release:

Over the next six months funding from the planning grant will allow the organizations to jointly specify and design "DuraSpace," a new web-based service that will allow institutions to easily distribute content to multiple storage providers, both "cloud-based" and institution-based. The idea behind DuraSpace is to provide a trusted, value-added service layer to augment the capabilities of generic storage providers by making stored digital content more durable, manageable, accessible and sharable.

Michele Kimpton, Executive Director of the DSpace Foundation, said, "Together we can leverage our expertise and open source value proposition to continue to provide integrated open solutions that support the scholarly mission of universities."

Sandy Payette, Executive Director of Fedora Commons, observes, "There is an important role for high-tech non-profit organizations in adding value to emerging cloud solutions. DuraSpace is designed with an eye towards enabling universities, libraries, and other types of organizations to take advantage of cloud storage while also addressing special requirements unique to areas such as digital archiving and scholarly communication."

The grant from the Mellon Foundation will support a needs analysis, focus groups, technical design sessions, and meetings with potential commercial partners. A working web-based demonstration will be completed during the six-month grant period to help validate the technical and business assumptions behind DuraSpace.

Digital Preservation: Two-Year JHOVE2 Project Funded

The National Digital Information Infrastructure Preservation Program has funded the two-year JHOVE2 project, which will " develop a next-generation JHOVE2 architecture for format-aware characterization." Project particpants are the California Digital Library, Portico, and Stanford University.

Here's an excerpt from the Digipres announcement:

Among the enhancements planned for JHOVE2 are:

  • Support for four specific aspects of characterization: signature-based identification, feature extraction, validation, and rules-based assessment
  • A more sophisticated data model supporting complex multi-file objects and arbitrarily-nested container objects
  • Streamlined APIs to facilitate the integration of JHOVE2 technology in systems, services, and workflows
  • Increased performance
  • Standardized error handling
  • A generic plug-in mechanism supporting stateful multi-module processing
  • Availability under the BSD open source license

To help focus project activities we have recruited a distinguished advisory board to represent the interests of the larger stakeholder community. The board includes participants from the following international memory institutions, projects, and vendors:

  • Deutsche Nationalbibliothek (DNB)
  • Ex Libris
  • Fedora Commons
  • Florida Center for Library Automation (FCLA)
  • Harvard University / GDFR
  • Koninklijke Bibliotheek (KB)
  • MIT/DSpace
  • National Archives (TNA)
  • National Archives and Records Administration (NARA)
  • National Library of Australia (NLA)
  • National Library of New Zealand (NLNZ)
  • Planets project

The project partners are currently engaged in a public needs assessment and requirements gathering phase. A provisional set of use cases and functional requirements has already been reviewed by the JHOVE2 advisory board. . . .

The functional requirements, along with other project information, is available on the JHOVE2 project wiki. Feedback on project goals and deliverables can be submitted through the JHOVE2 public mailing lists.

Ex Libris Digital Preservation System Live at the National Library of New Zealand

After completing a successful beta test, the National Library of New Zealand has started using the Ex Libris Digital Preservation System in production mode. (Thanks to Library Technology Guides.)

Here's an excerpt from the press release:

Based on the Open Archival Information System (OAIS) model and conforming to trusted digital repository (TDR) requirements, the Ex Libris Digital Preservation System provides institutions with the infrastructure and technology needed to preserve and facilitate access to the collections under their guardianship.

The understanding that preservation and access belong together—that they are not mutually exclusive entities—dictated a design in which preservation support is built directly into the platform rather than serving as an add-on feature. This end-to-end solution offers full security, auditing, replication, and integrity checks that maintain the safety of collections over time, while persistent identifier tools and standard APIs (Application Programming Interface) enable institutions to make their collections easily accessible to users.

The National Library of New Zealand is using the highly configurable and scalable Digital Preservation System to collect a range of digital material types from a wide variety of sources (such as publishers, government agencies, and Web sites in the New Zealand domain); to review, validate, and organize such materials; and to make them available to end users in accordance with user access rights. Risk analysis and conversion tools enable the system to provide meaningful access to the digital objects over time. The integration of the system with other National Library of New Zealand applications is facilitated by a built-in software development kit and the suite of APIs.

December 2008 will see the general release of the Digital Preservation System by Ex Libris Group.