Report on Library of Congress/San Diego Supercomputer Center Data Transfer and Storage Tests

The Library of Congress has published Data Center for Library of Congress Digital Holdings: A Pilot Project; Final Report.

Here an excerpt from the "Introduction":

Between May 2006 and October 2007, the Library of Congress (LC) and the San Diego Supercomputer Center (SDSC) conducted data-transfer and storage tests. At the heart of the project was the issue of trust, specifically how the LC could trust SDSC to reliably store several terabytes of the LC’s data. By what means could SDSC prove to the LC that the data was intact, preserved, and well-cared for? What tests could the LC devise, and what metrics could SDSC produce, to guarantee the integrity of their remotely stored data?

The two main objectives of the project were:

  • For SDSC to host LC content reliably and return it intact at the end of the project
  • For LC to be able to remotely access, process, analyze, and manage that content . . . .

Inspired by SDSC’s staggering technological potential, the LC had devised several scenarios for the data tests. But ultimately, as the project progressed, the LC opted to keep its goals simple: data transfer, storage, and file manipulation. In the end, both partners were happy with the project’s success. The project also produced lessons and unexpected results, some of which will have deep implications for all cultural institutions regarding transfer and storage of their digital assets.

Institutional Repositories, Tout de Suite

Institutional Repositories, Tout de Suite, the latest Digital Scholarship publication, is designed to give the reader a very quick introduction to key aspects of institutional repositories and to foster further exploration of this topic through liberal use of relevant references to online documents and links to pertinent websites. It is under a Creative Commons Attribution-Noncommercial 3.0 United States License, and it can be freely used for any noncommercial purpose in accordance with the license.

Columbia University Libraries and Bavarian State Library Become Google Book Search Library Partners

Both the Columbia University Libraries and Bavarian State Library have joined the Google Book Search Library Project.

Here are the announcements:

Zotero/Internet Archive Alliance to Create Zotero Commons

Dan Cohen has announced a partnership between the Center for History and New Media's Zotero project and the Internet Archive that will create the Zotero Commons, a repository for scholarly materials, as well as personal, restricted-access storage for scholars. The Andrew W. Mellon Foundation is supporting the project.

University of Michigan Libraries Release the UMich OAI Toolkit

The University of Michigan Libraries have released the UMich OAI Toolkit.

Here's an excerpt from the announcement:

This toolkit contains both harvester and data provider, both written in Perl. . . .

UMHarvester is a robust tool using LWP for harvesting nigh on every OAI data provider available. It allows for incremental harvesting, has multiple re-try options, and a batch harvest tool (Batch_UMHarvest) that can automatically perform incremental harvesting.

UMProvider relies heavily on libxml (XML::LibXML) and will store the data in nearly any relational database. It functions by harvesting from a database of records, making rights determinations from a separate database, and providing the resulting set of records.

Originally, only the UMHarvester was available from UM's DLXS software site. The UMProvider tool is newly developed and takes the place of our DLXS data provider tool.

Stable Version of SPECTRa Released: Software for Depositing Chemical Data into Repositories

A stable version of SPECTRa has been released. SPECTRa is designed to facilitate the deposit of chemical data into digital repositories.

The JISC-funded SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data a Digital Repository for the Chemical Community) project's final report is also available.

University of Maryland Libraries Digital Collections Launched

The University of Maryland Libraries has launched its Digital Collections repository.

Here's an excerpt from the announcement:

This release marks two and a half years of work in the creation of a repository that serves the teaching and research mission of the University of Maryland Libraries. Many of the objects are digital versions from Maryland's Special Collections (such as A Treasury of World's Fairs Art and Architecture) or are new virtual collections (The Jim Henson Works). Other collections (such as Films@UM) support the teaching mission of the Libraries. This release also marks the integration of electronically available finding aids, ArchivesUM, into the repository architecture, creating a framework for digital objects to be dynamically discovered from finding aids.

The repository is based on the Fedora platform, uses Lucene for indexing, and Helix for streaming video. The repository features almost 2500 digital objects, with new objects added monthly. Object types currently delivered include full text (both TEI and EAD), video, and images. Objects can be discovered within a collection context or via a search across multiple collections. Cross-collection discovery is achieved through a common metadata scheme and controlled vocabulary. This metadata scheme also provides for individual collections to have more granular domain-specific metadata.

An FAQ for the repository is available.

National Science Digital Library Releases Initial Fedora-based NCore Components

The National Science Digital Library Core Integration team at Cornell University has released a partial version of NCore, a "general platform for building semantic and virtual digital libraries united by a common data model and interoperable applications," which is built upon Fedora.

Here's an excerpt from the NSDL posting:

The NCore platform consists of a central repository built on top of Fedora, a data model, an API, and a number of fundamental services such as full-text search or OAI-PMH. Innovative NSDL services and tools that empower users as content creators are now built on, or transitioning to, the NCore platform. These include: the Expert Voices blogging system (http://expertvoices.nsdl.org/);the NSDL Wiki (http://wiki.nsdl.org/index.php/NSDL_Wiki); the NSDL OAI-PMH metadata ingest aggregation system; the OAI-PMH service for distributing public NSDL metadata; the NSDL Collection System (NCS), derived from the DLESE Collection system (DCS); the NSDL Search service, and the OnRamp content management and distribution system (http://onramp.nsdl.org).

Because NCore is a general Fedora-based open source platform useful beyond NSDL, Core Integration developers at Cornell University have made the repository and API code components of NCore available for download at the NCore project on Sourceforge (http://sourceforge.net/projects/nsdl-core). Over the next six months, NSDL will release the code for major tools and services that comprise the full NCore suite on SourceForge.

For further information, see the NCore presentation.

Towards the Australian Data Commons: A Proposal for an Australian National Data Service

The Australian eResearch Infrastructure Council has released Towards the Australian Data Commons: A Proposal for an Australian National Data Service.

Here's an excerpt from the "Overview":

This paper is designed to encourage, inform and ultimately summarise the discussions around the appropriate strategic and technical descriptions of the Australian National Data Service; to fill in the outline in the Platforms for Collaboration investment plan.

To do so, the paper:

  • introduces the Australian National Data Service (ANDS) and the driving forces behind its creation;
  • provides a rationale for the services that ANDS will provide, and the programs through which the services will be offered; and
  • describes in detail the ANDS programs.

Part One (Background) provides a brief summary of the reasons to focus on data management, as well as an overview of ANDS, and identifies some issues associated with implementation.

Part Two (Rationale) sets out the systemic issues associated with achieving a research data commons, and provides the resultant rationale for the services that ANDS will offer the programs that they will be delivered through.

Part Three (Detailed Descriptions of ANDS Programs) sets out in detail the Aim, Focus, Service Beneficiaries, Products and Community Engagement activities for each of the ANDS Programs.

Fedora Meets Web 2.0: Repository Redux Presentation from Access 2007

A digital video of Mark Leggott's (University Librarian, University of Prince Edward Island) presentation from Access 2007 is now available.

Here's an excerpt from the program that describes the talk:

The University of Prince Edward Island has embarked on a substantial project to support the institutions Administrative, Learning and Research communities using a Web 2.0/3.0 framework and the Fedora/Drupal/Moodle systems as the foundation. The session will describe the architecture and demo some of the core systems, such as Learn@UPEI, UPEI VRE (Virtual Research Environment) and some sample digital library collections.

Primary Research Group Publishes International Institutional Repository Survey

The Primary Research Group has published The International Survey of Institutional Digital Repositories. Paper and PDF versions are available at $89.50 each.

Here's an excerpt from the press release:

The study presents data from 56 institutional digital repositories from eleven countries, including the USA, Canada, Australia, Germany, South Africa, India, Turkey and other countries. The 121-page study presents more than 300 tables of data and commentary and is based on data from higher education libraries and other institutions involved in institutional digital repository development. . . .

Close to 41% of survey participants purchased software to develop their digital repositories. US-based institutions were much more likely than others to purchase software for this purpose. . . .

On average, a drop more than 12% of the content in the repositories came from pre-existing repositories maintained by academic departments or some other institutional unit.

A sixth of the libraries in the sample used Digital Commons software, and 28% of US-based repositories used this product. . . .

Those repositories in the sample that required less than 500 hours of labor per year had budgets of just less than $9,000 US. The largest repositories, those requiring 3,600 hours or more annually, had budgets averaging $145,444. 5.21% of the overall labor required to run the digital repositories in the sample came from academic departments not connected to the library. . . .

The mean number of journal articles held by the repositories in the sample was 772 with a mean of 162. . . .

15.56% of the repositories in the sample were funded largely through grants.

Version 1.0 of SWORD, A Smart Deposit Tool for Repositories, Has Been Released

Version 1.0 of SWORD has been released The release includes DSpace (1.5 only) and Fedora implementations, GUI/CLI clients, and the common Java library.

Here's an excerpt from the SWORD Wiki that describes the project:

SWORD (Simple Web-service Offering Repository Deposit) will take forward the Deposit protocol developed by a small working group as part of the JISC Digital Repositories Programme by implementing it as a lightweight web-service in four major repository software platforms: EPrints, DSpace, Fedora and IntraLibrary. The existing protocol documentation will be finalised by project partners and a prototype 'smart deposit' tool will be developed to facilitate easier and more effective population of repositories. The project intends to take an iterative approach to developing and revising the protocol, web-services and client implementation through evaluative testing and feedback mechanisms. Community acceptance and take-up will be sought through dissemination activities. The project is led by UKOLN, University of Bath, with partners at the University of Wales, Aberystwyth, the University of Southampton and Intrallect Ltd. The project aims to improve the efficiency and quality of repository deposit and to diversity and expedite the options for timely population of repositories with content whilst promoting a common deposit interface and supporting the Information Environment principles of interoperability.

Development Pack about Managing Intellectual Property Rights for Digital Learning Materials in Repositories

The TrustDR (Trust in Digital Repositories) Digital Repository Project's Managing Intellectual Property Rights in Digital Learning Materials: A Development Pack for Institutional Repositories is available. The publication, which was the final output of the JISC-funded project, is under a Creative Commons Attribution License.

Here's an excerpt from the "Executive Introduction and Summary":

What is this pack for?

  • To help clarify and update IPR policy for the management and use of digital learning materials created within institutions and develop a sustainable infrastructure (human, technical, educational and organisational) for the effective use of e-learning particularly in support of delivering a more flexible curriculum.

Who is this pack aimed at?

  • Senior management with responsibilities in this area and those supporting them, individuals and teams tasked with overhauling institutional IPR policy, managers and consultants etc who are interested in developing viable e-learning infrastructures, managers of e-learning projects and those involved in planning for projects, partnerships and collaborations, people with a general interest in this increasingly important aspect of e-learning.

DigitalPreservationEurope Publishes Report on Copyright and Privacy Issues for Cooperating Repositories

DigitalPreservationEurope has published PO3.4: Report on the Legal Framework on Repository Infrastructure Impacting on Cooperation Across Member States.

Here's excerpt from the "Introduction."

The focus of this paper is the legal framework for the management of content of cooperating repositories. The focus will be on the regulation of copyright and protection of personal data. That copyright is important when managing data repositories is common knowledge. However, there is an increasing tendency among authors not only to deposit their published scientific work, scientific articles, dissertations or books, but also the underlying data. In addition to this ordinary publicly available sources like internet web pages contain personal data, often of a sensitive nature. Due to this emergent trend repositories will have to comply with the rules governing the use and protection of personal data, especially in the medical and social sciences.

The scenario is the following:

  • National repositories acquire material from different sources and in different formats.
  • The repositories cooperate with repositories in other countries in the preservation of data.
  • There is some degree of specialisation, some repositories specialise on preserving certain formats and other repositories on the preservation of other formats.

This paper describes the legal framework regulating the two decisive actions which have to take place if this scenario is to become a reality:

  1. The reproduction of data
  2. The transfer of data to other repositories

Other copyright issues like the rules concerning communication with the public and the protection of databases will also be touched upon.

Boston Public Library/Open Content Alliance Contract Made Public

Boston Public Library has made public its digitization contract with the Open Content Alliance.

Some of the most interesting provisions include the intent of the Internet Archive to provide perpetual free and open access to the works, the digitization cost arrangements (BPL pays for transport and provides bibliographic metadata, the Internet Archive pays for digitization-related costs), the specification of file formats (e.g., JPEG 2000, color PDF, and various XML files), the provision of digital copies to BPL (copies are available immediately after digitization for BPL to download via FTP or HTTP within 3 months), and use of copies (any use by either party as long as provenance metadata and/or bookplate data is not removed).

Open-Source IRStats Released: Use Statistics for EPrints and DSpace

Eprints.org has released IRStats, an open source use statistics analysis package that analyzes both EPrints (versions 2 and 3) and DSpace (beta functionality) logs. The software is under a BSD license, and it requires Perl, awstats, MySQL, Maxmind Organisation Database, ChartDirector, and a CGI-capable Web server.

A description of IRStats features is available as well as examples of its use. For additional information on the project, see "Introduction to IRS."

DSpace 1.5 Alpha Released

The 1.5 alpha version of the popular DSpace repository software has been released.

Here's an excerpt from "DSpace 1.5 Alpha with Experimental Binary Distribution" by Richard Jones:

There are big changes in this code base, both in terms of functionality and organisation. First, we are now using Maven to manage our build process, and have carved the application into a set of core modules which can be used to assemble your desired DSpace instance. . . .

The second big and most exciting thing is that Manakin is now part of our standard distribution, and we want to see it taking over from the JSP UI over the next few major releases. . . .

In addition to this, we have an Event System which should help us start to decouple tightly integrated parts of the repository. . . . Browsing is now done with a heavily configurable system . . . . Tim Donohue's much desired Configurable Submission system is now integrated with both JSP and Manakin interfaces and is part of the release too.

Further to this we have a bunch of other functionality including: IP Authentication, better metadata and schema registry import, move items from one collection to another, metadata export, configurable multilingualism support, Google and html sitemap generator, Community and Sub-Communities as OAI Sets, and Item metadata in XHTML head ‹meta› elements.

A Study of Curation and Preservation Issues in the eCrystals Data Repository and Proposed Federation

JISC's eBank UK project, which is now in phase three, has released A Study of Curation and Preservation Issues in the eCrystals Data Repository and Proposed Federation, which addresses key issues related to the establishment of the eCrystals Federation.

Here's an excerpt from "eBank Phase 3: Transitioning to the eCrystals Federation" that explains the overall project:

This project will progress the establishment of a global Federation of data repositories for crystallography by performing a scoping study into the feasibility of constructing a network of data repositories: the eCrystals Federation. The Federation approach is presented as an innovative domain model to promote Open Access to data more widely and to facilitate take-up.

It builds on the work of the eBank project, and has links to Repository for the Laboratory (R4L), SPECTRa and SMART Tea projects in chemistry. The Federation will contribute to the development of a digital repository e-infrastructure for research and will inform the Repository Support Project. . . .

In Phase 3, partners will assess organisational issues and promote advocacy, examine interoperability associated with research workflow and data deposit, harmonise the metadata application profiles from repositories operating on different platforms (EPrints, DSpace & ReciprocalNet), investigate aggregation issues arising from harvesting metadata from repositories situated within the information environments developed in other countries (EU, USA & Australia) and scope the issues of the Federation of institutional archives interoperating with an international subject archive (IUCr).

Brewster Kahle on Libraries Going Open

Brewster Kahle's "Libraries Going Open" document provides some details on where the Internet Archive and the Open Content Alliance are going with projects involving mass digitization of microfilm, mass digitization of journals, ILL of scanned out-of-print books, scanning books on demand, and other areas.

RUBRIC Toolkit: Institutional Repository Solutions Released

The RUBRIC Project has released the RUBRIC Toolkit: Institutional Repository Solutions.

Here's an excerpt from RUBRIC Toolkit: About the RUBRIC Project and the Toolkit page:

The RUBRIC Toolkit is a legacy of the RUBRIC Project, reflecting the discussions, investigation, phases, processes, issues and experiences surrounding the implementation of an Institutional Repository (IR). The sections are based on the collaborative experience of the eight Australian and New Zealand Universities involved in the project.

The content for the RUBRIC Toolkit developed organically and collaboratively in the project wiki over an extended period of time. It was then refined and developed. Project members have populated the Toolkit with useful resources and tools that can be used by other Project Managers and Institutions implementing an IR.

The RUBRIC Toolkit was released in October 2007 and will continue to be updated until the end of the RUBRIC Project in December 2007. As such the Toolkit captures the "best" of available advice, experience and outcomes available for IR development in 2007 and provides links to further reading wherever possible.