DSpace Foundation and Fedora Commons Announce Decision to Collaborate

The DSpace Foundation and Fedora Commons have announced that they will collaborate on future digital repository initiatives.

Here's an excerpt from the press release:

Today two of the largest providers of open source software for managing and providing access to digital content, the DSpace Foundation and Fedora Commons, announced plans to combine strengths to work on joint initiatives that will more closely align their organizations' goals and better serve both open source repository communities in the coming months. . . .

The collaboration is expected to benefit over 500 organizations from around the world who are currently using either DSpace (examples include MIT, Rice University, Texas Digital Library and University of Toronto) or Fedora (examples include the National Library of France, New York Public Library, Encyclopedia of Chicago and eSciDoc) open source software to create repositories for a wide variety of purposes. . . .

The decision to collaborate came out of meetings held this spring where members of DSpace and Fedora Commons communities discussed multiple dimensions of cooperation and collaboration between the two organizations. Ideas included leveraging the power and reach of open source knowledge communities by using the same services and standards in the future. The organizations will also explore opportunities to provide new capabilities for accessing and preserving digital content, developing common web services, and enabling interoperability across repositories.

In the spirit of advancing open source software, Fedora Commons and DSpace will look at ways to leverage and incubate ideas, community and culture to:

  1. Provide the best technology and services to open source repository framework communities.
  2. Evaluate and synchronize, where possible, both organizations' technology roadmaps to enable convergence and interoperability of key architectural components.
  3. Demonstrate how the DSpace and Fedora open source repository frameworks offer a unique value proposition compared to proprietary solutions.

The announcement came on the heels of an event sponsored by the Joint Information Systems Committee's (JISC) Common Repository Interface Group (CRIG) held at the Library of Congress. The event, known as "RepoCamp," was a forum where developers gathered to discuss innovative approaches to improving interoperability and web-orientation for digital repositories. Sandy Payette, Executive Director of Fedora Commons, and Michele Kimpton, Executive Director of the DSpace Foundation, reiterated their commitment to collaboration and encouraged input and participation from both communities as work gets underway.

DSpace Can Support One Million Items

A paper by researchers from the National Library of Medicine ("Testing the Scalability of a DSpace-based Archive") finds that DSpace can support an archive with a million items. The tested system "is built upon MIT's DSpace software (Version 1.4), with some modifications and enhancements to better facilitate batch based processing."

Here's an excerpt from the conclusion:

We conclude that the version of DSpace used in SPER (with MySQL database) shows acceptable ingest performance for a million-item archive. . . .

The experimental results shown here pertain to items with mostly one or two monochrome TIFF images, though a few items have up to 100 images. However, a number of inferences may be derived from these results.

  • No real problems were found in ingesting a million items to the archive, using a Sun X4500 server machine, in terms of either performance or reliability of the SPER/DSpace software architecture and implementation. . . .
  • With the increase in archive size, the average ingest time of an item increases in a smooth and predictable way.
  • With increasing number of TIFF images, the ingest time (per item) increases by three to four percent for each additional image.
  • If color TIFF images were used, the ingest times would increase slightly due to the overhead of copying additional data to the upload area, and to the archive's asset storage. However, other archival overheads should not change.

Foresite Project OAI-ORE Resource Maps Software

The Foresite Project has released the foresite-toolkit.

Here's an excerpt from the announcement (footnotes removed):

The Foresite project is pleased to announce the initial code of two software libraries for constructing, parsing, manipulating and serialising OAI-ORE Resource Maps. These libraries are being written in Java and Python, and can be used generically to provide advanced functionality to OAI-ORE aware applications, and are compliant with the latest release (0.9) of the specification. The software is open source, released under a BSD licence, and is available from a Google Code repository . . . .

Foresite is a JISC funded project which aims to produce a demonstrator and test of the OAI-ORE standard by creating Resource Maps of journals and their contents held in JSTOR, and delivering them as ATOM documents via the SWORD interface to DSpace. DSpace will ingest these resource maps, and convert them into repository items which reference content which continues to reside in JSTOR. The Python library is being used to generate the resource maps from JSTOR and the Java library is being used to provide all the ingest, transformation and dissemination support required in DSpace.

DSpace Foundation and Fedora Commons Investigate Joint Collaboration

The DSpace Foundation and the Fedora Commons have been recently investigating the possibility of joint collaboration.

Here's an excerpt from a Dspace-General message:

Over the last few weeks, we (Michele Kimpton and Sandy Payette) have been discussing the possibilities of our organizations collaborating. . . .

Over the past couple of weeks, we have had informal discussions with members of our communities, leaders in libraries and higher education, and Board members to get initial feedback as to whether they would support collaboration and the outcomes they would like to see as a result.

This past week, we convened members of both communities during the PASIG conference to get input and ideas regarding a collaboration.

Thus far, all of the stakeholders we have had the opportunity to talk with have been extremely supportive and excited about the possibility of the Fedora and DSpace communities working together in some capacity.

As a result of these discussions, we have agreed to move forward in our exploration of collaborative possibilities. Over the next several weeks our organizations will meet to plan the next steps in the process. Our intent is to bring together the ideas and expertise within both communities to come up with the most compelling issues to work on to best serve our communities.

DSpace Version 1.5 Released

Version 1.5 of DSpace, which is a major upgrade, has been released.

Here's an excerpt from the announcement:

The DSpace community is pleased to announce the release of DSpace 1.5! This is an important release of DSpace with many new features, including a completely new theme-able Manakin user interface, SWORD integration, many new configurable options, and scalability improvements. . . .

New Features:

  • Maven DSpace 1.5 introduces a new Maven-based build system. Maven is a software tool from Apache that allows developers to compile and distribute software projects. Maven also enables DSpace to be more modular by arranging the software into sub-components. In addition, it makes customizations easier by giving developers the tools to maintain customizations, and provides the ability to manage new features as DSpace continues its accelerating growth rate. . . .
  • Manakin Customize your repository look-and-feel with the new Manakin theme-able user interface. Manakin introduces a new modular framework, enabling an institution to customize their interface according to the specific needs of the particular repository, community, or collection. . . .
  • Light Network Interface Integrate DSpace with legacy or local systems that need to manage content in the repository through the new Light Network Interface. This interface provides a programmatic mechanism to manage content within the repository through a WebDAV or SOAP based protocol. . . .
  • SWORD Integrate with the new SWORD (Simple Web-service Offering Repository Deposit) protocol. Based upon the Atom Publishing Protocol, this interface allows for cross-repository deposit of new content. This protocol may enable future tools that will provide for 'one click' deposit. . . .
  • Browsing The browsing system has been completely re-implemented to provide improved scalability and configuration. The new browsing system enables administrators to easily create new browse indexes. . . .
  • Submissions The item submission system is now more configurable by managing the steps a user follows when submitting a new item to the repository. The new submission system allows for these steps to be rearranged, removed, and even allows for new steps to be added. . . .
  • Events Another under-the-hood improvement introduced in DSpace 1.5 is the event system, which improves scalability and modularity by introducing an event model to the architecture. This feature will allow future add-ons to automatically manage content in the repository based upon when an object has been added, modified, or removed from the system.

E-Print Preservation: SHERPA DP: Final Report of the SHERPA DP Project

JISC has released SHERPA DP: Final Report of the SHERPA DP Project.

Here's an excerpt from the "Executive Summary":

The SHERPA DP project (2005–2007) investigated the preservation of digital resources stored by institutional repositories participating in the SHERPA project. An emphasis was placed on the preservation of e-prints—research papers stored in an electronic format, with some support for other types of content, such as electronic theses and dissertations.

The project began with an investigation of the method that institutional repositories, as Content Providers, may interact with Service Providers. The resulting model, framed around the OAIS, established a Co-operating archive relationship, in which data and metadata is transferred into a preservation repository subsequent to it being made available. . . .

The Arts & Humanities Data Service produced a demonstrator of a Preservation Service, to investigate the operation of the preservation service and accepted responsibility for the preservation of the digital objects for a three-year period (two years of project funding, plus one year).

The most notable development of the Preservation Service demonstrator was the creation of a reusable service framework that allows the integration of a disparate collection of software tools and standards. The project adopted Fedora as the basis for the preservation repository and built a technical infrastructure necessary to harvest metadata, transfer data, and perform relevant preservation activities. Appropriate software tools and standards were selected, including JHOVE and DROID as software tools to validate data objects; METS as a packaging standard; and PREMIS as a basis on which to create preservation metadata. . . .

A number of requirements were identified that were essential for establishing a disaggregated service for preservation, most notably some method of interoperating with partner institutions and he establishment of appropriate preservation policies. . . . In its role as a Preservation Service, the AHDS developed a repository-independent framework to support the EPrints and DSpace-based repositories, using OAI-PMH as common method of connecting to partner institutions and extracting digital objects.

Institutional Repositories, Tout de Suite

Institutional Repositories, Tout de Suite, the latest Digital Scholarship publication, is designed to give the reader a very quick introduction to key aspects of institutional repositories and to foster further exploration of this topic through liberal use of relevant references to online documents and links to pertinent websites. It is under a Creative Commons Attribution-Noncommercial 3.0 United States License, and it can be freely used for any noncommercial purpose in accordance with the license.

Rice University Releases Travelers in the Middle East Archive

Rice University has released the Travelers in the Middle East Archive under a Creative Commons Attribution 2.5 Generic License.

Here's an excerpt from the announcement:

IMEA provides access to:

  • Nearly 1,000 images, including stereocards, postcards and book illustrations
  • More than 150 historical maps representing the Middle East as it was in the 19th and early 20th centuries
  • Interactive geographical information systems (GIS) maps that serve as an interface to the collection and present detailed information about features such as waterways, elevation and populated places
  • Successive editions of classic travel guides and major museum collection catalogues
  • Convenient educational modules that set materials from the collection in historical and geographic context and explore the research process

TIMEA is able to offer seamless access for researchers by providing a common user interface to digital objects housed in three repositories. Texts, historical maps and images reside in DSpace, an open-source digital repository system. Educational research modules are presented within Connexions, an open-content commons and publishing platform for educational materials. TIMEA also uses Google Maps and ESRI’s ArcIMS map server.

New Release of BioMed Central's Open Repository, a Hosted Institutional Repository Service

BioMed Central has released version 1.4.9 of Open Repository, its DSpace-based, hosted institutional repository service.

Here's an excerpt from the press release:

Open Repository version 1.4.9 has several new features that are designed to enhance the customer experience. The release offers an improved user interface, making it easier for customers to browse and submit their material online. Additionally, institutions can convert their Word, Excel, PowerPoint, Text and RTF documents to PDF format. Customers can also set up RSS feeds, and customize lists and search fields, adding value to the already robust platform.

Version 1.0 of SWORD, A Smart Deposit Tool for Repositories, Has Been Released

Version 1.0 of SWORD has been released The release includes DSpace (1.5 only) and Fedora implementations, GUI/CLI clients, and the common Java library.

Here's an excerpt from the SWORD Wiki that describes the project:

SWORD (Simple Web-service Offering Repository Deposit) will take forward the Deposit protocol developed by a small working group as part of the JISC Digital Repositories Programme by implementing it as a lightweight web-service in four major repository software platforms: EPrints, DSpace, Fedora and IntraLibrary. The existing protocol documentation will be finalised by project partners and a prototype 'smart deposit' tool will be developed to facilitate easier and more effective population of repositories. The project intends to take an iterative approach to developing and revising the protocol, web-services and client implementation through evaluative testing and feedback mechanisms. Community acceptance and take-up will be sought through dissemination activities. The project is led by UKOLN, University of Bath, with partners at the University of Wales, Aberystwyth, the University of Southampton and Intrallect Ltd. The project aims to improve the efficiency and quality of repository deposit and to diversity and expedite the options for timely population of repositories with content whilst promoting a common deposit interface and supporting the Information Environment principles of interoperability.

Open-Source IRStats Released: Use Statistics for EPrints and DSpace

Eprints.org has released IRStats, an open source use statistics analysis package that analyzes both EPrints (versions 2 and 3) and DSpace (beta functionality) logs. The software is under a BSD license, and it requires Perl, awstats, MySQL, Maxmind Organisation Database, ChartDirector, and a CGI-capable Web server.

A description of IRStats features is available as well as examples of its use. For additional information on the project, see "Introduction to IRS."

DSpace 1.5 Alpha Released

The 1.5 alpha version of the popular DSpace repository software has been released.

Here's an excerpt from "DSpace 1.5 Alpha with Experimental Binary Distribution" by Richard Jones:

There are big changes in this code base, both in terms of functionality and organisation. First, we are now using Maven to manage our build process, and have carved the application into a set of core modules which can be used to assemble your desired DSpace instance. . . .

The second big and most exciting thing is that Manakin is now part of our standard distribution, and we want to see it taking over from the JSP UI over the next few major releases. . . .

In addition to this, we have an Event System which should help us start to decouple tightly integrated parts of the repository. . . . Browsing is now done with a heavily configurable system . . . . Tim Donohue's much desired Configurable Submission system is now integrated with both JSP and Manakin interfaces and is part of the release too.

Further to this we have a bunch of other functionality including: IP Authentication, better metadata and schema registry import, move items from one collection to another, metadata export, configurable multilingualism support, Google and html sitemap generator, Community and Sub-Communities as OAI Sets, and Item metadata in XHTML head ‹meta› elements.

University of Minnesota Launches the Digital Conservancy

The University of Minnesota has launched its institutional repository, the Digital Conservancy. It utilizes DSpace.

Here's a description from the University Digital Conservancy FAQ page:

The University Digital Conservancy is a program of the University of Minnesota, administered by the University Libraries. The program provides stewardship, reliable long-term open access, and broad dissemination of the digital scholarly and administrative works of University of Minnesota faculty, departments, centers and offices. Materials in the Conservancy are freely available online to the University community and to the public.

Here are selected web pages about the Digital Conservancy:

Update on the DSpace Foundation

Michele Kimpton, Executive Director of the DSpace Foundation, gave gave a talk about the foundation at the DSpace UK & Ireland User Group meeting in early July.

Her PowerPoint presentation is now available.

Source: Lewis, Stuart. "Presentations from Recent DSpace UK & Ireland User Group Meeting," Unilever Centre for Molecular Informatics, Cambridge—Jim Downing, 11 July 2007.

DSpace How-To Guide

Tim Donohue, Scott Phillips, and Dorothea Salo have published DSpace How-To Guide: Tips and Tricks for Managing Common DSpace Chores (Now Serving DSpace 1.4.2 and Manakin 1.1).

This 55-page booklet, which is under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License, will be a welcome addition to the virtual bookshelves of institutional repository managers struggling with the mysteries of DSpace.

Report on Chemistry Teaching/Research Data and Institutional Repositories

The JISC-funded SPECTRa project has released Project SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data): JISC Final Report, March 2007.

Here’s an excerpt from the Executive Summary:

Project SPECTRa’s principal aim was to facilitate the high-volume ingest and subsequent reuse of experimental data via institutional repositories, using the DSpace platform, by developing Open Source software tools which could easily be incorporated within chemists’ workflows. It focussed on three distinct areas of chemistry research—synthetic organic chemistry, crystallography and computational chemistry.

SPECTRa was funded by JISC’s Digital Repositories Programme as a joint project between the libraries and chemistry departments of the University of Cambridge and Imperial College London, in collaboration with the eBank UK project. . . .

Surveys of chemists at Imperial and Cambridge investigated their current use of computers and the Internet and identified specific data needs. The survey’s main conclusions were:

  • Much data is not stored electronically (e.g. lab books, paper copies of spectra)
  • A complex list of data file formats (particularly proprietary binary formats) being used
  • A significant ignorance of digital repositories
  • A requirement for restricted access to deposited experimental data

Distributable software tool development using Open Source code was undertaken to facilitate deposition into a repository, guided by interviews with key researchers. The project has provided tools which allow for the preservation aspects of data reuse. All legacy chemical file formats are converted to the appropriate Chemical Markup Language scheme to enable automatic data validation, metadata creation and long-term preservation needs. . . .

The deposition process adopted the concept of an "embargo repository" allowing unpublished or commercially sensitive material, identified through metadata, to be retained in a closed access environment until the data owner approved its release. . . .

Among the project’s findings were the following:

  • it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organisational capability of digital repositories;
  • scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;
  • the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;
  • institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;
  • IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.