Index Data Releases Open Source Pazpar2 Z39.50 Client

Index Data has released Version 1.0.1 of Pazpar2, an open source Z39.50 client.

Here’s an excerpt from the press release:

Pazpar2 . . . can be viewed either as a high-performance metasearching middleware or a Z39.50 client with a webservice interface, depending on your perspective and needs. It is a fairly compact C program—a resident daemon—that incorporates the best we know how to do in terms of providing high performance, user-oriented federated searching. . . .

One cool thing it does is search many databases in parallel, and do it fast, without unduly loading up the user interface. . . It retrieves a set of records from each target, and performs merging, deduplication, ranking/sorting, and pulls browse facets from them. . . .

It doesn’t know anything about data models, so you can handle exotic data sources if you need to. . . you use XSLT to normalize data into an internal model—we provide examples for MARC21 and a DC-esque internal model, and configure ranking, facets, sorting, etc., from that. . . .

Towards an Open Source Repository and Preservation System

The UNESCO Memory of the World Programme, with the support of the Australian Partnership for Sustainable Repositories, has published Towards an Open Source Repository and Preservation System: Recommendations on the Implementation of an Open Source Digital Archival and Preservation System and on Related Software Development.

Here’s an excerpt from the Executive Summary and Recommendations:

This report defines the requirements for a digital archival and preservation system using standard hardware and describes a set of open source software which could used to implement it. There are two aspects of this report that distinguish it from other approaches. One is the complete or holistic approach to digital preservation. The report recognises that a functioning preservation system must consider all aspects of a digital repositories; Ingest, Access, Administration, Data Management, Preservation Planning and Archival Storage, including storage media and management software. Secondly, the report argues that, for simple digital objects, the solution to digital preservation is relatively well understood, and that what is needed are affordable tools, technology and training in using those systems.

An assumption of the report is that there is no ultimate, permanent storage media, nor will there be in the foreseeable future. It is instead necessary to design systems to manage the inevitable change from system to system. The aim and emphasis in digital preservation is to build sustainable systems rather than permanent carriers. . . .

The way open source communities, providers and distributors achieve their aims provides a model on how a sustainable archival system might work, be sustained, be upgraded and be developed as required. Similarly, many cultural institutions, archives and higher education institutions are participating in the open source software communities to influence the direction of the development of those softwares to their advantage, and ultimately to the advantage of the whole sector.

A fundamental finding of this report is that a simple, sustainable system that provides strategies to manage all the identified functions for digital preservation is necessary. It also finds that for simple discrete digital objects this is nearly possible. This report recommends that UNESCO supports the aggregation and development of an open source archival system, building on, and drawing together existing open source programs.

This report also recommends that UNESCO participates through its various committees, in open source software development on behalf of the countries, communities, and cultural institutions, who would benefit from a simple, yet sustainable, digital archival and preservation system. . . .

Report on Chemistry Teaching/Research Data and Institutional Repositories

The JISC-funded SPECTRa project has released Project SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data): JISC Final Report, March 2007.

Here’s an excerpt from the Executive Summary:

Project SPECTRa’s principal aim was to facilitate the high-volume ingest and subsequent reuse of experimental data via institutional repositories, using the DSpace platform, by developing Open Source software tools which could easily be incorporated within chemists’ workflows. It focussed on three distinct areas of chemistry research—synthetic organic chemistry, crystallography and computational chemistry.

SPECTRa was funded by JISC’s Digital Repositories Programme as a joint project between the libraries and chemistry departments of the University of Cambridge and Imperial College London, in collaboration with the eBank UK project. . . .

Surveys of chemists at Imperial and Cambridge investigated their current use of computers and the Internet and identified specific data needs. The survey’s main conclusions were:

  • Much data is not stored electronically (e.g. lab books, paper copies of spectra)
  • A complex list of data file formats (particularly proprietary binary formats) being used
  • A significant ignorance of digital repositories
  • A requirement for restricted access to deposited experimental data

Distributable software tool development using Open Source code was undertaken to facilitate deposition into a repository, guided by interviews with key researchers. The project has provided tools which allow for the preservation aspects of data reuse. All legacy chemical file formats are converted to the appropriate Chemical Markup Language scheme to enable automatic data validation, metadata creation and long-term preservation needs. . . .

The deposition process adopted the concept of an "embargo repository" allowing unpublished or commercially sensitive material, identified through metadata, to be retained in a closed access environment until the data owner approved its release. . . .

Among the project’s findings were the following:

  • it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organisational capability of digital repositories;
  • scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;
  • the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;
  • institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;
  • IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.

Archivists’ Toolkit Beta 1.1 Released

The Archivists’ Toolkit Beta 1.1 has been released for testing by interested parties.

Here’s a description of the Archivists’ Toolkit from the project’s home page:

Key Features:

  • Integrated support for managing archival materials from acquisition through processing:
  • Recording repository information
  • Tracking sources / donors
  • Recording accessions
  • Basic authority control for names and topical subjects
  • Describing archival resources and digital objects
  • Managing location information
  • Customizable interface:
    • Modify field labels
    • Establish default values for fields and notes where boilerplate text is used
    • Customize searchable fields and record browse lists
  • Ingest of legacy data in multiple formats: EAD 2002, MARC XML, and tab delimited accession data
  • Rapid data entry interface for creating container lists quickly
  • Management of user accounts, with a range of permission levels to control access to data
  • Tracking of database records, including username and date of record creation and most recent edit
  • Generation of over 30 different administrative and descriptive reports, such as acquisition statistics, accession records, shelf lists, subject guides, etc.
  • Export EAD 2002, MARC XML, METS, MODS, and Dublin Core
  • Support for desktop or networked, single- or multi-repository installations
  • DSpace Executive Director Appointed

    Michele Kimpton, formerly of the Internet Archive, has been appointed the Executive Director of the newly formed DSpace nonprofit organization.

    Here’e an excerpt from the announcement:

    I am happy to report that we are making good progress on establishing the new non-profit organization, and I would like to take this opportunity to announce that Michele Kimpton has accepted the position as Executive Director for the organization. The DSpace non-profit corporation will initially provide organizational, legal and financial support for the DSpace open source software project. Prior to joining DSpace, Michele Kimpton was one of the founding Directors at Internet Archive, in charge of Web archiving technology and services. . . .

    Michele developed an organization within Internet Archive to help support and fund open source software and web archiving programs, so she comes to us with a lot of experience in both open source software and long-term digital curation. Her organization worked primarily with National Libraries and Archives around the world, so she is familiar with large, widely diverse and distributed communities. Michele was one of the co-founders of the IIPC ( International Internet Preservation Consortium, netpreserve.org), whose mission is to work collaboratively to develop tools, standards and processes for archiving and preservation of web material.

    The DSpace non-profit corporation is in the final stages of completing filing status as a not-for-profit corporation of Massachusetts. By summer 2007 we expect to have this legal entity in place, and a complete Board of Directors. Both MIT and Hewlett Packard have provided the start up funding to establish the organization over the next several years. . . .

    Fez 1.3 Released

    Christiaan Kortekaas has announced on the fedora-commons-users list that Fez 1.3 is now available from SourceForge.

    Here’s a summary of key changes from his message:

    • Primary XSDs for objects based on MODS instead of DC (can still handle your existing DC objects though)
    • Download statistics using apache logs and GeoIP
    • Object history logging (premis events)
    • Shibboleth support
    • Fulltext indexing (pdf only)
    • Import and Export of workflows and XSDs
    • Sanity checking to help make sure required external dependencies are working
    • OAI provider that respects FezACML authorisation rules

    For further information on Fez, see the prior post "Fez+Fedora Repository Software Gains Traction in US."

    E-Journal: A Drupal-Based E-Journal Publishing System

    Roman Chyla has developed E-Journal, an e-journal management and publishing system based upon the popular open-source Drupal content management system.

    Here is a description from the E-Journal site:

    This module allows you to create and control own electronic journals in Drupal—you can set up as many journals as you want, add authors and editors. Module gives you issue management and provides list of vocabularies (to browse) and archive of published articles. This module is more sophisticated than epublish.module and was inspired by Open Journal System. Our workflow is not so rigid though and because of the Drupal platform, you can do much more with e-journal than with OJS – potentially ;-).

    An example journal that uses E-Journal is Ikaros .

    (Prior postings about e-journal management and publishing systems.)

    Fez+Fedora Repository Software Gains Traction in US

    The February 2007 issue of Sustaining Repositories reports that more US institutions are using or investigating a combination of Fez and Fedora (see the below quote):

    Fez programmers at the University of Queensland (UQ) have been gratified by a surge in international interest in the Fez software. Emory University Libraries are building a Fez repository for electronic theses. Indiana University Libraries are also testing Fez+Fedora to see whether to replace their existing DSpace installation. The Colorado Alliance of Research Libraries (http://www.coalliance.org/) is using Fez+Fedora for their Alliance Digital Repository. Also in the US, the National Science Digital Library is using Fez+Fedora for their Materials Science Digital Library (http://matdl.org/repository/index.php).

    Oregon State University Libraries Release LibraryFind Metasearch Software

    The Oregon State University Libraries have released version 0.7 of LibraryFind, which is open source metasearch software.

    LibraryFind features noted in the press release include:

    • 2-click user workflow (one click to find, one click to get)
    • Integrated OpenURL resolver
    • 2-tiered caching system to improve search response time
    • Customizable user interface

    According to the installation instructions, the software requires Ruby 1.8.4 and Rails 1.1.6.

    MIT’s SIMILE Project

    MIT’s Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) project is producing a variety of interesting open source software packages that will be of interest to librarians and others such as Piggy Bank, "a Firefox extension that turns your browser into a mashup platform, by allowing you to extract data from different web sites and mix them together."

    Here is an overview of the SIMILE project from the About SIMILE page:

    SIMILE is a joint project conducted by the MIT Libraries and MIT Computer Science and Artificial Intelligence Laboratory. SIMILE seeks to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services. A key challenge is that the collections which must inter-operate are often distributed across individual, community, and institutional stores. We seek to be able to provide end-user services by drawing upon the assets, schemata/vocabularies/ontologies, and metadata held in such stores.

    SIMILE will leverage and extend DSpace, enhancing its support for arbitrary schemata and metadata, primarily though the application of RDF and semantic web techniques. The project also aims to implement a digital asset dissemination architecture based upon web standards. The dissemination architecture will provide a mechanism to add useful "views" to a particular digital artifact (i.e. asset, schema, or metadata instance), and bind those views to consuming services.

    You can get a more detailed overview of the project from the SIMILE grant proposal and from other project documents.

    There is a SIMILE blog and a Wiki. There are also three mailing lists.

    Fedora 2.2 Released

    The Fedora Project has released version 2.2 of Fedora.

    From the announcement:

    This is a significant release of Fedora that includes a complete repackaging of the Fedora source and binary distribution so that Fedora can now be installed as a standalone web application (.war) in any web container. This is a first step in positioning Fedora to fit within a standard "enterprise system" environment. A new installer application makes it easy to setup and run Fedora. Fedora now uses Servlet Filters for authentication. To support digital object integrity, the Fedora repository can now be configured to calculate and store checksums for datastream content. This can be done globally, or on selected datastreams. The Fedora API also provides the ability to check content integrity based on checksums. The RDF-based Resource Index has been tuned for better performance. Also, a new high-performing triplestore, backed by Postgres, has been developed that can be plugged into the Resource Index. Fedora contains many other enhancements and bug fixes.

    Under the Hood of PLoS ONE: The Open Source TOPAZ E-Publishing System

    PLoS is building its innovative PLoS ONE e-journal, which will incorporate both traditional and open peer review, using the open source TOPAZ software. (For a detailed description of the PLoS ONE peer review process, check out "ONE for All: The Next Step for PLoS.")

    What is TOPAZ? It’s Web site doesn’t provide specifics, but "PLoS ONE—Technical Background" by Richard Cave does:

    The core of TOPAZ is a digital information repository called Fedora (Flexible Extensible Digital Object Repository Architecture). Fedora is an Open Source content management application that supports the creation and management of digital objects. The digital objects contain metadata to express internal and external relationships in the repository, like articles in a journal or the text, images and video of an article. This relationship metadata can also be search using a semantic web query languages. Fedora is jointly developed by Cornell University’s computer science department and the University of Virginia Libraries.

    The metastore Kowari will be used with Fedora to support Resource Description Framework (RDF) http://en.wikipedia.org/wiki/Resource_Description_Framework metadata within the repository.

    The PLoS ONE web interface will be built with AJAX. Client-side APIs will create the community features (e.g. annotations, discussion threads, ratings, etc.) for the website. As more new features are available on the TOPAZ architecture, we will launch them on PLoS ONE.

    There was a TOPAZ Wiki at PLoS. It’s gone, but it’s pages are still cached by Google. The Wiki suggests that TOPAZ is likely to support Atom/RSS feeds, full-text search, and OAI-PMH among other possible features.

    For information about other open source e-journal publishing systems, see "Open Source Software for Publishing E-Journals."

    Open Source Software for Publishing E-Journals

    Want to publish an open access journal, but you don’t want to license a commercial journal management system, develop your own system, or to do it all by tedious HTML hand-coding? Here’s summary information about two existing open source e-journal management systems (and one emerging system) that may do the trick.

    HyperJournal

    • "HyperJournal is a software application that facilitates the administration of academic journals on the Web. Conceived for researchers in the Humanities and designed according to an intuitive and elegant layout, it permits the installation, personalization, and administration of a dedicated Web site at extremely low cost and without the need for special IT-competence. HyperJournal can be used not only to establish an online version of an existing paper periodical, but also to create an entirely new, solely electronic journal."
    • Overview
    • Documentation
    • Download

    Open Journal Systems, Public Knowledge Project

    • "Open Journal Systems (OJS) is a journal management and publishing system that has been developed by the Public Knowledge Project through its federally funded efforts to expand and improve access to research. OJS assists with every stage of the refereed publishing process, from submissions through to online publication and indexing. Through its management systems, its finely grained indexing of research, and the context it provides for research, OJS seeks to improve both the scholarly and public quality of referred research."
    • Open Journal Systems (Overview)
    • FAQ
    • OJS Technical Reference
    • Download

    DPubS (Digital Publishing System), Cornell University Library (In development)

    • "DPubS’ ground-breaking software system will enable publishers to cost-effectively organize, deliver, present and publish scholarly journals, monographs, conference proceedings, and other common and evolving means of academic discourse."
    • About DPubS
    • FAQ

    Postscript: Peter Suber suggests adding several other software packages, including:

    1. ePublishing Toolkit
    2. SciX Open Publishing Services (SOPS)