Open Harvester Systems 2.3.0 Released

The Public Knowledge Project has released Open Harvester Systems 2.3.0.

Here's an excerpt from the announcement:

This is a major rewrite of numerous parts of the Harvester code, including metadata storage and indexing. It increases indexing flexibility to support plugin-based indexing, including Lucene/SOLR support. It also adds OAI Data Provider support, including the potential to convert between metadata formats (currently from various formats into Dublin Core).

OCLC Makes New OAIster Interfaces Available

OCLC has made basic and advanced OAIster search interfaces available. Access is free.

OAIster is a database of over 23 million records from OAI-PMH-compliant digital repositories, which was originally developed by the University of Michigan Library. Initially, OCLC made OAIster available only as part of WorldCat and as a FirstSearch database (these access points remain). (Thanks to ResourceShelf.)

Read more about it at "OCLC Makes OAIster Records Available through WorldCat.org," "OCLC makes OAIster Records Available through WorldCat.org to Ensure Long-Term Public Access to Digital Resources," and "University of Michigan and OCLC Form Partnership to Ensure Long-Term Access to OAIster Database."

OAI-PMH: MOAI 1.0.7 Released

Infrae has released MOAI 1.0.7, a standalone OAI-PMH server that can “can be used in combination with any repository software that comes with an OAI feed.”

Here's an excerpt from the announcement:

MOAI is a platform for aggregating content from different sources, and publishing it through the Open Archive Initiatives protocol for metadata harvesting. It's been built for academic institutional repositories dealing with relational metadata and asset files. . . .

More specifically MOAI has the ability to:

  • Harvest data from different kinds of sources
  • Serve many OAI feeds from one MOAI server, each with their own configuration
  • Turn metadata values into OAI sets on the fly, creating new collections
  • Use OAI sets to filter records shown in a feed, configurable for each feed
  • Work easily with relational data (e.g. if an author changes, the publication should also change)
  • Simple and robust authentication through integration with the Apache webserver
  • Serve assets via Apache while still using configurable authentication rules

OCLC to Offer Free OAIster-Only Database View in 2010 to Complement Integrated WorldCat Access

The transfer of the OAIster database to OCLC's WorldCat is now complete, and OCLC will offer a free OAIster-only database view in 2010 to complement integrated WorldCat Access.

Here's an excerpt from the press release:

The University of Michigan and OCLC today announced that they have successfully transitioned the OAIster database to OCLC to ensure continued public access to open-archive collections, and to expand the visibility of these collections to millions of information seekers through OCLC services.

OAIster records are now fully accessible through WorldCat.org, and will be included in WorldCat.org search results along with records from thousands of libraries worldwide that add their holdings to WorldCat. OCLC plans to release a freely accessible, discrete view of the OAIster records in January 2010 through a URL specific to OAIster. OAIster records will also continue to be available on the OCLC FirstSearch service to Base Package subscribers, providing another valuable access point for this rich database and a complement to other FirstSearch databases. OCLC will continue to develop and enhance access to open archive content.

"Adding records for open archive collections is a natural complement to WorldCat and will drive discovery and access of these collections for a broader community of scholars," said Chip Nilges, OCLC Vice President, Business Development. "OCLC is committed to building on the success of OAIster by identifying open archive collections of interest to researchers and libraries, and ensuring that open archive collections will be freely discoverable and accessible to information seekers worldwide."

"Integration of OAIster inside WorldCat.org is the result of many years of looking for a better home for OAIster, where its resources can be searched alongside other valuable, scholarly resources," said Kat Hagedorn, OAIster/Metadata Harvesting Librarian at the University of Michigan. "I am eagerly looking forward to its increased usefulness in the world of search and discovery."

OAIster is a union catalog of digital resources hosted at the University of Michigan since 2002. Launched with grant support from the Andrew W. Mellon Foundation, OAIster was developed to test the feasibility of building a portal to open archive collections using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIster has grown to become one of the world's largest aggregations of records pointing to open archive collections with more than 23 million records contributed by over 1,100 organizations worldwide.

"The University of Michigan approached OCLC about managing future operations for the OAIster project to ensure its long-term viability," said John Wilkin, Associate University Librarian, University of Michigan Library, when the partnership was announced earlier this year. "OCLC plays a pivotal role in the business of metadata creation and distribution. Situating OAIster with OCLC helps to create an increasingly comprehensive discovery resource for users."

OCLC plans to release a freely accessible, discrete view of the OAIster database in 2010 that will be updated regularly. This will allow WorldCat.org searchers to view only items harvested through OAIster.

"OCLC has been very responsive to issues and needs brought up by the OAI community," said Ms. Hagedorn. "The creation of a free, separately accessible view of OAIster within OCLC is an example of their recognition of the value of OAIster in the world of metadata management."

Now that all OAIster records are accessible through WorldCat.org, the oaister.org Web site has been redirected to a new OAIster Web site at OCLC. For more information, visit the new OAIster Web site.

OAI-PMH: MOAI 1.0.6 Released

MOAI 1.0.6 has been released.

Here's an excerpt from the MOAI Web page:

MOAI has some interesting features not found in most OAI servers. Besides serving OAI, it can also harvest OAI. This makes it possible for MOAI to work as a pipe, where the OAI data can be reconfigured, cached, and enriched while it passes through the MOAI processing.

More specifically MOAI has the ability to:

  • Harvest data from different kinds of sources
  • Serve many OAI feeds from one MOAI server, each with their own configuration
  • Turn metadata values into OAI sets on the fly, creating new collections
  • Use OAI sets to filter records shown in a feed, configurable for each feed
  • Work easily with relational data (e.g. if an author changes, the publication should also change)
  • Simple and robust authentication through integration with the Apache webserver
  • Serve assets via Apache while still using configurable authentication rules

OAI-PMH: MOAI Server 1.0 Released

Infrae has released the MOAI Server 1.0, an open source OAI-PMH application.

Here's an excerpt from the press release:

MOAI is an open access server platform for institutional repositories. The server aggregates content from disparate sources, transforms it, stores it in a database, and (re)publishes the content, in one or many OAI feeds. Each feed has its own configuration.

The server has a flexible system for combining records into sets and uses these sets in the feed configuration. MOAI also comes with a simple yet flexible authentication scheme that can easily be customized. Besides providing authentication for the feeds, the authentication also controls access to the assets.

MOAI is a standalone system that can be used in combination with any repository software that comes with an OAI feed such as Fedora Commons, EPrints or DSpace. It can also be used directly with an SQL database or just a folder of XML file. . . .

MOAI has the ability to:

  • Harvest data from different kinds of sources
  • Serve many OAI feeds from one MOAI Server, each with their own configuration
  • Turn metadata values into OAI sets on the fly, creating new collections
  • Use OAI sets to filter records shown in a feed, configurable for each feed
  • Work easily with relational data (e.g. if an author changes, the publication should also change)
  • Provide simple and robust authentication through integration with the Apache webserver
  • Serve assets through Apache while still using configurable authentication rules

DigitalKoans

OCLC Research Releases Data Exchange Software for Museums

With support from a grant from the Andrew W. Mellon Foundation, OCLC Research has released data exchange software for museums.

Here's an excerpt from the press release:

COBOAT software is now available under a fee-free license for the purpose of publishing a CDWA Lite repository of collections information. It is a metadata publishing tool developed by Cognitive Applications Inc. (Cogapp) that transfers information between databases (such as collections management systems) and different formats. As configured for this project, COBOAT allows museums to extract standards-based records in the Categories for the Descriptions of Works of Art (CDWA) Lite XML data format out of Gallery Systems TMS, a leading collection management system in the museum industry. Configuration files allow COBOAT to be adjusted for extraction from different vendor-based or homegrown database systems, or locally divergent implementations of the same collections management systems.

OAICat Museum 1.0, an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) data content provider supporting CDWA Lite XML, is also available. It allows museums to share the data extracted with COBOAT using OAI-PMH.

eXtensible Catalog (XC) OAI Toolkit Released

The eXtensible Catalog project has released the eXtensible Catalog (XC) OAI Toolkit.

Here's an excerpt from the announcement:

The OAI Toolkit is used to make data stored in an institution's ILS or other repository available for harvesting via OAI-PMH, including other eXtensible Catalog applications. For an ILS, this is accomplished by exporting ILS metadata, converting it from MARC to MARCXML, and loading it into an OAI-PMH compliant repository. The repository (embedded in the OAI Toolkit) makes the data available for harvesting by other XC components.

The OAI Toolkit can be used as part of the XC system, or on its own to enable OAI-PMH harvestability of an existing repository. It is a server application written in Java and is only needed for ILS's and other repositories that do not already have the ability to be act as OAI-PMH Repositories (OAI Servers).

Public Knowledge Releases Open Archives Harvester 2.3.0

The Public Knowledge Project has released Open Archives Harvester, an open source OAI-PMH harvester.

Here's an excerpt from the announcement:

This is a major rewrite of numerous parts of the Harvester code, including metadata storage and indexing. It increases indexing flexibility to support plugin-based indexing, including Lucene/SOLR support. It also adds OAI Data Provider support, including the potential to convert between metadata formats (currently from various formats into Dublin Core).

Clarifications about the Michigan/OCLC OAIster Deal

Dorothea Salo has posted "The Straight Story on OAIster and Its Move" on Caveat Lector in which the University of Michigan Library's Katrina Hagedorn answers questions about the future of OAIster.

Here's an excerpt:

Q. Once oaister.org ceases to exist, there will be no way to search the harvested records for free except through worldcat.org, is that right?

A. I think those details haven’t been hammered out yet. Worldcat.org is one choice, yes. There will be likely be other products and services, and it’s likely you’ll be able to limit to just oaister records (for what that’s worth).

University of Michigan and OCLC Form OAIster Partnership

The University of Michigan and OCLC will jointly support the OAIster search engine for open access documents.

Here's an excerpt from the press release:

Launched in 2002 with grant support from the Andrew W. Mellon Foundation, OAIster was developed to test the feasibility of building a portal to open archive collections using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIster has since grown to become one of the largest aggregations of records pointing to open archive collections in the world with over 19 million records contributed by over 1,000 organizations worldwide.

Under the partnership, OAIster.org will continue to function as the public interface to OAIster collections, through funding provided by OCLC to the University of Michigan. Later in 2009, metadata harvesting operations will transfer from the University of Michigan to OCLC. . . .

Starting in late January 2009, while OAIster continues to be freely available at the www.oaister.org Web site, OCLC will host a version of OAIster on OCLC's FirstSearch platform and make it available through subscriptions to the FirstSearch Base Package at no additional charge.

CiteSeerX and SeerSuite: Havester + Search Engine + AI

In "CiteSeerX and SeerSuite—Adding to the Semantic Web," Avi Rappoport overviews beta versions of CiteSeerX and its open source, Java-based counterpart, SeerSuite.

Here's an excerpt:

Building on that experience, CiteSeerX is a completely new system, re-architected for scaling and modularity, to handle increasing demands from both researchers and digital library programmatic interfaces. The system uses artificial intelligence, machine learning, support vector machines, and other techniques to recognize and extract metadata for the articles found. It now uses the Lucene search engine and supports standards such as the Open Archives Initiative (OAI), including metadata browsing, and Z39.50. CiteSeerX has a simple but powerful internal structure for documents and citations. If it cannot access a document cited, it creates a virtual document as a place holder, which can then be filled when the document is available.

Australian National University's Harvester Service Released

The Australian National University has released its Harvester Service.

Here's an excerpt from the announcement:

The Harvester Service is a proxy harvester for processing and routing OAI-PMH Data Provider responses to various applications. It is intended it be used for integration with other applications requiring a harvesting service.

OAI2LODServer Version 0.2 Released

MediaSpaces has released Version 0.2 of the OAI2LODServer.

Here's a description from the software's home page:

The OAI2LOD Server exposes any OAI-PMH compliant metadata repository according to the Linked Data guidelines. This makes things and media objects accessible via HTTP URIs and query able via the SPARQL protocol. Parts of the OAI2LOD architecture, especially the front-end, are based on the D2R Server implementation.

Further, it provides a configurable linking mechanism based on string similarity metrics. This allows the automatic linking of OAI-PMH data with other open data sets such as DBPedia or any other OAI-PMH repository exposed via the OAI2LOD Server.

Repositories Support Project Releases Briefing Papers: Open Archives Initiative-Protocol for Metadata Harvesting and Workflows

The Repositories Support Project has released two briefing papers: Open Archives Initiative-Protocol for Metadata Harvesting and Workflows (i.e., digital repository submission workflows). Both briefing papers provide succinct introductions to the topic at hand.

Digital Library Federation and 10 Vendors/Developers Reach Accord about ILS Basic Discovery Interfaces

Ten vendors and application developers have agreed to support standard ILS interfaces that will permit integration and interoperability with emerging discovery services. These interfaces will be developed by the Digital Library Federation's ILS-Discovery Interface Committee. The participants are AquaBrowser, BiblioCommons, California Digital Library, Ex Libris, LibLime, OCLC, Polaris Library Systems, SirsiDynix, Talis, and VTLS.

Here's an excerpt from the announcement:

On March 6, representatives of the Digital Library Federation (DLF), academic libraries, and major library application vendors met in Berkeley, California to discuss a draft recommendation from the DLF for standard interfaces for integrating the data and services of the Integrated Library System (ILS) with new applications supporting user discovery. Such standard interfaces will allow libraries to deploy new discovery services to meet ever-growing user expectations in the Web 2.0 era, take full advantage of advanced ILS data management and services, and encourage a strong, innovative community and marketplace in next-generation library management and discovery applications.

At the meeting, participants agreed to support a set of essential functions through open protocols and technologies by deploying specific recommended standards.

These functions are:

  1. Harvesting. Functions to harvest data records for library collections, both in full, and incrementally based on recent changes. Harvesting options could include either the core bibliographic records, or those records combined with supplementary information (such as holdings or summary circulation data). Both full and differential harvesting options are expected to be supported through an OAI-PMH interface.
  2. Availability. Real-time querying of the availability of a bibliographic (or circulating) item. This functionality will be implemented through a simple REST interface to be specified by the ILS-DI task group.
  3. Linking. Linking in a stable manner to any item in an OPAC in a way that allows services to be invoked on it; for example, by a stable link to a page displaying the item's catalog record and providing links for requests for that item. This functionality will be implemented through a URL template defined for the OPAC as specified by the ILS-DI task group.

STARGATE Report Investigates Issues with Software to Support Harvesting for Publishers without OAI-PMH-compliant Repositories

The JISC-funded extension of the STARGATE project has released the STARGATE Extension Final Report.

Here's an excerpt from the original STARGATE project page that explains its goals:

The Centre for Digital Library Research (CDLR) at of Strathclyde set out to implement a low-tech solution to OAI-based disclosure for small publishers. Their STARGATE project was based on the 'static repositories' model for using OAI-PMH . . . Instead of building an OAI-compliant repository, a publisher builds a static repository, effectively an XML file of the relevant metadata on an accessible server. A separate static repository gateway handles the technical aspects of making the metadata available for harvesting, i.e. the complexity is shifted away from the publisher.

Here's an excerpt from the report's "Executive Summary":

The extension has produced a functional branded gateway that the publishing community can use to explore the use of static repositories. It will be maintained for the next year. The gateway is available at http://stargate.cdlr.strath.ac.uk/gateway/.

The project concludes that although functional the software is not suitable for deployment by a novice user. It is also effectively still in at the beta stage of development and it has only been used in a limited number of settings.

The project further suggests that the creation and maintenance of gateway(s) within the publishing community may be more suitably carried out in the same way that DOI and Purl provision is offered through a third-party service provider willing to work with developing open source software. Any deployment of a gateway by JISC to support wider participation in static repositories should also engage with the gateway software developers.

University of Michigan Libraries Release the UMich OAI Toolkit

The University of Michigan Libraries have released the UMich OAI Toolkit.

Here's an excerpt from the announcement:

This toolkit contains both harvester and data provider, both written in Perl. . . .

UMHarvester is a robust tool using LWP for harvesting nigh on every OAI data provider available. It allows for incremental harvesting, has multiple re-try options, and a batch harvest tool (Batch_UMHarvest) that can automatically perform incremental harvesting.

UMProvider relies heavily on libxml (XML::LibXML) and will store the data in nearly any relational database. It functions by harvesting from a database of records, making rights determinations from a separate database, and providing the resulting set of records.

Originally, only the UMHarvester was available from UM's DLXS software site. The UMProvider tool is newly developed and takes the place of our DLXS data provider tool.