In "CiteSeerX and SeerSuite—Adding to the Semantic Web," Avi Rappoport overviews beta versions of CiteSeerX and its open source, Java-based counterpart, SeerSuite.
Here's an excerpt:
Building on that experience, CiteSeerX is a completely new system, re-architected for scaling and modularity, to handle increasing demands from both researchers and digital library programmatic interfaces. The system uses artificial intelligence, machine learning, support vector machines, and other techniques to recognize and extract metadata for the articles found. It now uses the Lucene search engine and supports standards such as the Open Archives Initiative (OAI), including metadata browsing, and Z39.50. CiteSeerX has a simple but powerful internal structure for documents and citations. If it cannot access a document cited, it creates a virtual document as a place holder, which can then be filled when the document is available.
Terry Reese has posted step-by-step instructions about how to harvest OAI-PMH records from the University of Michigan Libraries' MBooks digital books collection using her MarcEdit freeware program. The data can either be converted to the MARC format or stored as is. MarcEdit also has a Z39.50 client as well as crosswalks, such as MARC to Dublin Core and MARC to EAD.
The Z39.50 Target Directory from Index Data includes both Z39.50– and SRU/SRW-enabled systems.
It can be searched or browsed by name.
Index Data has released Version 1.0.1 of Pazpar2, an open source Z39.50 client.
Here’s an excerpt from the press release:
Pazpar2 . . . can be viewed either as a high-performance metasearching middleware or a Z39.50 client with a webservice interface, depending on your perspective and needs. It is a fairly compact C program—a resident daemon—that incorporates the best we know how to do in terms of providing high performance, user-oriented federated searching. . . .
One cool thing it does is search many databases in parallel, and do it fast, without unduly loading up the user interface. . . It retrieves a set of records from each target, and performs merging, deduplication, ranking/sorting, and pulls browse facets from them. . . .
It doesn’t know anything about data models, so you can handle exotic data sources if you need to. . . you use XSLT to normalize data into an internal model—we provide examples for MARC21 and a DC-esque internal model, and configure ranking, facets, sorting, etc., from that. . . .