Podcasts from the CNI Fall 2007 Task Force Meeting

Podcasts are now available from CNI's Fall 2007 Task Force Meeting. Here's a selection:

Wikia Search Debuts to Pundits’ Criticism

An alpha version of Wikia's open source Wikia Search has gone public, but the consensus seems to be that this user-tuned search engine has a long way to go to compete with the likes of Google.

Read more about it at "Jimmy Wales Argues That His Wikia Needs More Time," "Wiki Citizens Taking on a New Area: Searching," "Wikia Launching Human-Powered Search," "Wikia Search Alpha Preview Leaves Much to Be Desired," "Wikia Search Is A Complete Letdown," and"Wikia Search—Miles Behind the Competition."

Institutional Repositories, Tout de Suite

Institutional Repositories, Tout de Suite, the latest Digital Scholarship publication, is designed to give the reader a very quick introduction to key aspects of institutional repositories and to foster further exploration of this topic through liberal use of relevant references to online documents and links to pertinent websites. It is under a Creative Commons Attribution-Noncommercial 3.0 United States License, and it can be freely used for any noncommercial purpose in accordance with the license.

University of Michigan Libraries Release the UMich OAI Toolkit

The University of Michigan Libraries have released the UMich OAI Toolkit.

Here's an excerpt from the announcement:

This toolkit contains both harvester and data provider, both written in Perl. . . .

UMHarvester is a robust tool using LWP for harvesting nigh on every OAI data provider available. It allows for incremental harvesting, has multiple re-try options, and a batch harvest tool (Batch_UMHarvest) that can automatically perform incremental harvesting.

UMProvider relies heavily on libxml (XML::LibXML) and will store the data in nearly any relational database. It functions by harvesting from a database of records, making rights determinations from a separate database, and providing the resulting set of records.

Originally, only the UMHarvester was available from UM's DLXS software site. The UMProvider tool is newly developed and takes the place of our DLXS data provider tool.

National Science Digital Library Releases Initial Fedora-based NCore Components

The National Science Digital Library Core Integration team at Cornell University has released a partial version of NCore, a "general platform for building semantic and virtual digital libraries united by a common data model and interoperable applications," which is built upon Fedora.

Here's an excerpt from the NSDL posting:

The NCore platform consists of a central repository built on top of Fedora, a data model, an API, and a number of fundamental services such as full-text search or OAI-PMH. Innovative NSDL services and tools that empower users as content creators are now built on, or transitioning to, the NCore platform. These include: the Expert Voices blogging system (http://expertvoices.nsdl.org/);the NSDL Wiki (http://wiki.nsdl.org/index.php/NSDL_Wiki); the NSDL OAI-PMH metadata ingest aggregation system; the OAI-PMH service for distributing public NSDL metadata; the NSDL Collection System (NCS), derived from the DLESE Collection system (DCS); the NSDL Search service, and the OnRamp content management and distribution system (http://onramp.nsdl.org).

Because NCore is a general Fedora-based open source platform useful beyond NSDL, Core Integration developers at Cornell University have made the repository and API code components of NCore available for download at the NCore project on Sourceforge (http://sourceforge.net/projects/nsdl-core). Over the next six months, NSDL will release the code for major tools and services that comprise the full NCore suite on SourceForge.

For further information, see the NCore presentation.

Perseus Digital Library Code and Content Now Freely Available

The Perseus Digital Library Project has released both the source code for Perseus 4.0 and a significant amount of the project's digital content. The Perseus Java Hopper code is open source; the content is under a Attribution-Noncommercial-Share Alike 3.0 United States license.

Here's a description of the Perseus Digital Library from the About page:

Since planning began in 1985, the Perseus Digital Library Project has explored what happens when libraries move online. . . .

Our flagship collection, under development since 1987, covers the history, literature and culture of the Greco-Roman world. We are applying what we have learned from Classics to other subjects within the humanities and beyond. We have studied many problems over the past two decades, but our current research centers on personalization: organizing what you see to meet your needs.

We collect texts, images, datasets and other primary materials. We assemble and carefully structure encyclopedias, maps, grammars, dictionaries and other reference works. At present, 1.1 million manually created and 30 million automatically generated links connect the 100 million words and 75,000 images in the core Perseus collections. 850,000 reference articles provide background on 450,000 people, places, organizations, dictionary definitions, grammatical functions and other topics.

Version 1.0 of SWORD, A Smart Deposit Tool for Repositories, Has Been Released

Version 1.0 of SWORD has been released The release includes DSpace (1.5 only) and Fedora implementations, GUI/CLI clients, and the common Java library.

Here's an excerpt from the SWORD Wiki that describes the project:

SWORD (Simple Web-service Offering Repository Deposit) will take forward the Deposit protocol developed by a small working group as part of the JISC Digital Repositories Programme by implementing it as a lightweight web-service in four major repository software platforms: EPrints, DSpace, Fedora and IntraLibrary. The existing protocol documentation will be finalised by project partners and a prototype 'smart deposit' tool will be developed to facilitate easier and more effective population of repositories. The project intends to take an iterative approach to developing and revising the protocol, web-services and client implementation through evaluative testing and feedback mechanisms. Community acceptance and take-up will be sought through dissemination activities. The project is led by UKOLN, University of Bath, with partners at the University of Wales, Aberystwyth, the University of Southampton and Intrallect Ltd. The project aims to improve the efficiency and quality of repository deposit and to diversity and expedite the options for timely population of repositories with content whilst promoting a common deposit interface and supporting the Information Environment principles of interoperability.

Mellon Funds Phase 2 of the eXtensible Catalog Project

The Andrew W. Mellon Foundation has given the University of Rochester Libraries a grant to support continued work on its eXtensible Catalog project.

Here's an excerpt from the announcement:

A $749,000 grant from the Andrew W. Mellon Foundation to the University’s River Campus Libraries will be used toward building and deploying the eXtensible Catalog (XC), a set of open-source software applications libraries can use to share their collections. The grant money will also be used to support broad adoption of the software by the library community. The grant and additional funding from the University and partner institutions makes up the $2.8 million needed for the project. The resulting system will allow libraries to simplify user access to all library resources, both digital and non-digital. . . .

It [XC] will provide a platform for local development and experimentation that will ultimately allow libraries to share their collections through a variety of applications, such as Web sites, institutional repositories, and content management systems.

University of Rochester staff will build XC in partnership with the following institutions: Notre Dame University, CARLI (Consortium of Academic and Research Libraries in Illinois), Rochester Institute of Technology, Oregon State University, the Georgia PINES Consortium, Cornell University, the University at Buffalo, Ohio State University, and Yale University. Each XC partner institution has committed staff time or monetary contributions toward the development of XC.

A second group of institutions will contribute to the project through the participation of its staff members in XC-user research, or by providing advisory support to the University’s development team. These institutions include the Library of Congress, OCLC, Inc., North Carolina State University, Darien (CT) Public Library, Ohio State University, and Yale University.

Creative Commons Seeks Feedback from Librarians about LiveDVD

Timothy Vollmer has announced on Lita-L (10/28/07 message) that the Creative Commons is looking for feedback about its LiveDVD for libraries, which is part of its LiveContent project.

Here's an excerpt from the message:

Creative Commons is working with Fedora on creating a LiveDVD for libraries that contains free, open source software (like OpenOffice, The Gimp, Inkscape, Firefox) and open content, including CC-licensed media such as audio, video, photographs, text and open educational resources. . . .

The next iteration we're working on is a LiveDVD for libraries, providing an informational resource and creative tool that would allow library patrons to test open source software, view (and rip, remix, reuse) open content, and even create new content with the software contained on the disc. . . .

We want to get some more feedback/comments/suggestions on the project and are also looking to identify librarians and interested groups to test out the LiveDVD!

DSpace 1.5 Alpha Released

The 1.5 alpha version of the popular DSpace repository software has been released.

Here's an excerpt from "DSpace 1.5 Alpha with Experimental Binary Distribution" by Richard Jones:

There are big changes in this code base, both in terms of functionality and organisation. First, we are now using Maven to manage our build process, and have carved the application into a set of core modules which can be used to assemble your desired DSpace instance. . . .

The second big and most exciting thing is that Manakin is now part of our standard distribution, and we want to see it taking over from the JSP UI over the next few major releases. . . .

In addition to this, we have an Event System which should help us start to decouple tightly integrated parts of the repository. . . . Browsing is now done with a heavily configurable system . . . . Tim Donohue's much desired Configurable Submission system is now integrated with both JSP and Manakin interfaces and is part of the release too.

Further to this we have a bunch of other functionality including: IP Authentication, better metadata and schema registry import, move items from one collection to another, metadata export, configurable multilingualism support, Google and html sitemap generator, Community and Sub-Communities as OAI Sets, and Item metadata in XHTML head ‹meta› elements.

Muradora 1.0, a Fedora Front-End, Released

DRAMA (Digital Repository Authorization Middleware Architecture) has released Muradora 1.0, a Fedora front-end that provides identity control (via Shibboleth), authorization (via XACML), and other functions. DRAMA is a sub-project of RAMP (Research Activityflow and Middleware Priorities Project). A Live DVD image simplifies installation.

Here’s an excerpt from the fedora-commons-users posting:

  • "Out-of-the-box" or customized deployment options
  • Intuitive access control editor allows end-users to specify their own access control criteria without editing any XML.
  • Hierarchical enforcement of access control policies. Access control can be set at the collection level, object level or datastream level.
  • Metadata input and validation for any well-formed metadata schema using XForms (a W3C standard). New metadata schemas can be supported via XForms scripts (no Muradora code modification required).
  • Flexible and extensible architecture based on the well known Java Spring enterprise framework.
  • Multiple deployments of Muradora (each customized for their own specific purpose) can talk to the one instance of Fedora.
  • Freely available as open source software (Apache 2 license). All dependent software is also open source.

Omeka: The Open-Source, IMLS-funded Web Publishing System for Museums

The Center for History and New Media at George Mason University has provided further details about its IMLS grant for Omeka.

Here's an excerpt from the posting:

HNM is also celebrating its IMLS funding for Omeka, a next-generation web-publishing platform for smaller history museums, historical societies, and historic sites. From the Swahili word meaning “to display” or “to lay out for discussion,” Omeka is designed for these groups that they may not have the adequate resources or expertise necessary to create and maintain their own online tools. The free, open-source tool will allow many more museums to mount well-designed, professional-looking, and content-rich web sites without adding to their constrained budgets. It will also provide a standards-based interoperable system to share and use digital content in multiple contexts so that museums can design online exhibitions more efficiently. Beginning in October 2007, CHNM will plan, design, test, evaluate, and disseminate Omeka over four phases while working closely with our major partner, the Minnesota Historical Society (MHS). MHS represents a wide museum network and a broad range of history and heritage institutions of different sizes, audiences, and subject area interests. In addition, we will make Omeka available to other small museums through conference presentations, direct mailings, and the CHNM website.

Evergreen 1.2.0 Released

Version 1.2.0 of the open-source Evergreen ILS software has been released.

Here's an excerpt from the Frequently Asked Questions:

From a library perspective, what does Evergreen do? What modules or components are available?

Evergreen currently has modules for circulation, cataloging, web catalog, and statistical reporting. Evergreen also supports the SIP2 protocol for self-check and Internet/computer access control.

What does it not do?

Evergreen's Acquisitions and Serials modules are currently under joint development with the University of Windsor. Other features on our roadmap include a Z39.50 server, and telephony and credit card support.

Scriblio Beta Released: A WordPress-Based CMS and OPAC

The Scriblio beta version has been released.

Here's a description of Scriblio from the About Scriblio page:

Scriblio (formerly WPopac) is an award winning, free, open source CMS and OPAC with faceted searching and browsing features based on WordPress. Scriblio is a project of Plymouth State University, supported in part by the Andrew W. Mellon Foundation.

  • Free and open source
  • Represents bibliographic collections — library catalogs and such — in an easily searchable, highly remixable web-based format
  • Leverages WordPress to offer rich content management features for all a library’s content
  • Free and open source

Xena 4.0: Open Source Digital Preservation Software

The National Archives of Australia has released Xena 4.0, which is open source digital preservation software.

Here's a brief description of its capabilities from the project homepage:

Xena software aids digital preservation by performing two important tasks:

  • Detecting the file formats of digital objects
  • Converting digital objects into open formats for preservation

LibraryFind 0.8.2 Released

The Oregon State University Libraries have released LibraryFind 0.8.2.

Here’s an excerpt from the CODE4LIB announcement:

LibraryFind is metasearch software written in Ruby-on-Rails. It allows libraries to provide a unified search solution to their users, letting library users search across both licensed collections and local collections. LibraryFind is open source software (licensed under the GPL), and is free to download and use. More information on LibraryFind can be found at http://libraryfind.org.

Digital Assets Factory Version 2.0 Released Under GPL

Bibliotheca Alexandrina has released version 2.0 of its open-source Digital Assets Factory software.

Here’s an excerpt from the project home page:

DAF v2.0 provides all the necessary tools required to manage the whole process of a digitization workflow, including its various Phases, User management, file movement and archiving. It provides the flexibility to manage multiple simultaneous projects with a diversity of materials, covering books, journals, newspapers, manuscripts, unbound materials, audio, video, and slides.

Advancing Knowledge: The IMLS/NEH Digital Partnership Grants Awarded

The Institute of Museum and Library Services and the National Endowment for the Humanities have announced the award of three grants under their Advancing Knowledge: The IMLS/NEH Digital Partnership program.

Here's an excerpt from the press release:

  • $347,520 to Historical Society of Pennsylvania for its project: PhilaPlace: A Neighborhood History and Culture Project. The Historical Society of Pennsylvania in collaboration with the Philadelphia Department of Records and the University of Pennsylvania’s School of Design will develop PhilaPlace, an interactive Web resource chronicling the history, culture, and architecture of Philadelphia's neighborhoods. Complete with maps, historical records, photographs, and digital models of select neighborhoods, PhilaPlace will serve as a prototype website for communities wishing to digitize their cultural heritage.
  • $349,939 to Tufts University, Medford for its project: Scalable Named Entity Identification in Classical Studies. The Perseus Project and the Collections and Archives of Tufts University will construct a testing database of scholarly and cultural documents about the ancient world. In the second part of the project, Tufts will develop a digital reference tool allowing researchers and librarians to conduct context-based “smart searches” of un-indexed words from existing databases in the Tufts Digital Library. By developing this database, and allowing for much shorter and complete context-based searches, Tufts hopes to lead scholars and students to the next generation of digital tools.
  • $349,996 to University of California, Berkeley for its project: Context and Relationships: Ireland and Irish Studies. The University of California, Berkeley in collaboration with the Queen’s University, Belfast, will develop a digital database of Irish studies materials to test three open-source digital tools. The Context Finder, Context Builder, and Context Provider tools will be aimed at establishing scholarly context. Using a common word search feature in digital collections, these tools will allow users to access the ideas that are associated with the words, thereby creating context through maps, primary texts and secondary works.

TableSeer: Searching and Ranking PDF Table Data

Researchers at Penn State's College of Information Sciences and Technology's Cyber-Infrastructure Lab have developed open source software called TableSeer that can find, extract, search, and rank table data from PDF files. Source code will be available at the project's close.

Here's an extract from the press release:

Tables are an important data resource for researchers. In a search of 10,000 documents from journals and conferences, the researchers found that more than 70 percent of papers in chemistry, biology and computer science included tables. Furthermore, most of those documents had multiple tables.

But while some software can identify and extract tables from text, existing software cannot search for tables across documents. That means scientists and scholars must manually browse documents in order to find tables-a time-consuming and cumbersome process.

TableSeer automates that process and captures data not only within the table but also in tables' titles and footnotes. In addition, it enables column-name-based search so that a user can search for a particular column in a table.

In tests with documents from the Royal Society of Chemistry, TableSeer correctly identified and retrieved 93.5 percent of tables created in text-based formats. . . .

Information on TableSeer can be found in a paper, "TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries," by Ying Liu, Kun Bai, Mitra and Giles of the Penn State College of Information Sciences and Technology.

UNIX Ruling: An Open Source Victory

In a blow to the SCO Group, Dale A. Kimball, a judge in the U.S. District Court for the District of Utah Central District, has ruled that Novell owns the disputed copyright to the UNIX operating system. The judge also ruled that SCO must drop its suits against IBM Corp and Sequant as well as pay Novell part of its licensing fees from Sun and Microsoft.

Here's an excerpt from "Novell Wins Right to Unix, Dismissing SCO":

The ruling is good news for organizations that use open-source software products, said Jim Zemlin, executive director of the Linux Foundation. "From the perspective of someone who is adopting open-source solutions to run in the enterprise, it proves to them that the industry is going to defend the platform, and that when organizations attack it from a legal perspective, that the industry collectively will defend it," he said.

Here's an excerpt from "Judge Says Unix Copyrights Belong to Novell":

The court's ruling has cut out the core of SCO's case and, as a result, eliminates SCO's threat to the Linux community based upon allegations of copyright infringement of Unix," said Joe LaSala, Novell's senior vice president and general counsel.

Sources: Gohring, Nancy. "Novell Wins Right to Unix, Dismissing SCO." InfoWorld, 10 August 2007; Markoff, John. "Judge Says Unix Copyrights Belong to Novell." The New York Times, 11 August 2007.

CommentPress 1.0 Theme Released: Paragraph-Level Commenting in WordPress

After a year-and-a-half of development effort, the Institute for the Future of the Book has released the open-source CommentPress 1.0 theme for WordPress, which allows paragraph-level comments that are displayed side-by-side with the associated paragraph.

Here’s an excerpt from the announcement:

This little tool is the happy byproduct of a year and a half spent hacking WordPress to see whether a popular net-native publishing form, the blog, which, most would agree, is very good at covering the present moment in pithy, conversational bursts but lousy at handling larger, slow-developing works requiring more than chronological organization—whether this form might be refashioned to enable social interaction around long-form texts. Out of this emerged a series of publishing experiments loosely grouped under the heading "networked books." . . .

In the course of our tinkering, we achieved one small but important innovation. Placing the comments next to rather than below the text turned out to be a powerful subversion of the discussion hierarchy of blogs, transforming the page into a visual representation of dialog, and re-imagining the book itself as a conversation. Several readers remarked that it was no longer solely the author speaking, but the book as a whole (author and reader, in concert). . . .

We can imagine a number of possibilities:

— scholarly contexts: working papers, conferences, annotation projects, journals, collaborative glosses
— educational: virtual classroom discussion around readings, study groups
— journalism/public advocacy/networked democracy: social assessment and public dissection of government or corporate documents, cutting through opaque language and spin (like our version of the Iraq Study Group Report, or a copy of the federal budget, or a Walmart press release)
— creative writing: workshopping story drafts, collaborative storytelling
— recreational: social reading, book clubs