Harvard University Library Launched Web Archive Collection Service (WAX)

The Harvard University Library has launched its Web Archive Collection Service (WAX).

Here's an excerpt from the press release (posted on DIGLIB@infoserv.nlc-bnc.ca):

WAX began as a pilot project in July 2006, funded by the University's Library Digital Initiative (LDI) to address the management of web sites by collection managers for long-term archiving. It was the first LDI project specifically oriented toward preserving "born-digital" material. . . .

During the pilot, we explored the legal terrain and implemented several methods of mitigating risks. We investigated various technologies and developed work flow efficiencies for the collection managers and the technologists. We analyzed and implemented the metadata and deposit requirements for long term preservation in our repository. We continue to look at ways to ease the labor intensive nature of the QA process, to improve display as the software matures and to assess additional requirements for long term preservation. . . .

WAX was built using several open source tools developed by the Internet Archive and other International Internet Preservation Consortium (IIPC) members. These IIPC tools include the Heritrix web crawler; the Wayback index and rendering tool; and the NutchWAX index and search tool. WAX also uses Quartz open source job scheduling software from OpenSymphony.

In February 2009, the pilot public interface was launched and announced to the University community. WAX has now transitioned to a production system supported by the University Library's central infrastructure.

One thought on “Harvard University Library Launched Web Archive Collection Service (WAX)”

Comments are closed.