ACLS Humanities E-Book XML Conversion Experiment: Report on Workflow, Costs, and User Preferences

The American Council of Learned Societies has released ACLS Humanities E-Book XML Conversion Experiment: Report on Workflow, Costs, and User Preferences.

Here's an excerpt:

In 2008, ACLS Humanities E-Book (HEB)—a subscription-based online collection of over 2,200 digital titles in the humanities—undertook an experiment to investigate the possibility of a future mass conversion of e-books preexisting in a scanned, page-image format into XML-encoded files. . . .

HEB had 20 sample page-image titles from its backlist converted to XML, using OCR-derived text files that had been created during the initial scanning process to enable searching. The books were tagged using a simplified version of HEB's standard specifications, to reduce the need for editorial intervention. . . . The cost of creating the XML titles was considerably greater than that associated with scanning (about $400 versus $170 per title).

The XML books were presented in the HEB collection side by side with their page-image counterparts. Despite any conversion-related flaws, our subsequent user survey indicated that readers preferred the XML format by a margin of about two to one, the most relevant factors cited in this regard being readability, accessible text, and additional features and functions not available in the page-image version.