Dave Mayo and Kate Bowers have published "The Devil's Shoehorn: A Case Study of EAD to ArchivesSpace Migration at a Large University" in the Code4Lib Journal.
Here's an excerpt:
A band of archivists and IT professionals at Harvard took on a project to convert nearly two million descriptions of archival collection components from marked-up text into the ArchivesSpace archival metadata management system. Starting in the mid-1990s, Harvard was an alpha implementer of EAD, an SGML (later XML) text markup language for electronic inventories, indexes, and finding aids that archivists use to wend their way through the sometimes quirky filing systems that bureaucracies establish for their records or the utter chaos in which some individuals keep their personal archives. These pathfinder documents, designed to cope with messy reality, can themselves be difficult to classify. Portions of them are rigorously structured, while other parts are narrative. Early documents predate the establishment of the standard; many feature idiosyncratic encoding that had been through several machine conversions, while others were freshly encoded and fairly consistent. In this paper, we will cover the practical and technical challenges involved in preparing a large (900MiB) corpus of XML for ingest into an open-source archival information system (ArchivesSpace). This case study will give an overview of the project, discuss problem discovery and problem solving, and address the technical challenges, analysis, solutions, and decisions and provide information on the tools produced and lessons learned.