Mass Digitisation: The IMPACT Project

Fifteen institutions from Europe and the UK have launched the IMPACT project.

Here's an excerpt from the press release:

Feeding into the EU's i2010 vision to significantly improve access to Europe's cultural heritage, the British Library and the University of Salford have teamed up with a group of 15 institutions from across the continent as part of the four-year IMPACT project—IMProving Access to Text—to remove the barriers that stand in the way of the mass digitisation of the European cultural heritage.

Led by the National Library of the Netherlands, Koninklijke Bibliotheek, the IMPACT project aims to share expertise from across Europe and establish international best practice guidelines with a view to speeding up, standardising and enhancing the quality of mass digitisation through establishing a Centre of Competence for text based digitisation. As one of the main participants, the British Library has taken the lead on one of IMPACT's four sub-projects, establishing the operational context of the work carried out by contributors to the project.

Mass digitisation has become one of the most prominent issues in the library world over the last 5 years, with a number of experienced libraries in Europe already scanning millions of pages each year. To help establish some standardisation over the course of the project, the British Library's team will lead work on a set of 'Decision Support Tools' in an effort to focus on practical implementation support, providing guidance on digitisation workflow, the capturing of material and the organisation of metadata based on the real world experiences of project partners. These measures, announced at the first IMPACT conference in April will help ensure new material can be digitised successfully and feed into existing workflows. . . .

With extensive experience working with the digitisation of historic material, the British Library has also been working closely with technical experts at the internationally distinguished Pattern Recognition and Image Analysis (PRImA) research group, University of Salford, exploring methods of improving Optical Character Recognition (OCR) for use in the digitisation of less standardised material. OCR technology was absolutely vital for the delivery of the Library's recent newspaper digitisation project of 19th Century UK newspapers (http://newspapers.bl.uk/blcs), allowing the text to be fully searchable, but the current technology has it limitations. . . .

Through collaboration IMPACT has already established methods for overcoming issues with geometric correction, border removal and binarisation, and is looking at examples of best practice from around the world, such as the Australian Newspaper Digitisation project's cutting edge application of collaborative user generated corrections, to increase resource discovery success for historic mass digitisation.