Update on the British Public Library/Microsoft Digitization Project
Jim Ashling provides an update on the progress that the British Public Library and Microsoft have made in their project to digitize about 100,000 books for access in Live Book Search in his Information Today article "Progress Report: The British Library and Microsoft Digitization Partnership."
Here's an excerpt from the article:
Unlike previous BL digitization projects where material had been selected on an item-by-item basis, the sheer size of this project made such selectivity impossible. Instead, the focus is on English-language material, collected by the BL during the 19th century. . . .
Scanning produces high-resolution images (300 dpi) that are then transferred to a suite of 12 computers for OCR (optical character recognition) conversion. The scanners, which run 24/7, are specially tuned to deal with the spelling variations and old-fashioned typefaces used in the 1800s. The process creates multiple versions including PDFs and OCR text for display in the online services, as well as an open XML file for long-term storage and potential conversion to any new formats that may become future standards. In all, the data will amount to 30 to 40 terabytes. . . .
Obviously, then, an issue exists here for a collection of 19th-century literature when some authors may have lived beyond the late 1930s [British/EU law gives authors a copyright term of life plus 70 years]. An estimated 40 percent of the titles are also orphan works. Those two issues mean that item-by-item copyright checking would be an unmanageable task. Estimates for the total time required to check on the copyright issues involved vary from a couple of decades to a couple of hundred years. The BL’s approach is to use two databases of authors to identify those who were still living in 1936 and to remove their work from the collection before scanning. That, coupled with a wide publicity to encourage any rights holders to step forward, may solve the problem.
Latest posts in Copyright
- New from Boyle: The Public Domain: Enclosing the Commons of the Mind - December 4th, 2008
- "Comments on the Commission's Green Paper on Copyright in the Knowledge Economy" - December 3rd, 2008
- Draft Creative Commons Licences—Briefing Paper Available - November 20th, 2008
Latest posts in Digitization
- PALINET to Digitize 20 Million Textual Pages - October 29th, 2008
- JISC Releases Report on Book Scanners - October 22nd, 2008
- Committee on Institutional Cooperation and University of California Launch HathiTrust, Shared Digital Repository - October 13th, 2008
Latest posts in E-Books
- NetLibrary UK Library Survey Shows Strong Interest in Increasing E-Book Holdings - November 24th, 2008
- Sony's Third-Generation E-Book Reader, the PRS-700 - October 3rd, 2008
- Open Knowledge Foundation Virtual Meeting on Open Textbooks - October 2nd, 2008
Latest posts in Mass Digitizaton
- Federal Judge John Sprizzo Tentatively Approves Google-AAP/AG Settlement - November 18th, 2008
- A Guide for the Perplexed: Libraries & the Google Library Project Settlement - November 14th, 2008
- Georgia Harper on the Google-AAP/AG Copyright Settlement - November 6th, 2008
Latest posts in Open Access
- Stanford's HighWire Press Hits 5 Million Article Mark - December 3rd, 2008
- Open Access Directory Seeks Volunteers - November 20th, 2008
- Digital Library Software: Greenstone Version 2.81 Released - November 13th, 2008
Latest posts in Search Engines
- Federal Judge John Sprizzo Tentatively Approves Google-AAP/AG Settlement - November 18th, 2008
- A Guide for the Perplexed: Libraries & the Google Library Project Settlement - November 14th, 2008
- Reference Extract: The Librarian-Recommendation-Weighted Search Engine - November 9th, 2008





























