Update on the British Public Library/Microsoft Digitization Project
Jim Ashling provides an update on the progress that the British Public Library and Microsoft have made in their project to digitize about 100,000 books for access in Live Book Search in his Information Today article "Progress Report: The British Library and Microsoft Digitization Partnership."
Here's an excerpt from the article:
Unlike previous BL digitization projects where material had been selected on an item-by-item basis, the sheer size of this project made such selectivity impossible. Instead, the focus is on English-language material, collected by the BL during the 19th century. . . .
Scanning produces high-resolution images (300 dpi) that are then transferred to a suite of 12 computers for OCR (optical character recognition) conversion. The scanners, which run 24/7, are specially tuned to deal with the spelling variations and old-fashioned typefaces used in the 1800s. The process creates multiple versions including PDFs and OCR text for display in the online services, as well as an open XML file for long-term storage and potential conversion to any new formats that may become future standards. In all, the data will amount to 30 to 40 terabytes. . . .
Obviously, then, an issue exists here for a collection of 19th-century literature when some authors may have lived beyond the late 1930s [British/EU law gives authors a copyright term of life plus 70 years]. An estimated 40 percent of the titles are also orphan works. Those two issues mean that item-by-item copyright checking would be an unmanageable task. Estimates for the total time required to check on the copyright issues involved vary from a couple of decades to a couple of hundred years. The BL’s approach is to use two databases of authors to identify those who were still living in 1936 and to remove their work from the collection before scanning. That, coupled with a wide publicity to encourage any rights holders to step forward, may solve the problem.
Latest posts in Copyright
- "GBS March Madness: Paths Forward for the Google Books Settlement" - March 5th, 2010
- Unintended Consequences: 12 Years Under the DMCA - March 4th, 2010
- "The Amended Google Books Settlement is Still Exclusive" - March 3rd, 2010
Latest posts in Digitization
- "Control of Museum Art Images: The Reach and Limits of Copyright and Licensing" - January 26th, 2010
- Updated: "Copyright Term and the Public Domain in the United States" Chart - January 20th, 2010
- National Library of the Netherlands Plans to Digitize All Dutch Books, Newspapers, and Periodicals from 1470 - January 11th, 2010
Latest posts in E-Books
- HighWire Press 2009 Librarian eBook Survey - March 5th, 2010
- "GBS March Madness: Paths Forward for the Google Books Settlement" - March 5th, 2010
- "The Amended Google Books Settlement is Still Exclusive" - March 3rd, 2010
Latest posts in Google and Other Search Engines
- "GBS March Madness: Paths Forward for the Google Books Settlement" - March 5th, 2010
- "The Amended Google Books Settlement is Still Exclusive" - March 3rd, 2010
- Google Book Search Settlement Hearing Transcript - February 22nd, 2010
Latest posts in Mass Digitizaton
- "GBS March Madness: Paths Forward for the Google Books Settlement" - March 5th, 2010
- "The Amended Google Books Settlement is Still Exclusive" - March 3rd, 2010
- Google Book Search Settlement Hearing Transcript - February 22nd, 2010
Latest posts in Open Access
- SPARC: Campus-Based Open-Access Publishing Funds - March 5th, 2010
- Digital Video: Peter Suber on the Future of Open Access - March 4th, 2010
- Duke University Draft Open Access Policy - March 3rd, 2010













