PALINET to Digitize 20 Million Textual Pages

With support from the Alfred P. Sloan Foundation, PALINET's Mass Digitization Collaborative plans to digitize 20 million textual pages of public domain material from participating member libraries. The scanned digital texts will be freely available from the Internet Archive.

Read more about at "PALINET's Mass Digitization Collaborative Underway."

JISC Releases Report on Book Scanners

JISC has released Public Exhibition of Automated Book Scanners Hosted at the Bayerische StaatsBibliothek—Munich 18th-20th June 2008.

Here's an excerpt from the announcement:

Julian Ball, the author of the report, attended an event at the Munich Digitisation Centre (18-10 June 2008) where four vendors exhibited and demonstrated their scanners: Qidenus, Kirtas, Treventus and 4DigitalBooks.

The report lists basic specifications for each scanner, contact details and personal observations on the various products.

Committee on Institutional Cooperation and University of California Launch HathiTrust, Shared Digital Repository

The Committee on Institutional Cooperation and the University of California System's university libraries have launched the HathiTrust, a shared digital repository.

Here's an excerpt from the press release:

A group of the nation’s largest research libraries are collaborating to create a repository of their vast digital collections, including millions of books, organizers announced today. These holdings will be archived and preserved in a single repository called the HathiTrust. Materials in the public domain will be available for reading online. . . .

Launched jointly by the 12-university consortium known as the Committee on Institutional Cooperation (CIC) and the 11 university libraries of the University of California system, the HathiTrust leverages the time-honored commitment to preservation and access to information that university libraries have valued for centuries. UC’s participation will be coordinated by the California Digital Library (CDL), which brings its deep and innovative experience in digital curation and online scholarship to the HathiTrust.

"This effort combines the expertise and resources of some of the nation’s foremost research libraries and holds even greater promise as it seeks to grow beyond the initial partners," says John Wilkin, associate university librarian of the University of Michigan and the newly named executive director of HathiTrust. Hathi (pronounced hah-TEE), the Hindi word for elephant incorporated into the repository’s name, underscores the immensity of this undertaking, Wilkin says. Elephants also evoke memory, wisdom, and strength.

As of today, HathiTrust contains more than 2 million volumes and approximately ¾ of a billion pages, about 16 percent of which are in the public domain. Public domain materials will be available for reading online. Materials protected by copyright, although not available for reading online, are given the full range of digital archiving services, thereby offering member libraries a reliable means to preserve their collections. Organizers also expect to use those materials in the research and development of the Trust.

Volumes are added to the repository daily, and content will grow rapidly as the University of California, CIC member libraries, and other prospective partners contribute their digitized content. Also today, the founding partners announce that the University of Virginia is joining the initiative.

Each of the founding partners brings extensive and highly regarded expertise in the areas of information technology, digital libraries, and project management to this endeavor. Creation of the HathiTrust supports the digitization efforts of the CIC and the University of California, each of which has entered into collective agreements with Google to digitize portions of the collections of their libraries, more than 10 million volumes in total, as part of the Google Book Search project. Materials digitized through other means will also be made available through HathiTrust.

Read more about it at "University Libraries in Google Project to Offer Backup Digital Library."

Federal Agencies Digitization Guidelines Initiative Website Launched

The Federal Agencies Digitization Guidelines Initiative has launched its Website.

Here's a summary from the home page:

This site is a collaborative effort by federal agencies formed as a group in 2007 to define common guidelines, methods, and practices to digitize historical content in a sustainable manner. Recognizing that the effort would require specialized expertise, two separate working groups were formed with the possibility that more tightly focused groups might be necessary as the work progressed. The Federal Agencies Still Image Digitization Working Group will concentrate its efforts on image content such as books, manuscripts, maps, and photographic prints and negatives. The Federal Agencies Audio-Visual Working Group is focusing its work on sound, video, and motion picture film.

Bellinger Named Director of the Office of Digital Assets and Infrastructure at Yale

Meg Bellinger, Associate University Librarian for Integrated Access and Technical Services at the Yale University Library, has been named Director of the Office of Digital Assets and Infrastructure at Yale, a new position in the Provost's office that is responsible for university-wide digitization.

Read more about it at "Bellinger to Direct Digitizing Office."

Google Newspaper Digitization Project Announced

Google has announced a newspaper digitization project that will "make more old newspapers accessible and searchable online by partnering with newspaper publishers to digitize millions of pages of news archives."

Read more about it at "Bringing History Online, One Newspaper at a Time."

Technical Report: Doctoral Theses Digitisation

Ingrid Mason, Digital Research Repository Coordinator at the New Zealand Electronic Text Centre of the Victoria University of Wellington’s University Library, has deposited a report (Technical Report: Doctoral Theses Digitisation) about that library's doctoral theses digitization project in its institutional repository.

Here's an excerpt:

Doctoral theses (~1200) in the University Library’s collection have been digitised and uploaded into the Library’s two research repositories: RestrictedArchive@Victoria and ResearchArchive@Victoria. With a view to sharing learning and useful information key considerations for other tertiary institutions undertaking a similar project are:

  • digital file sizes and server storage space
  • purpose of and standards of digitisation for access
  • data matching from library system and alumni database
  • database listing and tracking of theses and allied tasks
  • inventory listing and batching of theses into boxes
  • costs for digitisation, transportation and short term assistance

British Library Releases Its "Digitisation Strategy 2008-2011"

The British Library has released its "Digitisation Strategy 2008-2011."

Here's an excerpt:

Over the next 3 years we will build on our existing digitisation programme. Current projects include the digitisation of:

  • 20 million pages of 19th century literature [approximately 80,000 books];
  • 1 million pages of historic newspapers in addition to the 3m already digitised;
  • 4,000 hours of Archival Sound Recordings in addition to the 4,000 hours already digitised;
  • 100,000 pages of Greek manuscripts.

Our top priority digitisation programme in support of the Library's corporate strategy 2008-2011 is the digitisation of newspapers.

OCLC Announces WorldCat Copyright Evidence Registry Beta

OCLC has announced the WorldCat Copyright Evidence Registry beta, a union catalog of copyright information.

Here's an excerpt from the press release:

The WorldCat Copyright Evidence Registry is a community working together to build a union catalog of copyright evidence based on WorldCat, which contains more than 100 million bibliographic records describing items held in thousands of libraries worldwide. In addition to the WorldCat metadata, the Copyright Evidence Registry uses other data contributed by libraries and other organizations.

Digitization projects continue for books in the public domain, but books whose copyright status is unknown are destined to remain in print and on shelves until their status can be determined. The process to determine copyright status can be lengthy and labor intensive. The goal of the Copyright Evidence Registry is to encourage a cooperative environment to discover, create and share copyright evidence through a collaboratively created and maintained database, using the WorldCat cooperative model to eliminate duplicate efforts. . . .

The Copyright Evidence Registry six-month pilot was launched July 1 to test the concept and functionality. Users can search the Copyright Evidence Registry to find information about a book, learn what others have said about its copyright status, and share what they know. . . .

During a later stage of the pilot, OCLC will add a feature enabling pilot libraries to create and run automated copyright rules conforming to standards they define for determining copyright status. The rules will help libraries analyze the information available in the Copyright Evidence Registry and form their own conclusions about copyright status.

Five TexTreasures Digitization Grants Awarded

The Texas State Library and Archives Commission has awarded digitization grants to five TexShare member libraries.

Here's an excerpt from the press release:

TSLAC received 28 TexTreasures grant proposals. The exciting projects that have been funded are:

  1. "Houston Oral History Project" ($17,474)—The Houston Public Library is partnering with Mayor Bill White to preserve and make the video-recordings of significant Houstonians available on the web.
  2. "Early Texas Newspapers: 1829-1861" ($24,637)—The University of North Texas Libraries and the Center for American History at the University of Texas at Austin will partner to microfilm, digitize, and provide free public access to the earliest Texas newspapers held by the Center for American History.
  3. "The Witliff Collections" ($20,000)—The project creates an online exhibit accessing the primary source materials of researcher Dick J. Reavis held by the Southwestern Writers Collection at the Wittliff Collections at Texas State University about the siege of the Branch Davidians at Mount Carmel outside of Waco in 1993.
  4. "Austin History Center Glass Plate Negatives" ($12,889)—The Austin History Center, a division of the Austin Public Library, will digitize the complete Hubert Jones collection of 471 glass plate negatives containing subjects local to Austin and Texas.
  5. "Tejano Voices Project" ($20,000)—The University of Texas at Arlington Library will digitize and describe 60 of the 174 oral history interviews with notable Tejanos and Tejanas from across Texas conducted in 1992-2003 by Dr. Jose Angel Gutierrez, associate professor of political science at UT Arlington.

The Impact of Digitizing Special Collections on Teaching and Scholarship: Reflections on a Symposium about Digitization and the Humanities

OCLC Programs & Research has released The Impact of Digitizing Special Collections on Teaching and Scholarship: Reflections on a Symposium about Digitization and the Humanities.

Here's an excerpt:

University faculty and scholars demonstrated their uses of rare books and archives—in both digital and physical forms—to an audience of RLG Programs partners at a symposium in Philadelphia on June 4, 2008. Tony Grafton's recent article in The New Yorker provoked the theme of the symposium: we'll be travelling both the wide smooth road through the screen and the narrow difficult road of books and archives for a long time to come.

The audience of librarians, archivists, museum professionals and senior managers discussed administrative issues and opportunities for the use of digitized special collections. The academic speakers, however, spoke to us directly about their expectations of special collections and proposals for collaboration with scholars. These scholars emphasized the critical roles rare books, archives and other materials play in both teaching and research, and called for specific directions for libraries and archives to take in the near future. The primary users of primary resources presented clear imperatives for collections and custodians: work with faculty to understand current research methods and materials; go outside the library or archive to build collections and work with faculty; and continue to build digital and material collections for both teaching and research.

National Center for Research in Advanced Information and Digital Technologies to Be Established

A National Center for Research in Advanced Information and Digital Technologies will be established as part of the Higher Education Opportunity Act (see Sec. 802).

Here's an excerpt from the Digital Promise press release on its home page:

The new program is entitled the "National Center for Research in Advanced Information and Digital Technologies." It is a Congressionally originated 501(c)(3) nonprofit corporation located within the Department of Education. It will have a nine-member independent Board of Directors appointed by the Secretary of Education from nominations by members of Congress. Grants and contracts will be awarded on merit, and policies will be developed following the tested procedures of NSF and NIH. Given its status as a non-profit, independent corporation, the Center will be able to receive grants, contracts, and philanthropic contributions, as well as federal appropriations. . . .

Our next challenge is to secure FY09 appropriations for the Center. Because of the delay in passing the Higher Education Act, it was not possible for appropriations of the, until now, unauthorized National Center to be included in the Labor, HHS or Education funding bills that were passed in Committee in June. It is widely expected that final appropriations for FY09 will not be enacted until early next year. We are working hard to have funding for the National Center included in final appropriations legislation. We are requesting $50 million for FY09.

According to the About Digital Promise page one of the functions of the center will be to "commission pre-competitive research and fund the development of prototypes to . . . Digitize America’s collected memory stored in our nation’s universities, libraries, museums and public television archives to make these materials available anytime and anywhere."

A Look at the British Library's Digitization Program's Copyright Challenges

Tracey Caldwell's recent "Scan and Deliver" article examines the copyright challenges that the British Library faces in its digitization program (e.g., copyright issues have to be considered for works going as far back as the 1860s). It also mentions the impact of the shutdown of Microsoft's book digitization program on the British Library (digitization costs were shared 50-50 with Microsoft).

Research Study: How Is Web 2.0 Viewed by Academics?

The Birmingham Museums and Art Gallery's Pre-Raphaelite digitization project has released a study (Pre-Raphaelite Resource Project: Audience Research Report) about the perceptions of academics of the usefulness of Web 2.0 capabilities.

Here's an excerpt from the "Executive Summary":

Our research indicated that there is some readiness among the education community for Web 2.0 technologies but only in the context of academia as a status-conscious, competitive environment. Whilst there are clear benefits to be achieved from providing teachers and students with the opportunity to share ideas in the context of stimulus artefacts, many hold reservations about 'giving away' their intellectual property. Providing different levels of publishing privileges will help cater for the varying acceptance within the audience base for sharing their ideas publicly.

Social networking features are perceived by both HE students and lecturers as primarily for pleasure rather than for work so must be used sparingly in a resource of this nature. For younger students, however, the boundaries between work and life are increasingly blurred and the ability to contact experts and to personalise or control the space would be welcomed.

Care must be taken with positioning for the resource to be truly useful as a research tool; students and lecturers need to know that it has been created for them and has scholarly merit. Their main concern is to access reliable, relevant content and information, but the ability to form connections between these resources is one way of adding value to the collection.

Critique of the National Archives' The Founders Online Report

Peter Hirtle has posted a sharp critique of the National Archives' The Founders Online report on the LibraryLaw Blog that, among other points, questions whether the digitized works that result from the project will be free of copyright and access restrictions.

Here's an excerpt:

5. Perhaps the most problematic issues in the report surround its use of the term "open access." For some, open access means "digital, online, and free of charge." The report, while saying it wants to provide open access to the material, appears to recommend that all material be given to UVA's Rotunda system for delivery. Rotunda follows a subscription model—not open access—that is remarkably expensive considering that citizens have already paid for all of the editorial work on these volumes. How could this be open access? Apparently Rotunda might be willing to give up its subscription approach if a foundation were willing to pay for all of its costs. Unless such a commitment is in place, I find it disingenuous to describe a Rotunda delivery option as "open access." There is no discussion of other, free, delivery options, such as the willingness expressed by Deanna Marcum of the Library of Congress at the Senate Hearing to make all of the Founding Fathers papers accessible through LC (which already has a good site pointing to currently accessible papers).

6. Others argue that for true open access, information must be accessible outside of specific delivery systems (such as Rotunda) and made available in bulk. Open data and open interfaces allow for all sorts of interesting uses of material. For example, someone might want to mashup George Washington's papers to Google Maps in order to be able to easily visual geographically the spread of information. Others might want to mesh manuscript material with published secondary literature. Rather than anticipating the widespread dispersal and re-use of the Founding Fathers papers, however, and hence the need for harvestable data, open APIs, distributed access, etc., the report calls instead for "a single, unified, and sustainable Web site"—apparently the locked-down Rotunda system.

NEH/DFG Bilateral US/German Humanities Digitization Grants

The National Endowment for the Humanities (NEH) and the German Research Foundation (Deutsche Forschungsgemeinschaft) have issued a call for bilateral US/German humanities digitization grant proposals.

Here's an excerpt from the call:

These grants provide funding for up to three years of development in any of the following areas:

  • new digitization projects and pilot projects;
  • the addition of important materials to existing digitization projects; and
  • the development of related infrastructure to support international digitization work and the use of those digitized resources.

Collaboration between U.S. and German partners is a key requirement for this grant category.

Registry of U.S. Government Publication Digitization Projects Enhanced

The Registry of U.S. Government Publication Digitization Projects has been significantly enhanced.

Here's an excerpt from the announcement:

The enhanced Registry provides the ability to:

  • Browse digitization projects by category or alphabetically by title.
  • Search the entire Registry or filter searches by category or fields.
  • Quickly access new and recently updated listings.
  • Utilize RSS feeds to keep informed of new and updated projects.
  • View listings by contributor.
  • Contact fellow digitization participants.
  • Recommend listings to others.
  • Report broken links.
  • And much more!

Coverage of the Demise of Microsoft's Mass Digitization Project

Microsoft's decision to end its Live Search Books program, which provided important funding for the Open Content Alliance, has been widely covered by newspapers, blogs, and other information sources.

Here's a selection of articles and posts: "Books Scanning to be Publicly Funded," "'It Ain’t Over Till It's Over': Impact of the Microsoft Shutdown," "Microsoft Abandons Live Search Books/Academic Scan Plan," "Microsoft Burns Book Search—Lacks 'High Consumer Intent,'" "Microsoft Shuts Down Two of Its Google 'Wannabe’s': Live Search Books and Live Search Academic," "Microsoft Will Shut Down Book Search Program," "Microsoft's Book-Search Project Has a Surprise Ending," "Post-Microsoft, Libraries Mull Digitization," "Publishers Surprised by Microsoft Move," "Why Killing Live Book Search Is Good for the Future of Books," and "Without Microsoft, British Library Keeps on Digitizing."

University of Florida Has Digitized 1.7 Million Pages, over 100,000 in Last Month Alone

The University of Florida Digital Library Center has announced that it has digitized over 1.7 million pages, with about 100,000 pages being added in the last month alone. Their digitization statistics are available online. (Thanks to Open Access News.)

Read more about it "100,000 Pages a Month."

OCLC Announces Digital Archive Service

OCLC has announced the availability of a Digital Archive service.

Here's an excerpt from the press release:

The service provides a secure storage environment for libraries to easily manage and monitor master files and digital originals. The importance of preserving master files grows as a library's digital collections grow. Libraries need a workflow for capturing and managing master files that finds a balance between the acquisition of both digitized and born-digital content while not outpacing a library's capability to manage these large files. . . .

The Digital Archive service is a specially designed system in a controlled operating environment dedicated to the ongoing managed storage of digital content. OCLC has developed specific systems processes and procedures for the service tuned to the management of data for the long term.

From the time content arrives, the Digital Archive systems begin inspecting it to ensure continuity. OCLC systems perform quality checks and record the results in a "health record" for each file. Automated systems revisit these quality checks periodically so libraries receive up-to-date reports on the health of the collection. OCLC provides monthly updated information for all collections on the personal archive report portal.

For users of CONTENTdm, OCLC's digital collection management software for libraries and other cultural heritage institutions, the Digital Archive service is an optional capability integrated with various workflows for building collections. Master files are secured for ingest to the Digital Archive service using the CONTENTdm Acquisition Station, the Connexion digital import capability and the Web Harvesting service.

For users of other content management systems, the Digital Archive service provides a low-overhead mechanism for safely storing master files.

California Digital Library Puts Up Mass Digitization Projects Page

The California Digital Library has added a UC Libraries Mass Digitization Projects page to its Inside CDL Web site.

The Web site includes links to Frequently Asked Questions, contracts with digitization partners, and other information.

Of special interest in the FAQ are the questions "What rights to the digitized content does UC have in the projects; will access be limited in any way?" and "How will our patrons be able to access these texts, i.e. through MELVYL, or local catalogs, or a webpage, any search engine, or….?"