No Contract Awarded for GPO Mass Digitization of All Federal Publications

The U.S. Government Printing Office has been unable to award a contract for the digitization of all Federal publications.

Here's an excerpt from the announcement:

In 2004, GPO proposed digitizing all retrospective Federal publications back to the earliest days of the Federal Government. Following the conduct of a pilot project in 2006 and its evaluation in 2007, we issued an RFP in 2008 for a cooperative relationship with a public or private sector participant or participants where the uncompressed, unaltered files created as a result of the conversion process would be delivered to GPO at no cost to the Government, for ingest into GPO's Federal Digital System (FDsys). Unfortunately, we were unable to make an award for this RFP in the allocated timeframe.

We are very disappointed in this setback, but are currently developing new digitization alternatives. In addition to our longstanding goal of serving as one of the repositories for electronic files through the submission of material to FDsys, our focus for digitization will be on coordinating projects among institutions, assisting in the establishment and implementation of preservation guidelines, maintaining a registry of digitization projects, and ensuring that there is appropriate bibliographic metadata for the titles in the collection.

University of Illinois' IDEALS Repository Tops One Million Downloads

The University of Illinois' IDEALS institutional repository has topped one million downloads.

Here's an excerpt from the announcement:

The Illinois Digital Environment for Access to Learning and Scholarship (IDEALS), a digital repository for research and scholarship developed at the University of Illinois at Urbana-Champaign, has surpassed its one-millionth download.

The service, offered through the University Library and Campus Information Technologies and Educational Services (CITES), is sponsored by the Office of the Provost at Illinois and was launched in 2006. The campus institutional repository includes articles, working papers, preprints, technical reports, conference papers and, data sets in various digital formats provided by University faculty, staff, and graduate students. Although central to the University of Illinois, anyone can access and benefit from IDEALS collections and services. "Today, over 12,000 items have been uploaded into IDEALS," said Sarah Shreeves, associate professor and IDEALS coordinator. "The success of this service has surpassed what anyone envisioned two and a half years ago, and we hope that others in the Illinois community will take advantage of its services."

The mission of IDEALS is to preserve and provide persistent and reliable access to digital research and scholarship in order to give these works the greatest possible recognition and distribution. IDEALS endeavors to ensure that its materials appear in search engines such as Google, Google Scholar, and Bing and that the majority of the research is openly available for anyone to access. As a result of its efforts to disseminate research produced at the University of Illinois, IDEALS was recently ranked in the top 10 of institutional repositories worldwide. "I am delighted with the exposure that IDEALS has provided us with. Whenever we place a thesis or a report, the downloads start and never stop. We get many comments back from readers and researchers who have seen our work only on IDEALS," said Amr Elnashai, head, Civil and Environmental Engineering Department at the University of Illinois at Urbana-Champaign.

IDEALS contains a wealth of diverse information, from a Mid-America Earthquake Center report on the Kashmir Earthquake of 2005 to the Ethnography of the University Initiative’s publications and presentations, including campus folklore and cultural perceptions. "I appreciate that my thesis is archived in a stable location for reliable long-term access. The document is now freely available to anyone in the world, yet I retain the copyright," said David P. Hruska, an Illinois graduate. "Furthermore, my thesis is now displayed in search results returned by Google Scholar, improving the dissemination of my research."

Dean of University Libraries Candidates Interview at Indiana University

Candidates for the Ruth Lilly Dean of University Libraries position at Indiana University are interviewing this week. The candidates are Brenda Johnson (Dean of University Libraries at the University of California, Santa Barbara) and Diane Parr Walker (Deputy University Librarian at the University of Virginia).

Read more about it at "Library Dean Candidates Visit Today."

Deputy Director at UKOLN

UKOLN is recruiting a Deputy Director.

Here's an excerpt from the ad:

UKOLN is a centre of expertise in digital information management based at the University of Bath, providing advice and services to the library, information, education and cultural heritage communities.

We are seeking to recruit a Deputy Director to provide outstanding leadership and strategic direction to all technical activity within UKOLN, to assure our position at the forefront of innovative Digital Library (DL) developments. This is a key senior post within the UKOLN organisation and will be based at the University of Bath.

The post requires vision, strategic insight and innovation associated with the implementation and development of DLs within the education, research or cultural heritage sectors. Applicants should have extensive and in-depth technical knowledge of DLs and associated interoperability issues, knowledge of emerging Web technologies and an understanding of their potential for education and research. An established international reputation in the DL arena, together with a track record for leading and shaping innovative activities, is highly desirable.

The post also requires significant experience of securing funding awards and income generation. Applicants will have extensive experience of leading teams, directing multiple projects to a successful outcome/completion and be outstanding communicators with well-developed influencing and negotiating skills and a proven ability to produce high quality reports, papers and presentations.

SWORD PHP Library Version 0.9

The SWORD PHP library version 0.9 has been released. SWORD is "a lightweight protocol for depositing content from one location to another. It stands for Simple Web-service Offering Repository Deposit and is a profile of the Atom Publishing Protocol (known as APP or ATOMPUB)."

Here's an excerpt from the announcement:

  • Changed swordappservicedocument to build the servcedocument from the xml response rather than having the swordappclient do the work. This allows the service document to be parsed at a later time.
  • Changed the swordappclient deposit method to stream the file being deposited straight from disk rather than via memory to avoid using excessive memory and potentially exceeding the PHP memory limit. I’ve successfully tested this against DSpace with deposits of 600MB CD images.
  • Added some validation to the SWAP/METS packager to allow it to cope with filenames and metadata containing ampersands

Gawronski v. Amazon.com: Amazon's New Kindle Deletion Rules

As a result of the settlement of the Gawronski et al. v. Amazon.com Inc et al. case (about the deletion of George Orwell e-books), Amazon.com will comply with new rules regarding deletion of digital works on Kindles.

Here's an excerpt:

For copies of Works purchased pursuant to TOS granting "the non-exclusive right to keep a permanent copy" of each purchased Work and to "view, use and display [such Works] an unlimited number of times, solely on the [Devices] . . . and solely for [the purchasers'] personal, non-commercial use," Amazon will not remotely delete or modify such Works from Devices purchased and being used in the United States unless (a) the user consents to such deletion or modification; (b) the user requests a refund for the Work or otherwise fails to pay for the Work (e.g., if a credit or debit card issuer declines to remit payment); (c) a judicial or regulatory order requires such deletion or modification; or (d) deletion or modification is reasonably necessary to protect the consumer or the operation of a Device or network through which the Device communicates (e.g., to remove harmful code embedded within a copy of a Work downloaded to a Device). This paragraph does not apply to (a) applications (whether developed or offered by Amazon or by third parties), software or other code; (b) transient content such as blogs; or (c) content that the publisher intends to be updated and replaced with newer content as newer content becomes available. With respect to newspaper and magazine subscriptions, nothing in this paragraph prohibits the current operational practice pursuant to which older issues are automatically deleted from the Device to make room for newer issues, absent affirmative action by the Device user to save older issues.

Read more about it at "Amazon Settles Kindle '1984' Lawsuit" and "Amazon.com to Pay $150,000 to Settle Suit Challenging Take-Back of 1984."

Vernor v. Autodesk: First Sale Doctrine Covers Licensed Software

U.S. District Court Judge Richard A. Jones has ruled that resale of licensed software from Autodesk is not a copyright violation.

Here's an excerpt:

The legislative history of § 109 and § 117 informs the court's decision in several respects. First, as the court noted, it suggests that "owner" not only had the same meaning when both sections were enacted, but that the meaning was that ascribed to the term in decisions like Wise. Congress did not amend the term "owner" when amending the statutes. Second, the legislative history reveals not only that Congress has modified § 117 and § 109 to specifically address computer software, but that when it does so, its modifications are not subtle. This makes it even more improbable that Congress ascribes two different meanings to "owner." Third, the legislative history shows that despite incentive and opportunity to modify the term "owner," Congress has not done so. . . .

Autodesk's claim that Mr. Vernor promotes piracy is unconvincing. Mr. Vernor's sales of AutoCAD packages promote piracy no more so than Autodesk's sales of the same packages. Piracy depends on the number of people willing to engage in piracy, and a pirate is presumably just as happy to unlawfully duplicate software purchased directly from Autodesk as he is to copy software purchased from a reseller like Mr. Vernor. The court notes, moreover, that even if CTA had never opened its AutoCAD packages, never installed the software on its computer, and thus never raised the possibility of piracy, Autodesk would still take the position that CTA's resale of those packages was a copyright violation.

Read more about it at "It's Still A Duck: Court Re-Affirms That First Sale Doctrine Can Apply to 'Licensed' Software."

Librarian (Systems) at the National Agricultural Library

The National Agricultural Library is recruiting a Librarian (Systems).

Here's an excerpt from the ad:

The duties that the incumbent performs involve tasks that require a full professional knowledge of theories, objectives, principles, and techniques of librarianship with an emphasis on library automation.

Duties include the following:

Tests and evaluates new or updated databases, software modifications, or changes to Technical Services Division and/or other Library automated systems as needed. Refines and improves systems based on testing and user input.

Manages the overall operations of the automated processing system for acquisitions.

Develops plans or contract actions for automation projects including file conversions, special file maintenance activities, electronic data interchange, etc.

Creates SQL queries and reports to identify management information, solve workflow issues, provide quality control, and support management decisions.

Prepares, reviews, or oversees the preparation and review of system requirements and writes functional specifications for system development projects within the Technical Services Division, or the Library as a whole.

Applies advanced knowledge and expertise of Integrated Library Systems (ILS), report writing, online files and processing procedures, system analysis techniques and bibliographic and holdings control standards to the solution of technical processing problems to improve operating efficiency within the branch. Makes recommendations to senior management for modifications that improve efficiency of operations and/or the quality of products.

Coordinates training or trains division personnel, contractors and library staff from NAL and outside agencies in the use of technical processing systems.

Serves as a microcomputer support coordinator in providing assistance to Branch staff using microcomputer software and hardware.

SWORD2 Project Final Report

JISC has released SWORD2 Project Final Report.

Here's an excerpt:

The SWORD vision is about 'lowering the barriers to deposit', primarily for depositing content into repositories, and additionally, for depositing into any system which may wish to receive content from remote sources. The SWORD protocol defines a standard mechanism for depositing into repositories and other systems. The project and protocol were developed because there was previously no standardised way of doing this. A standard deposit interface allows repository services to be built that can offer functionality such as deposit from multiple locations, e.g. disparate repositories, desktop drag'n'drop tools, or from within standard office applications. SWORD can also facilitate deposit to multiple repositories, increasingly important for depositors who wish to deposit to funder, institutional or subject repositories. There are many other possibilities, including migration of content between repositories and transfer to preservation services. In addition to refining the existing SWORD application profile, the SWORD2 project has developed a number of tools and services to demonstrate these possibilities. It has also been pro-active in promoting SWORD and encouraging uptake within other repositories, services and tools, notably with its adoption into the Microsoft Article Authoring Add-in for Word 2007 and with the new Microsoft Zentity repository system .

The core aims of the project were to update the SWORD Protocol, the SWORD repository code libraries in the DSpace, Fedora, EPrints and Intrallect repositories, and the existing reference demonstrators. A Facebook application and validator have also been developed. Advocacy efforts include an e-learning case study, a briefing paper, a new SWORD website, and a range of additional dissemination activities, including conference papers, presentations, demonstrations and workshops at a number of national and international conferences and meetings.

University of Maryland: "What's the Opposite of a Pyrrhic Victory?: Lessons Learned from an Open Access Defeat"

In "What's the Opposite of a Pyrrhic Victory?: Lessons Learned from an Open Access Defeat," Tim Hackman examines the defeat of an open access resolution at the University of Maryland.

Here's an excerpt:

The "Faculty Voice"; article on open access published in March 2009 had been the first of its kind at UM, and discussion and drafting of the resolution had taken place mostly behind closed doors within the Faculty Affairs Committee, without involving the rest of the Senate. A handful of interested departments (almost all of them in the sciences) had met with representatives from the libraries to discuss scholarly communication and open access, but the majority of faculty members had no direct contact with someone who could explain the issue and its importance and answer specific questions. It was hoped that the faculty newsletter article would help in this regard, but it was a case of too little too late. The lesson then is don't assume faculty understand the situation or sympathize with the library's point of view.

Yale: "Digitization Project Derailed"

In "Digitization Project Derailed," Carol Hsin discusses the status of digitization efforts at the Yale University Library. (Thanks to ResourceShelf.)

Here's an excerpt:

Four months after Microsoft abruptly terminated its multi-million dollar book digitization deal with the University, Yale officials said they will have to wait for donations or grants to come in before they start another major book scanning project.

Peter Suber on "Ten Challenges for Open-Access Journals"

Peter Suber has published "Ten Challenges for Open-Access Journals" in the latest issue of the SPARC Open Access Newsletter.

Here's an excerpt:

I start with three disparities:  the gap between journal performance and what prevailing metrics say about journal performance (#1); the gap between the vision of OA embodied in the Budapest, Bethesda, and Berlin statements and the access policies at 85% of OA journals (#2); and the gap between a journal's quality and its prestige, even when the quality is high (#3).  Then I move on to seven kinds of doubt:  doubts about quality (#4), preservation (#5), honesty (#6), publication fees (#7), sustainability (#8), redirection (#9), and strategy (#10).

Digital Initiatives and Scholarly Communications Librarian at West Virginia University

The West Virginia University Libraries are recruiting a Digital Initiatives and Scholarly Communications Librarian.

Here's an excerpt from the ad:

The Digital Initiatives and Scholarly Communications Librarian works with librarian colleagues to develop policies and procedures for the management of digital content and metadata for varied digital projects, conducts outreach to inform the campus and the state about digital initiatives, and maintains the Libraries’ open access informational web pages. The Librarian will assist with a growing collection of digital exhibits, book digitization projects, and a well-established institutional repository, WVU Scholar. The Libraries currently work with DLXS, DigiTool, and ExLibris/Voyager. The Librarian will work with the Provosts Office and the Office of Information Technology to coordinate institutional repository policy and procedures governing submission, use, access, and preservation. This position reports to the Head of the Cataloging Department.

Walt Crawford on Open Access

Walt Crawford has dedicated an entire 34-page issue of Cites & Insights: Crawford at Large to a "Library Access to Scholarship" article on open access.

Here's an excerpt from the announcement:

A year’s worth of source material and commentary, organized into:
Mandates, Policies and Compacts
The Colors of OA
Numbers
Scandal!
Framing and Mysteries
The Problem(s) with Green OA
Quality, Value and Progress
Miscellany
Conclusion

Chances are, this is the last hurrah for Library Access to Scholarship and my semi-active independent commentary on open access.

Let's hope that Walt changes his mind about discontinuing "Library Access to Scholarship," which has always been interesting, thought-provoking, and informative reading.

E-Book Collections, SPEC Kit 313

The Association of Research Libraries has published E-Book Collections, SPEC Kit 313. The table of contents and executive summary are freely available.

Here's an excerpt from the announcement:

The Association of Research Libraries (ARL) has published E-book Collections, SPEC Kit 313, which examines the current use of e-books in ARL member libraries; their plans for implementing, increasing, or decreasing access to e-books; purchasing, cataloging, and collection management issues; and issues in marketing to and in usage by library clientele. . . .

According to survey responses, most institutions entered the e-book arena as part of a consortium which purchased an e-book package. The earliest forays occurred in the 1990s but the majority of libraries started e-book collections between 1999 and 2004. Purchasing at the collection level allowed libraries to acquire a mass of titles with a common interface, reducing some of the transition pains to the new format. The downside of collections is that libraries find they are often saddled with titles they would not have selected in print; also, each collection might have a different interface, adding to user frustration.

Those libraries reporting success with individually selected e-book titles cope with other problems: lag time between print and electronic publication (with electronic the lagging format), restrictive digital rights management, loss of access by ILL, and limited printing top the list of concerns. However, responses indicate a preference for title-by-title selection as a more efficient use of funds.

This SPEC Kit includes documentation from respondents in the form of collection development policies, e-book collection Web pages, e-book promotional materials, training materials for staff and users, and e-book reader loan policies.

Johns Hopkins University Sheridan Libraries' Data Conservancy Project Funded by $20 Million NSF Grant

The Johns Hopkins University Sheridan Libraries' Data Conservancy project has been funded by a $20 million NSF grant.

Here's an excerpt from the press release:

The Johns Hopkins University Sheridan Libraries have been awarded $20 million from the National Science Foundation (NSF) to build a data research infrastructure for the management of the ever-increasing amounts of digital information created for teaching and research. The five-year award, announced this week, was one of two for what is being called "data curation."

The project, known as the Data Conservancy, involves individuals from several institutions, with Johns Hopkins University serving as the lead and Sayeed Choudhury, Hodson Director of the Digital Research and Curation Center and associate dean of university libraries, as the principal investigator. In addition, seven Johns Hopkins faculty members are associated with the Data Conservancy, including School of Arts and Sciences professors Alexander Szalay, Bruce Marsh, and Katalin Szlavecz; School of Engineering professors Randal Burns, Charles Meneveau, and Andreas Terzis; and School of Medicine professor Jef Boeke. The Hopkins-led project is part of a larger $100 million NSF effort to ensure preservation and curation of engineering and science data.

Beginning with the life, earth, and social sciences, project members will develop a framework to more fully understand data practices currently in use and arrive at a model for curation that allows ease of access both within and across disciplines.

"Data curation is not an end but a means," said Choudhury. "Science and engineering research and education are increasingly digital and data-intensive, which means that new management structures and technologies will be critical to accommodate the diversity, size, and complexity of current and future data sets and streams. Our ultimate goal is to support new ways of inquiry and learning. The potential for the sharing and application of data across disciplines is incredible. But it’s not enough to simply discover data; you need to be able to access it and be assured it will remain available."

The Data Conservancy grant represents one of the first awards related to the Institute of Data Intensive Engineering and Science (IDIES), a collaboration between the Krieger School of Arts and Sciences, the Whiting School of Engineering, and the Sheridan Libraries. . . .

In addition to the $20 million grant announced today, the Libraries received a $300,000 grant from NSF to study the feasibility of developing, operating and sustaining an open access repository of articles from NSF-sponsored research. Libraries staff will work with colleagues from the Council on Library and Information Resources (CLIR), and the University of Michigan Libraries to explore the potential for the development of a repository (or set of repositories) similar to PubMedCentral, the open-access repository that features articles from NIH-sponsored research. This grant for the feasibility study will allow Choudhury's group to evaluate how to integrate activities under the framework of the Data Conservancy and will result in a set of recommendations for NSF regarding an open access repository.

Indiana University Bloomington Media Preservation Survey

Indiana University Bloomington has released its Media Preservation Survey.

Here's an excerpt:

The survey task force recommends a number of actions to facilitate the time-critical process of rescuing IUB’s audio, video, and film media.

  • Appoint a campus-wide taskforce to advise
    • the development of priorities for preservation action
    • the development of a campus-wide preservation plan
    • how units can leverage resources for the future
  • Create a centralized media preservation and digitization center that will serve the entire campus, using international standards for preservation transfer. As part of the planning for this center, hire a
    • media preservation specialist
    • film archivist
  • Develop special funding for the massive and rapid digitization of the treasures of IU over the next 10 years.
  • Create a centralized physical storage space appropriate for film, video, and audio.
  • Provide archival appraisal and control across campus to
    • assure quality of digitization for preservation
    • oversee plans for maintaining original media
  • Develop cataloging services for special collections to improve intellectual control to
    • accelerate research opportunities
    • improve access.

Software Developer (C++/C and Linux/Unix) at King's College London

The Centre for e-Research at King's College London is recruiting a Software Developer (C++/C and Linux/Unix) to work on the OCRopodium project, which is "investigating the use of the open source OCRopus software (http://sites.google.com/site/ocropus/) for applying Optical Character Recognition (OCR) to historical and archival material" (fixed-term contract for 18 months).

Here's an excerpt from the ad:

The successful applicant will be the key technical staff member for the project, and will be responsible for:

  • Carrying out technical investigations into the functionality and architecture of OCRopus. As OCRopus is an actively growing open source project and thus imperfectly documented, this will in itself require an ability to understand the source code and debug the software.
  • Developing and integrating software components for OCRing historical material, and enhancing existing components.
  • Contributing new and enhanced components to the OCRopus open source project.
  • Benchmarking and evaluation of OCRopus, in collaboration with our project partners at Queen's University, Belfast (QUB).
  • Integrating OCR within broader digitisation and digital library workflows.

Publishing and the Ecology of European Research Project Releases PEER Annual Report—Year 1

The Publishing and the Ecology of European Research project has released PEER Annual Report—Year 1.

Here's an excerpt:

PEER (Publishing and the Ecology of European Research), supported by the EC eContentplus programme, is investigating the effects of the large-scale, systematic depositing of authors' final peer reviewed manuscripts (so called Green Open Access or stage-two research output) on reader access, author visibility, and journal viability, as well as on the broader ecology of European research.

Peer-reviewed journals play a key role in scholarly communication and are essential for scientific progress and European competitiveness. The publishing and research communities share the view that increased access to the results of EU-funded research is necessary to maximise their use and impact. However, they hold different views on whether mandated deposit in open access repositories will achieve greater use and impact. There are also differences of opinion as to the most appropriate embargo periods. No consensus has been reached on a way forward so far.

The lack of consensus on these key issues stems from a lack of clear evidence of what impact the broad and systematic archiving of research outputs in open access repositories might be, but PEER aims to change this through building a substantial body of evidence, via the development of an "observatory" to monitor the effects of systematic archiving over time.

New York Public Library and Kirtas Technologies Make Half-Million Public Domain Books Available

The New York Public Library and Kirtas Technologies are making a half-million public domain books available for sale as digitized or printed copies.

Here's an excerpt from the press release:

Readers and researchers looking for hard-to-find books now have the opportunity to dip into the collections of one of the world's most comprehensive libraries to purchase digitized copies of public domain titles. Through their Digitize-on-Demand program, Kirtas Technologies has partnered with The New York Public Library to make 500,000 public domain works from the Library's collections available (to anyone in the world).

"New technology has allowed the Library to greatly expand access to its collections," said Paul LeClerc, President of The New York Public Library. "Now, for the first time, library users are able to order copies of specific items from our vast public domain collections that are useful to them. Additionally the program creates a digital legacy for future users of the same item and a revenue stream to support our operations. We are very pleased to participate in a program that is so beneficial to everyone involved."

Using existing information from NYPL's catalog records, Kirtas will make the library's public domain books available for sale through its retail site before they are ever digitized. Customers can search for a desired title on www.kirtasbooks.com and place an order for that book. When the order is placed, only then is it pulled from the shelf, digitized and made available as a high-quality reprint or digital file.

What makes this approach to digitization unique is that NYPL incurs no up-front printing, production or storage costs. It also provides the library with a self-funding, commercial model helping it to sustain its digitization programs in the future. Unlike other free or low-cost digitization programs, the library retains the rights and ownership to their own digitized content.

Greater Western Library Alliance Members Send Letter Supporting Federal Research Public Access Act of 2009 to Senators

Greater Western Library Alliance member universities have sent a letter supporting the Federal Research Public Access Act of 2009 to members of the U.S. Senate.

Here's an excerpt:

Timely, barrier-free access to the results of federally funded research supports the core mission of our academic institutions and is essential to fully utilize our collective investment in science. FRPAA will help us maximize this investment by increasing the sharing research results, advancing the pace of discovery, and applying this knowledge for the benefit of our communities.

The FRPAA bill also expands on the success of the public access policy of the National Institutes of Health (NIH), the first U.S. agency to require public access to taxpayer-funded research. More than 450,000 unique users access material from the NIH repository each day. Under S.1373, we envision researchers and students working in fields of equal importance—from climate change to renewable energy—having the same access to federally funded research to advance their critical work.

This bill is a crucial step in realizing this goal and we look forward to working with you to secure the bill’s passage.

Frankfurt Book Fair Publisher Survey

The Frankfurt Book Fair has released a summary of the results of a recent survey of 840 international publishing company representatives.

Here's an excerpt from the press release:

As a general rule, digital products still only comprise a small fraction of sales: Around 60 per cent of those polled estimate that considerably less than ten per cent of their revenue will come from digital sources in 2009. However, this will change in the next two years in the opinion of those polled: 41 per cent of those polled calculate sales of up to ten per cent for 2011 and 58 per cent anticipate that digital products will comprise a considerably higher share of total sales. The percentage of those who assume that 26 to 100 per cent of their revenue will come from digital products in two years increased from 24 per cent (2009) to 38 per cent (2011).

The idea that digital content will generate more sales than the traditional book business is also gradually becoming more of a reality. Around 50 per cent of industry experts now see the year 2018 as the turning point: In a comparable survey taken one year ago, 40 per cent saw this date as a "changing of the guard." In 2008, 27 per cent were of the opinion that digital would never overtake print—today that number is only 22 per cent.

Library Systems Department Head at West Virginia University Libraries

The West Virginia University Libraries are recruiting a Library Systems Department Head.

Here's an excerpt from the ad:

The Head directs technology planning and customer service, and provides hands-on support for ExLibris/Voyager online catalog, library web services, digital collections, and an institutional repository. Supervises 5 staff. . . . The Systems Department maintains a library network with 600+ workstations, 120 laptops, 32 servers, a 5 terabyte storage area network, and a 5 terabyte preservation backup.

Mining a Million Scanned Books: Linguistic and Structure Analysis, Fast Expanded Search, and Improved OCR Grant Awarded

The NSF Division of Information & Intelligent Systems has awarded a grant to the Center for Intelligent Information Retrieval at UMass Amherst, the Perseus Digital Library Project at Tufts, and the Internet Archive for their "Mining a Million Scanned Books: Linguistic and Structure Analysis, Fast Expanded Search, and Improved OCR" proposal.

Here's an excerpt from the award abstract:

The Center for Intelligent Information Retrieval at UMass Amherst, the Perseus Digital Library Project at Tufts, and the Internet Archive are investigating large-scale information extraction and retrieval technologies for digitized book collections. To provide effective analysis and search for scholars and the general public, and to handle the diversity and scale of these collections, this project focuses on improvements in seven interlocking technologies: improved OCR accuracy through word spotting, creating probabilistic models using joint distributions of features, and building topic-specific language models across documents; structural metadata extraction, to mine headers, chapters, tables of contents, and indices; linguistic analysis and information extraction, to perform syntactic analysis and entity extraction on noisy OCR output; inferred document relational structure, to mine citations, quotations, translations, and paraphrases; latent topic modeling through time, to improve language modeling for OCR and retrieval, and to track the spread of ideas across periods and genres; query expansion for relevance models, to improve relevance in information retrieval by offline pre-processing of document comparisons; and interfaces for exploratory data analysis, to provide users of the document collection with efficient tools to update complex models of important entities, events, topics, and linguistic features. When applied across large corpora, these technologies reinforce each other: improved topic modeling enables more targeted language models for OCR; extracting structural metadata improves citation analysis; and entity extraction improves topic modeling and query expansion. The testbed for this project is the growing corpus of over one million open-access books from the Internet Archive.