Columbia University and Microsoft Book Digitization Project

The Columbia University Libraries have announced that they will work with Microsoft to digitize a "large number of books" that are in the public domain.

Here's an excerpt from the press release:

Columbia University and Microsoft Corp. are collaborating on an initiative to digitize a large number of books from Columbia University Libraries and make them available to Internet users. With the support of the Open Content Alliance (OCA), publicly available print materials in Columbia Libraries will be scanned, digitized, and indexed to make them readily accessible through Live Search Books. . . .

Columbia University Libraries is playing a key role in book selection and in setting quality standards for the digitized materials. Microsoft will digitize selected portions of the Libraries’ great collections of American history, literature, and humanities works, with the specific areas to be decided mutually by Microsoft and Columbia during the early phase of the project.

Microsoft will give the Library high-quality digital images of all the materials, allowing the Library to provide worldwide access through its own digital library and to share the content with non-commercial academic initiatives and non-profit organizations.

Read more about it at "Columbia University Joins Microsoft Scan Plan."

How to Harvest OAI-PMH Records with the Freeware MarcEdit Program

Terry Reese has posted step-by-step instructions about how to harvest OAI-PMH records from the University of Michigan Libraries' MBooks digital books collection using her MarcEdit freeware program. The data can either be converted to the MARC format or stored as is. MarcEdit also has a Z39.50 client as well as crosswalks, such as MARC to Dublin Core and MARC to EAD.

REPOMAN-L (Institutional Repository Managers' Mailing List) Launched

Richard Griscom, University of Pennsylvania, and Leah Vanderjagt, University of Alberta, have launched REPOMAN-L (Institutional Repository Managers' Mailing List).

Here's an excerpt from the announcement:

We have created REPOMAN-L (Institutional Repository Managers' Mailing List) as an open forum for the discussion of issues, great and small, that confront repository managers. We hope that you will subscribe and participate enthusiastically, and use this list for problem-solving and sharing of advice; for example:

  • to poll the group on practices at their institutions
  • to ask about any aspect of development from policy to outreach
  • initiatives to software evaluation
  • to share links to useful tools and references
  • to explore rationale around decisions you're making about your repository. . . .

The list is purposefully unaffiliated with any institution, initiative, repository software platform, or conceptual idea such as open access; the list would of course not exclude discussion of these areas, but we ask subscribers to consider initiating these discussions on lists set up specifically for the topics and then bring summaries of relevance to this list.

Are There 200,000 "Duplicate" Articles in Journals Indexed by Medline?

Based on a recent study published in Nature, it is possible that there may be as many as 200,000 duplicate articles (either articles that were published in multiple journals or plagiarized) in journals indexed by Medline. To conduct the study, Mounir Errami and Harold Garner utilized the eTBLAST software to analyze samples of Medline article abstracts in order to estimate the prevalence of duplicate articles.

Duplicate detection is an issue of great concern to both publishers and scholars. The CrossCheck project is allowing eight publishers to test the duplicate checking as part of the editorial process in a closed-access environment. In the project's home page, it states:

Currently, existing PD [plagiarism detection] systems do not index the majority of scholarly/professional content because it is inaccessible to crawlers directed at the open web. The only scholarly literature that is currently indexed by PD systems is that which is available openly (e.g. OA, Archived or illegitimately posted copies) or that which has been made available via third-party aggregators (e.g. ProQuest). This, in turn, means that any publisher who is interested in employing PD systems in their editorial work-flow is unable to do so effectively. Even if a particular publisher doesn't have a problem with plagiarized manuscripts, they should have an interest in making sure that their own published content is not plagiarized or otherwise illegitimately copied.

In order for CrossRef members to use existing PD systems, there needs to be a mechanism through which PD system vendors can, under acceptable terms & conditions, create and use databases of relevant scholarly and professional content.

Open access advocates have pointed out that one advantage of OA is that it allows the unrestricted analysis and manipulation of the full text of freely available works. Open access makes it possible for all interested parties, including scholars and others who might not have access to closed duplicate verification databases, to conduct whatever analysis as they wish and to make the results public without having to consider potential business impacts.

Read more about it at: "Copycat Articles Seem Rife in Science Journals, a Digital Sleuth Finds" and "How Many Papers Are Just Duplicates?"

New Mailing Lists: JISC-SHIBBOLETH-LIBRARIES and Sword-app-tech

Two mailing lists have been recently established: JISC-SHIBBOLETH-LIBRARIES and sword-app-tech.

Excerpt from the JISC-SHIBBOLETH announcement:

Many institutions are now at the stage with their implementation of federated access management where issues directly impacting libraries are being considered and managed. This includes discovery processes for end-users, testing and changing access to federated service providers, dealing with different user definitions, managing license and resource information and changing send-user information.

To help support this process we have established a separate mailing list to enable discussion and exchange of views directly relating to library issues.

Excerpt from the Fedora-commons-users announcement:

A new mailing list has been created for discussion, bug reports, implementations questions and development ideas relating to SWORD (Simple Web-service Offering Repository Deposit).

SWORD is a protocol for interoperable deposit between repository platforms. It was developed by a JISC project during 2007, building on earlier work to define a deposit protocol, and is based on the Atom Publishing Protocol.

NIH Public Access Policy Implementation

On an updated Web page and a FAQ, the National Institutes of Health (NIH) has explained its implementation of the Public Access Policy required by Division G, Title II, Section 218 of PL 110-161 (Consolidated Appropriations Act, 2008).

Here's an excerpt from the NIH Public Access Policy Web page:

How to Comply

Address Copyright

Make sure that any copyright transfer or other publication agreements allow the article to be submitted to NIH in accordance with the Policy.

Submit Article

Authors may submit an article to the journal of their choice for publication.

  1. If you choose to publish your article in certain journals, you need do nothing further to comply with the submission requirement of the Policy. See http://publicaccess.nih.gov/submit_process_journals.htm for a list of these journals.
  2. For any journal other than one of those in this list, the author must:

    a. Inform the journal that the article is subject to the Public Access Policy when submitting it for publication.

    b. Make sure that any copyright transfer or other publication agreement allows the article to be submitted to NIH in accordance with the Policy. For more information, see the FAQ Whose approval do I need to submit my article to PubMed Central? and consult with your Institution.

    c. Submit the article to NIH, upon acceptance for publication. See the Submission Process for more information.

Cite Article

When citing their NIH-funded articles in NIH applications, proposals or progress reports, authors must include the PubMed Central reference number for each article.

Important Dates

  • April 7, 2008 As of April 7, 2008, all articles arising from NIH funds must be submitted to PubMed Central upon acceptance for publication.
  • May 25, 2008 As of May 25, 2008, NIH applications, proposals, and progress reports must include the PubMed Central reference number when citing an article that falls under the policy and is authored or co-authored by the investigator, or arose from the investigator’s NIH award. This policy includes applications submitted to the NIH for the May 25, 2008 due date and subsequent due dates.

Peter Suber has made some helpful comments about the policy implementation in "New FAQ for New NIH Policy" and "Text of the NIH OA Policy."

PublicDomainReprints.org Turns Digital Public Domain Books into Printed Books

PublicDomainReprints.org is offering an experimental service that allows users to convert about 1.7 million digital public domain books in the Internet Archive, Google Book Search, or the Universal Digital Library into printed books using the Lulu print-on-demand service.

Source: "Converting Google Book PDFs to Actual Books."

Institutional Repositories, Tout de Suite

Institutional Repositories, Tout de Suite, the latest Digital Scholarship publication, is designed to give the reader a very quick introduction to key aspects of institutional repositories and to foster further exploration of this topic through liberal use of relevant references to online documents and links to pertinent websites. It is under a Creative Commons Attribution-Noncommercial 3.0 United States License, and it can be freely used for any noncommercial purpose in accordance with the license.

NIH Open Access Mandate Becomes Law

President Bush has signed the "Consolidated Appropriations Act, 2008," which includes the NIH open access mandate. The mandate states: "The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law."

Read more about it at "OA Mandate at NIH Now Law and "Public Access Mandate Made Law."

Columbia University Libraries and Bavarian State Library Become Google Book Search Library Partners

Both the Columbia University Libraries and Bavarian State Library have joined the Google Book Search Library Project.

Here are the announcements:

Biomedical Digital Libraries and BioMed Central Part Company

According to "Biomedical Digital Libraries Moves to Open Journal Systems," Biomedical Digital Libraries will no longer be published by BioMed Central because "BMC's author payment model had become untenable for most of the authors wishing to publish in the journal." In the future, the journal will be published using Public Knowledge Project's Open Journal Systems without author fees.

BioMed Central has an article-processing charges waiver policy with case-by-case basis review, and it also offers a variety of article-processing charges discounts. It is not clear why these cost-reduction mechanisms did not meet author needs.

University of Michigan Libraries Release the UMich OAI Toolkit

The University of Michigan Libraries have released the UMich OAI Toolkit.

Here's an excerpt from the announcement:

This toolkit contains both harvester and data provider, both written in Perl. . . .

UMHarvester is a robust tool using LWP for harvesting nigh on every OAI data provider available. It allows for incremental harvesting, has multiple re-try options, and a batch harvest tool (Batch_UMHarvest) that can automatically perform incremental harvesting.

UMProvider relies heavily on libxml (XML::LibXML) and will store the data in nearly any relational database. It functions by harvesting from a database of records, making rights determinations from a separate database, and providing the resulting set of records.

Originally, only the UMHarvester was available from UM's DLXS software site. The UMProvider tool is newly developed and takes the place of our DLXS data provider tool.

Rice University Releases Travelers in the Middle East Archive

Rice University has released the Travelers in the Middle East Archive under a Creative Commons Attribution 2.5 Generic License.

Here's an excerpt from the announcement:

IMEA provides access to:

  • Nearly 1,000 images, including stereocards, postcards and book illustrations
  • More than 150 historical maps representing the Middle East as it was in the 19th and early 20th centuries
  • Interactive geographical information systems (GIS) maps that serve as an interface to the collection and present detailed information about features such as waterways, elevation and populated places
  • Successive editions of classic travel guides and major museum collection catalogues
  • Convenient educational modules that set materials from the collection in historical and geographic context and explore the research process

TIMEA is able to offer seamless access for researchers by providing a common user interface to digital objects housed in three repositories. Texts, historical maps and images reside in DSpace, an open-source digital repository system. Educational research modules are presented within Connexions, an open-content commons and publishing platform for educational materials. TIMEA also uses Google Maps and ESRI’s ArcIMS map server.

New Release of BioMed Central's Open Repository, a Hosted Institutional Repository Service

BioMed Central has released version 1.4.9 of Open Repository, its DSpace-based, hosted institutional repository service.

Here's an excerpt from the press release:

Open Repository version 1.4.9 has several new features that are designed to enhance the customer experience. The release offers an improved user interface, making it easier for customers to browse and submit their material online. Additionally, institutions can convert their Word, Excel, PowerPoint, Text and RTF documents to PDF format. Customers can also set up RSS feeds, and customize lists and search fields, adding value to the already robust platform.

Pitt's Libraries and University Press Establish Open Access Book Program

The University of Pittsburgh University Library System and the University of Pittsburgh University Press have established the University of Pittsburgh University Press Digital Editions, which offers free access to digitized versions of print books from the press.

Here's an excerpt from the press release:

The University of Pittsburgh’s University Library System (ULS) and University Press have formed a partnership to provide digital editions of press titles as part of the library system’s D-Scribe Digital Publishing Program. Thirty-nine books from the Pitt Latin American Series published by the University of Pittsburgh Press are now available online, freely accessible to scholars and students worldwide. Ultimately, most of the Press’ titles older than 2 years will be provided through this open access platform.

For the past decade, the University Library System has been building digital collections on the Web under its D-Scribe Digital Publishing Program, making available a wide array of historical documents, images and texts which can be browsed by collection and are fully searchable. The addition of the University of Pittsburgh Press Digital Editions collection marks the newest in an expanding number of digital collaborations between the University Library System and the University Press.

The D-Scribe Digital Publishing Program includes digitized materials drawn from Pitt collections and those of other libraries and cultural institutions in the region, pre-print repositories in several disciplines, the University’s mandatory electronic theses and dissertations program, and electronic journals during the past eight years, sixty separate collections have been digitized and made freely accessible via the World Wide Web. Many of these projects have been carried out with content partners such as Pitt faculty members, other libraries and museums in the area, professional associations, and most recently, with the University of Pittsburgh Press with several professional journals and the new University of Pittsburgh Press Digital Editions. . . .

More titles will be added to the University of Pittsburgh Press Digital Editions each month until most of the current scholarly books published by the Press are available both in print and as digital editions. The collection will eventually include titles from the Pitt Series in Russian and East European Studies, the Pitt-Konstanz Series in the Philosophy and History of Science, the Pittsburgh Series in Composition, Literacy, and Culture, the Security Continuum: Global Politics in the Modern Age, the History of the Urban Environment, back issues of Cuban Studies, and numerous other scholarly titles in history, political science, philosophy, and cultural studies.

Stable Version of SPECTRa Released: Software for Depositing Chemical Data into Repositories

A stable version of SPECTRa has been released. SPECTRa is designed to facilitate the deposit of chemical data into digital repositories.

The JISC-funded SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data a Digital Repository for the Chemical Community) project's final report is also available.

Institute of Physics Launches an Open Access Earth and Environmental Science Proceedings Service

The Institute of Physics has launched the IOP Conference Series: Earth and Environmental Science, an open access proceedings service. A FAQ is available.

Here's an excerpt from the press release:

Based on IOP Publishing’s highly successful open access proceedings in physics, EES allows conference organizers to create a comprehensive record of their event and make a valuable contribution to the open access literature that will be of long-lasting benefit to their research communities.

As part of the service’s launch, EES is waiving a total of US$5000 of publication fees for a number of conferences who expect to publish their proceedings during 2008.

We are delighted to announce that the first conference to qualify for this is the 14th International Symposium for the Advancement of Boundary Layer Remote Sensing (ISARS2008) which takes place on 23–25 June 2008, Risø National Laboratory, DTU, Roskilde, Denmark.

Eduserv Releases Study about the Use of Open Content Licenses By UK Heritage Organizations

The Eduserv Foundation has released Snapshot Study on the Use of Open Content Licences in the UK Cultural Heritage Sector (Appendices).

Here's an excerpt from the "Executive Summary":

This study investigates the awareness and use of open content licences in the UK cultural heritage community by way of a survey. Open content licensing generally grants a wide range of permission in copyright for use and re-use of works such as images, sounds, video, and text, whilst retaining a relatively small set of rights: often described as a ‘some rights reserved’ approach to copyright. For those wishing to share content using this model, Creative Archive (CA) and Creative Commons (CC) represent the two main sets of open content licences available for use in the United Kingdom.

The year of this survey, 2007, marks five years from the launch of the Creative Commons licences, two years since the launch of the UK-specific CC licences and two years as well since the launch of the UK-only Creative Archive licence.

This survey targeted UK cultural heritage organisations—primarily museums, libraries, galleries, archives, and those in the media community that conduct heritage activities (such as TV and radio broadcasters and film societies). In particular, this community produces trusted and highly valued content greatly desired by the general public and the research and education sectors. They are therefore a critical source of high-demand content and thus the focus for this project. The key objective has been to get a snapshot of current licensing practices in this area in 2007 for use by the sector and funding bodies wishing to do more work in this area.

Over 100 organisations responded to this web-based survey. Of these respondents:

  • Only 4 respondents out of 107 indicated that they held content but were not making it available online nor had plans to make it available online;
  • Images and text are the two content types most likely to be made available online;
  • Sound appears to be the most held content type not currently available online and with no plans to make it available in the future;
  • Many make some part of their collection available online without having done any formal analysis of the impact this may have;
  • 59 respondents were aware of Creative Archive or Creative Commons;
  • 10 use a CA or CC licence for some of their content; and
  • 12 have plans to use a CA or CC licence in the future.

House Doesn't Override Presidential Veto of Labor-HHS Bill Which Contains NIH OA Mandate

By two votes, the House failed to override President Bush's veto of the Departments of Labor, Health and Human Services, and Education, and Related Agencies Appropriations Act, 2008, which contained the NIH open access mandate (the vote was 277-141). Bloomberg reports that Senate Democrats have a new strategy:

Senate Majority Leader Harry Reid said Democrats will combine the 11 unfinished appropriations bills still needed to fund the federal government into one measure that exceeds the administration's request by $11 billion—half the $22 billion Democrats initially supported.

However, CQPolitics reports that:

The White House brushed off Reid’s proposal Thursday, as administration officials have done previously when Democrats have said they are willing to negotiate on funding levels.

"The president has been clear that Congress should adhere to the budgetary process and pass individual funding bills at reasonable and responsible spending levels," said Sean Kevelighan, a spokesman for the White House budget office. "Perhaps [the] Democratic leadership in Congress. . . should concern itself less with capturing political news cycles and more on their fundamental responsibility to fund the federal government."

Peter Suber had this to say about the override failure:

OK, on to Plan B.  The OA mandate for the NIH is a small part of a big bill to pay for about one-thirteenth of the federal government.  Some version of the appropriation will certainly pass and get the President's signature.  You can already see the jockeying between Congressional leaders and the White House about the contours of that version.  There are four grounds for optimism:

  1. The OA mandate was approved by both houses of Congress.  The easiest provisions to delete are those approved by just one chamber and kept by the House-Senate conference committee.
  2. The OA mandate has bipartisan support in Congress and Republican friends in the Executive Branch.
  3. The President has expressed strong objection to some of the policy provisions of the bill, but his stated concern about the OA provision is very mild by comparison.  If Congress deletes some of the more sensitive provisions in the spirit of compromise, it needn't touch the OA mandate.  In fact, deleting the OA provision would do virtually nothing to ingratiate the President.
  4. To reduce overall spending levels in the bill, Congress will cut some of the appropriations.   But the OA mandate is a policy change, not an appropriation.  There's no need to cut it to satisfy the President's fiscal objections to the current bill.   Stay tuned.

ALA Urgent Call for Action about the Presidential Veto of the Labor-HHS Bill

The American Library Association has issued an urgent call for action about the presidential veto of the FY 2008 Health and Human Services, Education, and Related Agencies appropriations bill, which includes the NIH Public Access Policy mandate and essential funding for library programs.

You can easily contact your senators using the ALA Action Alert Web form.

I've created a cut-and-paste version of prior ALA/Alliance for Taxpayer Access text about the NIH open access mandate and added brief information about key library programs funded by the bill. You can use this text to simplify the process of sending an e-mail via the ALA Action Alert Web form, but personalizing this text with an added sentence or two is recommended.

National Science Digital Library Releases Initial Fedora-based NCore Components

The National Science Digital Library Core Integration team at Cornell University has released a partial version of NCore, a "general platform for building semantic and virtual digital libraries united by a common data model and interoperable applications," which is built upon Fedora.

Here's an excerpt from the NSDL posting:

The NCore platform consists of a central repository built on top of Fedora, a data model, an API, and a number of fundamental services such as full-text search or OAI-PMH. Innovative NSDL services and tools that empower users as content creators are now built on, or transitioning to, the NCore platform. These include: the Expert Voices blogging system (http://expertvoices.nsdl.org/);the NSDL Wiki (http://wiki.nsdl.org/index.php/NSDL_Wiki); the NSDL OAI-PMH metadata ingest aggregation system; the OAI-PMH service for distributing public NSDL metadata; the NSDL Collection System (NCS), derived from the DLESE Collection system (DCS); the NSDL Search service, and the OnRamp content management and distribution system (http://onramp.nsdl.org).

Because NCore is a general Fedora-based open source platform useful beyond NSDL, Core Integration developers at Cornell University have made the repository and API code components of NCore available for download at the NCore project on Sourceforge (http://sourceforge.net/projects/nsdl-core). Over the next six months, NSDL will release the code for major tools and services that comprise the full NCore suite on SourceForge.

For further information, see the NCore presentation.