Contact the Senate about the NIH Public Access Policy by 9/28/07

The Alliance for Taxpayer Access, whose membership includes major library associations, has issued a new call to action about the NIH Public Access Policy that urges interested parties to contact their Senators by Friday, September 28, 2007. You can easily contact your senators using the ALA Action Alert Web form with my cut-and-paste version of ALA/ATA text or you can fax your Senators using the fax numbers in the press release (use the below link to get to the full press release)

Here's an excerpt from the press release:

As the Senate considers Appropriations measures for the 2008 fiscal year this fall, please take a moment to remind your Senators of your strong support for public access to publicly funded research and – specifically – ensuring the success of the National Institutes of Health (NIH) Public Access Policy by making deposit mandatory for researchers.

Earlier this summer, the House of Representatives passed legislation with language that directs the NIH to make this change (http://www.taxpayeraccess.org/media/release07-0720.html). The Senate Appropriations Committee approved a similar measure (http://www.taxpayeraccess.org/media/release07-0628.html). Now, as the Appropriations process moves forward, it is critically important that our Senators are reminded of the breadth and depth of support for enhanced public access to the results of NIH-funded research. Please take a moment to weigh in with your Senator now. . . .

Feel free to draw upon the following talking points:

  • American taxpayers are entitled to open access on the Internet to the peer-reviewed scientific articles on research funded by the U.S. government. Widespread access to the information contained in these articles is an essential, inseparable component of our nation's investment in science.
  • The Fiscal Year 2008 Labor/HHS Appropriations Bill reported out of committee contains language directing the National Institutes of Health (NIH) to change its Public Access Policy so that it requires NIH-funded researchers to deposit copies of agency-funded research articles into the National Library of Medicine’s online archive.
  • Over the more than two years since its implementation, the NIH's current voluntary policy has failed to achieve any of the agency's stated goals, attaining a deposit rate of less than 5% by individual researchers. A mandate is required to ensure deposit in NIH’s online archive of articles describing findings of all research funded by the agency.
  • We urge the Senate to support the inclusion of language put forth in the Labor/HHS Appropriations bill directing the NIH to implement a mandatory policy and ensuring free, timely access to all research articles stemming from NIH-funded research – without change – in any appropriate vehicle.

(We’ll be making additional resources for patient advocates – including the recording of our August 30 Web cast and specific talking points – available shortly as well.

67 Plagiarized Papers from Turkey Removed from arXiv

The arXiv archive has removed 67 plagiarized papers, which were written by 15 Turkish physicists. Questions about the physics expertise of two of the authors emerged during their oral dissertation defenses, and the investigation widened from there.

Source: “Turkish Professors Uncover Plagiarism in Papers Posted on Physics Server.” The Chronicle of Higher Education News Blog, 6 September 2007.

SPARC Canadian Author Addendum

The Canadian Association of Research Libraries (CARL) and SPARC (the Scholarly Publishing and Academic Resources Coalition) have released the SPARC Canadian Author Addendum.

Here's an excerpt from the press release:

Traditional publishing agreements often require that authors grant exclusive rights to the publisher. The new SPARC Canadian Author Addendum enables authors to secure a more balanced agreement by retaining select rights, such as the rights to reproduce, reuse, and publicly present the articles they publish for non-commercial purposes. It will help Canadian researchers to comply with granting council public access policies, such as the Canadian Institutes of Health Research Policy on Access to Research Outputs. The Canadian Addendum reflects Canadian copyright law and is an adaptation of the original U.S. version of the SPARC Author Addendum. . . .

An explanatory brochure complements the Addendum. Both the brochure and addendum are available in French and English on the CARL and SPARC Web sites and will be widely distributed. SPARC, in conjunction with ARL and ACRL, has also introduced a free Web cast on Understanding Author Rights. See http://www.arl.org/sparc/author for details.

Publisher Author Agreements

According to today's SHERPA/RoMEO statistics, 36% of the 308 included publishers are green ("can archive pre-print and post-print"), 24% are blue ("can archive post-print (i.e. final draft post-refereeing)"), 11% are yellow ("can archive pre-print (i.e. pre-refereeing)"), and 28% are white ("archiving not formally supported"). Looked at another way, 72% of the publishers permit some form of self-archiving.

These are certainly encouraging statistics, and publishers who permit any form of self-archiving should be applauded; however, leaving aside Creative Commons licenses and author agreements that have been crafted by SPARC and others to promote rights retention, publishers recently liberalized author agreements still raise issues that librarians and scholars should be aware of.

Looking deeper, there are publisher variations in terms of where e-prints can be self-archived. Typically, this might be some combination of the author's Website, institutional repository or Website, funding agency's server, or disciplinary archive. Some agreements allow deposit on any noncommercial or open access server. Restricting deposit to open access or noncommercial servers is perfectly legitimate in my view; more specific restrictions are, well, too restrictive. The problem arises when the agreement limits the author's deposit options to ones he or she doesn't have, such as only allowing deposit in an institutional repository when the author's institution doesn't have one or only allowing posting on an author's Website when the author doesn't have one.

Another issue is publisher requirements for authors to remove e-prints on publication, to modify e-prints after publication to reflect citation and publisher contact information, to replace e-prints with published versions, or to create their own versions of postprints. Low deposit rates in institutional repositories without institutional mandates suggest that anything that involves extra effort by authors is a deterrent to deposit. The above kinds of publisher requirements are likely to have equally low rates on compliance, resulting in deposited e-prints that do not conform to author agreements. To be effective, such requirements would have to be policed by publishers or digital repositories. Otherwise, they are meaningless and are best deleted from author agreements.

A final issue is retrospective deposit. We can think of the journal literature as an inverted pyramid, with the broad top being currently published articles and the bottom being the first published journal articles. The papers published since the emergence of author agreements that permit self-archiving are a significant resource; however, much of the literature precedes such agreements. The vast majority of these articles are under standard copyright transfer agreements, with publishers holding all rights. Consequently, it is very important that publishers clarify whether their relatively new self-archiving policies can be applied retroactively. Elsevier has done so:

When Elsevier changes its policies to enable greater academic use of journal materials (such as the changes several years ago in our web-posting policies) or to clarify the rights retained by journal authors, Elsevier is prepared to extend those rights retroactively with respect to articles published in journal issues produced prior to the policy change.

Elsevier is pleased to confirm that, unless explicitly noted to the contrary, all policies apply retrospectively to previously published journal content. If, after reviewing the material noted above, you have any questions about such rights, please contact Global Rights.

Unfortunately, many publishers have not clarified this issue. Under these conditions, whether authors can deposit preprints or author-created postprints hinges on whether these works are viewed as being different works from the publisher version, and, hence, owned by the authors. Although some open access advocates believe this to be the case, to my knowledge this has never been decided in a court of law. Michael Carroll, who is a professor at the Villanova University School of Law and a member of the Board of the Creative Commons, has said in an analysis of whether authors can put preprints of articles published using standard author agreements under Creative Commons licenses:

Although technically distinct, the copyrights in the pre-print and the post-print overlap. The important point to understand is that copyright grants the owner the right to control exact duplicates and versions that are "substantially similar" to the copyrighted work. (This is under U.S. law, but most other jurisdictions similarly define the scope of copyright).

A pre-print will normally be substantially similar to the post-print. Therefore, when an author transfers the exclusive rights in the work to a publisher, the author precludes herself from making copies or distributing copies of any substantially similar versions of the work as well.

Much progress has been made in the area of author agreements, but authors must still pay careful attention to the details of agreements, which vary considerably by publisher. The SHERPA/RoMEO—Publisher Copyright Policies & Self-Archiving database is a very useful and important tool and users should actively participate in refining this database; however, authors are well advised not to stop at the summary information presented here and to go to the agreement itself (if available). It would be very helpful if a set of standard author agreements that covered the major variations could be developed and put into use by the publishing industry.

House Passes H. R. 3043 and NIH Mandate Is Approved, but Bush May Veto Bill

By a 276 to 140 vote, the House approved H. R. 3043 (Making Appropriations for the Departments of Labor, Health and Human Services, and Education, and Related Agencies for the Fiscal Year Ending September 30, 2008, and for Other Purposes), which includes the following wording:

SEC. 217. The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

Due to concerns over increased spending, President Bush may veto the bill (see Peter Suber's "House Approves OA Mandate for NIH, but Bush May Veto" for details).

Here's the party breakdown on the vote:

  • Democrats: 223 yes, 1 no, 6 not voting.
  • Republications: 53 yes, 139 no, 9 not voting.

You can see a breakdown of votes by party, state, and other criteria at the Washington Post Votes Database page for the bill.

From the Washington Post, here are the House members who voted against the bill.

Robert Aderholt, Todd Akin, Rodney Alexander, Michele Bachmann, Spencer Bachus, Richard Baker, J. Gresham Barrett, Roscoe Bartlett, Joe Barton, Melissa Bean, Brian Bilbray, Rob Bishop, Marsha Blackburn, Roy Blunt, John Boehner, Jo Bonner, John Boozman, Charles Boustany, Kevin Brady, Henry Brown, Ginny Brown-Waite, Michael Burgess, Dan Burton, Steve Buyer, Dave Camp, John Campbell, Chris Cannon, Eric Cantor, John Carter, Steve Chabot, Howard Coble, Tom Cole, Michael Conaway, Ander Crenshaw, John Culberson, Geoff Davis, David Davis, Tom Davis, Nathan Deal, Mario Diaz-Balart, Lincoln Diaz-Balart, John Doolittle, Thelma Drake, David Dreier, John 'Jimmy' Duncan, Mary Fallin, Tom Feeney, Jeff Flake, Randy Forbes, Vito Fossella, Virginia Foxx, Trent Franks, Rodney Frelinghuysen, Elton Gallegly, Scott Garrett, Paul Gillmor, Phil Gingrey, Louie Gohmert, Virgil Goode, Bob Goodlatte, Kay Granger, Ralph Hall, J. Dennis Hastert, Doc Hastings, Dean Heller, Jeb Hensarling, Wally Herger, Peter Hoekstra, Duncan Hunter, Bob Inglis, Darrell Issa, Sam Johnson, Walter Jones, Jim Jordan, Steve King, Peter King, Jack Kingston, John Kline, Joe Knollenberg, Randy Kuhl, Doug Lamborn, Ron Lewis, Jerry Lewis, John Linder, Frank Lucas, Daniel Lungren, Connie Mack, Donald Manzullo, Kenny Marchant, Kevin McCarthy, Michael McCaul, Thad McCotter, Jim McCrery, Patrick McHenry, John Mica, Jeff Miller, Jerry Moran, Marilyn Musgrave, Sue Myrick, Randy Neugebauer, Devin Nunes, Stevan Pearce, Mike Pence, Thomas Petri, Joe Pitts, Ted Poe, Tom Price, Adam Putnam, George Radanovich, Thomas Reynolds, Cathy McMorris Rodgers, Hal Rogers, Dana Rohrabacher, Ileana Ros-Lehtinen, Peter Roskam, Edward Royce, Paul Ryan, Bill Sali, Jean Schmidt, Jim Sensenbrenner, Pete Sessions, John Shadegg, John Shimkus, Bill Shuster, Lamar Smith, Adrian Smith, Mark Souder, Cliff Stearns, John Sullivan, Lee Terry, Mac Thornberry, Todd Tiahrt, Pat Tiberi, Timothy Walberg, Greg Walden, Zachary Wamp, Lynn Westmoreland, Ed Whitfield, Roger Wicker, Joe Wilson

Should the need arise due to a veto, you can easily contact House and Senate members by e-mail using ALA's Action Alert form.

Publishers May Challenge NIH Mandate

According to a Library Journal Academic Newswire article, publishers may challenge the provisions of the NIH Public Access Policy mandate if it is made law. The issue arises from the wording of the House bill:

Sec. 217: The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

Regarding this wording, the Library Journal Academic Newswire article says:

While seemingly innocuous, that language almost certainly will form the basis for a challenge to the policy's implementation. In a letter to lawmakers, the Association of American Publishers (AAP) argued that "a mandate may not be consistent with copyright law," a position emphasized by Brian Crawford, chair of the AAP's Professional and Scholarly Publishing Division Executive Committee. "The copyright proviso in the Labor/HHS Appropriations language does not in itself provide sufficient assurance of copyright protection," Crawford told the LJ Academic Newswire. "The mandatory deposit of copyrighted articles in an online government site for worldwide distribution is in fundamental, inherent, and unavoidable conflict with the rights of copyright holders in those works."

Urgent: Send a Message to Congress about the NIH Public Access Policy

Peter Suber has pointed out that ALA has an Action Alert that allows you to just fill in a form to send a message to your Congressional representatives about the NIH Public Access Policy.

Under "Compose Message" in the form, I suggest that you shorten the Subject to "Support the NIH Public Access Policy." As an "Issue Area" you might use "Budget" or "Health." Be sure to fill in your salutation and phone number; they are required to send an e-mail even though the form does not show them as required fields.

I’ve made slight modifications to the talking points and created a Web page so that the talking points can simply be cut and pasted into the "Editable text to" section of the form as the message.

Friday’s OAI5 Presentations

Presentations from Friday’s sessions of the 5th Workshop on Innovations in Scholarly Communication in Geneva are now available.

Here are a few highlights from this major conference:

  • Doctoral e-Theses; Experiences in Harvesting on a National and European Level (PowerPoint): "In the presentation we will show some lessons learned and the first results of the Demonstrator, an interoperable portal of European doctoral e-theses in five countries: Denmark, Germany, the Netherlands, Sweden and the UK."
  • Exploring Overlay Journals: The RIOJA project (PowerPoint): "This presentation introduces the RIOJA (Repository Interface to Overlaid Journal Archives) project, on which a group of cosmology researchers from the UK is working with UCL Library Services and Cornell University. The project is creating a tool to support the overlay of journals onto repositories, and will demonstrate a cosmology journal overlaid on top of arXiv."
  • Dissemination or Publication? Some Consequences from Smudging the Boundaries between Research Data and Research Papers (PDF): "Project StORe’s repository middleware will enable researchers to move seamlessly between the research data environment and its outputs, passing directly from an electronic article to the data from which it was developed, or linking instantly to all the publications that have resulted from a particular research dataset."
  • Open Archives, The Expectations of the Scientific Communities (RealVideo): "This analysis led the French CNRS to start the Hal project, a pluridisciplinary open archive strongly inspired by ArXiv, and directly connected to it. Hal actually automatically transfers data and documents to ArXiv for the relevant disciplins; similarly, it is connected to Pum Med and Pub Med Central for life sciences. Hal is customizable so that institutions can build their own portal within Hal, which then plays the role of an institutional archive (examples are INRIA, INSERM, ENS Lyon, and others)."

(You may want to download PowerPoint Viewer 2007 if you don’t have PowerPoint 2007).

The Depot: A UK Digital Repository

The JISC Repositories and Preservation program has established the Depot, so that researchers who do not have an institutional repository can deposit digital postprints and other digital objects.

Here’s an excerpt from the press release:

The general strategy being adopted in the UK is that every university should develop and establish its own institutional repository (IR), as part of a comprehensive ‘JISC RepositoryNet’. Many researchers can already make use of the IRs set up in their institution, but that is not (yet) the case for all. A key purpose for The Depot is to bridge that gap during the period before all have such provision, and to provide a deposit facility that will enable all UK researchers to expose their publications to readers under terms of Open Access.

The Depot will also have a re-direct function to link researchers to the appropriate home pages of their own institutional repositories. The end result should be more content in repositories, making it easier for researchers and policy makers to have peer-reviewed research results exposed to wider readership under Open Access. . . .

The principal focus for The Depot is the deposit of post-prints, digital versions of published journal articles and similar items. There are plans to include links to places for depositing other digital materials, such as research datasets and learning materials. As indicated, The Depot helps provide a level-playing field for all UK researchers and their institutions, especially when deposit under Open Access is required by grant funding bodies. It may also become a useful facility for institutions as they implement and manage their own repositories, helping to promote the habit of deposit among staff, with the simple message, ‘put it in the depot’.

The Depot is based on E-Prints software and is compliant with the Open Archive Initiative (OAI), which promotes standards for repository interoperability. Its contents will be harvested and searched through the Intute Repository Search project. It offers a redirect service, UK Repository Junction, to ensure that content that comes within the remit of an extant repository is correctly placed there instead of in The Depot.

Additionally, as IRs are created, The Depot will offer a transfer service for content deposited by authors based at those universities, to help populate the new IRs. The Depot will therefore act as a ‘keepsafe’ until a repository of choice becomes available for deposited scholarly content. In this way, The Depot will avoid competing with extant and emerging IRs while bridging gaps in the overall repository landscape and encouraging more open access deposits.

A Depot FAQ is available.

Open Access Repository Software Use By Country

Based on data from the OpenDOAR Charts service, here is snapshot of the open access repository software that is in use in the top five countries that offer such repositories.

The countries are abbreviated in the table header column as follows: US = United States, DK = Germany, UK = United Kingdom, AU = Australia, and NL = Netherlands. The number in parentheses is the reported number of repositories in that country.

Read the country percentages downward in each column (they do not total to 100% across the rows).

Excluding "unknown" or "other" systems, the highest in-country percentage is shown in boldface.

Software/Country US (248) DE (109) UK (93) AU (50) NL (44)
Bepress 17% 0% 2% 6% 0%
Cocoon 0% 0% 1% 0% 0%
CONTENTdm 3% 0% 2% 0% 0%
CWIS 1% 0% 0% 0% 0%
DARE 0% 0% 0% 0% 2%
Digitool 0% 0% 1% 0% 0%
DSpace 18% 4% 22% 14% 14%
eDoc 0% 2% 0% 0% 0%
ETD-db 4% 0% 0% 0% 0%
Fedora 0% 0% 0% 2% 0%
Fez 0% 0% 0% 2% 0%
GNU EPrints 19% 8% 46% 22% 0%
HTML 2% 4% 4% 4% 0%
iTor 0% 0% 0% 0% 5%
Milees 0% 2% 0% 0% 0%
MyCoRe 0% 2% 0% 0% 0%
OAICat 0% 0% 0% 2% 0%
Open Repository 0% 0% 3% 0% 2%
OPUS 0% 43% 2% 0% 0%
Other 6% 7% 2% 2% 0%
PORT 0% 0% 0% 0% 2%
Unknown 31% 28% 18% 46% 23%
Wildfire 0% 0% 0% 0% 52%

Snapshot Data from OpenDOAR Charts

OpenDOAR has introduced OpenDOAR Charts, a nifty new service that allows users to create and view charts that summarize data from its database of open access repositories.

Here’s what a selection of the default charts show today. Only double-digit percentage results are discussed.

  • Repositories by continent: Europe is the leader with 49% of repositories. North America places second with 33%.
  • Repositories by country: In light of the above, it is interesting that the US leads the pack with 29% of repositories. Germany (13%) and the UK follow (11%).
  • Repository software: After the 28% of unknown software, EPrints takes the number two slot (21%), followed by DSpace (19%).
  • Repository types: By far, institutional repositories are the leader at 79%. Disciplinary repositories follow (13%).
  • Content types: ETDs lead (53%), followed by unpublished reports/working papers (48%), preprints/postprints (37%), conference/workshop papers (35%), books/chapters/sections (31%), multimedia/av (20%), postprints only (17%), bibliographic references (16%), special items (15%), and learning objects (13%).

This is a great service; however, I’d suggest that University of Nottingham consider licensing it under a Creative Commons license so that snapshot charts could be freely used (at least for noncommercial purposes).

Census of Institutional Repositories in the United States

The Council on Library and Information Resources has published the Census of Institutional Repositories in the United States: MIRACLE Project Research Findings, which was written by members of the University of Michigan School of Information’s MIRACLE (Making Institutional Repositories a Collaborative Learning Environment) Project. The report is freely available in digital form.

Here is an excerpt from the CLIR press release:

In conducting the census, the authors sought to identify the wide range of practices, policies, and operations in effect at institutions where decision makers are contemplating planning, pilot testing, or implementing an IR; they also sought to learn why some institutions have ruled out IRs entirely.

The project team sent surveys to library directors at 2,147 institutions, representing all university main libraries and colleges, except for community colleges, in the United States. About 21% participated in the census. More than half of the responding institutions (53%) have done no IR planning. Twenty percent have begun to plan, 16% are actively planning and pilot testing IRs, and 11% have implemented an operational IR.

While the study confirms a number of previous survey findings on operational IRs—such as the IR’s disproportionate representation at research institutions and the leading role of the library in planning, testing, implementing, and paying for IRs—the census also offers a wealth of new insights. Among them is the striking finding that half of the respondents who had not begun planning an IR intend to do so within 24 months.

Other institutional repository surveys include the ARL Institutional Repositories SPEC Kit and the DSpace community survey.

Economists’ Self-Archiving Behavior

Ted C. Bergstrom and Rosemarie Lavaty have deposited an eprint in eScholarship that studies the self-archiving behavior of economists ("How Often Do Economists Self-Archive?").

They summarize their findings in the paper’s abstract:

To answer the question of the paper’s title, we looked at the tables of contents from two recent issues of 33 economics journals and attempted to find a freely available online version of each article. We found that about 90 percent of articles in the most-cited economics journals and about 50 percent of articles in less-cited journals are available. We conduct a similar exercise for political science and find that only about 30 percent of the articles are freely available. The paper reports a regression analysis of the effects of author and article characteristics on likelihood of posing and it discusses the implications of self-archiving for the pricing of subscription-based academic journals.

Their conclusion suggests that significant changes in journal pricing could result from self-archiving:

As more content becomes available in open access archives, publishers are faced with greater availability of close substitutes for their products and library demand for journals is likely to become more price-elastic. The increased price-responsiveness means that profit-maximizing prices will fall. As a result, it can be hoped that commercial publishers will no longer be able to charge subscription prices greatly in excess of average cost. Thus the benefits of self-archiving to the academic community are twofold. There is the direct effect of making a greater portion of the body of research available to scholars everywhere and the secondary effect of reducing the prices charged by publishers who exploit their monopoly power.

OAIster Hits 10,000,000 Records

Excerpt from the press release:

We live in an information-driven world—one in which access to good information defines success. OAIster’s growth to 10 million records takes us one step closer to that goal.

Developed at the University of Michigan’s Library, OAIster is a collection of digital scholarly resources. OAIster is also a service that continually gathers these digital resources to remain complete and fresh. As global digital repositories grow, so do OAIster’s holdings.

Popular search engines don’t have the holdings OAIster does. They crawl web pages and index the words on those pages. It’s an outstanding technique for fast, broad information from public websites. But scholarly information, the kind researchers use to enrich their work, is generally hidden from these search engines.

OAIster retrieves these otherwise elusive resources by tapping directly into the collections of a variety of institutions using harvesting technology based on the Open Archives Initiative (OAI) Protocol for Metadata Harvesting. These can be images, academic papers, movies and audio files, technical reports, books, as well as preprints (unpublished works that have not yet been peer reviewed). By aggregating these resources, OAIster makes it possible to search across all of them and return the results of a thorough investigation of complete, up-to-date resources. . . .

OAIster is good news for the digital archives that contribute material to open-access repositories. "[OAIster has demonstrated that]. . . OAI interoperability can scale. This is good news for the technology, since the proliferation is bound to continue and even accelerate," says Peter Suber, author of the SPARC Open Access Newsletter. As open-access repositories proliferate, they will be supported by a single, well-managed, comprehensive, and useful tool.

Scholars will find that searching in OAIster can provide better results than searching in web search engines. Roy Tennant, User Services Architect at the California Digital Library, offers an example: "In OAIster I searched ‘roma’ and ‘world war,’ then sorted by weighted relevance. The first hit nailed my topic—the persecution of the Roma in World War II. Trying ‘roma world war’ in Google fails miserably because Google apparently searches ‘Rome’ as well as ‘Roma.’ The ranking then makes anything about the Roma people drop significantly, and there is nothing in the first few screens of results that includes the word in the title, unlike the OAIster hit."

OAIster currently harvests 730 repositories from 49 countries on 6 continents. In three years, it has more than quadrupled in size and increased from 6.2 million to 10 million in the past year. OAIster is a project of the University of Michigan Digital Library Production Service.

ScientificCommons.org: Access to Over 13 Million Digital Documents

ScientificCommons.org is an initiative of the Institute for Media and Communications Management at the University of St. Gallen. It indexes both metadata and full-text from global digital repositories. It uses OAI-PMH to identify relevant documents. The full-text documents are in PDF, PowerPoint, RTF, Microsoft Word, and Postscript formats. After being retrieved from their original repository, the documents are cached locally at ScientificCommons.org. It has indexed about 13 million documents from over 800 repositories.

Here are some additional features from the About ScientificCommons.org page:

Identification of authors across institutions and archives: ScientificCommons.org identifies authors and assigns them their scientific publications across various archives. Additionally the social relations between the authors will be extracted and displayed. . . .

Semantic combination of scientific information: ScientificCommons.org structures and combines the scientific data to knowledge areas with Ontology’s. Lexical and statistical methods are used to identify, extract and analyze keywords. Based on this processes ScientificCommons.org classifies the scientific data and uses it e.g. for navigational and weighting purposes.

Personalization services: ScientificCommons.org offers the researchers the possibilities to inform themselves about new publications via our RSS Feed service. They can customize the RSS Feed to a special discipline or even to personalized list of keywords. Furthermore ScientificCommons.org will provide an upload service. Every researcher can upload his publication directly to ScientificCommons.org and assign already existing publications at ScientificCommons.org to his own researcher profile.

Notre Dame Institutional Digital Repository Phase I Final Report

The University of Notre Dame Libraries have issued a report about their year-long institutional repository pilot project. There is an abbreviated HTML version and a complete PDF version.

From the Executive Summary:

Here is the briefest of summaries regarding what we did, what we learned, and where we think future directions should go:

  1. What we did—In a nutshell we established relationships with a number of content groups across campus: the Kellogg Institute, the Institute for Latino Studies, Art History, Electrical Engineering, Computer Science, Life Science, the Nanovic Institute, the Kaneb Center, the School of Architecture, FTT (Film, Television, and Theater), the Gigot Center for Entrepreneurial Studies, the Institute for Scholarship in the Liberal Arts, the Graduate School, the University Intellectual Property Committee, the Provost’s Office, and General Counsel. Next, we collected content from many of these groups, "cataloged" it, and saved it into three different computer systems: DigiTool, ETD-db, and DSpace. Finally, we aggregated this content into a centralized cache to provide enhanced browsing, searching, and syndication services against the content.
  2. What we learned—We essentially learned four things: 1) metadata matters, 2) preservation now, not later, 3) the IDR requires dedicated people with specific skills, 4) copyright raises the largest number of questions regarding the fulfillment of the goals of the IDR.
  3. Where we are leaning in regards to recommendations—The recommendations take the form of a "Chinese menu" of options, and the options are be grouped into "meals." We recommend the IDR continue and include: 1) continuing to do the Electronic Theses & Dissertations, 2) writing and implementing metadata and preservation policies and procedures, 3) taking the Excellent Undergraduate Research to the next level, and 4) continuing to implement DigiTool. There are quite a number of other options, but they may be deemed too expensive to implement.

Will Self-Archiving Cause Libraries to Cancel Journal Subscriptions?

There has been a great deal of discussion of late about the impact of self-archiving on library journal subscriptions. Obviously, this is of great interest to journal publishers who do not want to wake up one morning, rub the sleep from their eyes, and find out over their first cup of coffee at work that libraries have en masse canceled subscriptions because a "tipping point" has been reached. Likewise, open access advocates do not want journal publishers to panic at the prospect of cancellations and try to turn back the clock on liberal self-archiving policies. So, this is not a scenario that any one wants, except those who would like to simply scrap the existing journal publishing system and start over with a digital tabula rosa.

So, deep breath: Is the end near?

This question hinges on another: Will libraries accept any substitute for a journal that does not provide access to the full, edited, and peer-reviewed contents of that journal?

If the answer is "yes," publishers better get out their survival kits and hunker down for the digital nuclear winter or else change business practices to embrace the new reality. Attempts to fight back by rolling back the clock may just make the situation worse: the genie is out of the bottle.

If the answer is "no," preprints pose no threat, but postprints may under some difficult to attain circumstances.

It is unlikely that a critical mass of author created postprints (i.e., author makes the preprint look like the postprint) will ever emerge. Authors would have to be extremely motivated to have this occur. If you don’t believe me, take a Word file that you submitted to a publisher and make it look exactly like the published article (don’t forget the pagination because that might be a sticking point for libraries). That leaves publisher postprints (generally PDF files).

For the worst to happen, every author of every paper published in a journal would have to self-archive the final publisher PDF file (or the publishers themselves would have to do it for the authors under mandates).

But would that be enough? Wouldn’t the permanence and stability of the digital repositories housing these postprints be of significant concern to libraries? If such repositories could not be trusted, then libraries would have to attempt to archive the postprints in question themselves; however, since postprints are not by default under copyright terms that would allow this to happen (e.g., they are not under Creative Commons Licenses), libraries may be barred from doing so. There are other issues as well: journal and issue browsing capabilities, the value-added services of indexing and abstracting services, and so on. For now, let’s wave our hands briskly and say that these are all tractable issues.

If the above problems were overcome, a significant one remains: publishers add value in many ways to scholarly articles. Would libraries let the existing system of journal publishing collapse because of self-archiving without a viable substitute for these value-added functions being in place?

There have been proposals for and experiments with overlay journals for some time, as well other ideas for new quality control strategies, but, to date, none have caught fire. Old-fashioned peer review, copy editing and fact checking, and publisher-based journal design and production still reign, even among the vast majority of e-journals that are not published by conventional publishers. In the Internet age, nothing technological stops tens of thousands of new e-journals using open source journal management software from blooming, but they haven’t so far, have they? Rather, if you use a liberal definition of open access, there are about 2,500 OA journals—a significant achievement; however, there are questions about the longevity of such journals if they are published by small non-conventional publishers such as groups of scholars (e.g., see "Free Electronic Refereed Journals: Getting Past the Arc of Enthusiasm"). Let’s face it—producing a journal is a lot of work, even a small journal that only publishes less than a hundred papers a year.

Bottom line: a perfect storm is not impossible, but it is unlikely.

Results from the DSpace Community Survey

DSpace conducted an informal survey of its open source community in October 2006. Here are some highlights:

  • The vast majority of respondents (77.6%) used or planned to use DSpace for a university IR.
  • The majority of systems were in production (53.4%); pilot testing was second (35.3%).
  • Preservation and interoperability were the highest priority system features (61.2% each), followed by search engine indexing (57.8%) and open access to refereed articles (56.9%). (Percentage of respondents who rated these features "very important.") Only 5.2% thought that OA to refereed articles was unimportant.
  • The most common type of current IR content was refereed scholarly articles and theses/dissertations (55.2% each), followed by other (48.6%) and grey literature (47.4%).
  • The most popular types of content that respondents were planning to add to their IRs were datasets (53.4%), followed by audio and video (46.6% each).
  • The most frequently used type of metadata was customized Dublin Core (80.2%), followed by XML metadata (13.8%).
  • The most common update pattern was to regularly migrate to new versions; however it took a "long time to merge in my customizations/configuration" (44.8%).
  • The most common types of modification were minor cosmetics (34.5%), new features (26.7%), and significant user interface customization (21.6%).
  • Only 30.2% were totally comfortable with editing/customizing DSpace; 56.9% were somewhat comfortable and 12.9% were not comfortable.
  • Plug-in use is light: for example, 11.2% use SRW/U, 8.6% use Manakin, and 5.2% use TAPIR (ETDs).
  • The most desired feature for the next version is a more easily customized user interface (17.5%), closely followed by improved modularity (16.7%).

For information about other recent institutional repository surveys, see "ARL Institutional Repositories SPEC Kit" and "MIRACLE Project’s Institutional Repository Survey."

MIRACLE Project’s Institutional Repository Survey

The MIRACLE (Making Institutional Repositories A Collaborative Learning Environment) project at the University of Michigan’s School of Information presented a paper at JCDL 2006 titled "Nationwide Census of Institutional Repositories: Preliminary Findings."

MIRACLE’s sample population was 2,147 library directors at four-year US colleges and universities. The paper presents preliminary findings from 273 respondents.

Respondents characterized their IR activities as: "(1) implementation of an IR (IMP), (2) planning & pilot testing an IR software package (PPT), (3) planning only (PO), or (4) no planning to date (NP)."

Of the 273 respondents, "28 (10%) have characterized their IR involvement as IMP, 42 (15%) as PPT, 65 (24%) as PO, and 138 (51%) as NP."

The top-ranked benefits of having an IR were: "capturing the intellectual capital of your institution," "better service to contributors," and "longtime preservation of your institution’s digital output." The bottom-ranked benefits were "reducing user dependence on your library’s print collection," "providing maximal access to the results of publicly funded research," and "an increase in citation counts to your institution’s intellectual output."

On the question of IR staffing, the survey found:

Generally, PPT and PO decision-makers envision the library sharing operational responsibility for an IR. Decision-makers from institutions with full-fledged operational IRs choose responses that show library staff bearing the burden of responsibility for the IR.

Of those with operational IRs who identified their IR software, the survey found that they were using: "(1) 9 for Dspace, (2) 5 for bePress, (3) 4 for ProQuest’s Digital Commons, (4) 2 for local solutions, and (5) 1 each for Ex Libris’ DigiTools and Virginia Tech’s ETD." Of those who were pilot testing software: "(1) 17 for DSpace, (2) 9 for OCLC’s ContentDM, (3) 5 for Fedora, (4) 3 each for bePress, DigiTool, ePrints, and Greenstone, (5) 2 each for Innovative Interfaces, Luna, and ETD, and (6) 1 each for Digital Commons, Encompass, a local solution, and Opus."

In terms of number of documents in the IRs, by far the largest percentages were for less than 501 documents (IMP, 41%; and PPT, 67%).

The preliminary results also cover other topics, such as content recruitment, investigative decision-making activities, IR costs, and IR system features.

It is interesting to see how these preliminary results compare to those of the ARL Institutional Repositories SPEC Kit. For example, when asked "What are the top three benefits you feel your IR provides?," the ARL survey respondents said:

  1. Enhance visibility and increase dissemination of institution’s scholarship: 68%
  2. Free, open, timely access to scholarship: 46%
  3. Preservation of and long-term access to institution’s scholarship: 36%
  4. Preservation and stewardship of digital content: 36%
  5. Collecting, organizing assets in a central location: 24%
  6. Educate faculty about copyright, open access, scholarly communication: 8%

ARL Institutional Repositories SPEC Kit

The Institutional Repositories SPEC Kit is now available from the Association of Research Libraries (ARL). This document presents the results of a thirty-eight-question survey of 123 ARL members in early 2006 about their institutional repositories practices and plans. The survey response rate was 71% (87 out of 123 ARL members responded). The front matter and nine-page Executive Summary are freely available. The document also presents detailed question-by-question results, a list of respondent institutions, representative documents from institutions, and a bibliography. It is 176 pages long.

Here is the bibliographic information: University of Houston Libraries Institutional Repository Task Force. Institutional Repositories. SPEC Kit 292. Washington, DC: Association of Research Libraries, 2006. ISBN: 1-59407-708-8.

The members of the University of Houston Libraries Institutional Repository Task Force who authored the document were Charles W. Bailey, Jr. (Chair); Karen Coombs; Jill Emery (now at UT Austin); Anne Mitchell; Chris Morris; Spencer Simons; and Robert Wright.

The creation of a SPEC Kit is a highly collaborative process. SPEC Kit Editor Lee Anne George and other ARL staff worked with the authors to refine the survey questions, mounted the Web survey, analyzed the data in SPSS, created a preliminary summary of survey question responses, and edited and formatted the final document. Given the amount of data that the survey generated, this was no small task. The authors would like to thank the ARL team for their hard work on the SPEC Kit.

Although the Executive Summary is much longer than the typical one (over 5,100 words vs. about 1,500 words), it should not be mistaken for a highly analytic research article. Its goal was to try to describe the survey’s main findings, which was quite challenging given the amount of survey data available. The full data is available in the "Survey Questions and Responses" section of the SPEC Kit.

Here are some quick survey results:

  • Thirty-seven ARL institutions (43% of respondents) had an operational IR (we called these respondents implementers), 31 (35%) were planning one by 2007, and 19 (22%) had no IR plans.
  • Looked at from the perspective of all 123 ARL members, 30% had an operational IR and, by 2007, that figure may reach 55%.
  • The mean cost of IR implementation was $182,550.
  • The mean annual IR operation cost was $113,543.
  • Most implementers did not have a dedicated budget for either start-up costs (56%) or ongoing operations (52%).
  • The vast majority of implementers identified first-level IR support units that had a library reporting line vs. one that had a campus IT or other campus unit reporting line.
  • DSpace was by far the most commonly used system: 20 implementers used it exclusively and 3 used it in combination with other systems.
  • Proquest DigitalCommons (or the Bepress software it is based on) was the second choice of implementers: 7 implementers used this system.
  • While 28% of implementers have made no IR software modifications to enhance its functionality, 22% have made frequent changes to do so and 17% have made major modifications to the software.
  • Only 41% of implementers had no review of deposited documents. While review by designated departmental or unit officials was the most common method (35%), IR staff reviewed documents 21% of the time.
  • In a check all that apply question, 60% of implementers said that IR staff entered simple metadata for authorized users and 57% said that they enhanced such data. Thirty-one percent said that they cataloged IR materials completely using local standards.
  • In another check all that apply question, implementers clearly indicated that IR and library staff use a variety of strategies to recruit content: 83% made presentations to faculty and others, 78% identified and encouraged likely depositors, 78% had library subject specialists act as advocates, 64% offered to deposit materials for authors, and 50% offered to digitize materials and deposit them.
  • The most common digital preservation arrangement for implementers (47%) was to accept any file type, but only preserve specified file types using data migration and other techniques. The next most common arrangement (26%) was to accept and preserve any file type.
  • The mean number of digital objects in implementers’ IRs was 3,844.

dLIST E-Print Archive Adds Use Statistics

Authors who deposit e-prints in dLIST (Digital Library of Information Science and Technology) can now see use statistics for their works (archive users can see use startistics as well). For example, at the record for the "Indian Digital Library in Engineering Science and Technology (INDEST) Consortium: Consortia-Based Subscription to Electronic Resources for Technical Education System in India: A Government of India Initiative," you would click on "View statistics for this eprint" to get the use statistics for this work. You can view use statistics for the past four weeks, this year, last year, or all years.

Archive-wide use statistics are also available from either an e-print record or the dLIST Statistics page. From either one, you can rank all e-prints by use for the same time periods as individual e-prints and show overall archive use by year/month or country.

Disclosure: I am now the Scholarly Communication subject editor for dLIST.

The E-Print Deposit Conundrum

How can scholars be motivated to deposit e-prints in disciplinary archives, institutional repositories, and other digital archives?

In "A Key-Stroke Koan for Our Open-Access Times," Stevan Harnad says:

Researchers themselves have hinted at the resolution to this koan: Yes, they need and want OA. But there are many other demands on their time too, and they will only perform the requisite keystrokes if their employers and/or funders require them to do it, just as it is already their employers and funders who require them to do the keystrokes to publish (or perish) in the first place. It is employers and funders who set researchers’ priorities, because it is employers and funders who reward researchers’ performance. Today, about 15% of research is self-archived spontaneously but 95% of researchers sampled report that they would self-archive if required to do so by their employers and/or funders: 81% of them willingly, 14% reluctantly; only 5% would not comply with the requirement. And in the two objective tests to date of this self-reported prediction, both have fully confirmed it, with over 90% self-archiving in the two cases where it was made a requirement (Southampton-ECS and CERN).

This is a very cogent point, but, if the solution to the problem is to have scholars’ employers compel them to deposit e-prints, the next logical question is: how can university administrators and other key decision makers be convinced to mandate this activity?

In the UK, a debate is raging between OA advocates and publishers about the UK Research Funding Councils’ (RCUK) self-archiving proposal, which would "mandate the web self-archiving of authors’ final drafts of all journal articles resulting from RCUK-funded research." The fact that this national policy debate is occuring at all is an enormous advance for open access. If RCUK mandates e-print deposit, UK university administrators will need no convincing.

In the US, we are a long way from reaching that point, although the NIH’s voluntary e-print deposit policy provides some faint glimmer of hope that key government agencies can be moved to take some kind of action. However, the US does not have an equivalent to RUCK that can make dramatic e-print policy changes that affect research universities in one fell swoop. It does have government agencies, such as NSF, that control federal grant funds, private foundations that control their own grant funds, and thousands of universities and colleges that, in theory, could establish policies. This is a diffuse and varied audience for the OA message to reach and convince, and the message will need to be tailored to the audience to be effective.

While that plays out, we should not forget scholars themselves, however dimly we view the prospects of changing their behavior to be. University librarians and IT staff know their institutions’ scholars and can work with them one-one-one or in groups to gradually influence change. True, it’s "a journey of a thousand miles" approach, but, the number of librarians and IT staff that will be effective on a national stage is small, while the number of them that may be incrementally effective on the local level is large. The efforts are complementary, not mutually exclusive.

I would urge you to read Nancy Fried Foster and Susan Gibbons’ excellent article "Understanding Faculty to Improve Content Recruitment for Institutional Repositories" for a good example of how an IR can be personalized so that faculty have a greater sense of connection to it and how IR staff can change the way they talk about the IR to better match scholars’ world view.

Here are a few brief final thoughts.

First, as is often said, scholars care about the impact of their work, and it is likely that, if scholars could easily see detailed use statistics for their works (e.g., number of requests and domain breakdowns), they might be more inclined to deposit items if those statistics exceed their expectations. So, the challenge here is to incorporate this capability into commonly used archiving software programs if it is absent.

Second, scholars are unlikely to stumble when entering bibliographic data about their works (although it might not be quite as fully descriptive as purists might like), but entering subject keywords is another matter. Sure they know what the work is about, but are they using terms that others would use and that group their work with similar works in retrieval results? Yes, a controlled vocabulary would help, although such vocabularies have their own challenges. But, I wonder if user-generated "tags," such as those used in Technorati, might be another approach. The trick here is to make the tags and the frequency of their use visible to both authors and searchers. For authors, this helps them put their works where they will be found. For searchers, it helps them find the works.

Third, it might be helpful if an author could fill out a bibliographic template for a work once and, with a single keystroke, submit it to multiple designated digital archives and repositories. So, for example, a library author might choose to submit a work to his or her institutional repository, DLIST, and E-LIS all at once. Of course, this would require a minimal level of standardization of template information between systems and the development of appropriate import capabilities. Some will say: "why bother?" True, OAI-PMH harvesting should, in theory, make duplicate deposit unnecessary given OAIster-like systems. But "lots of copies keep stuff safe," and users still take a single-archive searching approach in spite of OAI-PMH systems.