ARL Institutional Repositories

The Association of Research Libraries (ARL) currently has 123 member libraries in the US and Canada. Below is a list of operational institutional repositories at ARL libraries. This list was complied by a quick examination of ARL libraries’ home pages, supplemented with a bit of Google searching. I certainly wouldn’t claim that it’s comprehensive, and I would welcome additions. (Quick note to ARL library Web site managers: put a highly visible link to your IR on your home page.)

While not perfect (what is?), this list does give us a rough snapshot of the level of IR activity in ARL libraries, and it provides some insight into how these large research libraries have chosen to structure and support their IRs (can you say bepress and DSpace?).

Two Views of IRs

Yesterday, Stevan Harnad offered extensive comments on my "Not Green Enough" posting. Here are my thoughts on those comments.

The crux of the matter is two very different views of institutional repositories (IRs), and, therefore, different perceptions about how quickly IRs will solve the self-archiving problem. My apologies in advance to Stevan if my capsule summary of his position is incorrect.

In Stevan’s view, the sole purpose of an IR is to provide free global access to e-prints. Once institutions adopt the Berlin 3 recommendations (which require faculty to self-archive in IRs and encourage them to publish in OA journals), establishing and running an IR is a cheap, simple technical problem. Therefore, it doesn’t matter whether publisher copyright agreements allow scholars to archive in disciplinary archives or in the Internet Archive’s universal repository. (I’m unclear about Steven’s position about independent scholars who will never be able to self-archive in an IR because they are not affiliated with any institution or about researchers who are affiliated with non-academic institutions that will never have IRs. Perhaps, in the last case, he believes that IRs will be universal for every non-academic institution.) IR managers who hold other views are obstructing progress because they are wasting time on nonessential issues, not correctly perceiving the urgency and simplicity of his self-archiving solution, and unnecessarily delaying the progress of OA.

My view of the basic function of an IR is best summed up by two quotes (the first by Clifford Lynch, Executive Director of the Coalition for Networked Information) and the second by me:

"In my view, a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution." [1]

"An institutional repository includes a variety of materials produced by scholars from many units, such as e-prints, technical reports, theses and dissertations, data sets, and teaching materials. Some institutional repositories are also being used as electronic presses, publishing e-books and e-journals." [2]

Given this vision of IRs, I see them as more technically complex than Steven. However, I see the primary challenges being in the areas of achieving buy-in from university administrators and faculty, establishing a wide range of policies and procedures (e.g., acceptable types and formats of material, deposit control and facilitation strategies, copyright compliance procedures, and metadata utilization), recruiting content (including depositing items for faculty if required to help populate the IR), providing user support and training, and providing data migration services as file formats become obsolete. Of course, if IRs a assume formal publishing role, this adds new dimensions of complexity, but I’ll defer that point for now since it is only being done in a few IRs, such as the following two examples:

eScholarship Repository
http://repositories.cdlib.org/escholarship/

Internet-First University Press at Cornell University
http://dspace.library.cornell.edu/handle/1813/62

(To clarify one point of confusion, libraries are not generally expecting IRs to solve the e-journal preservation problem. They are turning to solutions such as LOCKSS to do that.)

I do not believe that getting faculty to voluntarily deposit e-prints will be easy. I’m not convinced that most university administrators are going to be quickly and effortlessly persuaded to endorse Berlin 3 unless it is, in effect, externally mandated (e.g., Research Councils UK proposal).

I think that at least a significant subset of universities will want some type of basic vetting of the copyright compliance status of submitted e-prints, and, given the current wide range of variations in publisher copyright agreements and a relatively low level of faculty awareness and interest in copyright matters, that this will be a thorny issue (and one that directly relates to my standard copyright agreement idea).

This is why Johanneke Sytsema of Oxford University said in her comment about "How Green Is My Publisher"
(http://www.escholarlypub.com/digitalkoans/2005/04/26/how-green-is-my-publisher/#comments):

"I do agree with Charles Bailey that ‘green’ doesn’t automatically mean ‘go’. Being a repository manager myself, I never just ‘go’ when I encounter ‘green’ on the (invaluable) SHERPA Romeo list. First, I need to check whether the publisher allows archiving into an institutional repository, rather than just on a personal or departmental website. Secondly, I need to check the permitted format: some publisher[s] object to using the publisher PDF, other publishers require the use of the publisher PDF. Thirdly, I need to check on publisher policies every time I deposit, since publishers may change their policy from day to day. So, could the light get greener than it is now? I believe, it should."

Given my view of IRs, I agree with University of Rochester IR manager Susan Gibbons, when she says that the "the costs and efforts involved in maintaining an IR are substantial."

Which of these two views of institutional repositories will prevail? Time will tell.

If my view prevails, IRs will take longer than if Stevan’s view prevails. Academic authors who have papers accepted by publishers with restrictive author copyright agreements (i.e., those that bar deposit in disciplinary archives or in the universal repository) will have to wait to deposit papers in an OAI-PMH compliant archive. Lacking a way to self-archive with relative ease, they may simply choose not to do so. Non-academic authors may never be able to deposit their papers in an OAI-PMH compliant archive.

If Stevan’s view prevails, IRs will pop up like mushrooms and the above won’t matter, as long as authors enthusiastically deposit their old papers once their IRs are in place.

If the only barrier is a small investment of time and money (as Stevan describes below), it’s unclear to me why we don’t have universal IRs today:

"The 94% of authors at archiveless universities are one $2000 linux server plus a few days’ one-time sysad set-up time and a few annual sysaddays’ maintenance time away from having an institutional repository."

But, I say, Godspeed, Stevan. Prove me wrong, for that will mean that OA happens sooner, and scholars without access to IRs will be deprived of the benefits of depositing in an OAI-compliant repository (or depositing at all) for a shorter period of time.

And, I cheerfully give Steven the last word on the matter (for now anyway).

1. Clifford A. Lynch, "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age," ARL: A Bimonthly Report on Research Library Issues and Actions from ARL, CNI, and SPARC, no. 226 (2003),
http://www.arl.org/newsltr/226/ir.html

2. Charles W. Bailey, Jr., Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (Washington, DC: Association of Research Libraries, 2005), xviii,
http://info.lib.uh.edu/cwb/oab.pdf

Not Green Enough

Yesterday, Stevan Harnad took the time to comment extensively on my "How Green Is My Publisher?" posting. Thanks for doing so, Stevan. Here are some further thoughts on the matter.

CB:My publication page, check. We don’t have an institutional repository yet, but I assume that "other external Web site" will cover that when we do, check. Wait a minute, what if I want to deposit the e-print in a disciplinary archive like E-LIS or I want to put it in the Internet Archive’s upcoming "OAI-compliant ‘universal repository‘"? Looks to me like I’m out of luck. No way to immediately deposit the paper in an OAI-PMH compliant archive that will have a longer life than my Website and that can be harvested by OAI-PMH search services, such as OAIster.

SH: "The restrictions on 3rd-party archives are perfectly reasonable and no problem whatsoever at this time. The problem today (just so we keep our eyes on the ball!) is the non-archiving of 85% of articles, hence their inaccessibility to all those would-be users whose universities cannot afford access to the journal’s official version! It is cheap and easy for any university to create an OAI-compliant institutional archive, and OAIster can happily harvest the metadata.
http://archives.eprints.org/eprints.php?action=browse"

eprints.org’s Institutional Archives Registry currently shows a total of 424 archives. When we browse by archive type, we discover that there are 192 "Research Institutional or Departmental" registered archives worldwide. Of course, “Departmental” archives are not institutional repositories. They do not have an institutional scope of coverage, nor are they as likely as institutional archives to be permanent. True, departments are relatively stable, but their commitment to maintaining archives may not be (e.g., the archive may be the pet project of one or a few faculty members). By contrast, once an institution commits to having an archive, it’s likely to be a more permanent arrangement, especially if it is run by a library.

But, let’s wave our hands, and say 100% of them are institutional repositories (IRs). Universities Worldwide, which is "based on the ‘World List of Universities 1997’ published by the International Association of Universities (IAU) and links discovered or posted here," currently lists 7,130 universities in 181 countries. Assuming that this is a good rough approximation, that means that about 6% of all universities have IRs. Meaning, of course, that 94% do not.

And that means that 94% of authors at universities cannot self-archive in an institutional repository (or, given the hand waving, in a departmental archive). True, they can self-archive on personal Web pages. The issues with this strategy are: (a) how may authors have up-to-date publication pages or have publication pages at all?, (b) how long will they last (i.e., authors change jobs, retire, and die)?, and (c) there is no OAI-PMH access to those pages, so they don’t show up in OAIster and similar search engines.

Now, disciplinary archives and the Internet Archive’s universal repository solve these problems. Moreover, they solve another problem: independent scholars, corporate researchers, and other non-academic authors may never have an institutional repository to self-archive in.

I don’t see this as "no problem whatsoever at this time." Quite the contrary. To be "no problem," we would have to believe that it doesn’t matter if articles are archived in OAI-PMH compliant repositories or archives. To be "no problem," we would have to not care whether scholars who will never have an institutional repository at their disposal can self-archive.

As to the question of it being "cheap and easy for any university to create an OAI-compliant institutional archive," I think there is some difference of opinion on that point. Susan Gibbons says [1] the "the costs and efforts involved in maintaining an IR are substantial," and she provides these annual IR cost estimates:

  1. $285,000, MIT
  2. $100,000 (Canadian), Queens University (for staffing only)
  3. $200,000, University of Rochester
  4. between 2,280 and 3,190 staff hours,University of Oregon

But, of course, these differences in perception about costs relate to some degree to Stevan’s next point:

SH: (And worrying about the preservation of non-existent contents is rather putting the cart before the horse. The self-archived OA versions of a goodly portion of the 15% of the articles that have been self-archived in the past 15 years are still online and OA to tell the tale to this day. All their publishers’ official versions are too. So fussing about the permanence of the non-contents of cupboards that are in any case meant to be access-supplements, not the official version of record, is rather misplaced, when what is immediately missing and urgently needed is their presence, not their permanence.)

I think that Stevan will find that few academic libraries are not going to worry about permanence. Not only will they worry about the permeance of digital objects in their repositories, they will also worry about the permanence of publisher’s archives. Librarians know that publishers are corporations, and that corporations change priorities, merge, and fail. As libraries increasingly abandon print subscriptions and go e-only for economic reasons, at some point there will be no permanent distributed print archive of new journal issues in libraries worldwide as there is today, and libraries are going to worry about that a great deal. Moreover, universities are not going to establish institutional repositories just to support OA. That may be one important item on the agenda, but there will be other archiving needs to be met as well, and factors associated with those digital objects will affect the perception of the need for overall IR preservation too.

Libraries are also going to provide new services to provide IR support in addition to technical support, ranging from convicting faculty to self-archive and helping them do so to training users in using IRs (as well as other e-print services worldwide). These services will cost money.

Don’t want libraries to lead the IR effort if this is true?

In the words of Bob Dylan:

I asked the captain what his name was
And how come he didn’t drive a truck
He said his name was Columbus
I just said, "Good luck."

Moving on.

CB: “The agreement also states that the e-print must contain a fair amount of information about the publisher and the paper: the published article’s citation and copyright date, the publisher’s address, information about the publisher’s document delivery service, and a link to the publisher’s home page.”

SH: That’s just fine too. It is only good scholarly practice to provide the full reference information and to link to the official version of record for the sake of all those potential users who can afford it. What is wrong with that, and why would any author not want to do that?

Sure, an author would want to provide a citation to the published paper and a link to it, but I suspect few will be excited about providing a fair amount of advertising information for the publisher in their e-prints, such as the publisher’s address, home page, and document delivery service. It’s not a deal killer, but it’s more work for authors or IR staff. The more individual publisher variations that there are in copyright transfer agreements, the harder it is for scholars and IR staff to meet these varying requirements.

CB: Second, it would be helpful if such directories could identify whether articles can be deposited in key types of archives. I know that we don’t want the color codes to look like SpeedyGrl.com’s Ultimate Color Table, but I think that this is an important factor in addition to the type of e-print permitted.

SH: They already do. The main distinction is the author’s own institutional archive versus central (3rd-party) archives. It is the former that are the critical ones. The rest can be done by metadata harvesting.

The SHERPA colors do not make this distinction. Neither do the otherwise helpful notes. You must look at each specific agreement (if there is a link to it).

CB: Fourth, although copyright transfer agreements have always been a confusing mess, now we want authors to actually read and evaluate them, not just mindlessly sign them like they did when digital archiving wasn’t an issue. And institutional repository managers (or archive managers) need to make sense of them post facto to determine if articles can be legally deposited and what terms apply to those deposits. So, maybe it’s time to tilt at a new windmill: a set of standardized copyright transfer agreements. I know, it’s like trying to herd several thousand hyperactive cats. But, a few years ago, getting standardized use statistics for electronic resources from publishers seemed hopeless, and some progress has been made on that score.

SH: No, it’s not more windmills or red herrings that researchers, their institutions, their funders, and research itself need: What they need is to go ahead and self-archive.

Developing clear, understandable standard copyright transfer agreements is a red herring? Let’s look at just one aspect of the problem: IR managers’ copyright concerns. I offer some quotes:

"One aspect of the survey [baseline survey of research material already held on departmental and personal Web pages in the ed.ac.uk domain] that is not shown in the results is the lack of consistency in dealing with copyright and IPR issues that scholars face when placing material online. Some academic units have responded by not self-archiving any material at all. A rather worrying example of this is the School of Law (—do they know something that we don’t?) A small percentage of individual scholars have responded by using general disclaimers that may or may not be effective. Others, generally well-established professors, have posted material online that is arguably in breach of copyright agreements, e.g. whole book chapters. Most, however, take a middle line of only posting papers from sympathetic publishers who allow some form of self-archiving. It is apparent that if institutional repositories are going to work, then this general confusion over copyright and IPR issues needs to be addressed right at the source." [2]

"Filling a repository for published and peer-reviewed papers is a slow process, and it is clear that it is a task that requires a significant amount of staff input from those charged with developing the repository. Although we have succeeded in adding a reasonable amount of content to the repository we have also been offered significant amounts of content that cannot be added because of restrictive publisher copyright agreements. In some cases academics have offered between ten and twenty articles and we have not been able to add any of them to the repository. This is a clear demonstration that major changes need to take place at a high level in order for repositories to be successful." [3]

Certainly, all OA advocates are eager to get on with the business of doing OA vs. simply reflecting on it, and few have done as much as Stevan to advance the cause, but, in my view, the issues I’ve raised warrant further consideration and action.

Notes

1. Susan Gibbons, "Establishing an Institutional Repository," Library Technology Reports 40, no. 4 (2004): 54, 56.

2. Theo Andrew, "Trends in Self-Posting of Research Material Online by Academic Staff." Ariadne, no. 37 (2003),
http://www.ariadne.ac.uk/issue37/andrew/intro.html.

3. Morag Mackie, "Filling Institutional Repositories: Practical Strategies from the DAEDALUS Project," Ariadne, no. 39 (2004),
http://www.ariadne.ac.uk/issue39/mackie/intro.html.

How Green Is My Publisher?

Back in the early 1990s, I began to fight to retain the copyright to my scholarly writings. First, the publishers thought I was kidding. Then, when it was clear that I wasn’t, they thought I was nuts. Generally, they weren’t willing to negotiate. So, I sought out the few journals that would comply with this strange whim or that had editors who would "forget" to have me sign an author agreement. Unfortunately, some of the more liberal journals got gobbled up by megapublishers, limiting my options and casting some doubt on handshake deals. Once e-only journals by nonconventional publishers took off, they became my venue of choice, since they typically allowed copyright retention by default.

Things have changed, in large part do to the growing influence of the open access movement. Now, many publishers allow self archiving of e-prints (electronic preprints or postprints), and this, in theory, means that authors can cheerfully assign their copyrights to those publishers. How many publishers do this? Well we don’t know for sure, but according to Summary Statistics So Far (whose figures are based on the Romeo Project), 92% of the 8,450 processed journals are "green," (can archive postprint) or "pale green"(can archive preprint). (Gray means you can’t archive either one.)

If you want to self archive a scholarly article, the SHERPA Publisher Copyright Policies & Self-Archiving site is the place to go to determine whether the publisher of the journal you have in mind for your article will permit it. So, when approached recently about writing a paper for a library publisher (let’s call it X), I fired up Mozilla and looked X up. Good news, X is green, meaning "can archive pre-print and post-print." Not the dreaded white ("archiving not formally supported"), not yellow ("can archive pre-print (ie pre-refereeing)"), not even blue ("can archive post-print (ie final draft post-refereeing)"), but green. SHERPA did warn me of two conditions: "Published source must be acknowledged" and "Eprint server is non-profit." No problemo, right? Being ever cautious, I then used the handy link to the actual policy.

Here’s what I found. My "preprint distribution rights" allow "posting as electronic files on the contributor’s own Web site for personal or professional use, or on the contributor’s internal university/corporate intranet or network, or other external Web site at the contributor’s university or institution, but not for either commercial (for-profit) or systematic third party sales or dissemination, by which is meant any interlibrary loan or document delivery systems. The contributor may update the preprint with the final version of the article after review and revision by the journal’s editor(s) and/or editorial/peer-review board."

My publication page, check. We don’t have an institutional repository yet, but I assume that "other external Web site" will cover that when we do, check. Wait a minute, what if I want to deposit the e-print in a disciplinary archive like E-LIS or I want to put it in the Internet Archive’s upcoming "OAI-compliant ‘universal repository‘"? Looks to me like I’m out of luck. No way to immediately deposit the paper in an OAI-PMH compliant archive that will have a longer life than my Website and that can be harvested by OAI-PMH search services, such as OAIster.

The agreement also states that the e-print must contain a fair amount of information about the publisher and the paper: the published article’s citation and copyright date, the publisher’s address, information about the publisher’s document delivery service, and a link to the publisher’s home page. Guess I can do this when I’m modifying the article to incorporate the editorial changes. That should keep me off the streets.

So, what can we conclude from this brief dip into the murky waters of author agreements other than retaining rights may still be a good idea (if you can do it)?

First, There are swirling currents of complexity beneath the placid surface of color-coded copyright transfer agreement directories. This is not to say that such directories are not indispensible (or not doing a good job), but rather that, given the idiosyncratic nature of such agreements, authors still need to read the details if they want to be fully aware of their residual rights. They may not always like what they find, and what they find may affect their willingness to self archive if it’s too limiting or burdensome. "Green" may not always mean "go."

Second, it would be helpful if such directories could identify whether articles can be deposited in key types of archives. I know that we don’t want the color codes to look like SpeedyGrl.com’s Ultimate Color Table, but I think that this is an important factor in addition to the type of e-print permitted.

Third, if claims are going to made about the number of "green" journals, maybe more consideration about what "green" means is in order, and perhaps OA advocates should agree on their color schemes. Is "can archive pre-print and post-print" enough for "green," or should it be "can archive pre-print and post-print on the author’s Website or in any noncommercial archive or repository"? If the latter, the heat should be turned up on publishers that don’t permit it by authors and OA advocates.

Fourth, although copyright transfer agreements have always been a confusing mess, now we want authors to actually read and evaluate them, not just mindlessly sign them like they did when digital archiving wasn’t an issue. And institutional repository managers (or archive managers) need to make sense of them postfacto to determine if articles can be legally deposited and what terms apply to those deposits. So, maybe it’s time to tilt at a new windmill: a set of standardized copyright transfer agreements. I know, it’s like trying to herd several thousand hyperactive cats. But, a few years ago, getting standardized use statistics for electronic resources from publishers seemed hopeless, and some progress has been made on that score.

Heading for the Exits

At SPARC: "Rick Johnson, SPARC’s founding Executive Director, has announced his decision to resign. Heather Joseph has been named to succeed him. Joseph is the founding President and Chief Operating Officer of BioOne, an innovative aggregation of high-impact bioscience research journals. The change in SPARC leadership is effective July 1, 2005." And, at BioMed Central: "Jan Velterop, Director and Publisher of BioMed Central, will be leaving to pursue independently his many engagements as an advocate of Open Access to societies, funding institutions and publishers. Matthew Cockerill and Anne Greenwood will take joint responsibility for publishing and other activities of BioMed Central as the business continues its rapid growth." (Thanks to Peter Suber for the second one.)

What are the chances that these two major figures in the scholarly publishing reform movement would have their resignations announced within a day of each other? Let’s hope it’s not a trend. Rick Johnson did a bang-up job of "creating change" at SPARC, and Jan Velterop vigorously led the OA journal charge at BioMed Central, fostering the development of over 100 journals. Kudos and best wishes to both. I’m sure we haven’t heard the last of them.

Family Entertainment and Copyright Act

In "House OKs Family Copyright Bill," Wired News reports on the passage of the Family Entertainment and Copyright Act, which "Exempts from copyright and trademark infringement, under certain circumstances: (1) making limited portions of the audio or video content of a motion picture for private home viewing imperceptible; or (2) the creation of technology that enables such editing."

Just image what Kill Bill looks like on ClearPlay. Not even time to eat your popcorn. If protecting the artistic integrity of movies doesn’t matter to you, I suppose this law is harmless enough, but is it the infamous "slippery slope"? First families in private showing in homes, then schools in public showings, then who knows? Or, first DVDs, then other digital media? Or, first sex and violence, then other potentially objectionable material? Maybe e-textbooks with that pesky evolution concept neatly excised on demand by concerned parents or schools. Or, maybe that’s creationism instead. After all, what is objectionable is in the eye of the beholder.

The Access Principle: The Case for Open Access to Research and Scholarship

John Willinsky’s book, The Access Principle: The Case for Open Access to Research and Scholarship, will be released in December by MIT Press. The blurb indicates: "A commitment to scholarly work, writes Willinsky, carries with it a responsibility to circulate that work as widely as possible: this is the access principle."

Interesting. OA as a "responsibility," perhaps even a moral obligation. Often OA advocates discuss the benefits to authors of widespread digital exposure through OA, which boils down to enlighted self interest. And, of course, there is mandatory discussion of the need for access for the disenfranchised (not just the developing world, but anyone that can’t afford toll fees) in order to promote scholarship and other activities. (Let’s face it, who isn’t disenfranchised these days?) But, "responsibility," . . . hmmm, that heats up the dialog.

In any case, here’s a bit more: "Willinsky describes different types of access—the New England Journal of Medicine, for example, grants open access to issues six months after initial publication, and First Monday forgoes a print edition and makes its contents immediately accessible at no cost. He discusses the contradictions of copyright law, the reading of research, and the economic viability of open access. He also considers broader themes of public access to knowledge, human rights issues, lessons from publishing history, and ‘epistemological vanities.’"

By the way, Willinsky is a key figure in the Public Knowledge Project, which provides cool open source software such as Open Journal Systems and Open Conference Systems. (Thanks to Adrian Ho for the tip on this book.)