Two Views of IRs

Yesterday, Stevan Harnad offered extensive comments on my "Not Green Enough" posting. Here are my thoughts on those comments.

The crux of the matter is two very different views of institutional repositories (IRs), and, therefore, different perceptions about how quickly IRs will solve the self-archiving problem. My apologies in advance to Stevan if my capsule summary of his position is incorrect.

In Stevan’s view, the sole purpose of an IR is to provide free global access to e-prints. Once institutions adopt the Berlin 3 recommendations (which require faculty to self-archive in IRs and encourage them to publish in OA journals), establishing and running an IR is a cheap, simple technical problem. Therefore, it doesn’t matter whether publisher copyright agreements allow scholars to archive in disciplinary archives or in the Internet Archive’s universal repository. (I’m unclear about Steven’s position about independent scholars who will never be able to self-archive in an IR because they are not affiliated with any institution or about researchers who are affiliated with non-academic institutions that will never have IRs. Perhaps, in the last case, he believes that IRs will be universal for every non-academic institution.) IR managers who hold other views are obstructing progress because they are wasting time on nonessential issues, not correctly perceiving the urgency and simplicity of his self-archiving solution, and unnecessarily delaying the progress of OA.

My view of the basic function of an IR is best summed up by two quotes (the first by Clifford Lynch, Executive Director of the Coalition for Networked Information) and the second by me:

"In my view, a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution." [1]

"An institutional repository includes a variety of materials produced by scholars from many units, such as e-prints, technical reports, theses and dissertations, data sets, and teaching materials. Some institutional repositories are also being used as electronic presses, publishing e-books and e-journals." [2]

Given this vision of IRs, I see them as more technically complex than Steven. However, I see the primary challenges being in the areas of achieving buy-in from university administrators and faculty, establishing a wide range of policies and procedures (e.g., acceptable types and formats of material, deposit control and facilitation strategies, copyright compliance procedures, and metadata utilization), recruiting content (including depositing items for faculty if required to help populate the IR), providing user support and training, and providing data migration services as file formats become obsolete. Of course, if IRs a assume formal publishing role, this adds new dimensions of complexity, but I’ll defer that point for now since it is only being done in a few IRs, such as the following two examples:

eScholarship Repository
http://repositories.cdlib.org/escholarship/

Internet-First University Press at Cornell University
http://dspace.library.cornell.edu/handle/1813/62

(To clarify one point of confusion, libraries are not generally expecting IRs to solve the e-journal preservation problem. They are turning to solutions such as LOCKSS to do that.)

I do not believe that getting faculty to voluntarily deposit e-prints will be easy. I’m not convinced that most university administrators are going to be quickly and effortlessly persuaded to endorse Berlin 3 unless it is, in effect, externally mandated (e.g., Research Councils UK proposal).

I think that at least a significant subset of universities will want some type of basic vetting of the copyright compliance status of submitted e-prints, and, given the current wide range of variations in publisher copyright agreements and a relatively low level of faculty awareness and interest in copyright matters, that this will be a thorny issue (and one that directly relates to my standard copyright agreement idea).

This is why Johanneke Sytsema of Oxford University said in her comment about "How Green Is My Publisher"
(http://www.escholarlypub.com/digitalkoans/2005/04/26/how-green-is-my-publisher/#comments):

"I do agree with Charles Bailey that ‘green’ doesn’t automatically mean ‘go’. Being a repository manager myself, I never just ‘go’ when I encounter ‘green’ on the (invaluable) SHERPA Romeo list. First, I need to check whether the publisher allows archiving into an institutional repository, rather than just on a personal or departmental website. Secondly, I need to check the permitted format: some publisher[s] object to using the publisher PDF, other publishers require the use of the publisher PDF. Thirdly, I need to check on publisher policies every time I deposit, since publishers may change their policy from day to day. So, could the light get greener than it is now? I believe, it should."

Given my view of IRs, I agree with University of Rochester IR manager Susan Gibbons, when she says that the "the costs and efforts involved in maintaining an IR are substantial."

Which of these two views of institutional repositories will prevail? Time will tell.

If my view prevails, IRs will take longer than if Stevan’s view prevails. Academic authors who have papers accepted by publishers with restrictive author copyright agreements (i.e., those that bar deposit in disciplinary archives or in the universal repository) will have to wait to deposit papers in an OAI-PMH compliant archive. Lacking a way to self-archive with relative ease, they may simply choose not to do so. Non-academic authors may never be able to deposit their papers in an OAI-PMH compliant archive.

If Stevan’s view prevails, IRs will pop up like mushrooms and the above won’t matter, as long as authors enthusiastically deposit their old papers once their IRs are in place.

If the only barrier is a small investment of time and money (as Stevan describes below), it’s unclear to me why we don’t have universal IRs today:

"The 94% of authors at archiveless universities are one $2000 linux server plus a few days’ one-time sysad set-up time and a few annual sysaddays’ maintenance time away from having an institutional repository."

But, I say, Godspeed, Stevan. Prove me wrong, for that will mean that OA happens sooner, and scholars without access to IRs will be deprived of the benefits of depositing in an OAI-compliant repository (or depositing at all) for a shorter period of time.

And, I cheerfully give Steven the last word on the matter (for now anyway).

1. Clifford A. Lynch, "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age," ARL: A Bimonthly Report on Research Library Issues and Actions from ARL, CNI, and SPARC, no. 226 (2003),
http://www.arl.org/newsltr/226/ir.html

2. Charles W. Bailey, Jr., Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (Washington, DC: Association of Research Libraries, 2005), xviii,
http://info.lib.uh.edu/cwb/oab.pdf

One thought on “Two Views of IRs”

Stevan Harnad says:

May 2, 2005 at 11:07 am

Charles Bailey wrote, In “Two Views of IRs”

> CB: “The crux of the matter is two very different views of institutional repositories > (IRs), and, therefore, different perceptions about how quickly IRs will solve the > self-archiving problem.”

I think Charles has it exactly right here. There are (at least) 5 distinct aims for “IRs”:

“The 5 distinct aims for institutional repositories”

I. (RES) self-archiving institutional research output (preprints, postprints and theses)

II. (MAN) digital collection management (all kinds of digital content)

III. (PRES) digital preservation (all kinds of digital content)

IV. (TEACH) online teaching materials

V. (EPUB) electronic publication (journals and books)

There are hence also, I suppose, two different views of “self-archiving” — though I have to point out that the term “self-archiving” and even “open archives” preceded the “institutional repository” (IR) movement, was specific to (what has since come to be called) “open access” (OA) and its specific target content was the target content of (what has since come to be called) the “open access movement,” namely, refereed journal articles (both their preprints and their postprints, plus theses, i.e., RES, above). None of my disagreement with Charles is about semiotics, but words do get in our way.

The disagreement is about the agenda for II-V (MAN, PRES, TEACH and EPUB) holding back progress on the agenda for I (RES), which is the specific, focussed agenda of the OA movement:

“The literature that should be freely accessible online is that which scholars give to the world without expectation of payment. Primarily, this category encompasses their peer-reviewed journal articles, but it also includes any unreviewed preprints that they might wish to put online for comment or to alert colleagues to important research findings.”

RES (i.e., OA) is quite urgent and quite overdue. It (i.e., c. 100% OA) is also already easily within reach and now only requires a mandate from researchers’ institutions and funders. Researchers themselves have made it clear that this, and this only, is what it will take to get them to (OA) self-archive, and do so willingly. http://www.eprints.org/berlin3/ppts/02-AlmaSwan.ppt

The IR agenda (I-V, above, which does not even seem to assign any particular priority or urgency to I, which is RES, i.e., the OA subagenda of the IR agenda) has hence become (unwittingly and unintentionally, no doubt, but alas good intentions do not make up for untoward effects!) yet another “brake” on progress in OA self-archiving, adding to the long and long-standing list of groundless worries that have been holding back OA self-archiving for years now (“Zeno’s Paralysis”) further worries that don’t even have anything to do with OA self-archiving, but have to do with the IR agenda. (Indeed, although the replies to the 32 Zeno worries predated the IR terminology and the IR agenda, many of them could now be expressed as “This is a red herring for OA self-archiving and is based on conflating the OA self-archiving agenda with the IR agenda. Self-archive now, and worry about II-V later, or in parallel.”)

> CB: “In Stevan’s view, the sole purpose of an IR is to provide > free global access to e-prints.”

No, RES is the sole purpose of the OA movement (as distinct from the IR movement), and it is the urgent and 100% feasible solution to the research access/usage/impact/progress problem. Librarians — bless their hearts, which are all in the right places, but saints preserve us (sic!) from some of their bibliocentric and sometimes fatally anachronistic instincts! — seem to understand (a portion of) the access component of this problem (the “serials crisis”), but not the usage, impact and progress components, nor their solutions, in the online age.

MAN, PRES and TEACH have nothing whatsoever to do with the OA problem or its solution. And EPUB (which does include “golden” or OA Journal publication) is the far slower and more uncertain road to 100% OA, hence the wrong horse to back (particularly at the individual institutional EPUB level). Yet, wouldn’t you know it, a goodly portion of the library community had a hand in the futile, one-sided “gold rush” of the past 3 years, which only served to keep us still longer from the greener pastures of OA self-archiving!

Let it be admitted, though, that librarians are far from being alone in their mismanagement of gold and green:

The Case Against Mixing Up Green and Gold

And now it is SPARC, which it took ever so long to wean from its one-sided, short-sighted preoccupation with just driving down journal prices

“A Role for SPARC in Freeing the Refereed Literature” (2000)

that first went overboard on gold and then, partially recovered, went on to submerge the focussed, urgent OA (green) agenda in the much more diffuse IR agenda, effectively piling further needless weight onto OA’s burdens instead of lightening them.

> CB: “I’m unclear about Stevan’s position about independent scholars who will > never be able to self-archive in an IR because they are not affiliated > with any institution or about researchers who are affiliated with > non-academic institutions that will never have IRs. Perhaps, in the > last case, he believes that IRs will be universal for every non-academic > institution.”

I certainly hope that (OA) IRs will be universal to all research institutions, whether academic or not. There is certainly no reason they should not be, and every reason they should be, since doing doing it (n.b., *OA* IRs: RES) is cheap and simple, with the access/usage/impact/progress benefits amply rewarding the cost.

But what about about unaffiliated researchers? The only way I can reply is in percentage terms. If those do not speak for themselves, I give up and rest my case!

Today we have about 15% OA. Another way to put this is that 85% of our research article output is needlessly losing potential impact daily, weekly, monthly, with the size of that lost impact being estimated at between 50% and over 300% across all fields. Let’s conservatively peg the OA impact advantage at 50%: That means 85% of the 2.5 million articles published yearly are losing 50% of their potential research impact today.

Now some more percentages. It is also the case that 92% of journals are green (i.e., have given their official green light to their authors to self-archive, immediately; note that I am not saying that the green light was needed, but for those who feel it is needed, it is already there, in 92% of cases). Yet only 15% of articles are self-archived.

More percentages: Charles himself has estimated that less than 6% of universities worldwide as yet have (OA) IRs. This, despite the fact that the software is free, and the set-up and maintenance costs are risibly small, especially in relation to the benefits (for OA IRs! not necessarily for the whole II-V IR kit-and-kaboodle [MAN, PRES, TEACH, EPUB], but, to repeat, we are not talking about that; that’s not what’s urgent, that’s not what’s needlessly bleeding daily research impact).

More percentages: the 2 JISC international, interdisciplinary author surveys have reported that 79% of authors say they will self-archive — and self-archive willingly — but only if their employers and/or funders require it. If we add those who say they will comply grumblingly, we reach 96%, meaning only 4% of authors who say they would not comply with a self-archiving mandate.

Now, in the face of all these percentages, Charles asks me to say what will become of the articles of unaffiliated scholars! (Can we please see to the welfare of the dog, before worrying about the welfare of the flea on its tail? I am a vegetarian, and I care about the welfare of all living organisms, but there are numbers and priorities to reckon too. There are obvious solutions for unaffiliated scholars; but can we please keep them in proportion, rather than amplifying them into yet another antigen in the pandemic of Zeno’s Paralysis?)

> CB: “(To clarify one point of confusion, libraries are not generally > expecting IRs to solve the e-journal preservation problem. They are > turning to solutions such as LOCKSS to do that.)”

Well thank goodness that canard, at least, has been removed from Zeno’s list of 32, but alas only by Charles! That’s one down and how many more librarians to convince that journal article preservation has absolutely nothing to do with OA self-archiving?

And does this mean that IR PRES will no longer be cited as a retardant on OA IR RES?

> CB: “I do not believe that getting faculty to voluntarily deposit e-prints > will be easy. I’m not convinced that most university administrators > are going to be quickly and effortlessly persuaded to endorse Berlin 3 > unless it is, in effect, externally mandated (e.g., Research Councils > UK proposal).”

There are two speculations here: (1) that it will be hard to persuade faculty to self-archive and (2) that it will be hard to persuade administrators to require faculty to self-archive. Charles’s speculation may or may not be right. (Faculty themselves seem to be saying (1) will be easy if only (2) happens.) I prefer to work to make (2) happen, rather than speculating about whether it will happen, and how hard it will be. (I already know, after over a decade of archivangelizing, how hard it is to persuade people to see and do the optimal and inevitable!)

By the way, we need both institutional and funder mandates (and the funder mandates should be for *institutional* self-archiving as the preferred mode, for many, many reasons).

A Simple Way to Optimize the NIH Public Access Policy

And, as a proof of the fact that it can be done, there are already some universities and research institutions that have adopted (and registered) self-archiving policies (may their tribes increase!):

But they are far too few; and meanwhile 50% of potential research impact continues to be lost — needlessly and cumulatively — for 85% of research output worldwide.

> CB: “I think that at least a significant subset of universities will want > some type of basic vetting of the copyright compliance status of submitted > e-prints, and, given the current wide range of variations in publisher > copyright agreements and a relatively low level of faculty awareness > and interest in copyright matters, that this will be a thorny issue > (and one that directly relates to my standard copyright agreement idea).”

It will be as thorny an issue as we choose (needlessly and arbitrarily) to make it. Ninety-two percent of journals are green on institutional self-archiving. There is no need for the 8% tail to wag the 92% dog. 92% of research output can be self-archived immediately. For the time being, that portion of the 8% tail that is concerned about publisher policy can self-archive the metadata and full-text, and set the access for the full-text as institutional-internal for the time being, instead of OA, and the authors can email the full-text to all eprint-requesters (who will have seen the metadata for the 8%, along with the full-texts for the 92%). That cobbled solution will suffice to stanch all needless impact loss for the 92%, and most of it — if less conveniently — for the remaining 8% as well.

But not if we don’t do it, and instead keep fussing about permissions and copyright reform!

> CB: “This is why Johanneke Sytsema of Oxford University said [what she said]”

And that is why I replied what I replied:

Librarians must learn to distinguish OA from IRs (i.e., RES, from MAN, PRES, TEACH and EPUB) and to treat the former separately, on its own very special terms; and they must learn to understand the research access/usage/impact/progress problem from the point of view of the needs of research and researchers in the online age, not the habits and expectations of librarians in the on-paper age, reflexively carried over, kit and kaboodle, to the online medium. Some adaptations are in order, and I think it is librarian concepts and practice that need to adapt to research needs and the possibilities of the new medium, not vice versa.

> CB: “Which of these two views of institutional repositories will prevail’ > Time will tell. If my view prevails, IRs will take longer than if Stevan’s > view prevails. Academic authors who have papers accepted by publishers > with restrictive author copyright agreements (i.e., those that bar > deposit in disciplinary archives or in the universal repository) will > have to wait to deposit papers in an OAI-PMH compliant archive. Lacking > a way to self-archive with relative ease, they may simply choose not to > do so. Non-academic authors may never be able to deposit their papers > in an OAI-PMH compliant archive.”

I am working for the prevalence of a practice, and the benefits it brings, not for the prevalence of a view. The practice I advocate has been demonstrated to work, and to deliver the promised benefits, because it is already being practised by 15% of researchers. The bottle-neck for remaining 15% is purely mental, not practical or legal or financial. The library community (whose benign motivations are not for a moment in doubt, and who have already done a great deal to awaken the research community to the budgetary side of the access problem) have a choice now as to whether they want to become a part of the solution or the problem, insofar as OA self-archiving is concerned. If they opt for promoting the view that OA self-archiving needs to be subordinated to wider IR (MAN, PRES, TEACH, EPUB) and publishing/copyright reform agendas, they opt (in my view) to become part of the (OA) problem, rather than the (already tested and proven) solution.

> CB: “If the only barrier is a small investment of time and money (as Stevan > describes …), it’s unclear to me why we don’t have universal IRs today”

But it is quite clear to me precisely what is still missing! the universal adoption of institutional (OA) IR-filling policies, i.e. institutional self-archiving policies along the lines recommended by Berlin 3:

http://www.eprints.org/berlin3/outcomes.html http://www.ecs.soton.ac.uk/~harnad/Temp/berlin3-harnad.ppt http://www.eprints.org/signup/sign.php

Stevan Harnad

Comments are closed.