Not Green Enough

Yesterday, Stevan Harnad took the time to comment extensively on my "How Green Is My Publisher?" posting. Thanks for doing so, Stevan. Here are some further thoughts on the matter.

CB:My publication page, check. We don’t have an institutional repository yet, but I assume that "other external Web site" will cover that when we do, check. Wait a minute, what if I want to deposit the e-print in a disciplinary archive like E-LIS or I want to put it in the Internet Archive’s upcoming "OAI-compliant ‘universal repository‘"? Looks to me like I’m out of luck. No way to immediately deposit the paper in an OAI-PMH compliant archive that will have a longer life than my Website and that can be harvested by OAI-PMH search services, such as OAIster.

SH: "The restrictions on 3rd-party archives are perfectly reasonable and no problem whatsoever at this time. The problem today (just so we keep our eyes on the ball!) is the non-archiving of 85% of articles, hence their inaccessibility to all those would-be users whose universities cannot afford access to the journal’s official version! It is cheap and easy for any university to create an OAI-compliant institutional archive, and OAIster can happily harvest the metadata.
http://archives.eprints.org/eprints.php?action=browse"

eprints.org’s Institutional Archives Registry currently shows a total of 424 archives. When we browse by archive type, we discover that there are 192 "Research Institutional or Departmental" registered archives worldwide. Of course, “Departmental” archives are not institutional repositories. They do not have an institutional scope of coverage, nor are they as likely as institutional archives to be permanent. True, departments are relatively stable, but their commitment to maintaining archives may not be (e.g., the archive may be the pet project of one or a few faculty members). By contrast, once an institution commits to having an archive, it’s likely to be a more permanent arrangement, especially if it is run by a library.

But, let’s wave our hands, and say 100% of them are institutional repositories (IRs). Universities Worldwide, which is "based on the ‘World List of Universities 1997’ published by the International Association of Universities (IAU) and links discovered or posted here," currently lists 7,130 universities in 181 countries. Assuming that this is a good rough approximation, that means that about 6% of all universities have IRs. Meaning, of course, that 94% do not.

And that means that 94% of authors at universities cannot self-archive in an institutional repository (or, given the hand waving, in a departmental archive). True, they can self-archive on personal Web pages. The issues with this strategy are: (a) how may authors have up-to-date publication pages or have publication pages at all?, (b) how long will they last (i.e., authors change jobs, retire, and die)?, and (c) there is no OAI-PMH access to those pages, so they don’t show up in OAIster and similar search engines.

Now, disciplinary archives and the Internet Archive’s universal repository solve these problems. Moreover, they solve another problem: independent scholars, corporate researchers, and other non-academic authors may never have an institutional repository to self-archive in.

I don’t see this as "no problem whatsoever at this time." Quite the contrary. To be "no problem," we would have to believe that it doesn’t matter if articles are archived in OAI-PMH compliant repositories or archives. To be "no problem," we would have to not care whether scholars who will never have an institutional repository at their disposal can self-archive.

As to the question of it being "cheap and easy for any university to create an OAI-compliant institutional archive," I think there is some difference of opinion on that point. Susan Gibbons says [1] the "the costs and efforts involved in maintaining an IR are substantial," and she provides these annual IR cost estimates:

  1. $285,000, MIT
  2. $100,000 (Canadian), Queens University (for staffing only)
  3. $200,000, University of Rochester
  4. between 2,280 and 3,190 staff hours,University of Oregon

But, of course, these differences in perception about costs relate to some degree to Stevan’s next point:

SH: (And worrying about the preservation of non-existent contents is rather putting the cart before the horse. The self-archived OA versions of a goodly portion of the 15% of the articles that have been self-archived in the past 15 years are still online and OA to tell the tale to this day. All their publishers’ official versions are too. So fussing about the permanence of the non-contents of cupboards that are in any case meant to be access-supplements, not the official version of record, is rather misplaced, when what is immediately missing and urgently needed is their presence, not their permanence.)

I think that Stevan will find that few academic libraries are not going to worry about permanence. Not only will they worry about the permeance of digital objects in their repositories, they will also worry about the permanence of publisher’s archives. Librarians know that publishers are corporations, and that corporations change priorities, merge, and fail. As libraries increasingly abandon print subscriptions and go e-only for economic reasons, at some point there will be no permanent distributed print archive of new journal issues in libraries worldwide as there is today, and libraries are going to worry about that a great deal. Moreover, universities are not going to establish institutional repositories just to support OA. That may be one important item on the agenda, but there will be other archiving needs to be met as well, and factors associated with those digital objects will affect the perception of the need for overall IR preservation too.

Libraries are also going to provide new services to provide IR support in addition to technical support, ranging from convicting faculty to self-archive and helping them do so to training users in using IRs (as well as other e-print services worldwide). These services will cost money.

Don’t want libraries to lead the IR effort if this is true?

In the words of Bob Dylan:

I asked the captain what his name was
And how come he didn’t drive a truck
He said his name was Columbus
I just said, "Good luck."

Moving on.

CB: “The agreement also states that the e-print must contain a fair amount of information about the publisher and the paper: the published article’s citation and copyright date, the publisher’s address, information about the publisher’s document delivery service, and a link to the publisher’s home page.”

SH: That’s just fine too. It is only good scholarly practice to provide the full reference information and to link to the official version of record for the sake of all those potential users who can afford it. What is wrong with that, and why would any author not want to do that?

Sure, an author would want to provide a citation to the published paper and a link to it, but I suspect few will be excited about providing a fair amount of advertising information for the publisher in their e-prints, such as the publisher’s address, home page, and document delivery service. It’s not a deal killer, but it’s more work for authors or IR staff. The more individual publisher variations that there are in copyright transfer agreements, the harder it is for scholars and IR staff to meet these varying requirements.

CB: Second, it would be helpful if such directories could identify whether articles can be deposited in key types of archives. I know that we don’t want the color codes to look like SpeedyGrl.com’s Ultimate Color Table, but I think that this is an important factor in addition to the type of e-print permitted.

SH: They already do. The main distinction is the author’s own institutional archive versus central (3rd-party) archives. It is the former that are the critical ones. The rest can be done by metadata harvesting.

The SHERPA colors do not make this distinction. Neither do the otherwise helpful notes. You must look at each specific agreement (if there is a link to it).

CB: Fourth, although copyright transfer agreements have always been a confusing mess, now we want authors to actually read and evaluate them, not just mindlessly sign them like they did when digital archiving wasn’t an issue. And institutional repository managers (or archive managers) need to make sense of them post facto to determine if articles can be legally deposited and what terms apply to those deposits. So, maybe it’s time to tilt at a new windmill: a set of standardized copyright transfer agreements. I know, it’s like trying to herd several thousand hyperactive cats. But, a few years ago, getting standardized use statistics for electronic resources from publishers seemed hopeless, and some progress has been made on that score.

SH: No, it’s not more windmills or red herrings that researchers, their institutions, their funders, and research itself need: What they need is to go ahead and self-archive.

Developing clear, understandable standard copyright transfer agreements is a red herring? Let’s look at just one aspect of the problem: IR managers’ copyright concerns. I offer some quotes:

"One aspect of the survey [baseline survey of research material already held on departmental and personal Web pages in the ed.ac.uk domain] that is not shown in the results is the lack of consistency in dealing with copyright and IPR issues that scholars face when placing material online. Some academic units have responded by not self-archiving any material at all. A rather worrying example of this is the School of Law (—do they know something that we don’t?) A small percentage of individual scholars have responded by using general disclaimers that may or may not be effective. Others, generally well-established professors, have posted material online that is arguably in breach of copyright agreements, e.g. whole book chapters. Most, however, take a middle line of only posting papers from sympathetic publishers who allow some form of self-archiving. It is apparent that if institutional repositories are going to work, then this general confusion over copyright and IPR issues needs to be addressed right at the source." [2]

"Filling a repository for published and peer-reviewed papers is a slow process, and it is clear that it is a task that requires a significant amount of staff input from those charged with developing the repository. Although we have succeeded in adding a reasonable amount of content to the repository we have also been offered significant amounts of content that cannot be added because of restrictive publisher copyright agreements. In some cases academics have offered between ten and twenty articles and we have not been able to add any of them to the repository. This is a clear demonstration that major changes need to take place at a high level in order for repositories to be successful." [3]

Certainly, all OA advocates are eager to get on with the business of doing OA vs. simply reflecting on it, and few have done as much as Stevan to advance the cause, but, in my view, the issues I’ve raised warrant further consideration and action.

Notes

1. Susan Gibbons, "Establishing an Institutional Repository," Library Technology Reports 40, no. 4 (2004): 54, 56.

2. Theo Andrew, "Trends in Self-Posting of Research Material Online by Academic Staff." Ariadne, no. 37 (2003),
http://www.ariadne.ac.uk/issue37/andrew/intro.html.

3. Morag Mackie, "Filling Institutional Repositories: Practical Strategies from the DAEDALUS Project," Ariadne, no. 39 (2004),
http://www.ariadne.ac.uk/issue39/mackie/intro.html.

One thought on “Not Green Enough”

  1. On Thu, 28 Apr 2005, Charles W. Bailey, Jr. wrote:


    “Not Green Enough”

    CB: “‘Departmental’ archives are not institutional
    repositories. They do not have an institutional scope of coverage,
    nor are they as likely as institutional archives to be permanent.”

    But that is all irrelevant. The immediate and urgent purpose of
    self-archiving is access-provision (to maximize research usage, impact and
    progress), by supplementing the limited access to the publisher’s official
    toll-access version for all would-be users who cannot afford the tolls. It
    is the publisher’s official version that has the preservation problem,
    not the author’s supplementary access version. (Having said that, the
    departmental archives have been doing a lot better job of providing
    immediate *and continuing* access to their research output than those who
    just keep fussing over preservation and permissions…)

    CB:
    “Institutional Archives Registry
    currently shows a total of 424
    archives… 192 “Research Institutional or Departmental” registered
    archives worldwide… let’s… say 100% of them are institutional
    repositories (IRs). Universities Worldwide (IAU, 1997)… lists 7,130
    universities in 181 countries… that means that about 6% of all
    universities have IRs. Meaning, of course, that 94% do not.”

    Good estimate. Now you are facing the problem: Far too few archives,
    and most of those that already exist, still not being filled. Total
    percentage of the planet’s annual 2.5 million peer-reviewed journal
    output being self-archived annually to date: about 15%:

    http://www.crsc.uqam.ca/lab/chawki/ch.htm

    http://citebase.eprints.org/isi_study/

    Now, having described the size of the problem accurately, this is
    where Charles (having already worried about permissions and
    preservation) takes us:

    CB: “And that means that 94% of authors at universities cannot
    self-archive in an institutional repository”

    The 94% of authors at archiveless universities are one $2000 linux server
    — plus a few days’ one-time sysad set-up time and a few annual sysad
    days’ maintenance time — away from having an institutional repository:

    http://www.arl.org/sparc/pubs/enews/aug01.html#6

    Meanwhile the 6% of authors at universities with archives are just waiting
    for their university administration to stop fussing about permissions and
    preservation and get around to policy-making:

    http://www.eprints.org/berlin3/ppts/02-AlmaSwan.ppt

    CB: “disciplinary archives and the Internet Archive’s universal
    repository solve these problems.”

    By all means, all authors impatient to self-archive now can deposit
    their papers in a central archive, if a suitable one exists. And those
    who are diffident about 3rd-party archiving can just download the free
    eprints software and create their own OAI-compliant archive. But only
    institutions can adopt a policy mandating the self-archiving of their
    own research output (though funders can and should help too: by
    mandating that their fundees self-archive their funded research output
    — in their own institutional archives; that will help encourage
    universities to set them up, and to adopt a policy for filling them).

    http://cogprints.org/4122/

    CB: “To be “no problem,” we would have to believe that it doesn’t
    matter if articles are archived in OAI-PMH compliant repositories or
    archives.”

    OAI-compliance is better than vanilla self-archiving, but any
    self-archiving is better than no self-archiving. And 85% of articles
    are not being self-archived any which way today, so why are we fussing
    about preservation, permissions and OAI-compliance, when the cupboard
    is bare?

    CB: “To be “no problem,” we would have to not care whether scholars
    who will never have an institutional repository at their disposal
    can self-archive.”

    The absence of archives is not the problem! Even existing archives
    are near-empty. The absence of institutional (and funder) self-archiving

    policy
    is the problem.

    CB: “As to the question of it being “cheap and easy for any university
    to create an OAI-compliant institutional archive,” I think there is
    some difference of opinion on that point.
    1. $285,000, MIT
    2. $100,000 (Canadian), Queens University (for staffing only)
    3. $200,000, University of Rochester
    4. between 2,280 and 3,190 staff hours, University of Oregon”

    Without putting too fine a point on it: There are differences of opinion
    about what institutional repositories are for. Those with expensive
    opinions have expensive (though not necessarily filled!) archives. The
    target content for the OA movement is the annual 2.5 million articles
    published in the planet’s 24,000 peer-reviewed journals. Any given
    research-active university might publish from 1000 – 10,000 of those
    annual articles. The server, set-up and maintenance costs are as I
    described them. One can do more, of course, but the only further
    thing that is *necessary* is a specific, targeted institutional
    self-archiving *policy* along the lines of:

    http://www.eprints.org/signup/sign.php

    http://software.eprints.org/handbook/departments.php

    CB: “I think that Stevan will find that few academic libraries
    are not going to worry about permanence. Not only will they
    worry about the permanence of digital objects in their
    repositories, they will also worry about the permanence of
    publisher’s archives.”

    Everyone should do what they are best at doing. Some are better at
    worrying about permanence, others are better at creating and filling
    archives with their target contents. There are librarians of both
    kinds. But it is a fact that the only ones in a position to provide
    the target content are authors, not librarians; and the only ones in
    the position to mandate that authors do so are their employers and
    funders, not their librarians.

    But I must point out again (and again) that the problem of the
    permanence of the publishers’ archives, containing the official version
    of the 2.5 million annual journal articles *has nothing whatsoever to
    do with OA or self-archiving*. Nor does the permanence of digital
    objects other than journal articles. So there’s a few things to get off
    worriers’ minds so they can devote their efforts and ingenuity to the
    real task, which is getting authors self-archiving, and (institutional)
    archives filled.

    CB: “universities are not going to establish institutional
    repositories just to support OA.”

    If they establish them for other reasons, with other expenses,
    that’s fine. But don’t blame those further expenses on OA; and
    don’t let them further retard self-archiving, which has long been
    100% feasible and is well overdue:

    “EPrints, DSpace or ESpace?”

    CB: “Libraries are also going to provide new services to provide
    IR support in addition to technical support, ranging from convincing
    faculty to self-archive and helping them do so to training users
    in using IRs (as well as other e-print services worldwide). These
    services will cost money.”

    If the money’s available, it’s welcome, and well-spent. But if it is
    not, then let that not be cited as a deterrent from creating a vanilla OA
    archive and adopting the all-important component: an institutional
    self-archiving policy.

    > > SH: “The main distinction is the author’s
    > > own institutional archive versus central (3rd-party)
    > > archives. It is the former that are the critical ones. The
    > > rest can be done by metadata harvesting.”

    CB: ” The SHERPA colors do not make this distinction. Neither do
    the otherwise helpful notes. You must look at each specific
    agreement (if there is a link to it).”

    Nor should they make that distinction. The only relevant datum is preprint
    green, postprint green, or gray. The default is to self-archive the
    final, accepted draft in the author’s own institutional archive. (The
    website/archive distinction is 100% bogus.) The publisher’s PDF is
    unnecessary, and should always be linked to (if a URL exists). That’s all.

    CB: “Developing clear, understandable standard copyright transfer
    agreements is a red herring? Let’s look at just one aspect
    of the problem: IR managers’ copyright concerns.”

    Let us hope that this brand-new portfolio — “IR managers” — either gets
    up to speed with what OA self-archiving is really about, and for, and how
    to go about it, or it (the portfolio!) rapidly goes extinct. Authors
    need to self-archive; their universities and funders need to mandate
    the deposit of the metadata and full-text; and the

    “Keystroke Policy”
    takes care of the rest.

    Stevan Harnad

Comments are closed.