The Economics of Free, Scholar-Produced E-Journals

While highly visible, large-scale STM open access publishing ventures such as BioMed Central loom large in the free e-journal scene, small-scale scholar-produced e-journals continue to quietly publish new scholarly articles as they have done for at least 18 years now.

I won’t detour into a lengthy history lesson for those readers who weren’t there. The short version of the story is that New Horizons in Adult Education is typically seen as the first scholarly e-journal published on the Internet (it was established in Fall 1987); however, it’s important to recognize that those were primitive times Internet-wise, when distribution of ASCII article files via list servers and FTP servers were cutting-edge ventures. So, as you would image, finding tools were informal and few and far between. ARL’s publication of the Directory of Electronic Journals, Newsletters, and Academic Discussion Lists in July 1991 was a landmark event that made the invisible visible.

For some reason, there was a mini-surge of activity in the 1989-1991 period, with the emergence of the Bryn Mawr Classical Review, EJournal, Electronic Journal of Communication, Journal of the International Academy of Hospitality Research, Postmodern Culture, Psycoloquy, The Public-Access Computer Systems Review, Surfaces, and other journals. Several editors (myself, Stevan Harnad, and John Unsworth) rocked the house at the Association of Research Libraries’ 1992 Symposium on Scholarly Publishing on the Electronic Networks to the dismay of the assembled conventional publishers, who thought we were mad as hatters because we thought that: (a) e-journals were viable, (b) we could anoint ourselves as publishers, and (c) we were giving it away for free. My recollection is that, after the last speech, there was a stunned silence followed by a spattering of applause and a frenzy of generally hostile, astonished questions.

And, as they say, the rest is history. Peter Suber’s Timeline of the Open Access Movement is a good way to get a handle on subsequent events. Someday, I’ll write more about the early e-volution of e-journals.

So, onto the topic at hand. What are the economics of free, scholar-produced e-journals?

Let’s delimit the field a bit. We are not talking about journals produced by university presses or professional associations. Scholar-produced e-journals are generally labors of love, supported by a small group of scholars who serve without pay as editors, editorial board members, and journal production staff.

They often leverage existing technical infrastructure (e.g., Web servers) at the editors’ institutions. The volume of published papers is typically fairly modest, and the papers themselves are frequently not graphically complex. Editors or other volunteers manage the peer review process (usually via electronic means) as well as copy edit and format articles. HTML and PDF are the usual distribution formats, requiring HTML editors, Word, Acrobat, or similar low-cost or free programs. Increasingly, electronic journal management systems are used to automate editorial functions and simplify journal site creation and maintenance (a prime example is the free Open Journal Systems software). "Marketing" is often done by free electronic means: journal mailing lists, table of contents messages sent to targeted subject-related mailing lists, RSS alerts, etc. Since the content is free and electronic, there is no overhead for subscription/licensing management. Since no one gets paid, human resources functions are not needed. If authors retain copyright or content is under a Creative Commons or similar license, no permissions support is needed. Since existing facilities are used (at work or at home), there is no need to rent or purchase office space. Since no money is changing hands in any form, accounting support is unnecessary.

So, what are the economics of free, scholar produced journals? The glib answer is that there are none. But, the real answer is that the costs are so low and the functions so integral to scholarship that they are easily absorbed into ongoing operational costs of universities. Even if they weren’t and scholars had to do it all on their own, server hosting solutions are so ubiquitous and cheap, free open source software is so functional and pervasive, and commercial PC software is so powerful and cheap (especially at academic discounts) that these minor costs would act as no real barrier to the production of scholar-produced e-journals.

Of course, this is not to say that there are not issues associated with the viability and sustainability of these journals, the perpetual preservation of their contents, and other difficulties, but these are topics for another day.

One-Page Open Access Resources Handout

Need a very short (one-page) handout that identifies a few key open access resources? My OA co-presenter (Sara Ranger) and I did, so we created one. It’s at:

http://www.escholarlypub.com/cwb/OAHandout.pdf

It’s available under a Creative Commons Attribution-NonCommercial License.

Obviously, a number of very valuable resources had to be omitted, but, hopefully, users can employ these core resources to discover them.

BMC’s Impact Factors: Elsevier’s Take and Reactions to It

A growing body of research suggests that open access may increase the impact of scholarly literature (see Steve Hitchcock’s "Effect of Open Access and Downloads ("Hits") on Citation Impact: A Bibliography of Studies"). Consequently, "impact factors" play an important part in the ongoing dialog about the desirability of the open access model.

On June 23, 2005, BioMed Central issued a press release entitled "Open Access Journals Get Impressive Impact Factors" that discussed the impact factors for their journals. You can consult the press release for the details, but the essence of it was expressed in this quote from Matthew Cockerill, Director of Operations at BioMed Central:

These latest impact factors show that BioMed Central’s Open Access journals have joined the mainstream of science publishing, and can compete with traditional journals on their own terms. The impact factors also demonstrate one of the key benefits that Open Access offers authors: high visibility and, as a result, a high rate of citation.

On July 8, 2005, Tony McSean, Director of Library Relations for Elsevier, sent an e-mail message to SPARC-OAForum@arl.org "(OA and Impressive Impact Factors—Non Propter Hoc") that presented Elsevier’s analysis of the BMC data, putting it "into context with those of the major subscription-based publishers." Again, I would encourage you to read this analysis. The gist of the argument is as follows:

This comparison with four major STM publishers demonstrates that BMC’s overall IF results are unremarkable, and that they certainly do not provide evidence to support the common assertion that the open access publishing model increases impact factor scores.

My reaction was as follows.

These interesting observations do not appear to account for one difference between BMC journals and the journals of other publishers: their age. Well-established, older journals are more likely to have attained the credibility required for high IFs than newer ones (if they ever will attain such credibility).

Moreover, there is another difference: BMC journals are primarily e-journals, not print journals with derivative electronic counterparts. Although true e-journals have gained significant ground, I suspect that they still start out with a steeper hill to climb credibility-wise than traditional print journals.

Third, since it involves paying a fee, the author-pays model requires a higher motivation on the part of the author to publish in such journals, likely leading to a smaller pool of potential authors. To obtain high journal IFs, these had better be good authors. And, for good authors to publish in such journals, they must hold them in high regard because they have other alternatives.

So, if this analysis is correct, for BMC journals to have attained "unremarkable" IFs is a notable accomplishment because they have attained parity with conventional journals that have some significant advantages.

Earlier in the day, Dr. David Goodman, Associate Professor of the Palmer School of Library and Information Science, commented (unbeknownst to me since I read the list in digest form):

1/ I doubt anyone is contending that at this point any of the
BMC titles are better than the best titles from other publishers. The point is that they are at least as good as the average, and the best of them well above average. For a new publisher, that is a major accomplishment—and one that initially seemed rather doubtful. . . .

2/ Normally, publishing in a relative obscure and newly founded journal would come at some disadvantage to the author, regardless of how the journal was financed. . . .

3/ You can’t judge OA advantage from IF alone. IF refers to journals, OA advantage refers to individual articles. The most convincing studies on OA advantage are those with paired comparisons of articles, as Stevan Harnad has explained in detail.

4/ Most of the BMC titles, the ones beginning with the BMC journal of…, are OA completely. For the ones with Toll Access reviews etc., there is obviously much less availability of those portions than the OA primary research, so I doubt the usual review journal effect applies to the same extent as usual.

On July 9, 2005, Matt Cockerill sent a rebuttal to the SPARC-OAForum that said in part:

Firstly, the statistics you give are based on the set of journals that have ISI impact factors (in fact, they cover only journals which had 2003 Impact Factors). . . . Many of BioMed Central’s best journals are not yet tracked by ISI.

Secondly, comparing the percentage of Impact Factors going up or down does not seem a particularly meaningful metric. What is important, surely, is the actual value of the Impact Factor (relative to others in the field). In that regard, BioMed Central titles have done extremely well, and several are close to the top of their disciplines. . . .

Thirdly, you raise the point that review articles can boost a journal’s Impact Factor, and that many journals publish review articles specifically with the intention of improving their Impact Factor. This is certainly true, but of BioMed Central’s 130+ journals, all but six are online research journals, and publish virtually no review articles whatsoever. . . .

No reply yet from Elsevier, but, whether there is or not, I’m sure that we have not heard the last of the "impact factor" argument.

Stevan Harnad has made it clear that what he calls the "journal-affordability problem" is not the focus of open access (this is perhaps best expressed in Harnad et al.’s "The Access/Impact Problem and the Green and Gold Roads to Open Access"). The real issue is the "research article access/impact problem":

Merely to do the research and then put your findings in a desk drawer is no better than not doing the research at all. Researchers must submit their research to peer review and then "publish or perish," so others can use and apply their findings. But getting findings peer-reviewed and published is not enough either. Other researchers must find the findings useful, as proved by their actually using and citing them. And to be able to use and cite them, they must first be able to access them. That is the research article access/impact problem.

To see that the journal-affordability problem and the article access/impact problem are not the same one need only note that even if all 24,000 peer-reviewed research journals were sold to universities at cost (i.e., with not a penny of profit) it would still be true that almost no university has anywhere near enough money to afford all or even most of the 24,000 journals, even at minimal access-tolls (http://fisher.lib.virginia.edu/cgi-local/arlbin/arl.cgi?task=setuprank). Hence, it would remain true even then that not all would-be users could access all of the yearly 2.5 million articles, and hence that that potential research impact would continue to be lost.

So although the two problems are connected (lower journal prices would indeed generate somewhat more access), solving the journal-affordability problem does not solve the research access/impact problem.

Of course, there are different views of open access, but, for the moment, let’s say that this view is the prevailing one and that this is the most compelling argument to win the hearts and minds of scholars for open access. Open access will rise or fall based on its demonstrated ability to significantly boost impact factors, and the battle to prove or disprove this effect will be fierce indeed.

Open Access News Update

From June 24, 2005 to June 30, 2005, Open Access News was down, and I posted Peter Suber’s e-mail updates here. OAN is now up, and Peter has updated it with the missing postings. My updates have been deleted from this posting.

Links to the OAN messages in question are below.

June 30 posting (2 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2063.html

June 30 posting (7 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2062.html

June 29 posting (1 item)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2061.html

June 29 posting (5 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2060.html

June 28 posting (4 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2059.html

June 28 posting (2 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2056.html

June 27 posting (2 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2055.html

June 27 posting (6 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2054.html

June 26 posting (5 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2053.html

June 25 posting (11 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2051.html

June 24 posting (2 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2048.html

June 24 posting (7 items)
https://mx2.arl.org/Lists/SPARC-OAForum/Message/2043.html

Key Open Access Concepts

An excerpt from the Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (OAB) that provides a brief overview of OA concepts is now available in HTML-tagged format. Additional links have been added, and old links checked and updated. As part of the OAB, it is under a Creative Commons Attribution-NonCommercial License.

http://www.escholarlypub.com/oab/keyoaconcepts.htm

Will You Only Harvest Some?

The Digital Library for Information Science and Technology has announced DL-Harvest, an OAI-PMH service provider that harvests and makes searchable metadata about information science materials from the following archives and repositories:

  • ALIA e-prints
  • arXiv
  • Caltech Library System Papers and Publications
  • DLIST
  • Documentation Research and Training Centre
  • DSpace at UNC SILS
  • E-LIS
  • Metadata of LIS Journals
  • OCLC Research Publications
  • OpenMED@NIC
  • WWW Conferences Archive

DL-Harvest is a much needed, innovative discipline-based search service. Big kudos to all involved.

DLIST also just announced the formation of an advisory board.

The following musings, inspired by the DL-Harvest announcement, are not intended to detract from the fine work that DLIST is doing or from the very welcome addition of DL-Harvest to their service offerings.

Discipline-focused metadata can be relatively easily harvested from OAI-PHM-compliant systems that are organized along disciplinary lines (e.g., the entire archive/repository is discipline-based or an organized subset is discipline-based). No doubt these are very rich, primary veins of discipline-specific information, but how about the smaller veins and nuggets that are hard to identify and harvest because they are in systems or subsets that focus on another discipline?

Here’s an example. An economist, who is not part of a research center or other group that might have its own archive, writes extensively about the economics of the scholarly publishing business. This individual’s papers end up in the economics department section of his or her institutional repository and in EconWPA. They are highly relevant to librarians and information scientists, but will their metadata records be harvested for use in services like DL-Harvest using OAI-PMH since they are in the wrong conceptual bins (e.g., set in the case of the IR)?

Coleman et al. point to one solution in their intriguing "Integration of Non-OAI Resources for Federated Searching in DLIST, an Eprints Repository" paper. But (lots of hand waving here), if using automatic metadata extraction was an easy and simple way to supplement conventional OAI-PMH harvesting, the bottom line question is: how good is good enough? In other words, what’s an acceptable level of accuracy for the automatic metadata extraction? (I won’t even bring up the dreaded "controlled vocabulary" notion.)

No doubt this problem falls under the 80/20 Rule, and the 20 is most likely in the low hanging fruit OAI-PMH-wise, but wouldn’t it be nice to have more fruit?

Joint Institutional Repository Evaluation Project

The Johns Hopkins University Digital Knowledge Center in conjunction with MIT and the University of Virginia are working on a Mellon Foundation-funded "A Technology Analysis of Repositories and Services" project to: "conduct an architecture and technology evaluation of repository software and services such as e-learning, e-publishing, and digital preservation. The result will be a set of best practices and recommendations that will inform the development of repositories, services, and appropriate interfaces."

The grant proposal and a presentation given at the CNI Spring 2005 Task Force Meeting provide further details about the project.

Is the Access Spectrum a Red Herring or Are Green and Gold Too Black and White?

Stevan Harnad has commented extensively on my "The Spectrum of E-Journal Access Policies: Open to Restricted Access" DigitalKoans posting. Thanks for doing so, Stevan. Here are my thoughts on your comments.

First, let me concede that if you look at this question from Stevan’s particular open-access-centric point of view that, of course, the spectrum of publisher access policies is a complete and utter waste of time. I don’t recall suggesting that this was a new open access model per se, even though it includes open access in it as a component and it makes some further distinctions between open access and free access journals. Rather, it is what it says it is: a model that presents a range of publisher access policies from the least restrictive to the most restrictive. The color codes merely enhance the model slightly, they are not central to it (and, of course, as Steven says, he created this color coding Frankenstein to begin with). The model says nothing about e-prints.

That said, Steven’s view that open access equals free access (period) is not, as he well knows, universal, and his green and gold models are based on this premise.

Here is how Peter Suber defines OA in "Open Access Overview: Focusing on Open Access to Peer-Reviewed Research Articles and Their Preprints" (boldface is mine):

  • OA should be immediate, rather than delayed, and OA should apply to the full-text, not just to abstracts or summaries.
  • OA removes price barriers (subscriptions, licensing fees, pay-per-view fees) and permission barriers (most copyright and licensing restrictions).
  • There is some flexibility about which permission barriers to remove. For example, some OA providers permit commercial re-use and some do not. Some permit derivative works and some do not. But all of the major public definitions of OA agree that merely removing price barriers, or limiting permissible uses to "fair use" ("fair dealing" in the UK), is not enough.
  • Here’s how the Budapest Open Access Initiative put it: "There are many degrees and kinds of wider and easier access to this literature. By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited."
  • Here’s how the Bethesda and Berlin statements put it: "For a work to be OA, the copyright holder must consent in advance to let users ‘copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship….’"
  • The Budapest (February 2002), Bethesda (June 2003), and Berlin (October 2003) definitions of "open access" are the most central and influential for the OA movement. Sometimes I call refer to them collectively, or to their common ground, as the BBB definition.

So, by most OA definitions, a journal that "makes all of its articles immediately and permanently accessible to all would-be users webwide toll-free" is not OA unless it also uses a Creative Commons or similar license that permits use with minimal restrictions. It is FA (Free Access). As I have said in an earlier dialog, we can count on no journal to be "permanently accessible" unless some trusted archive other than the publisher makes it so, an issue that Steven apparently disagrees with, believing that publishers never go out of business.

I note that Steven has deviated from his "chrononomic parsimony" principle by having both "Green" and "Pale-Green," in his model and then lumping them both together in his discussions as "GREEN." (In his Summary Statistics So Far site he also introduces the color Grey, for "neither yet.") If preprints and postprints are of equal value, why not just code them Green? If they are not of equal value (i.e., postprints that accurately incorporate the changes that occur during the peer-review process are the only real substitute for the published article), then, in reality, those 15.5% of "Pale-Green" journals are of limited value in terms of self-archiving, and the real GREEN journal number is 76.2%, not 92%.

I must admit to some confusion on his latest stand that all types of self-archiving are equal. In "Ten Years After," he seems to be expressing a different sentiment regarding author home pages:

That said, there was a naive element to the Subversive Proposal, too, since Harnad’s plan would have led to researchers posting their papers on thousands of isolated FTP sites. This would have meant that anyone wanting to access the papers would have needed prior knowledge of the papers’ existence and the whereabouts of every relevant archive. They would then have had to search each archive separately. Today, Harnad concedes that "anonymous FTP sites and arbitrary Web sites are more like common graves, insofar as searching is concerned."

Perhaps I misunderstand what is meant by "arbitrary Web sites."

As the prior DigitalKoans dialog beginning with "How Green Is My Publisher?" shows, we clearly disagree on many points related to the importance of author copyright agreements (e.g., they have to permit deposit in disciplinary archives), the importance of deposit in OAI-PMH-compliant archives, and the mission and scope of institutional repositories.

A series of DigitalKoans postings that start with "The View from the IR Trenches, Part 1" provides numerous quotes from the literature that bolster my case.

Second, while I admire Stevan’s unflagging advocacy of open access (by which he really means free access), open access is not the only issue in the e-journal publishing world that is of concern to librarians to whom this missive was mainly addressed. This is because librarians, while hopefully working to build a better future, have to deal with the messy existing realities of the e-publishing environment to do their jobs and to make decisions about how to allocate scarce resources. Consequently, librarians have to scan the e-publishing environment, analyze it, categorize it, and make evaluative judgements about it. They have to make models of e-publishing reality to better understand it. They don’t have the luxury of only dreaming about what that reality should be.

Thus, while Steven is indifferent to many of those 894,302 free full-text articles from 857 HighWire-hosted journals (a number which likely dwarfs all articles available from OA/free journals), librarians are not. Paying attention to them is important. While many are not immediately free, they are free nonetheless after some embargo period. And EA (Embargoed Access) journals are better than RA (Restricted Access) journals in practical terms for users who have no other current access. And even limited access to more restrictive PA (Partial Access) journals is likely to be welcomed by users who today would have no access otherwise. I know that both kinds of access are welcomed by me as a user.

This is not to say that we shouldn’t strive for journals to move up the spectrum from red to green, but it is to say that: (1) some free access is better than no free access for journals that will never move further up the spectrum, and (2) it may be that some journals have to move step-by-step, not in one leap, for the change to take place, and, if they start higher, it may be easier to encourage them to move further and faster. (But we have to know which ones have this potential based on their current status.)

Steven’s model has colors, but, in reality, each color is black and white: Gold and nothing, GREEN and grey. All or nothing. And, as long as you accept his premises, it works, and it allows him to focus on his free-access goal with single minded determination, undistracted by the knotted complexities of the e-scholarly publishing environment. Long may he run.

For those who have a different view of OA or who have broader concerns, it’s too "black and white."

I give him the last word on this matter.

The Spectrum of E-Journal Access Policies: Open to Restricted Access

As journal publishing continues to evolve, the access policies of publishers become more differentiated. The open access movement has been an important catalyst for change in this regard, prodding publishers to reexamine their access policies and, in some cases, to move towards new access models.

To fully understand where things stand with journal access policies, we need to clarify and name the policies in use. While the below list may not be comprehensive, it attempts to provide a first-cut model for key journal access policies, adopting the now popular use of colors as a second form of shorthand for identifying the policy types.

  1. Open Access journals (OA journals, color code: green): These journals provide free access to all articles and utilize a form of licensing that puts minimal restrictions on the use of articles, such as the Creative Commons Attribution License. Example: Biomedical Digital Libraries.
  2. Free Access journals (FA journals, color code: cyan): These journals provide free access to all articles and utilize a variety of copyright statements (e.g., the journal copyright statement may grant liberal educational copying provisions), but they do not use a Creative Commons Attribution License or similar license. Example: The Public-Access Computer Systems Review.
  3. Embargoed Access journals (EA journals, color code: yellow): These journals provide free access to all articles after a specified embargo period and typically utilize conventional copyright statements. Example: Learned Publishing.
  4. Partial Access journals (PA journals, color code: orange): These journals provide free access to selected articles and typically utilize conventional copyright statements. Example: College & Research Libraries.
  5. Restricted Access journals (RA journals, color code: red): These journals provide no free access to articles and typically utilize conventional copyright statements. Example: Library Administration and Management. (Available in electronic form from Library Literature & Information Science Full Text and other databases.)

Using this taxonomy, an examination of the contents of the Directory of Open Access Journals quickly reveals that, in reality, it is the Directory of Open and Free Access Journals, because many listed journals do not use a Creative Commons Attribution License or similar license.

Some may argue that the distinction between OA and FA journals is meaningless; however, to do so suggests that the below sections of the "Budapest Open Access Initiative" in italics are meaningless and, consequently, that the Open Access movement is really just the Free Access movement.

By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

Not that there would be anything wrong with the Free Access movement, but some may feel that the broader scope of the Open Access movement is much more desirable.

In any case, the journal universe is not just green or red, and it’s a pity that we don’t know the breakdown of the spectrum (e.g., x number of green journals and y number of cyan journals), for that would give us a better handle on how the world has changed from the days when all journals were red journals.

Institutional Repository Overviews: A Brief Bibliography

You want a good introduction to institutional repositories. What should you read? Try one or more of the works below. For a quick overview, try Drake, Johnson, or Lynch. For more detail, try Crow or Ware. For an in-depth, library-oriented overview, Gibbons can’t be beat.

Crow, Raym. The Case for Institutional Repositories: A SPARC Position Paper. Washington, DC: The Scholarly Publishing and Academic Resources Coalition, 2002.

Drake, Miriam A. "Institutional Repositories: Hidden Treasures." Searcher 12, no. 5 (2004): 41-45.

Gibbons, Susan. "Establishing an Institutional Repository." Library Technology Reports 40, no. 4 (2004). (Available on Academic Search Premier.)

Johnson, Richard K. "Institutional Repositories: Partnering with Faculty to Enhance Scholarly Communication." D-Lib Magazine 8 (November 2002).

Lynch, Clifford A. "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age." ARL: A Bimonthly Report on Research Library Issues and Actions from ARL, CNI, and SPARC, no. 226 (2003): 1-7.

Ware, Mark. Pathfinder Research on Web-based Repositories. London: Publisher and Library/Learning Solutions, 2004.

The View from the IR Trenches, Part 4

Today, we’ll look at an article that describes the results of a one-year study at the University of Rochester, River Campus Libraries to "understand the current work practices of faculty in different disciplines in order to see how an IR might naturally support existing ways of work."

Foster, Nancy Fried, and Susan Gibbons. "Understanding Faculty to Improve Content Recruitment for Institutional Repositories." D-Lib Magazine 11, no. 1 (2005).
http://www.dlib.org/dlib/january05/foster/01foster.html

Selected quotes from the article are below; the headings are mine. Caveat emptor: selected quotes are just that. It’s always a good idea to read the full paper. I would hope that these brief quotes entice you to do so.

Faculty Needs

The people we interviewed want most to be able to. . .

  • Work with co-authors
  • Keep track of different versions of the same document
  • Work from different computers and locations, both Mac and PC
  • Make their own work available to others
  • Have easy access to other people’s work
  • Keep up in their fields
  • Organize their materials according to their own scheme
  • Control ownership, security, and access
  • Ensure that documents are persistently viewable or usable
  • Have someone else take responsibility for servers and digital tools
  • Be sure not to violate copyright issues
  • Keep everything related to computers easy and flawless
  • Reduce chaos or at least not add to it
  • Not be any busier

Using Standard IR Terminology Doesn’t Work

Accordingly, when we tried to recruit content using typical IR promotional language, faculty members and researchers did not respond enthusiastically. This is because they did not perceive the relevance of almost any of the IR features as stated in the terms used by librarians, archivists, computer programmers, and others who were setting up and running the IR for the institution. One reason faculty have not rushed to put their work into IRs, therefore, is that they do not recognize its benefits to them in their own terms.

Another reason that faculty have expressed little interest in IRs is related to the way the IR is named and organized. The term ‘institutional repository’ implies that the system is designed to support and achieve the needs and goals of the institution, not necessarily those of the individual. Moreover, it suggests that contributions of materials into the repository serve to highlight the achievements of the institution, rather than those of individual researchers and authors. . . .

Faculty Are Most Interested in Communicating with Colleagues Worldwide

When it comes to research, a faculty member’s strongest ties are usually with a small circle of colleagues from around the world who share an interest in the same field of research, such as plasma astrophysics or contemporary European critical thought. It is with these colleagues, many of them at other institutions, that researchers most want to communicate and share their work. But most organizations have mapped their IR communities to their academic departments rather than to the subtle, shifting communities of scholars engaged in interrelated research projects. . . . In the absence of a strong connection that would naturally bring these documents together into a collection that other scholars would look for, find, and use, there is no compelling reason for the authors to make the submission.

One-on-One Librarian-Faculty Sessions Are Best Way to Interest Faculty

Rather than approach faculty with a set, one-size-fits-all promotional spiel, these library liaisons operate under the guidance that a personalized, tailored approach works best. As we learned from the work-practice study, what faculty members care most about is their research. . . . Throughout the conversation, the library liaison is listening for opportunities to demonstrate how the benefits of the IR respond directly to the faculty member’s web-related research needs. . . .

IR Benefits Must Be Stated in Terms That Faculty Relate To

By contrast to the language previously used to describe the features and benefits of the IR, we are now describing the IR in language drawn from faculty interviews. Thus, we tell faculty that the IR will enable them to. . .

  • Make their own work easily accessible to others on the web through Google searches and searches within the IR itself
  • Preserve digital items far into the future, safe from loss or damage
  • Give out links to their work so that they do not have to spend time finding files and sending them out as email attachments
  • Maintain ownership of their own work and control who sees it
  • Not have to maintain a server
  • Not have to do anything complicated

Scholarly Communication Web Sites at ARL Libraries

The Association of Research Libraries (ARL) currently has 123 member libraries in the US and Canada. Below is a list of scholarly communication web sites at ARL libraries. This list was complied by a quick examination of ARL libraries’ home pages, supplemented by some Google searching. It’s not comprehensive, and I would welcome additions.

More on OhioLINK’s Digital Resource Commons

David F. Kohl has self-archived a PowerPoint presentation about the DRC at E-LIS. It’s called "Cooperating Beyond the ‘Buying Club’: Digital Resource Commons (DRC): Making the Impossible Possible in Ohio."

To quote from the abstract:

Each institution can ‘brand’ itself in the system and may host a discrete and customized interface to all of its content. To the end user it will appear as an institutional resource as if it were hosted on your own servers. There will also be a collective OhioLINK level branding and ability for searches to retrieve across the institutional collections. . . . You will have complete control of your own content and how it is accessed. Multi-tiered security levels will allow your content to be shared only to the extent desired. . . .

Alternatively content can be restricted to an individual department, to an institution, or to the OhioLINK membership. Each institution can set its own policies governing the content in its repositories. Likewise custom workflows can be established to make the most of the personnel involved in each project and expedite the content creation and capture process. The service will include robust and flexible cataloging tools to aid in the creation of records that can be searched and browsed effectively by all types of users. Catalog records can be exported in international standard XML formats such as the Open Archives Initiative Protocol for Metadata Harvesting. Through OhioLINK’s unique collaboration with the Ohio Supercomputer Center your content is stored on enterprise class servers and storage networks.. . . A huge storage area network allows virtually unlimited storage space on our disks. . . . Programming or system administration skills and experience are not required. The system is flexible and adaptable and provides services superior to ‘DSpace’ and ‘ContentDM’ without the associated costs.

OhioLINK’s Digital Resource Commons

Peter Murray, Assistant Director of Multimedia Systems at OhioLINK recently posted a job announcement on LITA-L (I’d link, but given the way ALA safeguards access to its lists, it’s simply impossible) that brought to my attention a bold OhioLink project called the Digital Resource Commons, which is part of an even bolder project called the Ohio Digital Commons for Education. The quote from the job ad below describes the Digital Resource Commons. An earlier part of the ad indicates that Fedora will be used as the DRC’s platform.

OhioLINK’s Digital Resource Commons (DRC) is an Ohio Board of Regents-funded project to create a federated repository service that ingests, preserves, presents, and mediates administration of the educational and research materials of participating institutions. With the capability to store and deliver a virtually unlimited variety of digital file types and formats (including text, data sets, image, audio, video, streaming video, multimedia presentations, animations, etc.) the DRC is positioned to capture digital content from student and faculty researchers as it is produced and return it to users of the DRC upon request. The DRC offers wide and flexible control to member institutions and the communities within institution to define how content is added, preserved, and displayed to repository users. With federated community administration features, lead contacts at member institutions can create communities and delegate up to a complete subset of their privileges within the system to the editors/moderators of those new communities. The ability to scope and brand content to a particular community and institution is offered while retaining the ability to search for content across the entire repository. As both an Open Archives Initiative Data Provider and Service Provider, the DRC is positioned to become the premier point for the discovery of knowledge by and about Ohio’s scholars. In conjunction with the other parts of the Ohio Board of Regents grant funding, the DRC is one piece of a larger effort to build the Ohio Digital Commons for Education—a powerful vision for the future of learning and research in the state of Ohio.

The quote below from the DRC Web site describes the Ohio Digital Commons for Education.

The Digital Resource Commons is one of three projects funded by an Ohio Board of Regents Technology Initiatives grant collectively called the Ohio Digital Commons for Education (ODCE). The three components—this resource repository, the state-wide licensing and development of course management systems (WebCT and Blackboard), and a common access control mechanism (Shibboleth)—combine to offer a powerful vision for learning and research for the state of Ohio.

Impressive. As Daniel Hudson Burnham said: "Make no little plans; they have no magic to stir men’s blood and probably themselves will not be realized."

The View from the IR Trenches, Part 3

Today, we’ll look at an article that provides a UK academic library’s view of its institutional repository responsibilities:

Nixon, William J. "The Evolution of an Institutional E-Prints Archive at the University Of Glasgow." Ariadne, no. 32 (2002).
http://www.ariadne.ac.uk/issue32/eprint-archives/

Selected quotes from the article are below; the headings are mine. Caveat emptor: selected quotes are just that. It’s always a good idea to read the full paper. I would hope that these brief quotes entice you to do so.

Library IR Roles

(The below quotes are from a summary list of library roles in the article.)

IR Advocate

Encouraging members of the University to deposit material into the ePrints archives. At Glasgow we have started an Advocacy campaign to demonstrating that this service has a broader context beyond Glasgow . . . A recent event to raise awareness about the issues of Scholarly Communication provided us with an opportunity to launch our e-prints service and to raise its profile

Copyright Advisory Service

Providing advice to members of the University about copyright and journal embargo policies for material which they would like to deposit in our archive, and as appropriate liaising directly with the Journal in question. This will become a pivotal role in the acceptance of our e-prints service since copyright is the number one question which members of the University ask about

Digitization Service

Converting material to a suitable format such as HTML or PDF for import into the archive. It may also be necessary to ensure that HTML which is submitted is properly formatted and cross-browser compatible

Deposit Service

Depositing material directly on behalf of members of the University who do not, or cannot self-archive their material. In instances in which we have deposited papers on behalf of individuals, we have created a new account for them and used that to submit their content. . . .

Metadata Review and Creation Service

Reviewing the metadata of content which has been self-archived to maintain the quality of the record and to add any additional subject headings and keywords as appropriate.

The View from the IR Trenches, Part 2

Today, we’ll look at an article about the challenges involved in populating an institutional repository:

Mackie, Morag. "Filling Institutional Repositories: Practical Strategies from the DAEDALUS Project." Ariadne, no. 39 (2004).
http://www.ariadne.ac.uk/issue39/mackie/

The DAEDALUS Project is at the University of Glasgow. This article is an especially interesting case study, and it details a number of useful, imaginative strategies for populating an IR.

Selected quotes from the article are below; the headings are mine. Caveat emptor: selected quotes are just that. It’s always a good idea to read the full paper. I would hope that these brief quotes entice you to do so.

Faculty Do Not Want to Deposit Works Themselves

Despite a generally encouraging response, this did not translate into real content being deposited in the repository. . . . We found that it was difficult to get staff to give or send us electronic copies of their papers, even when they had promised to do so. This was our first indication that while staff may be sympathetic many of them do not have the time or the inclination to contribute. They were happy to give us permission to do the work on their behalf, but could not commit to doing the work themselves. Clearly the advantages of institutional repositories were not yet sufficiently convincing to academics to persuade them to play an active part in the process.

Determining Which Articles Can be Legally Deposited Is Difficult and Time Consuming

[T]he majority of academics we contacted were happy for us to establish which of their publications could be added to the repository.

While an extremely useful resource and one that is growing all the time, the [SHERPA] list does not cover all publishers. . . . it has been necessary to track down policies from publishers’ Web sites, or to contact publishers directly where these do not exist or where they do not address the issue of whether an author is permitted to make his or her paper available in a repository. No two publisher polices are exactly the same, and many do not explicitly state what rights authors have in relation to repositories. . . . Interpreting publisher copyright policies is also a difficult area, particularly as there is no real precedent and no case law.

Where copyright policies did not exist or where they were unclear, we contacted the publishers directly and asked for permission. . . . Although some publishers reply quickly, others may take some weeks and some do not reply at all. We found that publishers were more likely to give permission for specific papers to be added than to outline their general policy on the issue. Consequently permissions for most articles have to be established on a case-by-case basis.

It Is Challenging to Identify Possible Depositors Using Open Access Journals

It would be useful to be able to identify additional content in other open access journals, but so far we have not found an easy way of doing this. The Directory of Open Access Journals. . . is very useful, but it does not enable searching by institution or author affiliation.

For IRs to Be Filled, Deposit May Need to be Mandated

Although we have succeeded in adding a reasonable amount of content to the repository we have also been offered significant amounts of content that cannot be added because of restrictive publisher copyright agreements. . . . This is a clear demonstration that major changes need to take place at a high level in order for repositories to be successful. Although some academics have taken the decision to try and avoid publishing in the journals of publishers with restrictive policies, this is still relatively rare. We can inform staff about the issues, but we cannot and should not dictate in which journals they publish. Change is only likely to happen if staff are required, either by the funding councils or by their institution, to make their publications available either by publishing in open access journals or in journals that permit deposit in a repository.

The View from the IR Trenches, Part 1

It may be helpful in understanding IRs to to examine some of the articles mentioned in yesterday’s "Early Adopters of IRs: A Brief Bibliography" posting in more detail.

Today, we’ll look at:

Andrew, Theo. "Trends in Self-Posting of Research Material Online by Academic Staff." Ariadne, no. 37 (2003).
http://www.ariadne.ac.uk/issue37/andrew/

This paper presents findings from "a baseline survey of research material already held on departmental and personal Web pages in the ed.ac.uk domain" (this is the University of Edinburgh’s domain).

Selected quotes from the article are below; the headings are mine. Caveat emptor: selected quotes are just that. It’s always a good idea to read the full paper. I would hope that these brief quotes entice you to do so.

Self-Archiving Disciplinary Differences Matter

As expected, there is a clear difference between academic areas. The average percentage of self-archiving scholars in each College supports this view. Within the College of Science and Engineering (S&E) this figure is 14.81%, which drops to 3.18% within Humanities and Social Science (HSS) and 0.32% within Medicine and Veterinary Medicine (MVM).

However, the situation is more complex than a simple trend of self-archiving being better established in S&E. Looking at the averages between Schools shows that even within Colleges there is a wide distribution of values. In S&E this ranges from 32.67% in Informatics to 6.99% in Engineering and Electronics. . . and in HSS from 12.70% in Philosophy, Psychology and Language Sciences to 0% in Divinity and Law . . . .

Even within individual Schools there is a noticeable change in self-archiving attitudes. For example, self-archiving percentages within the School of GeoScience range from 29.41% in Meteorology down to 0% in Geography. . . .

Disciplinary Archives May Not Be Generally Trusted

Considering the wide-ranging self-archiving trends between academic Colleges and even within Schools, it seems there is a direct correlation between willingness to self-archive and the existence of subject-based repositories. . . . because the ArXiv has become so successful . . . academics trust it as their ‘natural’ repository for self-archived material. The same degree of trust may not yet obtain in the case of the subject repositories mentioned above, which leads to additional self-archiving in home institution repositories. . . . where there is a pre-existing culture of self-archiving eprints in subject repositories, scholars are more likely to post research material on their own Web pages, until such time as those subject repositories become trusted for their comprehensiveness and persistence.

Low Number of Preprints Found on Personal Web Pages

A surprising finding from the baseline survey is the relatively low volume of preprints found on personal Web pages. This could be related to the success of eprint repositories. . . . Preprints do not have anywhere near the same impact factor as those papers from accredited journal titles, so it is possible that researchers would favour only putting their most impressive work in their online CV.

Scholars Are Confused by Copyright Agreements

One aspect of the survey that is not shown in the results is the lack of consistency in dealing with copyright and IPR issues that scholars face when placing material online. Some academic units have responded by not self-archiving any material at all. . . . A small percentage of individual scholars have responded by using general disclaimers that may or may not be effective. Others, generally well-established professors, have posted material online that is arguably in breach of copyright agreements. . . . Most, however, take a middle line of only posting papers from sympathetic publishers who allow some form of self-archiving. It is apparent that if institutional repositories are going to work, then this general confusion over copyright and IPR issues needs to be addressed right at the source.

Early Adopters of IRs: A Brief Bibliography

In "Two Views of IRs," I discussed institutional repositories in the abstract. A useful exercise, but we don’t need to just conjecture about how IRs will be structured and supported. Nor do we need to simply speculate about the issues that they will face. IRs exist, and we can "ask" their managers these questions by examining the articles that have been written about them. (Yesterday’s "ARL Institutional Repositories" posting provides another way to investigate operational IRs: try them out.)

Below is brief bibliography of interesting articles about IRs that are notable for providing insider views. You’ll note that many of them are about UK IRs. The UK has been in the forefront of the IR movement.

Andrew, Theo. "Trends in Self-Posting of Research Material Online by Academic Staff." Ariadne, no. 37 (2003).
http://www.ariadne.ac.uk/issue37/andrew/

Ashworth, Susan. "The DAEDALUS Project." Serials 16, no. 3 (2003): 249-253.
https://dspace.gla.ac.uk/handle/1905/149

Ashworth, Susan, Morag Mackie, and William J. Nixon. "The DAEDALUS Project, Developing Institutional Repositories at Glasgow University: The Story So Far." Library Review 53, no. 5 (2004): 259-264.
http://eprints.gla.ac.uk/archive/00000408/

Barton, Mary R., and Julie Harford Walker. "Building a Business Plan for DSpace, MIT Libraries’ Digital Institutional Repository." Journal of Digital Information 4, no. 2 (2003).
http://jodi.ecs.soton.ac.uk/Articles/v04/i02/Barton/

Baudoin, Patsy, and Margret Branschofsky. "Implementing an Institutional Repository: The DSpace Experience at MIT." Science & Technology Libraries 24, no. 1/2 (2003): 31-45.

Foster, Nancy Fried, and Susan Gibbons. "Understanding Faculty to Improve Content Recruitment for Institutional Repositories." D-Lib Magazine 11, no. 1 (2005).
http://www.dlib.org/dlib/january05/foster/01foster.html

Hey, Jessie. "Targeting Academic Research with Southampton’s Institutional Repository." Ariadne, no. 40 (2004).
http://www.ariadne.ac.uk/issue40/hey/

Mackie, Morag. "Filling Institutional Repositories: Practical Strategies from the DAEDALUS Project." Ariadne, no. 39 (2004).
http://www.ariadne.ac.uk/issue39/mackie/

Nixon, William J. "DAEDALUS: Freeing Scholarly Communication at the University of Glasgow." Ariadne, no. 34 (2003).
http://www.ariadne.ac.uk/issue34/nixon/

________. "The Evolution of an Institutional E-Prints Archive at the University Of Glasgow." Ariadne, no. 32 (2002).
http://www.ariadne.ac.uk/issue32/eprint-archives/

Soehner, Catherine. "The eScholarship Repository: A University of California Response to the Scholarly Communication Crisis." Science & Technology Libraries 22, no. 3/4 (2002): 29-37.

ARL Institutional Repositories

The Association of Research Libraries (ARL) currently has 123 member libraries in the US and Canada. Below is a list of operational institutional repositories at ARL libraries. This list was complied by a quick examination of ARL libraries’ home pages, supplemented with a bit of Google searching. I certainly wouldn’t claim that it’s comprehensive, and I would welcome additions. (Quick note to ARL library Web site managers: put a highly visible link to your IR on your home page.)

While not perfect (what is?), this list does give us a rough snapshot of the level of IR activity in ARL libraries, and it provides some insight into how these large research libraries have chosen to structure and support their IRs (can you say bepress and DSpace?).

Two Views of IRs

Yesterday, Stevan Harnad offered extensive comments on my "Not Green Enough" posting. Here are my thoughts on those comments.

The crux of the matter is two very different views of institutional repositories (IRs), and, therefore, different perceptions about how quickly IRs will solve the self-archiving problem. My apologies in advance to Stevan if my capsule summary of his position is incorrect.

In Stevan’s view, the sole purpose of an IR is to provide free global access to e-prints. Once institutions adopt the Berlin 3 recommendations (which require faculty to self-archive in IRs and encourage them to publish in OA journals), establishing and running an IR is a cheap, simple technical problem. Therefore, it doesn’t matter whether publisher copyright agreements allow scholars to archive in disciplinary archives or in the Internet Archive’s universal repository. (I’m unclear about Steven’s position about independent scholars who will never be able to self-archive in an IR because they are not affiliated with any institution or about researchers who are affiliated with non-academic institutions that will never have IRs. Perhaps, in the last case, he believes that IRs will be universal for every non-academic institution.) IR managers who hold other views are obstructing progress because they are wasting time on nonessential issues, not correctly perceiving the urgency and simplicity of his self-archiving solution, and unnecessarily delaying the progress of OA.

My view of the basic function of an IR is best summed up by two quotes (the first by Clifford Lynch, Executive Director of the Coalition for Networked Information) and the second by me:

"In my view, a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution." [1]

"An institutional repository includes a variety of materials produced by scholars from many units, such as e-prints, technical reports, theses and dissertations, data sets, and teaching materials. Some institutional repositories are also being used as electronic presses, publishing e-books and e-journals." [2]

Given this vision of IRs, I see them as more technically complex than Steven. However, I see the primary challenges being in the areas of achieving buy-in from university administrators and faculty, establishing a wide range of policies and procedures (e.g., acceptable types and formats of material, deposit control and facilitation strategies, copyright compliance procedures, and metadata utilization), recruiting content (including depositing items for faculty if required to help populate the IR), providing user support and training, and providing data migration services as file formats become obsolete. Of course, if IRs a assume formal publishing role, this adds new dimensions of complexity, but I’ll defer that point for now since it is only being done in a few IRs, such as the following two examples:

eScholarship Repository
http://repositories.cdlib.org/escholarship/

Internet-First University Press at Cornell University
http://dspace.library.cornell.edu/handle/1813/62

(To clarify one point of confusion, libraries are not generally expecting IRs to solve the e-journal preservation problem. They are turning to solutions such as LOCKSS to do that.)

I do not believe that getting faculty to voluntarily deposit e-prints will be easy. I’m not convinced that most university administrators are going to be quickly and effortlessly persuaded to endorse Berlin 3 unless it is, in effect, externally mandated (e.g., Research Councils UK proposal).

I think that at least a significant subset of universities will want some type of basic vetting of the copyright compliance status of submitted e-prints, and, given the current wide range of variations in publisher copyright agreements and a relatively low level of faculty awareness and interest in copyright matters, that this will be a thorny issue (and one that directly relates to my standard copyright agreement idea).

This is why Johanneke Sytsema of Oxford University said in her comment about "How Green Is My Publisher"
(http://www.escholarlypub.com/digitalkoans/2005/04/26/how-green-is-my-publisher/#comments):

"I do agree with Charles Bailey that ‘green’ doesn’t automatically mean ‘go’. Being a repository manager myself, I never just ‘go’ when I encounter ‘green’ on the (invaluable) SHERPA Romeo list. First, I need to check whether the publisher allows archiving into an institutional repository, rather than just on a personal or departmental website. Secondly, I need to check the permitted format: some publisher[s] object to using the publisher PDF, other publishers require the use of the publisher PDF. Thirdly, I need to check on publisher policies every time I deposit, since publishers may change their policy from day to day. So, could the light get greener than it is now? I believe, it should."

Given my view of IRs, I agree with University of Rochester IR manager Susan Gibbons, when she says that the "the costs and efforts involved in maintaining an IR are substantial."

Which of these two views of institutional repositories will prevail? Time will tell.

If my view prevails, IRs will take longer than if Stevan’s view prevails. Academic authors who have papers accepted by publishers with restrictive author copyright agreements (i.e., those that bar deposit in disciplinary archives or in the universal repository) will have to wait to deposit papers in an OAI-PMH compliant archive. Lacking a way to self-archive with relative ease, they may simply choose not to do so. Non-academic authors may never be able to deposit their papers in an OAI-PMH compliant archive.

If Stevan’s view prevails, IRs will pop up like mushrooms and the above won’t matter, as long as authors enthusiastically deposit their old papers once their IRs are in place.

If the only barrier is a small investment of time and money (as Stevan describes below), it’s unclear to me why we don’t have universal IRs today:

"The 94% of authors at archiveless universities are one $2000 linux server plus a few days’ one-time sysad set-up time and a few annual sysaddays’ maintenance time away from having an institutional repository."

But, I say, Godspeed, Stevan. Prove me wrong, for that will mean that OA happens sooner, and scholars without access to IRs will be deprived of the benefits of depositing in an OAI-compliant repository (or depositing at all) for a shorter period of time.

And, I cheerfully give Steven the last word on the matter (for now anyway).

1. Clifford A. Lynch, "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age," ARL: A Bimonthly Report on Research Library Issues and Actions from ARL, CNI, and SPARC, no. 226 (2003),
http://www.arl.org/newsltr/226/ir.html

2. Charles W. Bailey, Jr., Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (Washington, DC: Association of Research Libraries, 2005), xviii,
http://info.lib.uh.edu/cwb/oab.pdf

Not Green Enough

Yesterday, Stevan Harnad took the time to comment extensively on my "How Green Is My Publisher?" posting. Thanks for doing so, Stevan. Here are some further thoughts on the matter.

CB:My publication page, check. We don’t have an institutional repository yet, but I assume that "other external Web site" will cover that when we do, check. Wait a minute, what if I want to deposit the e-print in a disciplinary archive like E-LIS or I want to put it in the Internet Archive’s upcoming "OAI-compliant ‘universal repository‘"? Looks to me like I’m out of luck. No way to immediately deposit the paper in an OAI-PMH compliant archive that will have a longer life than my Website and that can be harvested by OAI-PMH search services, such as OAIster.

SH: "The restrictions on 3rd-party archives are perfectly reasonable and no problem whatsoever at this time. The problem today (just so we keep our eyes on the ball!) is the non-archiving of 85% of articles, hence their inaccessibility to all those would-be users whose universities cannot afford access to the journal’s official version! It is cheap and easy for any university to create an OAI-compliant institutional archive, and OAIster can happily harvest the metadata.
http://archives.eprints.org/eprints.php?action=browse"

eprints.org’s Institutional Archives Registry currently shows a total of 424 archives. When we browse by archive type, we discover that there are 192 "Research Institutional or Departmental" registered archives worldwide. Of course, “Departmental” archives are not institutional repositories. They do not have an institutional scope of coverage, nor are they as likely as institutional archives to be permanent. True, departments are relatively stable, but their commitment to maintaining archives may not be (e.g., the archive may be the pet project of one or a few faculty members). By contrast, once an institution commits to having an archive, it’s likely to be a more permanent arrangement, especially if it is run by a library.

But, let’s wave our hands, and say 100% of them are institutional repositories (IRs). Universities Worldwide, which is "based on the ‘World List of Universities 1997’ published by the International Association of Universities (IAU) and links discovered or posted here," currently lists 7,130 universities in 181 countries. Assuming that this is a good rough approximation, that means that about 6% of all universities have IRs. Meaning, of course, that 94% do not.

And that means that 94% of authors at universities cannot self-archive in an institutional repository (or, given the hand waving, in a departmental archive). True, they can self-archive on personal Web pages. The issues with this strategy are: (a) how may authors have up-to-date publication pages or have publication pages at all?, (b) how long will they last (i.e., authors change jobs, retire, and die)?, and (c) there is no OAI-PMH access to those pages, so they don’t show up in OAIster and similar search engines.

Now, disciplinary archives and the Internet Archive’s universal repository solve these problems. Moreover, they solve another problem: independent scholars, corporate researchers, and other non-academic authors may never have an institutional repository to self-archive in.

I don’t see this as "no problem whatsoever at this time." Quite the contrary. To be "no problem," we would have to believe that it doesn’t matter if articles are archived in OAI-PMH compliant repositories or archives. To be "no problem," we would have to not care whether scholars who will never have an institutional repository at their disposal can self-archive.

As to the question of it being "cheap and easy for any university to create an OAI-compliant institutional archive," I think there is some difference of opinion on that point. Susan Gibbons says [1] the "the costs and efforts involved in maintaining an IR are substantial," and she provides these annual IR cost estimates:

  1. $285,000, MIT
  2. $100,000 (Canadian), Queens University (for staffing only)
  3. $200,000, University of Rochester
  4. between 2,280 and 3,190 staff hours,University of Oregon

But, of course, these differences in perception about costs relate to some degree to Stevan’s next point:

SH: (And worrying about the preservation of non-existent contents is rather putting the cart before the horse. The self-archived OA versions of a goodly portion of the 15% of the articles that have been self-archived in the past 15 years are still online and OA to tell the tale to this day. All their publishers’ official versions are too. So fussing about the permanence of the non-contents of cupboards that are in any case meant to be access-supplements, not the official version of record, is rather misplaced, when what is immediately missing and urgently needed is their presence, not their permanence.)

I think that Stevan will find that few academic libraries are not going to worry about permanence. Not only will they worry about the permeance of digital objects in their repositories, they will also worry about the permanence of publisher’s archives. Librarians know that publishers are corporations, and that corporations change priorities, merge, and fail. As libraries increasingly abandon print subscriptions and go e-only for economic reasons, at some point there will be no permanent distributed print archive of new journal issues in libraries worldwide as there is today, and libraries are going to worry about that a great deal. Moreover, universities are not going to establish institutional repositories just to support OA. That may be one important item on the agenda, but there will be other archiving needs to be met as well, and factors associated with those digital objects will affect the perception of the need for overall IR preservation too.

Libraries are also going to provide new services to provide IR support in addition to technical support, ranging from convicting faculty to self-archive and helping them do so to training users in using IRs (as well as other e-print services worldwide). These services will cost money.

Don’t want libraries to lead the IR effort if this is true?

In the words of Bob Dylan:

I asked the captain what his name was
And how come he didn’t drive a truck
He said his name was Columbus
I just said, "Good luck."

Moving on.

CB: “The agreement also states that the e-print must contain a fair amount of information about the publisher and the paper: the published article’s citation and copyright date, the publisher’s address, information about the publisher’s document delivery service, and a link to the publisher’s home page.”

SH: That’s just fine too. It is only good scholarly practice to provide the full reference information and to link to the official version of record for the sake of all those potential users who can afford it. What is wrong with that, and why would any author not want to do that?

Sure, an author would want to provide a citation to the published paper and a link to it, but I suspect few will be excited about providing a fair amount of advertising information for the publisher in their e-prints, such as the publisher’s address, home page, and document delivery service. It’s not a deal killer, but it’s more work for authors or IR staff. The more individual publisher variations that there are in copyright transfer agreements, the harder it is for scholars and IR staff to meet these varying requirements.

CB: Second, it would be helpful if such directories could identify whether articles can be deposited in key types of archives. I know that we don’t want the color codes to look like SpeedyGrl.com’s Ultimate Color Table, but I think that this is an important factor in addition to the type of e-print permitted.

SH: They already do. The main distinction is the author’s own institutional archive versus central (3rd-party) archives. It is the former that are the critical ones. The rest can be done by metadata harvesting.

The SHERPA colors do not make this distinction. Neither do the otherwise helpful notes. You must look at each specific agreement (if there is a link to it).

CB: Fourth, although copyright transfer agreements have always been a confusing mess, now we want authors to actually read and evaluate them, not just mindlessly sign them like they did when digital archiving wasn’t an issue. And institutional repository managers (or archive managers) need to make sense of them post facto to determine if articles can be legally deposited and what terms apply to those deposits. So, maybe it’s time to tilt at a new windmill: a set of standardized copyright transfer agreements. I know, it’s like trying to herd several thousand hyperactive cats. But, a few years ago, getting standardized use statistics for electronic resources from publishers seemed hopeless, and some progress has been made on that score.

SH: No, it’s not more windmills or red herrings that researchers, their institutions, their funders, and research itself need: What they need is to go ahead and self-archive.

Developing clear, understandable standard copyright transfer agreements is a red herring? Let’s look at just one aspect of the problem: IR managers’ copyright concerns. I offer some quotes:

"One aspect of the survey [baseline survey of research material already held on departmental and personal Web pages in the ed.ac.uk domain] that is not shown in the results is the lack of consistency in dealing with copyright and IPR issues that scholars face when placing material online. Some academic units have responded by not self-archiving any material at all. A rather worrying example of this is the School of Law (—do they know something that we don’t?) A small percentage of individual scholars have responded by using general disclaimers that may or may not be effective. Others, generally well-established professors, have posted material online that is arguably in breach of copyright agreements, e.g. whole book chapters. Most, however, take a middle line of only posting papers from sympathetic publishers who allow some form of self-archiving. It is apparent that if institutional repositories are going to work, then this general confusion over copyright and IPR issues needs to be addressed right at the source." [2]

"Filling a repository for published and peer-reviewed papers is a slow process, and it is clear that it is a task that requires a significant amount of staff input from those charged with developing the repository. Although we have succeeded in adding a reasonable amount of content to the repository we have also been offered significant amounts of content that cannot be added because of restrictive publisher copyright agreements. In some cases academics have offered between ten and twenty articles and we have not been able to add any of them to the repository. This is a clear demonstration that major changes need to take place at a high level in order for repositories to be successful." [3]

Certainly, all OA advocates are eager to get on with the business of doing OA vs. simply reflecting on it, and few have done as much as Stevan to advance the cause, but, in my view, the issues I’ve raised warrant further consideration and action.

Notes

1. Susan Gibbons, "Establishing an Institutional Repository," Library Technology Reports 40, no. 4 (2004): 54, 56.

2. Theo Andrew, "Trends in Self-Posting of Research Material Online by Academic Staff." Ariadne, no. 37 (2003),
http://www.ariadne.ac.uk/issue37/andrew/intro.html.

3. Morag Mackie, "Filling Institutional Repositories: Practical Strategies from the DAEDALUS Project," Ariadne, no. 39 (2004),
http://www.ariadne.ac.uk/issue39/mackie/intro.html.

How Green Is My Publisher?

Back in the early 1990s, I began to fight to retain the copyright to my scholarly writings. First, the publishers thought I was kidding. Then, when it was clear that I wasn’t, they thought I was nuts. Generally, they weren’t willing to negotiate. So, I sought out the few journals that would comply with this strange whim or that had editors who would "forget" to have me sign an author agreement. Unfortunately, some of the more liberal journals got gobbled up by megapublishers, limiting my options and casting some doubt on handshake deals. Once e-only journals by nonconventional publishers took off, they became my venue of choice, since they typically allowed copyright retention by default.

Things have changed, in large part do to the growing influence of the open access movement. Now, many publishers allow self archiving of e-prints (electronic preprints or postprints), and this, in theory, means that authors can cheerfully assign their copyrights to those publishers. How many publishers do this? Well we don’t know for sure, but according to Summary Statistics So Far (whose figures are based on the Romeo Project), 92% of the 8,450 processed journals are "green," (can archive postprint) or "pale green"(can archive preprint). (Gray means you can’t archive either one.)

If you want to self archive a scholarly article, the SHERPA Publisher Copyright Policies & Self-Archiving site is the place to go to determine whether the publisher of the journal you have in mind for your article will permit it. So, when approached recently about writing a paper for a library publisher (let’s call it X), I fired up Mozilla and looked X up. Good news, X is green, meaning "can archive pre-print and post-print." Not the dreaded white ("archiving not formally supported"), not yellow ("can archive pre-print (ie pre-refereeing)"), not even blue ("can archive post-print (ie final draft post-refereeing)"), but green. SHERPA did warn me of two conditions: "Published source must be acknowledged" and "Eprint server is non-profit." No problemo, right? Being ever cautious, I then used the handy link to the actual policy.

Here’s what I found. My "preprint distribution rights" allow "posting as electronic files on the contributor’s own Web site for personal or professional use, or on the contributor’s internal university/corporate intranet or network, or other external Web site at the contributor’s university or institution, but not for either commercial (for-profit) or systematic third party sales or dissemination, by which is meant any interlibrary loan or document delivery systems. The contributor may update the preprint with the final version of the article after review and revision by the journal’s editor(s) and/or editorial/peer-review board."

My publication page, check. We don’t have an institutional repository yet, but I assume that "other external Web site" will cover that when we do, check. Wait a minute, what if I want to deposit the e-print in a disciplinary archive like E-LIS or I want to put it in the Internet Archive’s upcoming "OAI-compliant ‘universal repository‘"? Looks to me like I’m out of luck. No way to immediately deposit the paper in an OAI-PMH compliant archive that will have a longer life than my Website and that can be harvested by OAI-PMH search services, such as OAIster.

The agreement also states that the e-print must contain a fair amount of information about the publisher and the paper: the published article’s citation and copyright date, the publisher’s address, information about the publisher’s document delivery service, and a link to the publisher’s home page. Guess I can do this when I’m modifying the article to incorporate the editorial changes. That should keep me off the streets.

So, what can we conclude from this brief dip into the murky waters of author agreements other than retaining rights may still be a good idea (if you can do it)?

First, There are swirling currents of complexity beneath the placid surface of color-coded copyright transfer agreement directories. This is not to say that such directories are not indispensible (or not doing a good job), but rather that, given the idiosyncratic nature of such agreements, authors still need to read the details if they want to be fully aware of their residual rights. They may not always like what they find, and what they find may affect their willingness to self archive if it’s too limiting or burdensome. "Green" may not always mean "go."

Second, it would be helpful if such directories could identify whether articles can be deposited in key types of archives. I know that we don’t want the color codes to look like SpeedyGrl.com’s Ultimate Color Table, but I think that this is an important factor in addition to the type of e-print permitted.

Third, if claims are going to made about the number of "green" journals, maybe more consideration about what "green" means is in order, and perhaps OA advocates should agree on their color schemes. Is "can archive pre-print and post-print" enough for "green," or should it be "can archive pre-print and post-print on the author’s Website or in any noncommercial archive or repository"? If the latter, the heat should be turned up on publishers that don’t permit it by authors and OA advocates.

Fourth, although copyright transfer agreements have always been a confusing mess, now we want authors to actually read and evaluate them, not just mindlessly sign them like they did when digital archiving wasn’t an issue. And institutional repository managers (or archive managers) need to make sense of them postfacto to determine if articles can be legally deposited and what terms apply to those deposits. So, maybe it’s time to tilt at a new windmill: a set of standardized copyright transfer agreements. I know, it’s like trying to herd several thousand hyperactive cats. But, a few years ago, getting standardized use statistics for electronic resources from publishers seemed hopeless, and some progress has been made on that score.

The Access Principle: The Case for Open Access to Research and Scholarship

John Willinsky’s book, The Access Principle: The Case for Open Access to Research and Scholarship, will be released in December by MIT Press. The blurb indicates: "A commitment to scholarly work, writes Willinsky, carries with it a responsibility to circulate that work as widely as possible: this is the access principle."

Interesting. OA as a "responsibility," perhaps even a moral obligation. Often OA advocates discuss the benefits to authors of widespread digital exposure through OA, which boils down to enlighted self interest. And, of course, there is mandatory discussion of the need for access for the disenfranchised (not just the developing world, but anyone that can’t afford toll fees) in order to promote scholarship and other activities. (Let’s face it, who isn’t disenfranchised these days?) But, "responsibility," . . . hmmm, that heats up the dialog.

In any case, here’s a bit more: "Willinsky describes different types of access—the New England Journal of Medicine, for example, grants open access to issues six months after initial publication, and First Monday forgoes a print edition and makes its contents immediately accessible at no cost. He discusses the contradictions of copyright law, the reading of research, and the economic viability of open access. He also considers broader themes of public access to knowledge, human rights issues, lessons from publishing history, and ‘epistemological vanities.’"

By the way, Willinsky is a key figure in the Public Knowledge Project, which provides cool open source software such as Open Journal Systems and Open Conference Systems. (Thanks to Adrian Ho for the tip on this book.)