Report About Users’ Digital Repository Needs at the University of Hull

The RepoMMan Project at the University of Hull has published The RepoMMan User Needs Analysis report.

Here’s an excerpt from the JISC-REPOSITORIES announcement:

The document covers the repository needs of users in the research, learning & teaching, and administration areas. Whilst based primarily on needs expressed in interviews at the University of Hull the document is potentially of wider applicability, drawing from an on-line survey of researchers elsewhere and a survey of the L&T community undertaken by the CD-LOR Project.

DRAMA Project’s Fedora Authentication Code Alpha Release

The DRAMA (Digital Repository Authorization Middleware Architecture) project has released an alpha version of its Fedora authentication code. DRAMA is part of the RAMP (Research Activityflow and Middleware Priorities Project) project.

Here’s an excerpt from the fedora-commons-users announcement about the release’s features:

  • Federated authentication (using Shibboleth) for Fedora.
  • Extended XACML engine support via the introduction of an XML database for storing and querying policies and XACML requests over web services.
  • Re-factoring of Fedora XACML authorization into an interceptor layer which is separate from Fedora.
  • A new web GUI for Fedora nicknamed "mura" (Note: that we will be changing the GUI name to a new one soon).

UK Council of Research Repositories Established

SHERPA Plus has announced the launch of the UK Council of Research Repositories.

It is described as follows: "UKCoRR will be an independent professional body to allow repository managers to share experiences and discuss issues of common concern. It will give repository managers a group voice in national discussions and policy development independent of projects or temporary initiatives."

Digital Object Prototypes Framework Released

Kostas Saidis has released the Digital Object Prototypes Framework. It is available from the DOPs download page.

Here is an excerpt from the fedora-commons-users announcement:

At a glance, DOPs is a framework for the effective management and manipulation of diverse and heterogeneous digital material, providing repository-independent, type-consistent abstractions of stored digital objects. In DOPs, individual objects are treated as instances of their prototype and, hence, conform to its specifications automatically, regardless of the underlying storage format used to store and encode the objects.

The framework also provides inherent support for collections /sub-collections hierarchies and compound objects, while it allows DL-pertinent services to compose type-specific object behavior effectively. A DO Storage module is also available, which allows one to use the framework atop Fedora (thoroughly tested with Fedora version 2.0).

PRESERV Project Report on Digital Preservation in Institutional Repositories

The JISC PRESERV (Preservation Eprint Services) project has issued a report titled Laying the Foundations for Repository Preservation Services: Final Report from the PRESERV Project.

Here’s an excerpt from the Executive Summary:

The PRESERV project (2005-2007) investigated long-term preservation for institutional repositories (IRs), by identifying preservation services in conjunction with specialists, such as national libraries and archives, and building support for services into popular repository software, in this case EPrints. . . .

PRESERV was able to work with The National Archives, which has produced PRONOMDROID, the pre-eminent tool for file format identification. Instead of linking PRONOM to individual repositories, we linked it to the widely used Registry of Open Access Repositories (ROAR), through an OAI harvesting service. As a result format profiles can be found for over 200 repositories listed in ROAR, what we call the PRONOM-ROAR service. . . .

The lubricant to ease the movement of data between the components of the services model is metadata, notably preservation metadata, which informs, describes and records a range of activities concerned with preserving specific digital objects. PRESERV identified a rich set of preservation metadata, based on the current standard in this area, PREMIS, and where this metadata could be generated in our model. . . .

The most important changes to EPrints software as a result of the project were the addition of a history module to record changes to an object and actions performed on an object, and application programs to package and disseminate data for delivery to an external service using either the Metadata Encoding and Transmission Standard (METS) or the MPEG-21 Part 2: Digital Item Declaration Language (DIDL). One change to the EPrints deposit interface is the option for authors to select a licence indicating rights for allowable use by service providers or users, and others. . . .

PRESERV has identified a powerful and flexible framework in which a wide range of preservation services from many providers can potentially be intermediated to many repositories by other types of repository services. It is proposed to develop and test this framework in the next phase of the project.

Trustworthy Repositories Audit & Certification: Criteria and Checklist Published

The Center for Research Libraries and RLG Programs have published the Trustworthy Repositories Audit & Certification: Criteria and Checklist.

Here’s an excerpt from the press release:

In 2003, RLG and the US National Archives and Records Administration created a joint task force to address digital repository certification. The goal of the RLG-NARA Task Force on Digital Repository Certification was to develop criteria to identify digital repositories capable of reliably storing, migrating, and providing access to digital collections. With partial funding from the NARA Electronic Records Archives Program, the international task force produced a set of certification criteria applicable to a range of digital repositories and archives, from academic institutional preservation repositories to large data archives and from national libraries to third-party digital archiving services. . . . .

In 2005, the Andrew W. Mellon Foundation awarded funding to the Center for Research Libraries to further establish the documentation requirements, delineate a process for certification, and establish appropriate methodologies for determining the soundness and sustainability of digital repositories. Under this effort, Robin Dale (RLG Programs) and Bernard F. Reilly (President, Center for Research Libraries) created an audit methodology based largely on the checklist, tested it on several major digital repositories, including the E-Depot at the Koninklijke Bibliotheek in the Netherlands, the Inter-University Consortium for Political and Social Research, and Portico.

Findings and methodologies were shared with those of related working groups in Europe who applied the draft checklist in their own domains: the Digital Curation Center (U.K.), DigitalPreservationEurope (Continental Europe) and NESTOR (Germany). The report incorporates the sum of knowledge and experience, new ideas, techniques, and tools that resulted from cross-fertilization between the U.S. and European efforts. It also includes a discussion of audit and certification criteria and how they can be considered from an organizational perspective.

Fez 1.3 Released

Christiaan Kortekaas has announced on the fedora-commons-users list that Fez 1.3 is now available from SourceForge.

Here’s a summary of key changes from his message:

  • Primary XSDs for objects based on MODS instead of DC (can still handle your existing DC objects though)
  • Download statistics using apache logs and GeoIP
  • Object history logging (premis events)
  • Shibboleth support
  • Fulltext indexing (pdf only)
  • Import and Export of workflows and XSDs
  • Sanity checking to help make sure required external dependencies are working
  • OAI provider that respects FezACML authorisation rules

For further information on Fez, see the prior post "Fez+Fedora Repository Software Gains Traction in US."

Fez+Fedora Repository Software Gains Traction in US

The February 2007 issue of Sustaining Repositories reports that more US institutions are using or investigating a combination of Fez and Fedora (see the below quote):

Fez programmers at the University of Queensland (UQ) have been gratified by a surge in international interest in the Fez software. Emory University Libraries are building a Fez repository for electronic theses. Indiana University Libraries are also testing Fez+Fedora to see whether to replace their existing DSpace installation. The Colorado Alliance of Research Libraries (http://www.coalliance.org/) is using Fez+Fedora for their Alliance Digital Repository. Also in the US, the National Science Digital Library is using Fez+Fedora for their Materials Science Digital Library (http://matdl.org/repository/index.php).

Wildfire Institutional Repository Software

One of the interesting findings of my brief investigation of open access repository software by country was the heavy use of Wildfire in the Netherlands.

Wildfire was created by Henk Druiven, University of Groningen, and it is used by over 70 repositories. It runs on a PHP, MySQL, and Apache platform.

Here is a brief description from In Between.

Wildfire is the software our library uses for our OAI compatible repositories. It is a flexible system for setting up a large number of repositories that at the same time allows them to be aggregated in groups. A group acts like yet another repository with its own harvest address and user interface.

There are several descriptive documents about Wildfire, but most are not in English.

Open Access Repository Software Use By Country

Based on data from the OpenDOAR Charts service, here is snapshot of the open access repository software that is in use in the top five countries that offer such repositories.

The countries are abbreviated in the table header column as follows: US = United States, DK = Germany, UK = United Kingdom, AU = Australia, and NL = Netherlands. The number in parentheses is the reported number of repositories in that country.

Read the country percentages downward in each column (they do not total to 100% across the rows).

Excluding "unknown" or "other" systems, the highest in-country percentage is shown in boldface.

Software/Country US (248) DE (109) UK (93) AU (50) NL (44)
Bepress 17% 0% 2% 6% 0%
Cocoon 0% 0% 1% 0% 0%
CONTENTdm 3% 0% 2% 0% 0%
CWIS 1% 0% 0% 0% 0%
DARE 0% 0% 0% 0% 2%
Digitool 0% 0% 1% 0% 0%
DSpace 18% 4% 22% 14% 14%
eDoc 0% 2% 0% 0% 0%
ETD-db 4% 0% 0% 0% 0%
Fedora 0% 0% 0% 2% 0%
Fez 0% 0% 0% 2% 0%
GNU EPrints 19% 8% 46% 22% 0%
HTML 2% 4% 4% 4% 0%
iTor 0% 0% 0% 0% 5%
Milees 0% 2% 0% 0% 0%
MyCoRe 0% 2% 0% 0% 0%
OAICat 0% 0% 0% 2% 0%
Open Repository 0% 0% 3% 0% 2%
OPUS 0% 43% 2% 0% 0%
Other 6% 7% 2% 2% 0%
PORT 0% 0% 0% 0% 2%
Unknown 31% 28% 18% 46% 23%
Wildfire 0% 0% 0% 0% 52%

Snapshot Data from OpenDOAR Charts

OpenDOAR has introduced OpenDOAR Charts, a nifty new service that allows users to create and view charts that summarize data from its database of open access repositories.

Here’s what a selection of the default charts show today. Only double-digit percentage results are discussed.

  • Repositories by continent: Europe is the leader with 49% of repositories. North America places second with 33%.
  • Repositories by country: In light of the above, it is interesting that the US leads the pack with 29% of repositories. Germany (13%) and the UK follow (11%).
  • Repository software: After the 28% of unknown software, EPrints takes the number two slot (21%), followed by DSpace (19%).
  • Repository types: By far, institutional repositories are the leader at 79%. Disciplinary repositories follow (13%).
  • Content types: ETDs lead (53%), followed by unpublished reports/working papers (48%), preprints/postprints (37%), conference/workshop papers (35%), books/chapters/sections (31%), multimedia/av (20%), postprints only (17%), bibliographic references (16%), special items (15%), and learning objects (13%).

This is a great service; however, I’d suggest that University of Nottingham consider licensing it under a Creative Commons license so that snapshot charts could be freely used (at least for noncommercial purposes).

Census of Institutional Repositories in the United States

The Council on Library and Information Resources has published the Census of Institutional Repositories in the United States: MIRACLE Project Research Findings, which was written by members of the University of Michigan School of Information’s MIRACLE (Making Institutional Repositories a Collaborative Learning Environment) Project. The report is freely available in digital form.

Here is an excerpt from the CLIR press release:

In conducting the census, the authors sought to identify the wide range of practices, policies, and operations in effect at institutions where decision makers are contemplating planning, pilot testing, or implementing an IR; they also sought to learn why some institutions have ruled out IRs entirely.

The project team sent surveys to library directors at 2,147 institutions, representing all university main libraries and colleges, except for community colleges, in the United States. About 21% participated in the census. More than half of the responding institutions (53%) have done no IR planning. Twenty percent have begun to plan, 16% are actively planning and pilot testing IRs, and 11% have implemented an operational IR.

While the study confirms a number of previous survey findings on operational IRs—such as the IR’s disproportionate representation at research institutions and the leading role of the library in planning, testing, implementing, and paying for IRs—the census also offers a wealth of new insights. Among them is the striking finding that half of the respondents who had not begun planning an IR intend to do so within 24 months.

Other institutional repository surveys include the ARL Institutional Repositories SPEC Kit and the DSpace community survey.

MIT’s SIMILE Project

MIT’s Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) project is producing a variety of interesting open source software packages that will be of interest to librarians and others such as Piggy Bank, "a Firefox extension that turns your browser into a mashup platform, by allowing you to extract data from different web sites and mix them together."

Here is an overview of the SIMILE project from the About SIMILE page:

SIMILE is a joint project conducted by the MIT Libraries and MIT Computer Science and Artificial Intelligence Laboratory. SIMILE seeks to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services. A key challenge is that the collections which must inter-operate are often distributed across individual, community, and institutional stores. We seek to be able to provide end-user services by drawing upon the assets, schemata/vocabularies/ontologies, and metadata held in such stores.

SIMILE will leverage and extend DSpace, enhancing its support for arbitrary schemata and metadata, primarily though the application of RDF and semantic web techniques. The project also aims to implement a digital asset dissemination architecture based upon web standards. The dissemination architecture will provide a mechanism to add useful "views" to a particular digital artifact (i.e. asset, schema, or metadata instance), and bind those views to consuming services.

You can get a more detailed overview of the project from the SIMILE grant proposal and from other project documents.

There is a SIMILE blog and a Wiki. There are also three mailing lists.

Fedora 2.2 Released

The Fedora Project has released version 2.2 of Fedora.

From the announcement:

This is a significant release of Fedora that includes a complete repackaging of the Fedora source and binary distribution so that Fedora can now be installed as a standalone web application (.war) in any web container. This is a first step in positioning Fedora to fit within a standard "enterprise system" environment. A new installer application makes it easy to setup and run Fedora. Fedora now uses Servlet Filters for authentication. To support digital object integrity, the Fedora repository can now be configured to calculate and store checksums for datastream content. This can be done globally, or on selected datastreams. The Fedora API also provides the ability to check content integrity based on checksums. The RDF-based Resource Index has been tuned for better performance. Also, a new high-performing triplestore, backed by Postgres, has been developed that can be plugged into the Resource Index. Fedora contains many other enhancements and bug fixes.

Notre Dame Institutional Digital Repository Phase I Final Report

The University of Notre Dame Libraries have issued a report about their year-long institutional repository pilot project. There is an abbreviated HTML version and a complete PDF version.

From the Executive Summary:

Here is the briefest of summaries regarding what we did, what we learned, and where we think future directions should go:

  1. What we did—In a nutshell we established relationships with a number of content groups across campus: the Kellogg Institute, the Institute for Latino Studies, Art History, Electrical Engineering, Computer Science, Life Science, the Nanovic Institute, the Kaneb Center, the School of Architecture, FTT (Film, Television, and Theater), the Gigot Center for Entrepreneurial Studies, the Institute for Scholarship in the Liberal Arts, the Graduate School, the University Intellectual Property Committee, the Provost’s Office, and General Counsel. Next, we collected content from many of these groups, "cataloged" it, and saved it into three different computer systems: DigiTool, ETD-db, and DSpace. Finally, we aggregated this content into a centralized cache to provide enhanced browsing, searching, and syndication services against the content.
  2. What we learned—We essentially learned four things: 1) metadata matters, 2) preservation now, not later, 3) the IDR requires dedicated people with specific skills, 4) copyright raises the largest number of questions regarding the fulfillment of the goals of the IDR.
  3. Where we are leaning in regards to recommendations—The recommendations take the form of a "Chinese menu" of options, and the options are be grouped into "meals." We recommend the IDR continue and include: 1) continuing to do the Electronic Theses & Dissertations, 2) writing and implementing metadata and preservation policies and procedures, 3) taking the Excellent Undergraduate Research to the next level, and 4) continuing to implement DigiTool. There are quite a number of other options, but they may be deemed too expensive to implement.

Will Self-Archiving Cause Libraries to Cancel Journal Subscriptions?

There has been a great deal of discussion of late about the impact of self-archiving on library journal subscriptions. Obviously, this is of great interest to journal publishers who do not want to wake up one morning, rub the sleep from their eyes, and find out over their first cup of coffee at work that libraries have en masse canceled subscriptions because a "tipping point" has been reached. Likewise, open access advocates do not want journal publishers to panic at the prospect of cancellations and try to turn back the clock on liberal self-archiving policies. So, this is not a scenario that any one wants, except those who would like to simply scrap the existing journal publishing system and start over with a digital tabula rosa.

So, deep breath: Is the end near?

This question hinges on another: Will libraries accept any substitute for a journal that does not provide access to the full, edited, and peer-reviewed contents of that journal?

If the answer is "yes," publishers better get out their survival kits and hunker down for the digital nuclear winter or else change business practices to embrace the new reality. Attempts to fight back by rolling back the clock may just make the situation worse: the genie is out of the bottle.

If the answer is "no," preprints pose no threat, but postprints may under some difficult to attain circumstances.

It is unlikely that a critical mass of author created postprints (i.e., author makes the preprint look like the postprint) will ever emerge. Authors would have to be extremely motivated to have this occur. If you don’t believe me, take a Word file that you submitted to a publisher and make it look exactly like the published article (don’t forget the pagination because that might be a sticking point for libraries). That leaves publisher postprints (generally PDF files).

For the worst to happen, every author of every paper published in a journal would have to self-archive the final publisher PDF file (or the publishers themselves would have to do it for the authors under mandates).

But would that be enough? Wouldn’t the permanence and stability of the digital repositories housing these postprints be of significant concern to libraries? If such repositories could not be trusted, then libraries would have to attempt to archive the postprints in question themselves; however, since postprints are not by default under copyright terms that would allow this to happen (e.g., they are not under Creative Commons Licenses), libraries may be barred from doing so. There are other issues as well: journal and issue browsing capabilities, the value-added services of indexing and abstracting services, and so on. For now, let’s wave our hands briskly and say that these are all tractable issues.

If the above problems were overcome, a significant one remains: publishers add value in many ways to scholarly articles. Would libraries let the existing system of journal publishing collapse because of self-archiving without a viable substitute for these value-added functions being in place?

There have been proposals for and experiments with overlay journals for some time, as well other ideas for new quality control strategies, but, to date, none have caught fire. Old-fashioned peer review, copy editing and fact checking, and publisher-based journal design and production still reign, even among the vast majority of e-journals that are not published by conventional publishers. In the Internet age, nothing technological stops tens of thousands of new e-journals using open source journal management software from blooming, but they haven’t so far, have they? Rather, if you use a liberal definition of open access, there are about 2,500 OA journals—a significant achievement; however, there are questions about the longevity of such journals if they are published by small non-conventional publishers such as groups of scholars (e.g., see "Free Electronic Refereed Journals: Getting Past the Arc of Enthusiasm"). Let’s face it—producing a journal is a lot of work, even a small journal that only publishes less than a hundred papers a year.

Bottom line: a perfect storm is not impossible, but it is unlikely.

Certifying Digital Repositories: DINI Draft

The Electronic Publishing Working Group of the Deutsche Initiative für Netzwerkinformation (DINI) has released an English draft of its DINI-Certificate Document and Publication Services 2007.

It outlines criteria for repository author support; indexing; legal aspects; long-term availability; logs and statistics; policies; security, authenticity and data integrity; and service visibility. It also provides examples.

Details on Open Repositories 2007 Talks

Details about the Open Repositories 2007 conference sessions are now available, including keynotes, poster sessions, presentations, and user groups. For DSpace, EPrints, and Fedora techies, the user group sessions look like a don’t miss with talks by luminaries such as John Ockerbloom and MacKenzie Smith. The presentations sessions include talks by Andrew Treloar, Carl Lagoze and Herbert Van de Sompel, Leslie Johnston, Simeon Warner among other notables. Open Repositories 2007 will be held in San Antonio, January 23-26.

Hopefully, the conference organizers plan to make streaming audio and/or video files available post-conference, but PowerPoints, as was the case for Open Repositories 2006, would also be useful.

Results from the DSpace Community Survey

DSpace conducted an informal survey of its open source community in October 2006. Here are some highlights:

  • The vast majority of respondents (77.6%) used or planned to use DSpace for a university IR.
  • The majority of systems were in production (53.4%); pilot testing was second (35.3%).
  • Preservation and interoperability were the highest priority system features (61.2% each), followed by search engine indexing (57.8%) and open access to refereed articles (56.9%). (Percentage of respondents who rated these features "very important.") Only 5.2% thought that OA to refereed articles was unimportant.
  • The most common type of current IR content was refereed scholarly articles and theses/dissertations (55.2% each), followed by other (48.6%) and grey literature (47.4%).
  • The most popular types of content that respondents were planning to add to their IRs were datasets (53.4%), followed by audio and video (46.6% each).
  • The most frequently used type of metadata was customized Dublin Core (80.2%), followed by XML metadata (13.8%).
  • The most common update pattern was to regularly migrate to new versions; however it took a "long time to merge in my customizations/configuration" (44.8%).
  • The most common types of modification were minor cosmetics (34.5%), new features (26.7%), and significant user interface customization (21.6%).
  • Only 30.2% were totally comfortable with editing/customizing DSpace; 56.9% were somewhat comfortable and 12.9% were not comfortable.
  • Plug-in use is light: for example, 11.2% use SRW/U, 8.6% use Manakin, and 5.2% use TAPIR (ETDs).
  • The most desired feature for the next version is a more easily customized user interface (17.5%), closely followed by improved modularity (16.7%).

For information about other recent institutional repository surveys, see "ARL Institutional Repositories SPEC Kit" and "MIRACLE Project’s Institutional Repository Survey."

OAI’s Object Reuse and Exchange Initiative

The Open Archives Initiative has announced its Object Reuse and Exchange (ORE) initiative:

Object Reuse and Exchange (ORE) will develop specifications that allow distributed repositories to exchange information about their constituent digital objects. These specifications will include approaches for representing digital objects and repository services that facilitate access and ingest of these representations. The specifications will enable a new generation of cross-repository services that leverage the intrinsic value of digital objects beyond the borders of hosting repositories. . . . its real importance lies in the potential for these distributed repositories and their contained objects to act as the foundation of a new digitally-based scholarly communication framework. Such a framework would permit fluid reuse, refactoring, and aggregation of scholarly digital objects and their constituent parts—including text, images, data, and software. This framework would include new forms of citation, allow the creation of virtual collections of objects regardless of their location, and facilitate new workflows that add value to scholarly objects by distributed registration, certification, peer review, and preservation services. Although scholarly communication is the motivating application, we imagine that the specifications developed by ORE may extend to other domains.

OAI-ORE is being funded my the Andrew W. Mellon Foundation for a two-year period.

Presentations from the Augmenting Interoperability across Scholarly Repositories meeting are a good source of further information about the thinking behind the initiative as is the "Pathways: Augmenting Interoperability across Scholarly Repositories" preprint.

MIRACLE Project’s Institutional Repository Survey

The MIRACLE (Making Institutional Repositories A Collaborative Learning Environment) project at the University of Michigan’s School of Information presented a paper at JCDL 2006 titled "Nationwide Census of Institutional Repositories: Preliminary Findings."

MIRACLE’s sample population was 2,147 library directors at four-year US colleges and universities. The paper presents preliminary findings from 273 respondents.

Respondents characterized their IR activities as: "(1) implementation of an IR (IMP), (2) planning & pilot testing an IR software package (PPT), (3) planning only (PO), or (4) no planning to date (NP)."

Of the 273 respondents, "28 (10%) have characterized their IR involvement as IMP, 42 (15%) as PPT, 65 (24%) as PO, and 138 (51%) as NP."

The top-ranked benefits of having an IR were: "capturing the intellectual capital of your institution," "better service to contributors," and "longtime preservation of your institution’s digital output." The bottom-ranked benefits were "reducing user dependence on your library’s print collection," "providing maximal access to the results of publicly funded research," and "an increase in citation counts to your institution’s intellectual output."

On the question of IR staffing, the survey found:

Generally, PPT and PO decision-makers envision the library sharing operational responsibility for an IR. Decision-makers from institutions with full-fledged operational IRs choose responses that show library staff bearing the burden of responsibility for the IR.

Of those with operational IRs who identified their IR software, the survey found that they were using: "(1) 9 for Dspace, (2) 5 for bePress, (3) 4 for ProQuest’s Digital Commons, (4) 2 for local solutions, and (5) 1 each for Ex Libris’ DigiTools and Virginia Tech’s ETD." Of those who were pilot testing software: "(1) 17 for DSpace, (2) 9 for OCLC’s ContentDM, (3) 5 for Fedora, (4) 3 each for bePress, DigiTool, ePrints, and Greenstone, (5) 2 each for Innovative Interfaces, Luna, and ETD, and (6) 1 each for Digital Commons, Encompass, a local solution, and Opus."

In terms of number of documents in the IRs, by far the largest percentages were for less than 501 documents (IMP, 41%; and PPT, 67%).

The preliminary results also cover other topics, such as content recruitment, investigative decision-making activities, IR costs, and IR system features.

It is interesting to see how these preliminary results compare to those of the ARL Institutional Repositories SPEC Kit. For example, when asked "What are the top three benefits you feel your IR provides?," the ARL survey respondents said:

  1. Enhance visibility and increase dissemination of institution’s scholarship: 68%
  2. Free, open, timely access to scholarship: 46%
  3. Preservation of and long-term access to institution’s scholarship: 36%
  4. Preservation and stewardship of digital content: 36%
  5. Collecting, organizing assets in a central location: 24%
  6. Educate faculty about copyright, open access, scholarly communication: 8%

ARL Institutional Repositories SPEC Kit

The Institutional Repositories SPEC Kit is now available from the Association of Research Libraries (ARL). This document presents the results of a thirty-eight-question survey of 123 ARL members in early 2006 about their institutional repositories practices and plans. The survey response rate was 71% (87 out of 123 ARL members responded). The front matter and nine-page Executive Summary are freely available. The document also presents detailed question-by-question results, a list of respondent institutions, representative documents from institutions, and a bibliography. It is 176 pages long.

Here is the bibliographic information: University of Houston Libraries Institutional Repository Task Force. Institutional Repositories. SPEC Kit 292. Washington, DC: Association of Research Libraries, 2006. ISBN: 1-59407-708-8.

The members of the University of Houston Libraries Institutional Repository Task Force who authored the document were Charles W. Bailey, Jr. (Chair); Karen Coombs; Jill Emery (now at UT Austin); Anne Mitchell; Chris Morris; Spencer Simons; and Robert Wright.

The creation of a SPEC Kit is a highly collaborative process. SPEC Kit Editor Lee Anne George and other ARL staff worked with the authors to refine the survey questions, mounted the Web survey, analyzed the data in SPSS, created a preliminary summary of survey question responses, and edited and formatted the final document. Given the amount of data that the survey generated, this was no small task. The authors would like to thank the ARL team for their hard work on the SPEC Kit.

Although the Executive Summary is much longer than the typical one (over 5,100 words vs. about 1,500 words), it should not be mistaken for a highly analytic research article. Its goal was to try to describe the survey’s main findings, which was quite challenging given the amount of survey data available. The full data is available in the "Survey Questions and Responses" section of the SPEC Kit.

Here are some quick survey results:

  • Thirty-seven ARL institutions (43% of respondents) had an operational IR (we called these respondents implementers), 31 (35%) were planning one by 2007, and 19 (22%) had no IR plans.
  • Looked at from the perspective of all 123 ARL members, 30% had an operational IR and, by 2007, that figure may reach 55%.
  • The mean cost of IR implementation was $182,550.
  • The mean annual IR operation cost was $113,543.
  • Most implementers did not have a dedicated budget for either start-up costs (56%) or ongoing operations (52%).
  • The vast majority of implementers identified first-level IR support units that had a library reporting line vs. one that had a campus IT or other campus unit reporting line.
  • DSpace was by far the most commonly used system: 20 implementers used it exclusively and 3 used it in combination with other systems.
  • Proquest DigitalCommons (or the Bepress software it is based on) was the second choice of implementers: 7 implementers used this system.
  • While 28% of implementers have made no IR software modifications to enhance its functionality, 22% have made frequent changes to do so and 17% have made major modifications to the software.
  • Only 41% of implementers had no review of deposited documents. While review by designated departmental or unit officials was the most common method (35%), IR staff reviewed documents 21% of the time.
  • In a check all that apply question, 60% of implementers said that IR staff entered simple metadata for authorized users and 57% said that they enhanced such data. Thirty-one percent said that they cataloged IR materials completely using local standards.
  • In another check all that apply question, implementers clearly indicated that IR and library staff use a variety of strategies to recruit content: 83% made presentations to faculty and others, 78% identified and encouraged likely depositors, 78% had library subject specialists act as advocates, 64% offered to deposit materials for authors, and 50% offered to digitize materials and deposit them.
  • The most common digital preservation arrangement for implementers (47%) was to accept any file type, but only preserve specified file types using data migration and other techniques. The next most common arrangement (26%) was to accept and preserve any file type.
  • The mean number of digital objects in implementers’ IRs was 3,844.

ARL Institutional Repositories, Version 2

The Association of Research Libraries (ARL) currently has 123 member libraries in the US and Canada. Below is an update of an earlier list of operational institutional repositories at ARL libraries.

More on How Can Scholars Retain Copyright Rights?

Peter Suber has made the following comment on Open Access News about "How Can Scholars Retain Copyright Rights?":

This is a good introduction to the options. I’d only make two additions.

  1. Authors needn’t retain full copyright in order to provide OA to their own work. They only need to retain the right of OA archiving—which, BTW, about 70% of journals already give to authors in the copyright transfer agreement.
  2. Charles mentions the author addenda from SPARC and Science Commons, but there’s also one from MIT.

Peter is right on both points; however, my document has a broader rights retention focus than providing OA to scholars’ work, although that is an important aspect of it.

For example, there is a difference between simply making an article available on the Internet and making it available under a Creative Commons Attribution-NonCommercial 2.5 License. The former allows the user to freely read, download, and print the article for personal use. The latter allows user to make any noncommercial use of the article without permission as long as proper attribution is made, including creating derivative works. So professor X could print professor Y’s article and distribute in class without permission and without worrying about fair use considerations. (Peter, of course, understands these distinctions, and he is just trying to make sure that authors understand that they don’t have to do anything but sign agreements that grant them appropriate self-archiving rights in order to provide OA access to their articles.)

I considered the MIT addenda, but thought it might be too institution-specific. On closer reading, it could be used without alteration.