Dealing with Data: Roles, Rights, Responsibilities and Relationships

JISC has released its Dealing with Data: Roles, Rights, Responsibilities and Relationships: Consultancy Report, which was written as part of its Digital Repositories Programme’s Data Cluster Consultancy.

Here’s an excerpt from the Executive Summary:

This Report explores the roles, rights, responsibilities and relationships of institutions, data centres and other key stakeholders who work with data. It concentrates primarily on the UK scene with some reference to other relevant experience and opinion, and is framed as "a snapshot" of a relatively fast-moving field. . . .

The Report is largely based on two methodological approaches: a consultation workshop and a number of semi-structured interviews with stakeholder representatives.

It is set within the context of the burgeoning "data deluge" emanating from e-Science applications, increasing momentum behind open access policy drivers for data, and developments to define requirements for a co-ordinated e-infrastructure for the UK. The diversity and complexity of data are acknowledged, and developing typologies are referenced.

Council of Australian University Librarians ETD Survey Report

The Council of Australian University Librarians has released Australasian Digital Theses Program: Membership Survey 2006.

Here’s an excerpt from the "Key Findings" section:

1. The average percentage of records for digital theses added to ADT is 95% when digital submission is mandatory and 17% when it is not mandatory. . . .

2. 59% of respondents will have mandatory digital submission in place in 2007.

3. With this level of mandatory submission it is predicted that 60% of all theses produced in Australia and New Zealand in 2007 will have a digital copy recorded in ADT. . . .

5. The overwhelming majority of respondents offer a mediated submission service, either only having a mediated service or offering both mediated and self-submission services. When mediated and self-submission are both available, the percentage self-submitted is polarised with some achieving over a 75% self-submission rate.

6. Over half the respondents have a repository already and most are using it to manage digital theses.

7. 87% will have a repository by the end of this year, and the rest are in the initial planning stage.

CIC’s Digitization Contract with Google

Library Journal Academic Newswire has published a must-read article ("Questions Emerge as Terms of the CIC/Google Deal Become Public") about the Committee on Institutional Cooperation’s Google Book Search Library Project contract.

The article includes quotes from Peter Brantley, Digital Library Federation Executive Director, from his "Monetizing Libraries" posting about the contract (another must-read piece).

Here’s an excerpt from Brantley’s posting:

In other words—pretty much, unless Google ceases business operations, or there is a legal ruling or agreement with publishers that expressly permits these institutions (excepting Michigan and Wisconsin which have contracts of precedence) to receive digitized copies of In-Copyright material, it will be held in escrow until such time as it becomes public domain.

That could be a long wait. . . .

In an article early this year in The New Yorker, "Google’s Moon Shot," Jeffrey Toobin discusses possible outcomes of the antagonism this project has generated between Google and publishers. Paramount among them, in his mind, is a settlement. . . .

A settlement between Google and publishers would create a barrier to entry in part because the current litigation would not be resolved through court decision; any new entrant would be faced with the unresolved legal issues and required to re-enter the settlement process on their own terms. That, beyond the costs of mass digitization itself, is likely to deter almost any other actor in the market.

Report on Chemistry Teaching/Research Data and Institutional Repositories

The JISC-funded SPECTRa project has released Project SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data): JISC Final Report, March 2007.

Here’s an excerpt from the Executive Summary:

Project SPECTRa’s principal aim was to facilitate the high-volume ingest and subsequent reuse of experimental data via institutional repositories, using the DSpace platform, by developing Open Source software tools which could easily be incorporated within chemists’ workflows. It focussed on three distinct areas of chemistry research—synthetic organic chemistry, crystallography and computational chemistry.

SPECTRa was funded by JISC’s Digital Repositories Programme as a joint project between the libraries and chemistry departments of the University of Cambridge and Imperial College London, in collaboration with the eBank UK project. . . .

Surveys of chemists at Imperial and Cambridge investigated their current use of computers and the Internet and identified specific data needs. The survey’s main conclusions were:

  • Much data is not stored electronically (e.g. lab books, paper copies of spectra)
  • A complex list of data file formats (particularly proprietary binary formats) being used
  • A significant ignorance of digital repositories
  • A requirement for restricted access to deposited experimental data

Distributable software tool development using Open Source code was undertaken to facilitate deposition into a repository, guided by interviews with key researchers. The project has provided tools which allow for the preservation aspects of data reuse. All legacy chemical file formats are converted to the appropriate Chemical Markup Language scheme to enable automatic data validation, metadata creation and long-term preservation needs. . . .

The deposition process adopted the concept of an "embargo repository" allowing unpublished or commercially sensitive material, identified through metadata, to be retained in a closed access environment until the data owner approved its release. . . .

Among the project’s findings were the following:

  • it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organisational capability of digital repositories;
  • scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;
  • the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;
  • institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;
  • IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.

NIH Public Access Policy Mandate Needs Immediate Support

The Alliance for Taxpayer Access has issued an action alert regarding a change in the NIH Public Access Policy that would mandate deposit of articles resulting from NIH-funded research. Peter Suber has discussed this issue in relation to a call by ACRL for an NIH mandate.

Here is the alert:

The NIH Public Access Policy is currently under consideration by Congress, as part of the larger FY08 Labor/HHS, Education, and Related Agencies Appropriations Bill. The House is expected to mark up the FY08 Labor/HHS Appropriations Bill on Thursday, June 7th.

Please take action now to express your support for a shift to mandatory policy Fax your House Representative a letter as soon as possible.

Visit http://www.house.gov for contact information. Constituents of the House Appropriations Labor/HHS Subcommittee are especially encouraged to write. (http://appropriations.house.gov/Subcommittees/sub_lhhse.shtml)

For talking points and background on the NIH Public Access Policy and recent legislative measures, please see the ATA Web site at http://www.taxpayeraccess.org/nih.html.

NIH Policy Status

The House is expected to mark up the FY08 Labor/HHS Appropriations Bill within the week. The bill will then move to the full Appropriations committee. Please stand by for an announcement about House activities from the Alliance for Taxpayer Access in the coming days.

The Senate Appropriations Committee—Labor/HHS Subcommittee is expected to review their versions of appropriations bills later this month.

Google Library Project Adds Committee on Institutional Cooperation (CIC)

The Google Book Search Library Project has an important new participant—the Committee on Institutional Cooperation (CIC). The CIC members are the University of Chicago, the University of Illinois, Indiana University, the University of Iowa, the University of Michigan, Michigan State University, the University of Minnesota, Northwestern University, Ohio State University, Pennsylvania State University, Purdue University, and the University of Wisconsin-Madison. As many as 10 million volumes will be digitized from the collections of these major research libraries.

Here’s an excerpt from the CIC press release:

This partnership between our 12 member universities and Google is unprecedented. What makes this work so exciting is that we will literally open the pages of millions of books that have been assembled on our library shelves over more than a century. In literally seconds, we’ll be able browse across the content of thousands of volumes, searching for words or phrases, and making links across those texts that would have taken weeks or months or years of dedicated and scrupulous analysis. It is an extraordinary effort, blending the efforts and aspirations of librarians, university administrators, and scholars from across 12 world-class research universities. And our corporate partner possesses unparalleled expertise in creating and opening the digital world to coherent and comprehensive searching.

The effort is not entirely without controversy—no great undertaking ever is. But our universities believe strongly in the power of information to change the world, and in preserving, protecting and extending access to information. We have carefully weighed and considered the intellectual property issues and believe that our effort is firmly within the guidelines of current copyright law, while providing some flexibility as those laws are tested in the new digital environment in the coming years.

Repositories as Platforms for Researchers e-Portfolios Podcast

The Australian Partnership for Sustainable Repositories (APSR) has made a podcast of Susan Gibbons’s "Repositories as Platforms for Researchers e-Portfolios" presentation at the Adaptable Repository workshop at the University of Sydney.

Powerpoints from the workshop’s presentations are also available.

Happy Birthday Open Access News!

Open Access News is five today. OAN‘s indefatigable primary author Peter Suber has written over 10,800 OAN postings during this period. Going further back to 2001, he has written 109 issues of the SPARC Open Access Newsletter (formerly called the Free Online Scholarship Newsletter) as well as important papers on open access.

Thanks, Peter. The open access movement owes you a huge debt of gratitude for this fine work.

The REMAP Project: Record Management and Preservation in Digital Repositories

The REMAP Project at the University of Hull has been funded by JISC investigate how record management and digital preservation functions can be best supported in digital repositories. It utilizes the Fedora system.

Here’s an except from the Project Aims page (I have added the links in this excerpt):

The REMAP project has the following aims:

  • To develop Records Management and Digital Preservation (RMDP) workflow(s) in order to understand how a digital repository can support these activities
  • To embed digital repository interaction within working practices for RMDP purposes
  • To further develop the use of a WSBPEL orchestration tool to work with external Web services, including the PRONOM Web services, to provide appropriate metadata and file information for RMDP
  • To develop and test a notification layer that can interact with the orchestration tool and allow RSS
    syndication to individuals alerting them to RMDP tasks
  • To develop and test an intermediate persistence layer to underpin the notification layer and interact
    with the WSBPEL orchestration tool to allow orchestrated workflows to take place over time
  • To test and validate the use of the enhanced WSBPEL tool with institutional staff involved in RMDP activities

SWORD (Simple Web-service Offering Repository Deposit) Project

Led by UKOLN, The JISC SWORD (Simple Web-service Offering Repository Deposit) Project is developing "a prototype ‘smart deposit’ tool" to "facilitate easier and more effective population of repositories."

Here’s an excerpt from the project plan:

The effective and efficient population of repositories is a key concern for the repositories community. Deposit is a crucial step in the repository workflow; without it a repository has no content and can fulfill no further function. Currently most repositories exist in a fairly linear context, accepting deposits from a single interface and putting them into a single repository. Further deployment of repositories, encouraged by JISC and other funders, means that this situation is changing and we are beginning to see an increasingly complex and dynamic ecology of interactions between repositories and other services and systems. By and large developers are not creating repository systems and software from scratch, rather they are considering how repositories interface with other applications within institutions and the wider information landscape. A single repository, or multiple repositories, might interact with other components, such as VLEs, authoring tools, packaging tools, name authority services, classification services and research systems. In terms of content, resources may be deposited in a repository by both human and software agents, e.g. packaging tools that push content into repositories or a drag-and-drop desktop tool. The type of resource being deposited will also influence the choice of deposit mechanism. If the resources are complex packaged objects then a web service will need to support the ingest of multiple packaging standards.

There is currently no standard mechanism for accepting content into repositories, yet there already exists a stable and widely implemented service for harvesting metadata from repositories (OAI-PMH—Open Archives Initiative Protocol for Metadata Harvesting). This project will implement a similarly open protocol or specification for deposit. By taking a similar approach, the project and the resulting protocol and implementations will gain easier acceptance by a community already familiar with the OAI-PMH.

This project aims to develop a Simple Web-service Offering Repository Deposit (SWORD)—a lightweight deposit protocol that will be implemented as a simple web service within EPrints, DSpace, Fedora and IntraLibrary and tested against a prototype ‘smart deposit’ tool. The project plans to take forward the lightweight protocol originally formulated by a small group working within the Digital Repositories Programme (the ‘Deposit API’ work) . The project is aligned with the Object Reuse and Exchange (ORE) Mellon-funded two-year project by the Open Archives Initiative, which commenced in October 2006. Members of the SWORD project team are represented on its Technical and Liaison Committees. . . . . The SWORD project is not attempting to duplicate work being done being done by ORE, but seeks to build on existing work to support UK-specific requirements whilst feeding into the ongoing ORE project.

Position Papers from the NSF/JISC Repositories Workshop

Position papers from the NSF/JISC Repositories Workshop are now available.

Here’s an excerpt from the Workshop’s Welcome and Themes page:

Here is some background information. A series of recent studies and reports have highlighted the ever-growing importance for all academic fields of data and information in digital formats. Studies have looked at digital information in science and in the humanities; at the role of data in Cyberinfrastructure; at repositories for large-scale digital libraries; and at the challenges of archiving and preservation of digital information. The goal of this workshop is to unite these separate studies. The NSF and JISC share two principal objectives: to develop a road map for research over the next ten years and what to support in the near term.

Here are the position papers:

Friday’s OAI5 Presentations

Presentations from Friday’s sessions of the 5th Workshop on Innovations in Scholarly Communication in Geneva are now available.

Here are a few highlights from this major conference:

  • Doctoral e-Theses; Experiences in Harvesting on a National and European Level (PowerPoint): "In the presentation we will show some lessons learned and the first results of the Demonstrator, an interoperable portal of European doctoral e-theses in five countries: Denmark, Germany, the Netherlands, Sweden and the UK."
  • Exploring Overlay Journals: The RIOJA project (PowerPoint): "This presentation introduces the RIOJA (Repository Interface to Overlaid Journal Archives) project, on which a group of cosmology researchers from the UK is working with UCL Library Services and Cornell University. The project is creating a tool to support the overlay of journals onto repositories, and will demonstrate a cosmology journal overlaid on top of arXiv."
  • Dissemination or Publication? Some Consequences from Smudging the Boundaries between Research Data and Research Papers (PDF): "Project StORe’s repository middleware will enable researchers to move seamlessly between the research data environment and its outputs, passing directly from an electronic article to the data from which it was developed, or linking instantly to all the publications that have resulted from a particular research dataset."
  • Open Archives, The Expectations of the Scientific Communities (RealVideo): "This analysis led the French CNRS to start the Hal project, a pluridisciplinary open archive strongly inspired by ArXiv, and directly connected to it. Hal actually automatically transfers data and documents to ArXiv for the relevant disciplins; similarly, it is connected to Pum Med and Pub Med Central for life sciences. Hal is customizable so that institutions can build their own portal within Hal, which then plays the role of an institutional archive (examples are INRIA, INSERM, ENS Lyon, and others)."

(You may want to download PowerPoint Viewer 2007 if you don’t have PowerPoint 2007).

Thursday’s OAI5 Presentations

Presentations from Thursday’s sessions of the 5th Workshop on Innovations in Scholarly Communication in Geneva are now available.

Here are a few highlights from this major conference:

  • Business Models for Digital Repositories (PowerPoint): "Those setting up, or planning to set up, a digital repository may be interested to know more about what has gone before them. What is involved, what is the cost, how many people are needed, how have others made the case to their institution, and how do you get anything into it once it is built? I have recently undertaken a study of European repository business models for the DRIVER project and will present an overview of the findings."
  • DRIVER: Building a Sustainable Infrastructure of European Scientific Repositories (PowerPoint): "Ten partners from eight countries have entered into an international partnership, to connect and network as a first step more than 50 physically distributed institutional repositories to one, large-scale, virtual Knowledge Base of European research."
  • On the Golden Road : Open Access Publishing in Particle Physics (RealVideo): "A working party works now to bring together funding agencies, laboratories and libraries into a single consortium, called SCOAP3 (Sponsoring Consortium for Open access Publishing in Particle Physics). This consortium will engage with publishers towards building a sustainable model for open access publishing. In this model, subscription fees from multiple institutions are replaced with contracts with publishers of open access journals where the SCOAP3 consortium is a single financial partner."
  • Open Access Forever—Or Five Years, Whichever Comes First: Progress on Preserving the Digital Scholarly Record (RealVideo): "The current state of the curation and preservation of digital scholarship over its entire lifecycle will be reviewed, and progress on problems of specific interest to scholarly communication will be examined. The difficulty of curating the digital scholarly record and preserving it for future generations has important implications for the movement to make that record more open and accessible to the world, so this a timely topic for those who are interested in the future of scholarly communication."

(You may want to download PowerPoint Viewer 2007 if you don’t have PowerPoint 2007).

OpenDOAR API

The OpenDOAR project has announced the availability of an API for accessing digital repository data in their database.

Here’s an excerpt from the press release:

OpenDOAR, as a SHERPA project, is pleased to announce the release of an API that lets developers use OpenDOAR data in their applications. It is a machine-to-machine interface that can run a wide variety of queries against the OpenDOAR Database and get back XML data. Developers can choose to receive just repository titles & URLs, all the available OpenDOAR data, or intermediate levels of detail. They can then incorporate the output into their own applications and ‘mash-ups’, or use it to control processes such as OAI-PMH harvesting. . . .

OpenDOAR is a continuing project hosted at the University of Nottingham under the SHERPA Partnership. OpenDOAR maintains and builds on a quality-assured list of the world’s Open Access Repositories. OpenDOAR acts as a bridge between repository administrators and the service providers who make use of information held in repositories to offer search and other services to researchers and scholars worldwide.

A key feature of OpenDOAR is that all of the repositories we list have been visited by project staff, tested and assessed by hand. We currently decline about a quarter of candidate sites as being broken, empty, out of scope, etc. This gives a far higher quality assurance to the listings we hold than results gathered by just automatic harvesting. OpenDOAR has now surveyed over 1,100 repositories, producing a classified Directory of over 800 freely available archives of academic information.

Wednesday’s OAI5 Presentations

Presentations from Wednesday’s sessions of the 5th Workshop on Innovations in Scholarly Communication in Geneva are now available.

Here are a few highlights from this major conference:

  • MESUR: Metrics from Scholarly Usage of Resources (PowerPoint): "The two-year MESUR project, funded by the Andrew W. Mellon Foundation, aims to define and validate a range of usage-based impact metrics, and issue guidelines with regards to their characteristics and proper application. The MESUR project is constructing a large-scale semantic model of the scholarly community that seamlessly integrates a wide range of bibliographic, citation and usage data."
  • OAI Object Re-Use and Exchange (PowerPoint): "In this presentation, we will give an overview of the current activities, including: defining the problem of compound documents within the web architecture, enumerating and exploring several use cases, and identifying likely adopters of OAI-ORE."
  • OpenDOAR Policy Tools and Applications (RealVideo): "OpenDOAR has developed a set of policy generator tools for repository administrators and is contacting administrators to advocate policy development."
  • State of OAI-PMH (PowerPoint): "The OAI-PMH was released in 2001 and stabilized at v2.0 in 2002. Since then there has been steady growth in adoption of the protocol. Support for the OAI-PMH is assumed for base-level interoperability between institutional repositories, and is also provided for many other collections of scholarly material. I will review the current landscape and reflect on some milestones and issues."

(You may want to download PowerPoint Viewer 2007 if you don’t have PowerPoint 2007).

The Depot: A UK Digital Repository

The JISC Repositories and Preservation program has established the Depot, so that researchers who do not have an institutional repository can deposit digital postprints and other digital objects.

Here’s an excerpt from the press release:

The general strategy being adopted in the UK is that every university should develop and establish its own institutional repository (IR), as part of a comprehensive ‘JISC RepositoryNet’. Many researchers can already make use of the IRs set up in their institution, but that is not (yet) the case for all. A key purpose for The Depot is to bridge that gap during the period before all have such provision, and to provide a deposit facility that will enable all UK researchers to expose their publications to readers under terms of Open Access.

The Depot will also have a re-direct function to link researchers to the appropriate home pages of their own institutional repositories. The end result should be more content in repositories, making it easier for researchers and policy makers to have peer-reviewed research results exposed to wider readership under Open Access. . . .

The principal focus for The Depot is the deposit of post-prints, digital versions of published journal articles and similar items. There are plans to include links to places for depositing other digital materials, such as research datasets and learning materials. As indicated, The Depot helps provide a level-playing field for all UK researchers and their institutions, especially when deposit under Open Access is required by grant funding bodies. It may also become a useful facility for institutions as they implement and manage their own repositories, helping to promote the habit of deposit among staff, with the simple message, ‘put it in the depot’.

The Depot is based on E-Prints software and is compliant with the Open Archive Initiative (OAI), which promotes standards for repository interoperability. Its contents will be harvested and searched through the Intute Repository Search project. It offers a redirect service, UK Repository Junction, to ensure that content that comes within the remit of an extant repository is correctly placed there instead of in The Depot.

Additionally, as IRs are created, The Depot will offer a transfer service for content deposited by authors based at those universities, to help populate the new IRs. The Depot will therefore act as a ‘keepsafe’ until a repository of choice becomes available for deposited scholarly content. In this way, The Depot will avoid competing with extant and emerging IRs while bridging gaps in the overall repository landscape and encouraging more open access deposits.

A Depot FAQ is available.

OpenLOCKSS Project

Led by the University of Glasgow Library, the new JISC-funded OpenLOCKSS project will preserve selected UK open access publications.

Here’s an excerpt from the project proposal:

Although LOCKSS has initially concentrated on negotiations with society and commercial publishers, there has always been an interest in smaller open-access journals, as evidenced by the LOCKSS Humanities Project1, where twelve major US libraries have collaborated to contact more than fifty predominantly North American open access journal titles, enabling them to be preserved within the LOCKSS system. . . .

At present, much open access content is under threat, and is difficult to preserve for posterity under standard arrangements, at least until the British Library, and the other UK national libraries, are able to take a more proactive and comprehensive stance in preserving websites comprising UK output. Many open access journals are small operations, often dependent on one or two enthusiastic editors, often based in university departments and/or small societies, concerned with producing the next issue, and often with very little interest in or knowledge of preservation considerations. Their long term survival beyond the first few issues can often be in doubt, but their content, where appropriate quality controls have been applied, is worthy of preservation.

LOCKSS is an ideal low-cost mechanism for ensuring preservation, provided that appropriate contacts can be made and plug-in developments completed, and sufficient libraries agree to host content, on the Humanities Project model. . . .

Earlier in 2006, a survey was carried out by the LOCKSS Pilot Project, to discover preferences for commercial/society publishers to approach with a view to participating in LOCKSS, and Content Complete Ltd have been undertaking this work, as well as negotiating with the NESLi2 publishers on their LOCKSS participation. . . .

We propose to consider initially the titles with at least six votes (it may not be appropriate to approach all these titles, for example we shall check that all are currently publishing and confirm that they appear to be of appropriate quality), followed by those with five or four votes. We propose that agreements for LOCKSS participation are concluded with at least twelve titles, with fifteen as a likely upper limit.

Repository 66: OA Digital Repository Map Mashup

Stuart Lewis of the University of Wales Aberystwyth has created a Google Map mashup called Repository 66 that shows worldwide open access digital repositories using data from ROAR and OpenDOAR. (Route 66 was a famous highway in the US.)

Dr. John Hoey Joins the Scholarly Exchange Board

Julian Fisher, Managing Director of the Scholarly Exchange, has announced that Dr. John Hoey has joined the Scholarly Exchange Board.

Here’s an excerpt from the SPARC-OAForum announcement:

Dr. Hoey is the former editor-in-chief of the Canadian Medical Association Journal and long an advocate of open access publishing. A specialist in community medicine and internal medicine, he is Professor of Medicine (adjunct) at Queen’s University and a Special Advisor to the Principal on Public Health.

Scholarly Exchange, Inc. has eliminated a major obstacle in starting open access journals by providing a free and fully supported e-publishing platform. Combining Open Journal Systems public-domain software with complete hosting and support, this service offers scholars unrivaled freedom and flexibility to produce academic journals at a price that fosters the open access model. It also develops tools and methods to promote and support open access journals.

Report About Users’ Digital Repository Needs at the University of Hull

The RepoMMan Project at the University of Hull has published The RepoMMan User Needs Analysis report.

Here’s an excerpt from the JISC-REPOSITORIES announcement:

The document covers the repository needs of users in the research, learning & teaching, and administration areas. Whilst based primarily on needs expressed in interviews at the University of Hull the document is potentially of wider applicability, drawing from an on-line survey of researchers elsewhere and a survey of the L&T community undertaken by the CD-LOR Project.