Repository Presentations from the DataShare Project

The DataShare project has released two recent presentations about its activities: "Data Documentation Initiative (DDI)" and "Guidelines and Tools for Repository Planning and Assessment." A recent briefing paper, The Data Documentation Initiative (DDI) and Institutional Repositories, is also available.

Here's a description of the DataShare project from its home page:

DISC-UK DataShare, led by EDINA, arises from an existing UK consortium of data support professionals working in departments and academic libraries in universities (Data Information Specialists Committee-UK), and builds on an international network with a tradition of data sharing and data archiving dating back to the 1960s in the social sciences. By working together across four universities and internally with colleagues already engaged in managing open access repositories for e-prints, this partnership will introduce and test a new model of data sharing and archiving to UK research institutions. By supporting academics within the four partner institutions who wish to share datasets on which written research outputs are based, this network of institution-based data repositories develops a niche model for deposit of 'orphaned datasets' currently filled neither by centralised subject-domain data archives/centres/grids nor by e-print based institutional repositories (IRs).

DISC-UK Report on Web 2.0 Data Visualization Tools

JISC has released DISC-UK DataShare: Web 2.0 Data Visualisation Tools: Part 1—Numeric Data.

Here's an excerpt from the "Introduction":

Part 1 of this briefing paper will highlight some examples of new collaborative web services using Web 2.0 technologies which venture into the numeric data visualisation arena. These mashups allow researchers to upload and analyse their own data in ‘open’ and dynamic environments. Broadly speaking the numeric data being referred to could be micro-data (data about the individual), macro-data2 or country-level data, derived or summary data.

Stewardship of Digital Research Data: A Framework of Principles and Guidelines

The Research Information Network (RIN) has published Stewardship of Digital Research Data: A Framework of Principles and Guidelines: Responsibilities of Research Institutions and Funders, Data Managers, Learned Societies and Publishers.

Here's an excerpt from the Web page describing the document:

Research data are an increasingly important and expensive output of the scholarly research process, across all disciplines. . . . But we shall realise the value of data only if we move beyond research policies, practices and support systems developed in a different era. We need new approaches to managing and providing access to research data.

In order to address these issues, the RIN established a group to produce a framework of key principles and guidelines, and we consulted on a draft document in 2007. The framework is founded on the fundamental policy objective that ideas and knowledge, including data, derived from publicly-funded research should be made available for public use, interrogation, and scrutiny, as widely, rapidly and effectively as practicable. . . .

The framework is structured around five broad principles which provide a guide to the development of policy and practice for a range of key players: universities, research institutions, libraries and other information providers, publishers, and research funders as well as researchers themselves. Each of these principles serves as a basis for a series of questions which serve a practical purpose by pointing to how the various players might address the challenges of effective data stewardship.

Towards the Australian Data Commons: A Proposal for an Australian National Data Service

The Australian eResearch Infrastructure Council has released Towards the Australian Data Commons: A Proposal for an Australian National Data Service.

Here's an excerpt from the "Overview":

This paper is designed to encourage, inform and ultimately summarise the discussions around the appropriate strategic and technical descriptions of the Australian National Data Service; to fill in the outline in the Platforms for Collaboration investment plan.

To do so, the paper:

  • introduces the Australian National Data Service (ANDS) and the driving forces behind its creation;
  • provides a rationale for the services that ANDS will provide, and the programs through which the services will be offered; and
  • describes in detail the ANDS programs.

Part One (Background) provides a brief summary of the reasons to focus on data management, as well as an overview of ANDS, and identifies some issues associated with implementation.

Part Two (Rationale) sets out the systemic issues associated with achieving a research data commons, and provides the resultant rationale for the services that ANDS will offer the programs that they will be delivered through.

Part Three (Detailed Descriptions of ANDS Programs) sets out in detail the Aim, Focus, Service Beneficiaries, Products and Community Engagement activities for each of the ANDS Programs.

Digital Library Federation Forum for NSF DataNet Grant Proposals

The Digital Library Federation has established a forum for those who want to collaborate or get further information about the NSF's Sustainable Digital Data Preservation and Access Network Partners (DataNet) grant program. Participation in the forum is open, but registration is required.

Podcasts about the Long-Term Use of Research Data

Podcasts about the Long-Term Use of Research Data

The Australian Partnership for Sustainable Repositories has released MP3 and PDF files from its Long-lived Collections: The Future of Australia's Research Data Presentations symposium.

Here are selected MP3 files:

NSF Solicits Grant Proposals for up to $20 Million for Dataset Access and Preservation

National Science Foundation's Office of Cyberinfrastructure has announced the availability of grants to U.S. academic institutions under its Sustainable Digital Data Preservation and Access Network Partners (DataNet) program.

Here's an excerpt from the solicitation:

Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

The new types of organizations envisioned in this solicitation will integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise to:

  • provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
  • continuously anticipate and adapt to changes in technologies and in user needs and expectations;
  • engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and
  • serve as component elements of an interoperable data preservation and access network.

By demonstrating feasibility, identifying best practices, establishing viable models for long term technical and economic sustainability, and incorporating frontier research, these exemplar organizations can serve as the basis for rational investment in digital preservation and access by diverse sectors of society at the local, regional, national, and international levels, paving the way for a robust and resilient national and global digital data framework.

These organizations will provide:

  • a vision and rationale that meet critical data needs, create important new opportunities and capabilities for discovery, innovation, and learning, improve the way science and engineering research and education are conducted, and guide the organization in achieving long-term sustainability;
  • an organizational structure that provides for a comprehensive range of expertise and cyberinfrastructure capabilities, ensures active participation and effective use by a wide diversity of individuals, organizations, and sectors, serves as a capable partner in an interoperable network of digital preservation and access organizations, and ensures effective management and leadership; and
  • activities to provide for the full data management life cycle, facilitate research as resource and object, engage in computer science and information science research critical to DataNet functions, develop new tools and capabilities for learning that integrate research and education at all levels, provide for active community input and participation in all phases and all aspects of Partner activities, and include a vigorous and comprehensive assessment and evaluation program.

Potential applicants should note that this program is not intended to support narrowly-defined, discipline-specific repositories. . . .

Award Information

Anticipated Type of Award: Cooperative Agreement

Estimated Number of Awards: 5 — Two to three awards are anticipated in each of two review cycles (one review cycle for fiscal year FY2008 awards and one for FY2009) for a total of five awards, contingent on the quality of proposals received and pending the availability of funds. Each award is limited to a total of up to $20,000,000 (direct plus indirect costs) for up to 5 years. The initial term of each award is expected to be 5 years with the potential at NSF's sole discretion for one terminal renewal for another 5 years, subject to performance and the availability of funds. Such performance is to include serving the needs of the relevant science and engineering research and education communities and catalyzing new opportunities for progress. If a second five-year award is made, NSF funding is expected to decrease in each successive year of the award as the Partner transitions to a sustainable economic model with other sources of support. The actual amount of the annual decrease in NSF support will be established through the cooperative agreement. Note that the maximum period NSF will support a DataNet Partner is 10 years.

Anticipated Funding Amount: $100,000,000 — Up to $100,000,000 over a five year period is expected to be available contingent on the quality of proposals received and pending the availability of funds.

Legal Aspects of Data Access and Reuse in Collaborative Research

The Open Access to Knowledge Law Project and the Legal Framework for e-Research Project have released Building the Infrastructure for Data Access and Reuse in Collaborative Research: An Analysis of the Legal Context.

Here's an excerpt from the "Executive Summary":

This Report examines the broad legal framework within which research data is generated, managed, disseminated and used. The background to the Report is the growing support for systems that enable research data generated in publicly-funded research projects to be made available for access and use by others in the research community.

The Report provides an overview of the operation of copyright law, contract and confidentiality laws, as well as a range of legislation—privacy, public records and freedom of information legislation, etc—that is of relevance to research data. The Report considers how these legal rules apply to define rights in research data and regulate the generation, management and sharing of data. In any given research project there will be a multitude of different parties with varying interests. . . The Report examines the relationships between these parties and the legal arrangements that must be implemented to ensure that research data is properly and effectively managed, so that it can be accessed and used by other researchers.

Important in the context of collaborative research and open access, the Report describes and explains current practices and attitudes towards data sharing. . . . Often these practices are informed by international and national policies on access and use, formulated by international organisations and conferences, research funders and research bodies. The Report considers these policies at length and canvasses the development of the open access to research data movement.

Finally, the Report encourages researchers and research organisations to adopt proper management and legal frameworks for research data outputs. . . . The Report describes best practice strategies and mechanisms for organising, preserving and enabling access to and reuse of research data, including data management policies and principles, data management plans and data management toolkits. Proposals are made for further work to be undertaken on data access policies, frameworks, strategies and mechanisms.

Dealing with Data: Roles, Rights, Responsibilities and Relationships

JISC has released its Dealing with Data: Roles, Rights, Responsibilities and Relationships: Consultancy Report, which was written as part of its Digital Repositories Programme’s Data Cluster Consultancy.

Here’s an excerpt from the Executive Summary:

This Report explores the roles, rights, responsibilities and relationships of institutions, data centres and other key stakeholders who work with data. It concentrates primarily on the UK scene with some reference to other relevant experience and opinion, and is framed as "a snapshot" of a relatively fast-moving field. . . .

The Report is largely based on two methodological approaches: a consultation workshop and a number of semi-structured interviews with stakeholder representatives.

It is set within the context of the burgeoning "data deluge" emanating from e-Science applications, increasing momentum behind open access policy drivers for data, and developments to define requirements for a co-ordinated e-infrastructure for the UK. The diversity and complexity of data are acknowledged, and developing typologies are referenced.

Report on Chemistry Teaching/Research Data and Institutional Repositories

The JISC-funded SPECTRa project has released Project SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data): JISC Final Report, March 2007.

Here’s an excerpt from the Executive Summary:

Project SPECTRa’s principal aim was to facilitate the high-volume ingest and subsequent reuse of experimental data via institutional repositories, using the DSpace platform, by developing Open Source software tools which could easily be incorporated within chemists’ workflows. It focussed on three distinct areas of chemistry research—synthetic organic chemistry, crystallography and computational chemistry.

SPECTRa was funded by JISC’s Digital Repositories Programme as a joint project between the libraries and chemistry departments of the University of Cambridge and Imperial College London, in collaboration with the eBank UK project. . . .

Surveys of chemists at Imperial and Cambridge investigated their current use of computers and the Internet and identified specific data needs. The survey’s main conclusions were:

  • Much data is not stored electronically (e.g. lab books, paper copies of spectra)
  • A complex list of data file formats (particularly proprietary binary formats) being used
  • A significant ignorance of digital repositories
  • A requirement for restricted access to deposited experimental data

Distributable software tool development using Open Source code was undertaken to facilitate deposition into a repository, guided by interviews with key researchers. The project has provided tools which allow for the preservation aspects of data reuse. All legacy chemical file formats are converted to the appropriate Chemical Markup Language scheme to enable automatic data validation, metadata creation and long-term preservation needs. . . .

The deposition process adopted the concept of an "embargo repository" allowing unpublished or commercially sensitive material, identified through metadata, to be retained in a closed access environment until the data owner approved its release. . . .

Among the project’s findings were the following:

  • it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organisational capability of digital repositories;
  • scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;
  • the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;
  • institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;
  • IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.

Position Papers from the NSF/JISC Repositories Workshop

Position papers from the NSF/JISC Repositories Workshop are now available.

Here’s an excerpt from the Workshop’s Welcome and Themes page:

Here is some background information. A series of recent studies and reports have highlighted the ever-growing importance for all academic fields of data and information in digital formats. Studies have looked at digital information in science and in the humanities; at the role of data in Cyberinfrastructure; at repositories for large-scale digital libraries; and at the challenges of archiving and preservation of digital information. The goal of this workshop is to unite these separate studies. The NSF and JISC share two principal objectives: to develop a road map for research over the next ten years and what to support in the near term.

Here are the position papers:

Friday’s OAI5 Presentations

Presentations from Friday’s sessions of the 5th Workshop on Innovations in Scholarly Communication in Geneva are now available.

Here are a few highlights from this major conference:

  • Doctoral e-Theses; Experiences in Harvesting on a National and European Level (PowerPoint): "In the presentation we will show some lessons learned and the first results of the Demonstrator, an interoperable portal of European doctoral e-theses in five countries: Denmark, Germany, the Netherlands, Sweden and the UK."
  • Exploring Overlay Journals: The RIOJA project (PowerPoint): "This presentation introduces the RIOJA (Repository Interface to Overlaid Journal Archives) project, on which a group of cosmology researchers from the UK is working with UCL Library Services and Cornell University. The project is creating a tool to support the overlay of journals onto repositories, and will demonstrate a cosmology journal overlaid on top of arXiv."
  • Dissemination or Publication? Some Consequences from Smudging the Boundaries between Research Data and Research Papers (PDF): "Project StORe’s repository middleware will enable researchers to move seamlessly between the research data environment and its outputs, passing directly from an electronic article to the data from which it was developed, or linking instantly to all the publications that have resulted from a particular research dataset."
  • Open Archives, The Expectations of the Scientific Communities (RealVideo): "This analysis led the French CNRS to start the Hal project, a pluridisciplinary open archive strongly inspired by ArXiv, and directly connected to it. Hal actually automatically transfers data and documents to ArXiv for the relevant disciplins; similarly, it is connected to Pum Med and Pub Med Central for life sciences. Hal is customizable so that institutions can build their own portal within Hal, which then plays the role of an institutional archive (examples are INRIA, INSERM, ENS Lyon, and others)."

(You may want to download PowerPoint Viewer 2007 if you don’t have PowerPoint 2007).

Report on Sharing and Re-Use of Geospatial Data in Repositories

The GRADE project has released a report titled Designing a Licensing Strategy for Sharing and Re-Use of Geospatial Data in the Academic Sector.

The JISC-REPOSITORIES announcement indicates that the report presents "a licensing strategy for the sharing and re-use of geospatial data within the UK research and education sector," and that it "puts forward a conceptual framework for resolving those described rights management issues raised in relation to repositories."

Here is an excerpt from the report that describes it further:

Geospatial material created in the education sector can be highly complex, incorporating data created elsewhere either as found, or customised to fit the particular need of the academic or lecturer. The downstream rights can become very complex, as it is necessary to ensure that permissions have been gained to reuse or repurpose the data, and it is usually essential that correct attribution is made. There are currently concerns and confusion over the assertion of IPR and copyright of created geospatial data particularly where third party data are included.

This report considers a licensing strategy for the sharing and re-use of geospatial data within the UK research and education sector.