Open Access to and Reuse of Research Data—The State of the Art in Finland

The Finnish Social Science Data Archive has published Open Access to and Reuse of Research Data—The State of the Art in Finland.

Here's an excerpt:

In 2006, the Ministry of Education in Finland allocated resources to the Finnish Social Science Data Archive (FSD) to chart national and international practices related to open access to research data. Consequently, the FSD carried out an online survey targeting professors of human sciences, social sciences and behavioural sciences in Finnish universities. Some respondents were senior staff at research institutes. The respondents were asked about the state and use of data collected in their department/institute. Almost half of the respondents considered the preservation and use of digital research data to be relevant to their department. The number of respondents (150) is large enough to warrant statistical analysis even though response rate was low at 28%.

National Research Data Management for the UK: UKRDS Interim Report Released

The UK Research Data Service has released the UKRDS Interim Report.

The report recommends adopting a "Hybrid/Umbrella" model for managing research data in the UK. Here's an excerpt:

In this model ["Hybrid/Umbrella"], UKRDS acts as an umbrella organisation, representing the interests of many UK data repositories, both those based around single institutions and those based on storage for a single discipline. Such an organisation would be well-placed to act as a mediator, as a standards-setting body and as source of information about data archiving and repositories, perhaps in a similar fashion to the Digital Curation Centre (DCC). In time it might become a data repository in its own right or take on other functions as required. This approach brings the Shared Services model into the current environment of grid computing and cloud-based data storage, with an emphasis on distributed shared services, rather than centralised shared services. Although there are still risks associated with this model, they are lower than the previous two and more manageable. The exact structure of such an organisation would be dependent on circumstance and would need to take into account the requirements of the member organisations.

Oxford Releases Report on Digital Repository Services for Research Data Management

The Oxford University Office of the Director of IT has released Findings of the Scoping Study Interviews and the Research Data Management Workshop: Scoping Digital Repository Services for Research Data Management.

Here's an excerpt from the report's Web page:

The scoping study interviews aimed to document data management practices from Oxford researchers as well as to capture their requirements for services to help them manage their data more effectively. In order to do this, 37 face-to-face interviews were conducted between May and June with researchers from 27 colleges, departments and faculties. In addition to this, the Research Data Management Workshop was organised to complement the findings of the scoping study interviews.

APSR Releases Investigating Data Management Practices in Australian Universities

The Australian Partnership for Sustainable Repositories has released Investigating Data Management Practices in Australian Universities.

Here an excerpt from the report's Web page:

In late 2007, The University of Queensland undertook a survey of data management practices among the university’s researchers. This was done in response to the increasing realisation that repositories need to include research data, in addition to the research outputs in print form already included, and to provide information which would enhance the support provided for those engaged in eResearch.

The survey was carried out using the Apollo software developed at The Australian National University and adapted by APSR. Two other universities, The University of Melbourne and the Queensland University of Technology, have now replicated the survey among their own communities, while adding some questions of local interest.

The survey covers questions such as the types of digital data being created (spreadsheets, documents, experimental data, images, fieldwork data, etc), the size of the data collection, software used for data analysis, data storage and backup, application of a data management plan, roles and responsibilities around data management, copyright frameworks, usage of high capacity computing, and much more.

Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library

Cornell University Library's Data Working Group has deposited its Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library report in the eCommons@Cornell repository.

Here's the abstract:

Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, are giving rise to new modes of conducting research. Infrastructure to promote and support the curation of digital research data is not yet fully-developed in all research disciplines, scales, and contexts. Organizations of all kinds are examining and staking out their potential roles in the areas of cyberinfrastructure development, data-driven scholarship, and data curation. The purpose of the Cornell University Library's (CUL) Data Working Group (DaWG) is to exchange information about CUL activities related to data curation, to review and exchange information about developments and activities in data curation in general, and to consider and recommend strategic opportunities for CUL to engage in the area of data curation. This white paper aims to fulfill this last element of the DaWG's charge.

Survey of Canadian and International Data Management Initiatives Released

The Canadian Association of Research Libraries (CARL) has released Survey of Canadian and International Data Management Initiatives.

Here's an excerpt from the "Introduction":

Research libraries have a role to play in this emerging data-intensive environment. A 2007 CARL survey found that most CARL members are interested in managing research data, but few have a formal data archiving policy. CARL has formed a Research Data Management Working Group to assist members in collecting, organizing, preserving and providing access to the research data and to formulate a cooperative approach for CARL.

The purpose of this report is to provide an overview of the types of data management activities being undertaken in Canada and internationally. This review documents the various options available for libraries, and will pave the way for a more detailed investigation by the Working Group of the potential roles for libraries.

RIN Publishes To Share or not to Share: Publication and Quality Assurance of Research Data Outputs

The Research Information Network has published To Share or not to Share: Publication and Quality Assurance of Research Data Outputs. The report has a separate Annex file.

This report presents the findings from a study of whether or not researchers do in fact make their research data available to others, and the issues they encounter when doing so. The study is set in a context where the amount of digital data being created and gathered by researchers is increasing rapidly; and there is a growing recognition by researchers, their employers and their funders of the potential value in making new data available for sharing, and in curating them for re-use in the long term.

Presentations from the Open Access Collections Workshop Now Available

Presentations from the Australian Partnership for Sustainable Repositories' Open Access Collections workshop are now available. Presentations are in HTML/PDF, MP3, and digital video formats. The workshop was held in association with the Queensland University Libraries Office of Cooperation and the University of Queensland Library.

Dealing with Research Data in a Federated Digital Repository: Oxford University Planning Document Released

The Oxford e-Research Centre has released Scoping Digital Repository Services for Research Data Management, a project plan for determining the requirements for handling data in a federated digital repository at Oxford University.

Here's an excerpt from the "Aims and Objectives" section:

Objectives:

  • Capture and document researchers’ requirements for digital repository services to handle research data.
  • Participate actively in the development of an interoperability framework for the federated digital repository at Oxford.
  • Make recommendations to improve and coordinate the provision of digital repository services for research data.
  • Initiate and develop collaborations with the different repository activities already occurring to ensure that communication takes place in between them.
  • Raise awareness at Oxford of the importance and advantages of the active management of research data.
  • Communicate significant national and international developments in repositories to relevant Oxford stakeholders, in order to stimulate the adoption of best practices.

Repository Presentations from the DataShare Project

The DataShare project has released two recent presentations about its activities: "Data Documentation Initiative (DDI)" and "Guidelines and Tools for Repository Planning and Assessment." A recent briefing paper, The Data Documentation Initiative (DDI) and Institutional Repositories, is also available.

Here's a description of the DataShare project from its home page:

DISC-UK DataShare, led by EDINA, arises from an existing UK consortium of data support professionals working in departments and academic libraries in universities (Data Information Specialists Committee-UK), and builds on an international network with a tradition of data sharing and data archiving dating back to the 1960s in the social sciences. By working together across four universities and internally with colleagues already engaged in managing open access repositories for e-prints, this partnership will introduce and test a new model of data sharing and archiving to UK research institutions. By supporting academics within the four partner institutions who wish to share datasets on which written research outputs are based, this network of institution-based data repositories develops a niche model for deposit of 'orphaned datasets' currently filled neither by centralised subject-domain data archives/centres/grids nor by e-print based institutional repositories (IRs).

DISC-UK Report on Web 2.0 Data Visualization Tools

JISC has released DISC-UK DataShare: Web 2.0 Data Visualisation Tools: Part 1—Numeric Data.

Here's an excerpt from the "Introduction":

Part 1 of this briefing paper will highlight some examples of new collaborative web services using Web 2.0 technologies which venture into the numeric data visualisation arena. These mashups allow researchers to upload and analyse their own data in ‘open’ and dynamic environments. Broadly speaking the numeric data being referred to could be micro-data (data about the individual), macro-data2 or country-level data, derived or summary data.

Stewardship of Digital Research Data: A Framework of Principles and Guidelines

The Research Information Network (RIN) has published Stewardship of Digital Research Data: A Framework of Principles and Guidelines: Responsibilities of Research Institutions and Funders, Data Managers, Learned Societies and Publishers.

Here's an excerpt from the Web page describing the document:

Research data are an increasingly important and expensive output of the scholarly research process, across all disciplines. . . . But we shall realise the value of data only if we move beyond research policies, practices and support systems developed in a different era. We need new approaches to managing and providing access to research data.

In order to address these issues, the RIN established a group to produce a framework of key principles and guidelines, and we consulted on a draft document in 2007. The framework is founded on the fundamental policy objective that ideas and knowledge, including data, derived from publicly-funded research should be made available for public use, interrogation, and scrutiny, as widely, rapidly and effectively as practicable. . . .

The framework is structured around five broad principles which provide a guide to the development of policy and practice for a range of key players: universities, research institutions, libraries and other information providers, publishers, and research funders as well as researchers themselves. Each of these principles serves as a basis for a series of questions which serve a practical purpose by pointing to how the various players might address the challenges of effective data stewardship.

Towards the Australian Data Commons: A Proposal for an Australian National Data Service

The Australian eResearch Infrastructure Council has released Towards the Australian Data Commons: A Proposal for an Australian National Data Service.

Here's an excerpt from the "Overview":

This paper is designed to encourage, inform and ultimately summarise the discussions around the appropriate strategic and technical descriptions of the Australian National Data Service; to fill in the outline in the Platforms for Collaboration investment plan.

To do so, the paper:

  • introduces the Australian National Data Service (ANDS) and the driving forces behind its creation;
  • provides a rationale for the services that ANDS will provide, and the programs through which the services will be offered; and
  • describes in detail the ANDS programs.

Part One (Background) provides a brief summary of the reasons to focus on data management, as well as an overview of ANDS, and identifies some issues associated with implementation.

Part Two (Rationale) sets out the systemic issues associated with achieving a research data commons, and provides the resultant rationale for the services that ANDS will offer the programs that they will be delivered through.

Part Three (Detailed Descriptions of ANDS Programs) sets out in detail the Aim, Focus, Service Beneficiaries, Products and Community Engagement activities for each of the ANDS Programs.

Digital Library Federation Forum for NSF DataNet Grant Proposals

The Digital Library Federation has established a forum for those who want to collaborate or get further information about the NSF's Sustainable Digital Data Preservation and Access Network Partners (DataNet) grant program. Participation in the forum is open, but registration is required.

Podcasts about the Long-Term Use of Research Data

Podcasts about the Long-Term Use of Research Data

The Australian Partnership for Sustainable Repositories has released MP3 and PDF files from its Long-lived Collections: The Future of Australia's Research Data Presentations symposium.

Here are selected MP3 files:

NSF Solicits Grant Proposals for up to $20 Million for Dataset Access and Preservation

National Science Foundation's Office of Cyberinfrastructure has announced the availability of grants to U.S. academic institutions under its Sustainable Digital Data Preservation and Access Network Partners (DataNet) program.

Here's an excerpt from the solicitation:

Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

The new types of organizations envisioned in this solicitation will integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise to:

  • provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
  • continuously anticipate and adapt to changes in technologies and in user needs and expectations;
  • engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and
  • serve as component elements of an interoperable data preservation and access network.

By demonstrating feasibility, identifying best practices, establishing viable models for long term technical and economic sustainability, and incorporating frontier research, these exemplar organizations can serve as the basis for rational investment in digital preservation and access by diverse sectors of society at the local, regional, national, and international levels, paving the way for a robust and resilient national and global digital data framework.

These organizations will provide:

  • a vision and rationale that meet critical data needs, create important new opportunities and capabilities for discovery, innovation, and learning, improve the way science and engineering research and education are conducted, and guide the organization in achieving long-term sustainability;
  • an organizational structure that provides for a comprehensive range of expertise and cyberinfrastructure capabilities, ensures active participation and effective use by a wide diversity of individuals, organizations, and sectors, serves as a capable partner in an interoperable network of digital preservation and access organizations, and ensures effective management and leadership; and
  • activities to provide for the full data management life cycle, facilitate research as resource and object, engage in computer science and information science research critical to DataNet functions, develop new tools and capabilities for learning that integrate research and education at all levels, provide for active community input and participation in all phases and all aspects of Partner activities, and include a vigorous and comprehensive assessment and evaluation program.

Potential applicants should note that this program is not intended to support narrowly-defined, discipline-specific repositories. . . .

Award Information

Anticipated Type of Award: Cooperative Agreement

Estimated Number of Awards: 5 — Two to three awards are anticipated in each of two review cycles (one review cycle for fiscal year FY2008 awards and one for FY2009) for a total of five awards, contingent on the quality of proposals received and pending the availability of funds. Each award is limited to a total of up to $20,000,000 (direct plus indirect costs) for up to 5 years. The initial term of each award is expected to be 5 years with the potential at NSF's sole discretion for one terminal renewal for another 5 years, subject to performance and the availability of funds. Such performance is to include serving the needs of the relevant science and engineering research and education communities and catalyzing new opportunities for progress. If a second five-year award is made, NSF funding is expected to decrease in each successive year of the award as the Partner transitions to a sustainable economic model with other sources of support. The actual amount of the annual decrease in NSF support will be established through the cooperative agreement. Note that the maximum period NSF will support a DataNet Partner is 10 years.

Anticipated Funding Amount: $100,000,000 — Up to $100,000,000 over a five year period is expected to be available contingent on the quality of proposals received and pending the availability of funds.

Legal Aspects of Data Access and Reuse in Collaborative Research

The Open Access to Knowledge Law Project and the Legal Framework for e-Research Project have released Building the Infrastructure for Data Access and Reuse in Collaborative Research: An Analysis of the Legal Context.

Here's an excerpt from the "Executive Summary":

This Report examines the broad legal framework within which research data is generated, managed, disseminated and used. The background to the Report is the growing support for systems that enable research data generated in publicly-funded research projects to be made available for access and use by others in the research community.

The Report provides an overview of the operation of copyright law, contract and confidentiality laws, as well as a range of legislation—privacy, public records and freedom of information legislation, etc—that is of relevance to research data. The Report considers how these legal rules apply to define rights in research data and regulate the generation, management and sharing of data. In any given research project there will be a multitude of different parties with varying interests. . . The Report examines the relationships between these parties and the legal arrangements that must be implemented to ensure that research data is properly and effectively managed, so that it can be accessed and used by other researchers.

Important in the context of collaborative research and open access, the Report describes and explains current practices and attitudes towards data sharing. . . . Often these practices are informed by international and national policies on access and use, formulated by international organisations and conferences, research funders and research bodies. The Report considers these policies at length and canvasses the development of the open access to research data movement.

Finally, the Report encourages researchers and research organisations to adopt proper management and legal frameworks for research data outputs. . . . The Report describes best practice strategies and mechanisms for organising, preserving and enabling access to and reuse of research data, including data management policies and principles, data management plans and data management toolkits. Proposals are made for further work to be undertaken on data access policies, frameworks, strategies and mechanisms.

Dealing with Data: Roles, Rights, Responsibilities and Relationships

JISC has released its Dealing with Data: Roles, Rights, Responsibilities and Relationships: Consultancy Report, which was written as part of its Digital Repositories Programme’s Data Cluster Consultancy.

Here’s an excerpt from the Executive Summary:

This Report explores the roles, rights, responsibilities and relationships of institutions, data centres and other key stakeholders who work with data. It concentrates primarily on the UK scene with some reference to other relevant experience and opinion, and is framed as "a snapshot" of a relatively fast-moving field. . . .

The Report is largely based on two methodological approaches: a consultation workshop and a number of semi-structured interviews with stakeholder representatives.

It is set within the context of the burgeoning "data deluge" emanating from e-Science applications, increasing momentum behind open access policy drivers for data, and developments to define requirements for a co-ordinated e-infrastructure for the UK. The diversity and complexity of data are acknowledged, and developing typologies are referenced.

Report on Chemistry Teaching/Research Data and Institutional Repositories

The JISC-funded SPECTRa project has released Project SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data): JISC Final Report, March 2007.

Here’s an excerpt from the Executive Summary:

Project SPECTRa’s principal aim was to facilitate the high-volume ingest and subsequent reuse of experimental data via institutional repositories, using the DSpace platform, by developing Open Source software tools which could easily be incorporated within chemists’ workflows. It focussed on three distinct areas of chemistry research—synthetic organic chemistry, crystallography and computational chemistry.

SPECTRa was funded by JISC’s Digital Repositories Programme as a joint project between the libraries and chemistry departments of the University of Cambridge and Imperial College London, in collaboration with the eBank UK project. . . .

Surveys of chemists at Imperial and Cambridge investigated their current use of computers and the Internet and identified specific data needs. The survey’s main conclusions were:

  • Much data is not stored electronically (e.g. lab books, paper copies of spectra)
  • A complex list of data file formats (particularly proprietary binary formats) being used
  • A significant ignorance of digital repositories
  • A requirement for restricted access to deposited experimental data

Distributable software tool development using Open Source code was undertaken to facilitate deposition into a repository, guided by interviews with key researchers. The project has provided tools which allow for the preservation aspects of data reuse. All legacy chemical file formats are converted to the appropriate Chemical Markup Language scheme to enable automatic data validation, metadata creation and long-term preservation needs. . . .

The deposition process adopted the concept of an "embargo repository" allowing unpublished or commercially sensitive material, identified through metadata, to be retained in a closed access environment until the data owner approved its release. . . .

Among the project’s findings were the following:

  • it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organisational capability of digital repositories;
  • scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;
  • the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;
  • institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;
  • IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.