Cross-Disciplinary Data Tools Development: Cornell Establishes DISCOVER Research Service Group

Cornell University has launched its DISCOVER Research Service Group to support its data-driven science efforts.

Here's an excerpt from the press release:

Cornell University announced today the establishment of the DISCOVER Research Service Group (DRSG) to facilitate data-driven science at Cornell by developing cross-disciplinary data archival and discovery tools. DISCOVER will conduct pilot projects in selected strategic areas such as the development of data discovery portals using access-layer protocols now under development at Fedora Commons and the Virtual Observatory. . . .

Cornell's Department of Astronomy and the University Library, in partnership with the Cornell Center for Advanced Computing, will work closely with DISCOVER, which is comprised of research groups from multiple disciplines and core data management and curation staff. . . .

The overarching goal of the DISCOVER Research Service Group is to provide accessible paths for the curation, preservation, and mining of scientific data. Systems are needed to make data sets accessible physically over both space (over a wide network) and time (for the indefinite future) and also transparently, using modern Web-based tools that are expected to evolve.

CERN’s Grid: 100,000 Processors at 140 Scientific Institutions

The Worldwide Large Hadron Collider Computing Grid consortium’s grid is ready to process an anticipated 15 million gigabytes per year of data from the collider. It’s composed of 100,000 processors distributed among 140 scientific institutions.

Read more about it at “CERN Officially Unveils Its Grid: 100,000 Processors, 15 Petabytes a Year” and “The Grid Powers Up to Save Lives and Seek the God Particle.”

NISO Holds Final Thought Leader Meeting on Research Data

NISO (the National Information Standards Organization) has held its final Thought Leader meeting on the topic of research data. A short summary of the meeting is available at “NISO Brings Together Data Thought Leaders.”

Earlier this year, NISO held Thought Leader meetings on institutional repositories, digital library and collections, and e-learning and course management systems. Final reports are available for the institutional repositories and digital library and collections meetings.

Open Access to and Reuse of Research Data—The State of the Art in Finland

The Finnish Social Science Data Archive has published Open Access to and Reuse of Research Data—The State of the Art in Finland.

Here's an excerpt:

In 2006, the Ministry of Education in Finland allocated resources to the Finnish Social Science Data Archive (FSD) to chart national and international practices related to open access to research data. Consequently, the FSD carried out an online survey targeting professors of human sciences, social sciences and behavioural sciences in Finnish universities. Some respondents were senior staff at research institutes. The respondents were asked about the state and use of data collected in their department/institute. Almost half of the respondents considered the preservation and use of digital research data to be relevant to their department. The number of respondents (150) is large enough to warrant statistical analysis even though response rate was low at 28%.

National Research Data Management for the UK: UKRDS Interim Report Released

The UK Research Data Service has released the UKRDS Interim Report.

The report recommends adopting a "Hybrid/Umbrella" model for managing research data in the UK. Here's an excerpt:

In this model ["Hybrid/Umbrella"], UKRDS acts as an umbrella organisation, representing the interests of many UK data repositories, both those based around single institutions and those based on storage for a single discipline. Such an organisation would be well-placed to act as a mediator, as a standards-setting body and as source of information about data archiving and repositories, perhaps in a similar fashion to the Digital Curation Centre (DCC). In time it might become a data repository in its own right or take on other functions as required. This approach brings the Shared Services model into the current environment of grid computing and cloud-based data storage, with an emphasis on distributed shared services, rather than centralised shared services. Although there are still risks associated with this model, they are lower than the previous two and more manageable. The exact structure of such an organisation would be dependent on circumstance and would need to take into account the requirements of the member organisations.

Oxford Releases Report on Digital Repository Services for Research Data Management

The Oxford University Office of the Director of IT has released Findings of the Scoping Study Interviews and the Research Data Management Workshop: Scoping Digital Repository Services for Research Data Management.

Here's an excerpt from the report's Web page:

The scoping study interviews aimed to document data management practices from Oxford researchers as well as to capture their requirements for services to help them manage their data more effectively. In order to do this, 37 face-to-face interviews were conducted between May and June with researchers from 27 colleges, departments and faculties. In addition to this, the Research Data Management Workshop was organised to complement the findings of the scoping study interviews.

APSR Releases Investigating Data Management Practices in Australian Universities

The Australian Partnership for Sustainable Repositories has released Investigating Data Management Practices in Australian Universities.

Here an excerpt from the report's Web page:

In late 2007, The University of Queensland undertook a survey of data management practices among the university’s researchers. This was done in response to the increasing realisation that repositories need to include research data, in addition to the research outputs in print form already included, and to provide information which would enhance the support provided for those engaged in eResearch.

The survey was carried out using the Apollo software developed at The Australian National University and adapted by APSR. Two other universities, The University of Melbourne and the Queensland University of Technology, have now replicated the survey among their own communities, while adding some questions of local interest.

The survey covers questions such as the types of digital data being created (spreadsheets, documents, experimental data, images, fieldwork data, etc), the size of the data collection, software used for data analysis, data storage and backup, application of a data management plan, roles and responsibilities around data management, copyright frameworks, usage of high capacity computing, and much more.

Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library

Cornell University Library's Data Working Group has deposited its Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library report in the eCommons@Cornell repository.

Here's the abstract:

Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, are giving rise to new modes of conducting research. Infrastructure to promote and support the curation of digital research data is not yet fully-developed in all research disciplines, scales, and contexts. Organizations of all kinds are examining and staking out their potential roles in the areas of cyberinfrastructure development, data-driven scholarship, and data curation. The purpose of the Cornell University Library's (CUL) Data Working Group (DaWG) is to exchange information about CUL activities related to data curation, to review and exchange information about developments and activities in data curation in general, and to consider and recommend strategic opportunities for CUL to engage in the area of data curation. This white paper aims to fulfill this last element of the DaWG's charge.

Survey of Canadian and International Data Management Initiatives Released

The Canadian Association of Research Libraries (CARL) has released Survey of Canadian and International Data Management Initiatives.

Here's an excerpt from the "Introduction":

Research libraries have a role to play in this emerging data-intensive environment. A 2007 CARL survey found that most CARL members are interested in managing research data, but few have a formal data archiving policy. CARL has formed a Research Data Management Working Group to assist members in collecting, organizing, preserving and providing access to the research data and to formulate a cooperative approach for CARL.

The purpose of this report is to provide an overview of the types of data management activities being undertaken in Canada and internationally. This review documents the various options available for libraries, and will pave the way for a more detailed investigation by the Working Group of the potential roles for libraries.

RIN Publishes To Share or not to Share: Publication and Quality Assurance of Research Data Outputs

The Research Information Network has published To Share or not to Share: Publication and Quality Assurance of Research Data Outputs. The report has a separate Annex file.

This report presents the findings from a study of whether or not researchers do in fact make their research data available to others, and the issues they encounter when doing so. The study is set in a context where the amount of digital data being created and gathered by researchers is increasing rapidly; and there is a growing recognition by researchers, their employers and their funders of the potential value in making new data available for sharing, and in curating them for re-use in the long term.

Presentations from the Open Access Collections Workshop Now Available

Presentations from the Australian Partnership for Sustainable Repositories' Open Access Collections workshop are now available. Presentations are in HTML/PDF, MP3, and digital video formats. The workshop was held in association with the Queensland University Libraries Office of Cooperation and the University of Queensland Library.

Dealing with Research Data in a Federated Digital Repository: Oxford University Planning Document Released

The Oxford e-Research Centre has released Scoping Digital Repository Services for Research Data Management, a project plan for determining the requirements for handling data in a federated digital repository at Oxford University.

Here's an excerpt from the "Aims and Objectives" section:

Objectives:

  • Capture and document researchers’ requirements for digital repository services to handle research data.
  • Participate actively in the development of an interoperability framework for the federated digital repository at Oxford.
  • Make recommendations to improve and coordinate the provision of digital repository services for research data.
  • Initiate and develop collaborations with the different repository activities already occurring to ensure that communication takes place in between them.
  • Raise awareness at Oxford of the importance and advantages of the active management of research data.
  • Communicate significant national and international developments in repositories to relevant Oxford stakeholders, in order to stimulate the adoption of best practices.

Repository Presentations from the DataShare Project

The DataShare project has released two recent presentations about its activities: "Data Documentation Initiative (DDI)" and "Guidelines and Tools for Repository Planning and Assessment." A recent briefing paper, The Data Documentation Initiative (DDI) and Institutional Repositories, is also available.

Here's a description of the DataShare project from its home page:

DISC-UK DataShare, led by EDINA, arises from an existing UK consortium of data support professionals working in departments and academic libraries in universities (Data Information Specialists Committee-UK), and builds on an international network with a tradition of data sharing and data archiving dating back to the 1960s in the social sciences. By working together across four universities and internally with colleagues already engaged in managing open access repositories for e-prints, this partnership will introduce and test a new model of data sharing and archiving to UK research institutions. By supporting academics within the four partner institutions who wish to share datasets on which written research outputs are based, this network of institution-based data repositories develops a niche model for deposit of 'orphaned datasets' currently filled neither by centralised subject-domain data archives/centres/grids nor by e-print based institutional repositories (IRs).

DISC-UK Report on Web 2.0 Data Visualization Tools

JISC has released DISC-UK DataShare: Web 2.0 Data Visualisation Tools: Part 1—Numeric Data.

Here's an excerpt from the "Introduction":

Part 1 of this briefing paper will highlight some examples of new collaborative web services using Web 2.0 technologies which venture into the numeric data visualisation arena. These mashups allow researchers to upload and analyse their own data in ‘open’ and dynamic environments. Broadly speaking the numeric data being referred to could be micro-data (data about the individual), macro-data2 or country-level data, derived or summary data.

Stewardship of Digital Research Data: A Framework of Principles and Guidelines

The Research Information Network (RIN) has published Stewardship of Digital Research Data: A Framework of Principles and Guidelines: Responsibilities of Research Institutions and Funders, Data Managers, Learned Societies and Publishers.

Here's an excerpt from the Web page describing the document:

Research data are an increasingly important and expensive output of the scholarly research process, across all disciplines. . . . But we shall realise the value of data only if we move beyond research policies, practices and support systems developed in a different era. We need new approaches to managing and providing access to research data.

In order to address these issues, the RIN established a group to produce a framework of key principles and guidelines, and we consulted on a draft document in 2007. The framework is founded on the fundamental policy objective that ideas and knowledge, including data, derived from publicly-funded research should be made available for public use, interrogation, and scrutiny, as widely, rapidly and effectively as practicable. . . .

The framework is structured around five broad principles which provide a guide to the development of policy and practice for a range of key players: universities, research institutions, libraries and other information providers, publishers, and research funders as well as researchers themselves. Each of these principles serves as a basis for a series of questions which serve a practical purpose by pointing to how the various players might address the challenges of effective data stewardship.

Towards the Australian Data Commons: A Proposal for an Australian National Data Service

The Australian eResearch Infrastructure Council has released Towards the Australian Data Commons: A Proposal for an Australian National Data Service.

Here's an excerpt from the "Overview":

This paper is designed to encourage, inform and ultimately summarise the discussions around the appropriate strategic and technical descriptions of the Australian National Data Service; to fill in the outline in the Platforms for Collaboration investment plan.

To do so, the paper:

  • introduces the Australian National Data Service (ANDS) and the driving forces behind its creation;
  • provides a rationale for the services that ANDS will provide, and the programs through which the services will be offered; and
  • describes in detail the ANDS programs.

Part One (Background) provides a brief summary of the reasons to focus on data management, as well as an overview of ANDS, and identifies some issues associated with implementation.

Part Two (Rationale) sets out the systemic issues associated with achieving a research data commons, and provides the resultant rationale for the services that ANDS will offer the programs that they will be delivered through.

Part Three (Detailed Descriptions of ANDS Programs) sets out in detail the Aim, Focus, Service Beneficiaries, Products and Community Engagement activities for each of the ANDS Programs.

Digital Library Federation Forum for NSF DataNet Grant Proposals

The Digital Library Federation has established a forum for those who want to collaborate or get further information about the NSF's Sustainable Digital Data Preservation and Access Network Partners (DataNet) grant program. Participation in the forum is open, but registration is required.

Podcasts about the Long-Term Use of Research Data

Podcasts about the Long-Term Use of Research Data

The Australian Partnership for Sustainable Repositories has released MP3 and PDF files from its Long-lived Collections: The Future of Australia's Research Data Presentations symposium.

Here are selected MP3 files:

NSF Solicits Grant Proposals for up to $20 Million for Dataset Access and Preservation

National Science Foundation's Office of Cyberinfrastructure has announced the availability of grants to U.S. academic institutions under its Sustainable Digital Data Preservation and Access Network Partners (DataNet) program.

Here's an excerpt from the solicitation:

Science and engineering research and education are increasingly digital and increasingly data-intensive. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

The new types of organizations envisioned in this solicitation will integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise to:

  • provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
  • continuously anticipate and adapt to changes in technologies and in user needs and expectations;
  • engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and
  • serve as component elements of an interoperable data preservation and access network.

By demonstrating feasibility, identifying best practices, establishing viable models for long term technical and economic sustainability, and incorporating frontier research, these exemplar organizations can serve as the basis for rational investment in digital preservation and access by diverse sectors of society at the local, regional, national, and international levels, paving the way for a robust and resilient national and global digital data framework.

These organizations will provide:

  • a vision and rationale that meet critical data needs, create important new opportunities and capabilities for discovery, innovation, and learning, improve the way science and engineering research and education are conducted, and guide the organization in achieving long-term sustainability;
  • an organizational structure that provides for a comprehensive range of expertise and cyberinfrastructure capabilities, ensures active participation and effective use by a wide diversity of individuals, organizations, and sectors, serves as a capable partner in an interoperable network of digital preservation and access organizations, and ensures effective management and leadership; and
  • activities to provide for the full data management life cycle, facilitate research as resource and object, engage in computer science and information science research critical to DataNet functions, develop new tools and capabilities for learning that integrate research and education at all levels, provide for active community input and participation in all phases and all aspects of Partner activities, and include a vigorous and comprehensive assessment and evaluation program.

Potential applicants should note that this program is not intended to support narrowly-defined, discipline-specific repositories. . . .

Award Information

Anticipated Type of Award: Cooperative Agreement

Estimated Number of Awards: 5 — Two to three awards are anticipated in each of two review cycles (one review cycle for fiscal year FY2008 awards and one for FY2009) for a total of five awards, contingent on the quality of proposals received and pending the availability of funds. Each award is limited to a total of up to $20,000,000 (direct plus indirect costs) for up to 5 years. The initial term of each award is expected to be 5 years with the potential at NSF's sole discretion for one terminal renewal for another 5 years, subject to performance and the availability of funds. Such performance is to include serving the needs of the relevant science and engineering research and education communities and catalyzing new opportunities for progress. If a second five-year award is made, NSF funding is expected to decrease in each successive year of the award as the Partner transitions to a sustainable economic model with other sources of support. The actual amount of the annual decrease in NSF support will be established through the cooperative agreement. Note that the maximum period NSF will support a DataNet Partner is 10 years.

Anticipated Funding Amount: $100,000,000 — Up to $100,000,000 over a five year period is expected to be available contingent on the quality of proposals received and pending the availability of funds.