Draft Roadmap for Science Data Infrastructure

PARSE.Insight has released Draft Roadmap for Science Data Infrastructure.

Here's an excerpt from the announcement:

The draft roadmap provides an overview and initial details of a number of specific components, both technical and non-technical, which would be needed to supplement existing and already planned infrastructures for scientific data. The infra-structure components are aimed at bridging the gaps between islands of functionality, developed for particular purposes, often by other European projects. Thus the infrastructure components are intended to play a general, unifying role in scientific data. While developed in the context of a Europe-wide infrastructure, there would be great advantages for these types of infrastructure components to be available much more widely.

DCC Releases "Database Archiving"

The Digital Curation Centre has released a new briefing paper on "Database Archiving."

Here's an excerpt:

Database archiving is usually seen as a subset of data archiving. In a computational context, data archiving means to store electronic documents, data sets, multimedia files, and so on, for a period of time. The primary goal is to maintain the data in case it is later requested for some particular purpose. Complying with government regulations on data preservation are for example a main driver behind data archiving efforts. Database archiving focuses on archiving data that are maintained under the control of a database management system and structured under a database schema, e.g., a relational database.

Rufus Pollock on Open Data and Licensing

In "Open Data Openness and Licensing," Rufus Pollock, a Cambridge University economist, tackles the question of whether open research data should be licensed.

Here's an excerpt:

Over the last couple of years there has been substantial discussion about the licensing (or not) of (open) data and what "open" should mean. In this debate there two distinct, but related, strands:

  1. Some people have argued that licensing is inappropriate (or unnecessary) for data.
  2. Disagreement about what "open" should mean. Specifically: does openness allow for attribution and share-alike "requirements" or should "open" data mean "public domain" data?

These points are related because arguments for the inappropriateness of licensing data usually go along the lines: data equates to facts over which no monopoly IP rights can or should be granted; as such all data is automatically in the public domain and hence there is nothing to license (and worse "licensing" amounts to an attempt to "enclose" the public domain).

However, even those who think that open data can/should only be public domain data still agree that it is reasonable and/or necessary to have some set of community "rules" or "norms" governing usage of data. Therefore, the question of what requirements should be allowed for "open" data is a common one, whatever one"s stance on the PD question.

ARL Report: Current Models of Digital Scholarly Communication

The Association of Research Libraries has released Current Models of Digital Scholarly Communication by Nancy L. Maron and K. Kirby Smith, plus a database of associated examples.

Here's an excerpt from the press release:

In the spring of 2008, ARL engaged Ithaka’s Strategic Services Group to conduct an investigation into the range of online resources valued by scholars, paying special attention to those projects that are pushing beyond the boundaries of traditional formats and are considered innovative by the faculty who use them. The networked digital environment has enabled the creation of many new kinds of works, and many of these resources have become essential tools for scholars conducting research, building scholarly networks, and disseminating their ideas and work, but the decentralized distribution of these new-model works has made it difficult to fully appreciate their scope and number.

Ithaka’s findings are based on a collection of resources identified by a volunteer field team of over 300 librarians at 46 academic institutions in the US and Canada. Field librarians talked with faculty members on their campuses about the digital scholarly resources they find most useful and reported the works they identified. The authors evaluated each resource gathered by the field team and conducted interviews of project leaders of 11 representative resources. Ultimately, 206 unique digital resources spanning eight formats were identified that met the study’s criteria.

The study’s innovative qualitative approach yielded a rich cross-section of today’s state of the art in digital scholarly resources. The report profiles each of the eight genres of resources, including discussion of how and why the faculty members reported using the resources for their work, how content is selected for the site, and what financial sustainability strategies the resources are employing. Each section draws from the in-depth interviews to provide illustrative anecdotes and representative examples.

Highlights from the study’s findings include:

  • While some disciplines seem to lend themselves to certain formats of digital resource more than others, examples of innovative resources can be found across the humanities, social sciences, and scientific/technical/medical subject areas.

  • Of all the resources suggested by faculty, almost every one that contained an original scholarly work operates under some form of peer review or editorial oversight.

  • Some of the resources with greatest impact are those that have been around a long while.

  • While some resources serve very large audiences, many digital publications—capable of running on relatively small budgets—are tailored to small, niche audiences.

  • Innovations relating to multimedia content and Web 2.0 functionality appear in some cases to blur the lines between resource types.

  • Projects of all sizes—especially open-access sites and publications—employ a range of support strategies in the search for financial sustainability.

Presentations from the Oxford Institutional and National Services for Research Data Management Workshop

Presentations from the Institutional and National Services for Research Data Management Workshop at the Oxford Said Business School are now available.

Here's a selection:

Presentations from Reinventing Science Librarianship: Models for the Future

Presentations (usually digital audio and PowerPoint slides) about data curation, e-science, virtual organizations and other topics from the ARL/CNI Fall Forum on Reinventing Science Librarianship: Models for the Future are now available.

Speakers included Sayeed Choudhury, Ron Larsen, Liz Lyon, Richard Luce, and others.

Presentations from eResearch Australasia 2008

Presentations from the eResearch Australasia 2008 conference are available.

Here's a brief selection:

Cross-Disciplinary Data Tools Development: Cornell Establishes DISCOVER Research Service Group

Cornell University has launched its DISCOVER Research Service Group to support its data-driven science efforts.

Here's an excerpt from the press release:

Cornell University announced today the establishment of the DISCOVER Research Service Group (DRSG) to facilitate data-driven science at Cornell by developing cross-disciplinary data archival and discovery tools. DISCOVER will conduct pilot projects in selected strategic areas such as the development of data discovery portals using access-layer protocols now under development at Fedora Commons and the Virtual Observatory. . . .

Cornell's Department of Astronomy and the University Library, in partnership with the Cornell Center for Advanced Computing, will work closely with DISCOVER, which is comprised of research groups from multiple disciplines and core data management and curation staff. . . .

The overarching goal of the DISCOVER Research Service Group is to provide accessible paths for the curation, preservation, and mining of scientific data. Systems are needed to make data sets accessible physically over both space (over a wide network) and time (for the indefinite future) and also transparently, using modern Web-based tools that are expected to evolve.

CERN’s Grid: 100,000 Processors at 140 Scientific Institutions

The Worldwide Large Hadron Collider Computing Grid consortium’s grid is ready to process an anticipated 15 million gigabytes per year of data from the collider. It’s composed of 100,000 processors distributed among 140 scientific institutions.

Read more about it at “CERN Officially Unveils Its Grid: 100,000 Processors, 15 Petabytes a Year” and “The Grid Powers Up to Save Lives and Seek the God Particle.”

NISO Holds Final Thought Leader Meeting on Research Data

NISO (the National Information Standards Organization) has held its final Thought Leader meeting on the topic of research data. A short summary of the meeting is available at “NISO Brings Together Data Thought Leaders.”

Earlier this year, NISO held Thought Leader meetings on institutional repositories, digital library and collections, and e-learning and course management systems. Final reports are available for the institutional repositories and digital library and collections meetings.

Open Access to and Reuse of Research Data—The State of the Art in Finland

The Finnish Social Science Data Archive has published Open Access to and Reuse of Research Data—The State of the Art in Finland.

Here's an excerpt:

In 2006, the Ministry of Education in Finland allocated resources to the Finnish Social Science Data Archive (FSD) to chart national and international practices related to open access to research data. Consequently, the FSD carried out an online survey targeting professors of human sciences, social sciences and behavioural sciences in Finnish universities. Some respondents were senior staff at research institutes. The respondents were asked about the state and use of data collected in their department/institute. Almost half of the respondents considered the preservation and use of digital research data to be relevant to their department. The number of respondents (150) is large enough to warrant statistical analysis even though response rate was low at 28%.

National Research Data Management for the UK: UKRDS Interim Report Released

The UK Research Data Service has released the UKRDS Interim Report.

The report recommends adopting a "Hybrid/Umbrella" model for managing research data in the UK. Here's an excerpt:

In this model ["Hybrid/Umbrella"], UKRDS acts as an umbrella organisation, representing the interests of many UK data repositories, both those based around single institutions and those based on storage for a single discipline. Such an organisation would be well-placed to act as a mediator, as a standards-setting body and as source of information about data archiving and repositories, perhaps in a similar fashion to the Digital Curation Centre (DCC). In time it might become a data repository in its own right or take on other functions as required. This approach brings the Shared Services model into the current environment of grid computing and cloud-based data storage, with an emphasis on distributed shared services, rather than centralised shared services. Although there are still risks associated with this model, they are lower than the previous two and more manageable. The exact structure of such an organisation would be dependent on circumstance and would need to take into account the requirements of the member organisations.

Oxford Releases Report on Digital Repository Services for Research Data Management

The Oxford University Office of the Director of IT has released Findings of the Scoping Study Interviews and the Research Data Management Workshop: Scoping Digital Repository Services for Research Data Management.

Here's an excerpt from the report's Web page:

The scoping study interviews aimed to document data management practices from Oxford researchers as well as to capture their requirements for services to help them manage their data more effectively. In order to do this, 37 face-to-face interviews were conducted between May and June with researchers from 27 colleges, departments and faculties. In addition to this, the Research Data Management Workshop was organised to complement the findings of the scoping study interviews.

APSR Releases Investigating Data Management Practices in Australian Universities

The Australian Partnership for Sustainable Repositories has released Investigating Data Management Practices in Australian Universities.

Here an excerpt from the report's Web page:

In late 2007, The University of Queensland undertook a survey of data management practices among the university’s researchers. This was done in response to the increasing realisation that repositories need to include research data, in addition to the research outputs in print form already included, and to provide information which would enhance the support provided for those engaged in eResearch.

The survey was carried out using the Apollo software developed at The Australian National University and adapted by APSR. Two other universities, The University of Melbourne and the Queensland University of Technology, have now replicated the survey among their own communities, while adding some questions of local interest.

The survey covers questions such as the types of digital data being created (spreadsheets, documents, experimental data, images, fieldwork data, etc), the size of the data collection, software used for data analysis, data storage and backup, application of a data management plan, roles and responsibilities around data management, copyright frameworks, usage of high capacity computing, and much more.

Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library

Cornell University Library's Data Working Group has deposited its Digital Research Data Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library report in the eCommons@Cornell repository.

Here's the abstract:

Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, are giving rise to new modes of conducting research. Infrastructure to promote and support the curation of digital research data is not yet fully-developed in all research disciplines, scales, and contexts. Organizations of all kinds are examining and staking out their potential roles in the areas of cyberinfrastructure development, data-driven scholarship, and data curation. The purpose of the Cornell University Library's (CUL) Data Working Group (DaWG) is to exchange information about CUL activities related to data curation, to review and exchange information about developments and activities in data curation in general, and to consider and recommend strategic opportunities for CUL to engage in the area of data curation. This white paper aims to fulfill this last element of the DaWG's charge.

Survey of Canadian and International Data Management Initiatives Released

The Canadian Association of Research Libraries (CARL) has released Survey of Canadian and International Data Management Initiatives.

Here's an excerpt from the "Introduction":

Research libraries have a role to play in this emerging data-intensive environment. A 2007 CARL survey found that most CARL members are interested in managing research data, but few have a formal data archiving policy. CARL has formed a Research Data Management Working Group to assist members in collecting, organizing, preserving and providing access to the research data and to formulate a cooperative approach for CARL.

The purpose of this report is to provide an overview of the types of data management activities being undertaken in Canada and internationally. This review documents the various options available for libraries, and will pave the way for a more detailed investigation by the Working Group of the potential roles for libraries.

RIN Publishes To Share or not to Share: Publication and Quality Assurance of Research Data Outputs

The Research Information Network has published To Share or not to Share: Publication and Quality Assurance of Research Data Outputs. The report has a separate Annex file.

This report presents the findings from a study of whether or not researchers do in fact make their research data available to others, and the issues they encounter when doing so. The study is set in a context where the amount of digital data being created and gathered by researchers is increasing rapidly; and there is a growing recognition by researchers, their employers and their funders of the potential value in making new data available for sharing, and in curating them for re-use in the long term.

Presentations from the Open Access Collections Workshop Now Available

Presentations from the Australian Partnership for Sustainable Repositories' Open Access Collections workshop are now available. Presentations are in HTML/PDF, MP3, and digital video formats. The workshop was held in association with the Queensland University Libraries Office of Cooperation and the University of Queensland Library.

Dealing with Research Data in a Federated Digital Repository: Oxford University Planning Document Released

The Oxford e-Research Centre has released Scoping Digital Repository Services for Research Data Management, a project plan for determining the requirements for handling data in a federated digital repository at Oxford University.

Here's an excerpt from the "Aims and Objectives" section:


  • Capture and document researchers’ requirements for digital repository services to handle research data.
  • Participate actively in the development of an interoperability framework for the federated digital repository at Oxford.
  • Make recommendations to improve and coordinate the provision of digital repository services for research data.
  • Initiate and develop collaborations with the different repository activities already occurring to ensure that communication takes place in between them.
  • Raise awareness at Oxford of the importance and advantages of the active management of research data.
  • Communicate significant national and international developments in repositories to relevant Oxford stakeholders, in order to stimulate the adoption of best practices.

Repository Presentations from the DataShare Project

The DataShare project has released two recent presentations about its activities: "Data Documentation Initiative (DDI)" and "Guidelines and Tools for Repository Planning and Assessment." A recent briefing paper, The Data Documentation Initiative (DDI) and Institutional Repositories, is also available.

Here's a description of the DataShare project from its home page:

DISC-UK DataShare, led by EDINA, arises from an existing UK consortium of data support professionals working in departments and academic libraries in universities (Data Information Specialists Committee-UK), and builds on an international network with a tradition of data sharing and data archiving dating back to the 1960s in the social sciences. By working together across four universities and internally with colleagues already engaged in managing open access repositories for e-prints, this partnership will introduce and test a new model of data sharing and archiving to UK research institutions. By supporting academics within the four partner institutions who wish to share datasets on which written research outputs are based, this network of institution-based data repositories develops a niche model for deposit of 'orphaned datasets' currently filled neither by centralised subject-domain data archives/centres/grids nor by e-print based institutional repositories (IRs).