Data Curation, Open Data, and Research Data Management – Page 52

How to License Research Data

The Digital Curation Centre has released How to License Research Data.

Here's an excerpt:

This guide will help you decide how to apply a licence to your research data, and which licence would be most suitable. It should provide you with an awareness of why licensing data is important, the impact licences have on future research, and the potential pitfalls to avoid. It concentrates on the UK context, though some aspects apply internationally; it does not, however, provide legal advice. The guide should interest both the principal investigators and researchers responsible for the data, and those who provide access to them through a data centre, repository or archive.

| Digital Scholarship | Digital Scholarship Publications Overview |

Managing Digital Collections: A Collaborative Initiative on the South African Framework

The National Research Foundation has released Managing Digital Collections: A Collaborative Initiative on the South African Framework.

Here's an excerpt:

The objective of this Framework is to provide high-level principles for planning and managing the full digital collection life cycle. It aims to

provide an overview of some of the major components and activities involved in creating good digital collections

provide a sense of the landscape of digital collections management

identify existing resources that support the development of sound local practices

encourage community participation in the ongoing development of best practices for digital collection building

contribute to the benefits of sound data management practices, as well as the goals of data sharing and long term access

introduce data management and curation issues

assist cultural heritage organisations to create and manage complex digital collections

assist funding organisations who wish to encourage and support the development of good digital collections

advocate the use of internationally-created appropriate open community standards to ensure quality and to increase global interoperability for better exchange and re-use of data and digital content.

| Digital Scholarship | Digital Scholarship Publications Overview |

MIT Libraries Awarded $650,000 grant from the Library of Congress for Exhibit 3.0 Project

The MIT Libraries have been awarded a $650,000 grant from the Library of Congress for the Exhibit 3.0 Project.

Here's an excerpt from the press release:

The MIT Libraries has been awarded a $650,000 grant from the Library of Congress for work in collaboration with the MIT Computer Science and Artificial Intelligence Lab (CSAIL) and Zepheira, Inc. on "Exhibit 3.0," a new project to redesign and expand upon Exhibit, the popular open source software tool for searching, browsing and visualizing data on the Web. The goal is to provide libraries, cultural institutions and other organizations grappling with large amounts of digital content, with an enhanced tool that is scalable and useful for data management, visualization and navigation. According to the Library of Congress, "It is the Library's intent that this work also will further contribute to the collaborative knowledge sharing among the broader communities concerned about the critical infrastructure that will ensure sustainability and accessibility of digital content over time."

"This innovative work has already made a considerable impact on digital content communities whose data is diverse and complex. The visualizations bring new understanding to users and curators alike," said Martha Anderson, Director of the National Digital Information Infrastructure and Preservation Program at the Library of Congress. "We're extremely fortunate to have the support of the Library of Congress on this important research," said Ann Wolpert, director of the MIT Libraries. "Our hope is that Exhibit 3.0 will be a useful tool in tackling the daunting challenge all libraries face in ensuring the future sustainability and accessibility of our digital content."

Exhibit was originally developed as part of the MIT Simile Project (simile.mit.edu), an ambitious collaboration of the MIT Libraries, the MIT CSAIL, and the World Wide Web Consortium (W3C) to explore applications of the Semantic Web to problems of information management across both large-scale digital libraries and small-scale personal collections. Exhibit runs inside a Web browser and supports many types of information using common Web standards for data publishing. Since its release, Exhibit has been used by thousands of websites worldwide across a range of diverse industries including cultural heritage, libraries, publishers, medical research, life science and government. Most recently Exhibit has been used by DATA.GOV (http://data.gov/), an Open Government Initiative by President Obama's administration to increase public access to high value data generated by the Executive Branch of the Federal Government. The application has been used to help demonstrate new ways of visualizing government data. . . .

The Exhibit 3.0 project will redesign and re-implement Exhibit to scale from small collections to very large data collections of the magnitude created by the Library of Congress and its National Digital Information Infrastructure and Preservation Program (NDIIPP). The redesigned Exhibit will be as simple to use as the current tool but more scalable, more modular, and easier to integrate into a variety of information management systems and websites—offering an improved user experience.

In addition to the Library of Congress, the MIT Libraries and other organizations that manage large quantities of data will collaborate on the project for their own collections. A major focus of the project will be to build a lively community around Exhibit, of both users of the software and software developers, to help continuously improve the open source tool. Another aspect of the new project will incorporate research by students at MIT's CSAIL (Computer Science and Artificial Intelligence Lab) on personal information management. The research will focus on improving the user experience working with data in Exhibit, and incorporating new data visualization techniques that allow users to explore data in novel ways. "Impressive data-interactive sites abound on the web, but right now you need a team of developers to create them. Exhibit demonstrated that authoring data-interactive sites can be as easy as authoring a static web page. With Exhibit 3.0 we can move from a prototype to a robust platform that anyone can use to author (not program) rich interactive information visualizations that effectively communicate with their users," said David Karger, computer science professor with CSAIL.

The project will begin in January for a period of one year, and a new website and other communication channels will be publicized soon. For more information see http://similewidgets.org/exhibit3.

| Digital Scholarship |

DataCite Metadata Scheme for the Publication and Citation of Research Data, Version 2.0 Released

DataCite has released the DataCite Metadata Scheme for the Publication and Citation of Research Data, Version 2.0.

Here's an excerpt:

The DataCite Metadata Scheme is a list of core metadata properties chosen for the accurate and consistent identification of data for citation and retrieval purposes, along with recommended use instructions. At a minimum, the mandatory metadata scheme properties must be provided at the time of identifier registration. Data centres and other submitters may also choose to use the optional properties to identify their data more clearly. This metadata scheme can fulfill several key functions in support of the larger goals of DataCite. Primarily these are:

recommending a standard citation format for datasets, based on a small number of properties required for identifier registration;

providing the basis for interoperability with other data management schemas;

promoting dataset discovery with optional properties allowing for flexible description of the resource, including its relationship to other resources;

and, laying the groundwork for future services (e.g., discovery) through the use of controlled terms from both a DataCite vocabulary and external vocabularies as applicable. The DataCite vocabularies will be administered by the DataCite Metadata Supervisor who will establish and publicize procedures for submitting changes.

| Digital Scholarship |

"Data Preservation in High Energy Physics"

David M. South has self-archived "Data Preservation in High Energy Physics" in arXiv.org.

Here's an excerpt:

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are in many cases unique. At the same time, HEP has no coherent strategy for data preservation and re-use, and many important and complex data sets are simply lost. In a period of a few years, several important and unique experimental programs will come to an end, including those at HERA, the b-factories and at the Tevatron. An inter-experimental study group on HEP data preservation and long-term analysis (DPHEP) was formed and a series of workshops were held to investigate this issue in a systematic way. The physics case for data preservation and the preservation models established by the group are presented, as well as a description of the transverse global projects and strategies already in place.

| Digital Scholarship |

Unchartered Waters—The State of Open Data in Europe

CSC has released Unchartered Waters—The State of Open Data in Europe

Here's an excerpt:

This study analyses the current state of the open data policy ecosystem and open government data offerings in nine European Member States. Since none of the countries studied currently offers a national open data portal, this study compares the statistics offices’ online data offerings. The analysis shows that they fulfill a number of open data principles but that there is still a lot of room for improvement. This study underlines that the development of data catalogues and portals should not be seen as means to an end.

| Digital Scholarship |

America COMPETES Act Establishes Interagency Public Access Committee

The signing of the America COMPETES Reauthorization Act of 2010 by President Obama establishes a new Interagency Public Access Committee. The International Association of Scientific, Technical & Medical Publishers (STM) has issued a press release that "applauds the efforts of US legislators in crafting the charter of the Interagency Public Access Committee."

Here's an excerpt from the Act:

SEC. 103. INTERAGENCY PUBLIC ACCESS COMMITTEE.

(a) ESTABLISHMENT.—The Director shall establish a working group under the National Science and Technology Council with

the responsibility to coordinate Federal science agency research and policies related to the dissemination and long-term stewardship of the results of unclassified research, including digital data and peer-reviewed scholarly publications, supported wholly, or in part, by funding from the Federal science agencies.

(b) RESPONSIBILITIES.—The working group shall—

(1) identify the specific objectives and public interests that need to be addressed by any policies coordinated under (a);

(2) take into account inherent variability among Federal science agencies and scientific disciplines in the nature of research, types of data, and dissemination models;

(3) coordinate the development or designation of standards for research data, the structure of full text and metadata, navigation tools, and other applications to maximize interoperability across Federal science agencies, across science and engineering disciplines, and between research data and scholarly publications, taking into account existing consensus standards, including international standards;

(4) coordinate Federal science agency programs and activities that support research and education on tools and systems required to ensure preservation and stewardship of all forms of digital research data, including scholarly publications;

(5) work with international science and technology counterparts to maximize interoperability between United States based unclassified research databases and international databases and repositories;

(6) solicit input and recommendations from, and collaborate with, non-Federal stakeholders, including the public, universities, nonprofit and for-profit publishers, libraries, federally funded and non federally funded research scientists, and other organizations and institutions with a stake in long term preservation and access to the results of federally funded research;

(7) establish priorities for coordinating the development of any Federal science agency policies related to public access to the results of federally funded research to maximize the benefits of such policies with respect to their potential economic or other impact on the science and engineering enterprise and the stakeholders thereof;

(8) take into consideration the distinction between scholarly publications and digital data;

(9) take into consideration the role that scientific publishers play in the peer review process in ensuring the integrity of the record of scientific research, including the investments and added value that they make; and

(10) examine Federal agency practices and procedures for providing research reports to the agencies charged with locating and preserving unclassified research.

(c) PATENT OR COPYRIGHT LAW.—Nothing in this section shall be construed to undermine any right under the provisions of title 17 or 35, United States Code.

(d) APPLICATION WITH EXISTING LAW.—Nothing defined in section

(b) shall be construed to affect existing law with respect to Federal science agencies’ policies related to public access.

(e) REPORT TO CONGRESS.—Not later than 1 year after the date of enactment of this Act, the Director shall transmit a report to Congress describing—

(1) the specific objectives and public interest identified under (b)(1);

(2) any priorities established under subsection (b)(7);

(3) the impact the policies described under (a) have had on the science and engineering enterprise and the stakeholders, including the financial impact on research budgets;

(4) the status of any Federal science agency policies related to public access to the results of federally funded research; and

(5) how any policies developed or being developed by Federal science agencies, as described in subsection (a), incorporate input from the non-Federal stakeholders described in subsection (b)(6).

(f) FEDERAL SCIENCE AGENCY DEFINED.—For the purposes of this section, the term ‘‘Federal science agency’’ means any Federal agency with an annual extramural research expenditure of over $100,000,000.

| Digital Scholarship |

Guide for Research Libraries: The NSF Data Sharing Policy

ARL has released the Guide for Research Libraries: The NSF Data Sharing Policy.

Here's an excerpt:

The Association for Research Libraries has developed this guide primarily for librarians, to help them make sense of the new NSF requirement. It provides the context for, and an explanation of, the policy change and its ramifications for the grant-writing process. It investigates the role of libraries in data management planning, offering guidance in helping researchers meet the NSF requirement. In addition, the guide provides a resources page, where examples of responses from ARL libraries may be found, as well as guides for data management planning created by various NSF directorates and approaches to the topic created by international data archive and curation centers.

| Digital Scholarship |

Riding the Wave—How Europe Can Gain from the Rising Tide of Scientific Data

The High-Level Group on Scientific Data has released Riding the Wave—How Europe Can Gain from the Rising Tide of Scientific Data.

Here's an excerpt:

A fundamental characteristic of our age is the rising tide of data — global, diverse, valuable and complex. In the realm of science, this is both an opportunity and a challenge. This report, prepared for the European Commission's Directorate-General for Information Society and Media, identifies the benefits and costs of accelerating the development of a fully functional e-infrastructure for scientific data — a system already emerging piecemeal and spontaneously across the globe, but now in need of a far-seeing, global framework. The outcome will be a vital scientific asset: flexible, reliable, efficient, cross-disciplinary and cross-border.

The benefits are broad. With a proper scientific e-infrastructure, researchers in different domains can collaborate on the same data set, finding new insights. They can share a data set easily across the globe, but also protect its integrity and ownership. They can use, re-use and combine data, increasing productivity. They can more easily solve today's Grand Challenges, such as climate change and energy supply. Indeed, they can engage in whole new forms of scientific inquiry, made possible by the unimaginable power of the e-infrastructure to find correlations, draw inferences and trade ideas and information at a scale we are only beginning to see. For society as a whole, this is beneficial. It empowers amateurs to contribute more easily to the scientific process, politicians to govern more effectively with solid evidence, and the European and global economy to expand.

NSF Data Sharing Policy Released

The National Science Foundation has released its revised NSF Data Sharing Policy. As of January 18, 2011, NSF proposals must include a two-page (or less) "Data Management Plan" in accordance with the Grant Proposal Guide, chapter II.C.2.j (see below excerpt).

Here's an excerpt from the Award and Administration Guide, chapter VI.D.4:

b. Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing. Privileged or confidential information should be released only in a form that protects the privacy of individuals and subjects involved. General adjustments and, where essential, exceptions to this sharing expectation may be specified by the funding NSF Program or Division/Office for a particular field or discipline to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate the legitimate interest of investigators. A grantee or investigator also may request a particular adjustment or exception from the cognizant NSF Program Officer.

c. Investigators and grantees are encouraged to share software and inventions created under the grant or otherwise make them or their products widely available and usable.

d. NSF normally allows grantees to retain principal legal rights to intellectual property developed under NSF grants to provide incentives for development and dissemination of inventions, software and publications that can enhance their usefulness, accessibility and upkeep. Such incentives do not, however, reduce the responsibility that investigators and organizations have as members of the scientific and engineering community, to make results, data and collections available to other researchers.

Here's an excerpt from the Grant Proposal Guide, chapter II.C.2.j:

Plans for data management and sharing of the products of research. Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan”. This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include:

the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;

the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);

policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;

policies and provisions for re-use, re-distribution, and the production of derivatives; and

plans for archiving data, samples, and other research products, and for preservation of access to them.

A May 2010 NSF press release ("Scientists Seeking NSF Funding Will Soon Be Required to Submit Data Management Plans") discussed the background for the policy:

"Science is becoming data-intensive and collaborative," noted Ed Seidel, acting assistant director for NSF's Mathematical and Physical Sciences directorate. "Researchers from numerous disciplines need to work together to attack complex problems; openly sharing data will pave the way for researchers to communicate and collaborate more effectively."

"This is the first step in what will be a more comprehensive approach to data policy," added Cora Marrett, NSF acting deputy director. "It will address the need for data from publicly-funded research to be made public."

"Why Linked Data is Not Enough for Scientists"

Sean Bechhofer et al. have self-archived "Why Linked Data is Not Enough for Scientists" in the ECS EPrints Repository

Here's an excerpt:

Scientific data stands to represent a significant portion of the linked open data cloud and science itself stands to benefit from the data fusion capability that this will afford. However, simply publishing linked data into the cloud does not necessarily meet the requirements of reuse. Publishing has requirements of provenance, quality, credit, attribution, methods in order to provide the reproducibility that allows validation of results. In this paper we make the case for a scientific data publication model on top of linked data and introduce the notion of Research Objects as first class citizens for sharing and publishing.

"Research Data: Who Will Share What, with Whom, When, and Why?"

Christine L. Borgman has self-archived "Research Data: Who Will Share What, with Whom, When, and Why?" in SelectedWorks.

Here's an excerpt:

The deluge of scientific research data has excited the general public, as well as the scientific community, with the possibilities for better understanding of scientific problems, from climate to culture. For data to be available, researchers must be willing and able to share them. The policies of governments, funding agencies, journals, and university tenure and promotion committees also influence how, when, and whether research data are shared. Data are complex objects. Their purposes and the methods by which they are produced vary widely across scientific fields, as do the criteria for sharing them. To address these challenges, it is necessary to examine the arguments for sharing data and how those arguments match the motivations and interests of the scientific community and the public. Four arguments are examined: to make the results of publicly funded data available to the public, to enable others to ask new questions of extant data, to advance the state of science, and to reproduce research. Libraries need to consider their role in the face of each of these arguments, and what expertise and systems they require for data curation.

"Keeping Research Data Safe Factsheet"

Charles Beagrie Limited has released the "Keeping Research Data Safe Factsheet."

Here's an excerpt:

This factsheet illustrates for institutions, researchers, and funders some of the key findings and recommendations from the JISC-funded Keeping Research Data Safe (KRDS1) and Keeping Research Data Safe 2 (KRDS2) projects.

Data Mash-Ups and the Future of Mapping

JISC has released Data Mash-Ups and the Future of Mapping.

Here's an excerpt:

The term 'mash-up' refers to websites that weave data from different sources into new Web services. The key to a successful Web service is to gather and use large datasets and harness the scale of the Internet through what is known as network effects. This means that data sources are just as important as the software that 'mashes' them, and one of the most profound pieces of data that a user has at any one time is his or her location. . . .

Since, as this report makes clear, data mash-ups that make use of geospatial data in some form or other are by far the most common mash-ups to date, then they are likely to provide useful lessons for other forms of data. In particular, the education community needs to understand the issues around how to open up data, how to allow data to be added to in ways that do not compromise accuracy and quality and how to deal with issues such as privacy and working with commercial and non-profit third parties—and the GeoWeb is a test ground for much of this. Thirdly, new location-based systems are likely to have educational uses by, for example, facilitating new forms of fieldwork. Understanding the technology behind such systems and the way it is developing is likely to be of benefit to teachers and lecturers who are thinking about new ways to engage with learners. And finally, there is a future watching aspect. Data mash-ups in education and research are part of an emerging, richer information environment with greater integration of mobile applications, sensor platforms, e-science, mixed reality, and semantic, machine-computable data. This report starts to speculate on forms that these might take, in the context of map-based data.

Research Data Management: Incremental Project Releases Scoping Study And Implementation Plan

The Incremental Project has released the Scoping Study And Implementation Plan. The Cambridge University Library and Humanities Advanced Technology and Information Institute (HATII) at the University of Glasgow jointly run the project.

Here's a brief description of the project from its home page:

The project is a first step in improving and facilitating the day-to-day and long-term management of research data in higher education institutions (HEI's). We aim to increase researchers’ capacity and motivation for managing their digital research data, using existing tools and resources where possible and working to identify and fill gaps where additional tailored support and guidance is required. We aim to take a bottom-up approach, consulting a diverse set of researchers in each stage of the project.

Read more about it at "Scoping Study and Implementation Plan Released."

Open Data: Panton Principles Authors Named SPARC Innovators

The authors of the Panton Principles have been named as SPARC Innovators.

Here's an excerpt from the press release:

Science is based on building on, reusing, and openly criticizing the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavors, it is crucial that science data be made open.

That's the belief of four leaders who have put forth a groundbreaking set of recommendations for scientists to more easily share their data—The Panton Principles—and who have been named the latest SPARC Innovators for their work.

The authors of The Panton Principles are:

Peter Murray-Rust, chemist at the University of Cambridge;

Cameron Neylon, biochemist at the Rutherford Appleton Laboratory in Didcot, England;

Rufus Pollock, co-founder of the Open Knowledge Foundation and Mead Fellow in Economics, Emmanuel College, University of Cambridge;

John Wilbanks, vice president for Science, Creative Commons, San Francisco.

The authors advocate making data freely available on the Internet for anyone to download, copy, analyze, reprocess, pass to software or use for any purpose without financial, legal or technical barriers. Through the Principles, the group aimed to develop clear language that explicitly defines how a scientist's rights to his own data could be structured so others can freely reuse or build on it. The goal was to craft language simple enough that a scientist could easily follow it, and then focus on doing science rather than law.

The Panton Principles were publicly launched in February of 2010, with a Web site at www.pantonprinciples.org to spread the word and an invitation to endorse. About 100 individuals and organizations have endorsed the Principles so far.

"This is the first time we're seeing diverse viewpoints crystallize around the pragmatic idea that we have to start somewhere, agree on the basics, and set the tone," says Heather Joseph, Executive Director of SPARC (the Scholarly Publishing and Academic Resources Coalition). "The authors are all leading thinkers in this area—as well as producers and consumers of data. They each approached the idea of open data from different directions, yet with the same drive to open up science, and ended up on common ground."

According to Pollock, "It's commonplace that we advance by building on the work of colleagues and predecessors—standing on the shoulders of giants. In a digital age, to build on the work of others we need something very concrete: access to the data of others and the freedom to use and reuse it. That's what the Panton Principles are about."

To read the full June 2010 SPARC Innovator profile, visit http://www.arl.org/sparc/innovator.

Presentations from the Changing Role Of Libraries in Support of Research Data Activities: A Public Symposium

The Board on Research Data and Information has released presentations from the Changing Role Of Libraries in Support of Research Data Activities: A Public Symposium.

Presentations included:

Deanna Marcum, Library of Congress: The Role of Libraries in Digital Data Preservation and Access—The Library of Congress Experience
Betsy Humphreys, National Library of Medicine: More Data, More Use, Less Lead Time: Scientific Data Activities at the National Library of Medicine
Joyce Ray, Institute for Museum and Library Services: Libraries in the New Research Environment
Karla Strieb, Association of Research Libraries: Supporting E-Science: Progress at Research Institutions and Their Libraries
Christine Borgman, UC, Los Angeles: Why Data Matters to Librarians—and How to Educate the Next Generation

Addressing the Research Data Gap: A Review of Novel Services for Libraries

The Canadian Association of Research Libraries (CARL) has released Addressing the Research Data Gap: A Review of Novel Services for Libraries.

Here's an excerpt:

This document presents the results of a review of novel opportunities for libraries in the area of research data services. The activities were identified through a review of the literature and a scan of projects being undertaken at libraries and other institutions worldwide. For the purpose of this report, research data services have been organized into five distinct areas (although it should be noted that there are significant overlaps between them): awareness and advocacy; support and training; access and discovery; archiving and preservation; and virtual research environments. Each section contains a general description of the area accompanied by a number of examples. The examples are not meant to be comprehensive account of existing projects, but rather to highlight the range of possibilities available.

Open Data Study

The Open Society Institute's Transparency and Accountability Initiative has released the Open Data Study.

Here's an excerpt:

There are substantial social and economic gains to be made from opening government data to the public. The combination of geographic, budget, demographic, services, education and other data, publicly available in an open format on the web, promises to improve services as well as create future economic growth.

This approach has been recently pioneered by governments in the United State and the United Kingdom (with the launch of two web portals – www.data.gov and www.data.gov.uk respectively) inspired in part by applications developed by grassroots civil society organisations ranging from bicycle accidents maps to sites breaking down how and where tax money is spent. In the UK, the data.gov.uk initiative was spearheaded by Tim Berners-Lee, the inventor of the World Wide Web.

This research, commissioned by a consortium of funders and NGOs under the umbrella of the Transparency and Accountability Initiative, seeks to explore the feasibility of applying this approach to open data in relevant middle income and developing countries. Its aim is to identify the strategies used in the US and UK contexts with a view to building a set of criteria to guide the selection of pilot countries, which in turn suggests a template strategy to open government data.

Digital Data: "Metadata and Provenance Management"

Ewa Deelman et al. have self-archived "Metadata and Provenance Management" in arXiv.org.

Here's an excerpt:

Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes.

Open Source Data Registry Software: CKAN (Comprehensive Knowledge Archive Network) Version 1.0 Released

The Open Knowledge Foundation has released CKAN (Comprehensive Knowledge Archive Network) version 1.0.

Here's an excerpt from the announcement:

As well as being used to power http://ckan.net and http://data.gov.uk CKAN is now helping run 7 data catalogues around the world including ones in Canada (http://datadotgc.ca / http://ca.ckan.net), Germany (http://de.ckan.net/) and Norway (http://no.ckan.net).

CKAN.net has also continued to grow steadily and now has over 940 registered packages:

Here's a description of CKAN from the project page:

CKAN is a registry or catalogue system for datasets or other "knowledge" resources. CKAN aims to make it easy to find, share and reuse open content and data, especially in ways that are machine automatable.

Review of the State of the Art of the Digital Curation of Research Data

Alex Ball has deposited Review of the State of the Art of the Digital Curation of Research Data in Opus.

Here's an excerpt :

The aim of this report is to present the state of the art of the digital curation of research data, in terms of both theoretical understanding and practical application, and note points of particular interest to the ERIM Project. The report begins by reviewing the concepts of data curation and digital curation, and then exploring the terminologies currently in use for describing digital repositories and data lifecycles. Some parallels are also drawn between digital curation practice and design and engineering practice. Existing guidance on data curation from research funders, established data centres and the Digital Curation Centre is summarized in section 3. A review of some important standards and tools that have been developed to assist in research data management and digital repository management is presented in section 4. Finally, a short case study of implementing a new data management plan is presented in section 5, followed by some conclusions and recommendations in section 6.

"BioTorrents: A File Sharing Service for Scientific Data"

Morgan G. I. Langille and Jonathan A. Eisen have published "BioTorrents: A File Sharing Service for Scientific Data" in PLoS ONE.

Here's an excerpt:

The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net.

Recommendations for Independent Scholarly Publication of Data Sets

The Creative Commons has released Recommendations for Independent Scholarly Publication of Data Sets. This is a working paper.

Here's an excerpt:

In an ideal world, any data collected by a research study would be available to anyone interested in validating or building on that data, just as is the documentation describing the study itself. Some data has value that goes beyond the study for which it is generated, and getting the data to those who can use it for reanalysis, meta-analysis, and other applications unimagined by the study authors is to everyone's benefit. Data reuse failure is receiving growing recognition as a problem for the research community and the general public. The road to reuse is perilous, involving as it does a series of difficult steps:

The author must be professionally motivated to publish the data

The effort and economic burden of publication must be acceptable

The data must become accessible to potential users

The data must remain accessible over time

The data must be discoverable by potential users

The users use of the data must be permitted

The user must be able to understand what was measured and how (materials and methods)

The user must be able to understand all computations that were applied and their inputs

The user must be able to apply standard tools to all file formats

The user must be able to understand the data in detail (units, symbols)

This report considers how the genre of the data paper, suitably construed, might be used to help a data set survive these trials.

California Digital Library Becomes Founding Member of DataCite Consortium

The California Digital Library has become a founding member of the DataCite Consortium.

Here's an excerpt from the press release:

One of today's most important priorities for academic scholarship and research is providing long-term access to datasets. Data are now seen as the building blocks of scholarship and research in the sciences and humanities. Scholars and archivists recognize the potential for increasing collaboration and synthesis when data are archived, published, and shared, forging the possibility for new discoveries built upon the research of others. . . .

DataCite offers an easy way to connect an article published in a scholarly journal with the underlying data and allows authors to take control of the management and distribution of their research. Additionally, DataCite provides the means for researchers to share and get credit for datasets; establish easier access to research data; increase acceptance of research data as legitimate, citable contributions to the scholarly record; and to support data archiving that permits results to be verified and re-purposed for future study.

A pragmatic first step towards managing, or "curating," data is to register the existence of datasets publicly and permanently. Mirroring accepted publishing practice, DataCite's services make it easy for data producers to obtain permanent catalog records and persistent identifiers that are visible through familiar mechanisms, such as library systems, CrossRef and search engines. . . .

Stephen P. Miller, head of the Geological Data Center, Scripps Institution of Oceanography at the University of California, San Diego says, "It is critical for research community data operations to keep in close communications with DataCite, maintaining a forum to discuss challenges and to share resources and innovative tools. For example, the ‘Rolling Deck to Repository (R2R)' project was recently launched to capture all routine underway data on U.S. oceanographic research vessels, approximately 500 expeditions per year, conducted by 18 independent operating institutions. In recent years there has been a change in the cultural patterns in the marine science and other communities. Data are being exchanged, and re-used, more than ever. Much of the re-use is accomplished without the direct involvement of the original data collector… It is now a general practice to combine data from various online resources even before you go to sea, and to submit your data to a repository for others to use after returning."

In addition to the CDL, the DataCite consortium includes the German National Library of Science and Technology, the British Library, the Library of the ETH Zurich, the French Institute for Scientific and Technical Information, the Technical Information Center of Denmark, the Dutch TU Delft Library, Canada Institute for Scientific and Technical Information, the Australian National Data Service and Purdue University.