Research Data Curation Bibliography, Version 4

Digital Scholarship has released version 4 of the Research Data Curation Bibliography. This selective bibliography includes over 320 English-language articles and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions.

The "digital curation" concept is still evolving. In "Digital Curation and Trusted Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo define digital curation as follows:

Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts.

Most sources have been published from January 2009 through June 2014; however, a limited number of earlier key sources are also included.

The bibliography includes links to freely available versions of included works. If such versions are unavailable, links to the publishers' descriptions are provided.

It is available under a Creative Commons Attribution-Noncommercial 3.0 United States License.

For broader coverage of the digital curation literature, see the author's Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works,which presents over 650 English-language articles, books, and technical reports, and the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement, which presents over 130 additional sources.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"The ‘Digital’ Scholarship Disconnect"

Clifford Lynch has published "The 'Digital' Scholarship Disconnect" in EDUCAUSE Review.

Here's an excerpt:

Still, in all of these examples of digital scholarship, a key challenge remains: How can we curate and manage data now that so much of it is being produced and collected in digital form? How can we ensure that it will be discovered, shared, and reused to advance scholarship? We are struggling through the establishment of institutions, funding models, policies and practices, and even new legal requirements and community norms—ranging from cultural changes about who can use data (and when) to economic decisions about who should pay for what. Some disciplines are less contentious than others: for example, astronomy data is technically well-understood and usually not terribly sensitive. Reputation, rather than commercial reward, is wrapped up in astronomical discoveries, and there is no institutional review board to ensure the safety and dignity of astronomical objects. On the other hand, human subjects and their data raise an enormous number of questions about informed consent, privacy, and anonymization; when there are genetic markers or possible treatments to be discovered or validated, serious high-value commercial interests may be at stake. All of these factors tend to work against the free and convenient sharing of data.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"The University Library as Incubator for Digital Scholarship"

Bryan Sinclair has published "The University Library as Incubator for Digital Scholarship" in EDUCAUSE Review.

Here's an excerpt:

The campus of the future will be increasingly connected and collaborative, and the library can be the community center and beta test kitchen for new forms of interdisciplinary inquiry. Libraries have always been in the business of knowledge creation and transfer, and the digital scholarship incubator within the library can serve as a natural extension of this essential function. In an age of visualization, analytics, big data, and new forms of online publishing, these central spaces can facilitate knowledge creation and transfer by connecting people, data, and technology in a shared collaborative space.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Developing a Research Data Management Service—A Case Study"

Jeff Moon has published "Developing a Research Data Management Service—A Case Study" in Partnership.

Here's an excerpt:

Publicly-funded, researcher-generated data has been on the front burner lately, driven by a variety of factors, including evolving funding-agency policies and journal publisher requirements. In this context, Queen's University Library (QUL) developed and implemented a Research Data Management (RDM) Service to meet researchers' needs. This process is described here, framed around four main themes: planning, building, educating, and doing.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Research Data Sharing: Developing a Stakeholder-Driven Model for Journal Policies"

Paul Sturges et al. have self-archived "Research Data Sharing: Developing a Stakeholder-Driven Model for Journal Policies."

Here's an excerpt:

The Journal Research Data (JoRD) Project was a JISC (Joint Information Systems Committee) funded feasibility study on the possible shape of a central service on journal research data policies. The objectives of the study included, amongst other considerations: to identify the current state of journal data sharing policies and to investigate the views and practices of stakeholders to data sharing. The project confirmed that a large percentage of journals do not have a policy on data sharing, and that there are inconsistencies between the traceable journal data sharing policies. Such a state leaves authors unsure of whether they should deposit data relating to articles and where and how to share that data. In the absence of a consolidated infrastructure for the easy sharing of data, a journal data sharing model policy was developed. The model policy was developed from comparing the quantitative information gathered from analysing existing journal data policies with qualitative data collected from the stakeholders concerned. This article summarises the information gathered, outlines the process by which the model was developed and presents the model journal data sharing policy in full.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"PLOS Data Policy: Catalyst for a Better Research Process"

Emma Ganley has published "PLOS Data Policy: Catalyst for a Better Research Process" in College & Research Libraries News.

Here's an excerpt:

PLOS is seeking to ensure the ongoing utility of research, as making a paper openly accessible is enhanced enormously if that paper is linked seamlessly to the data from which it was constructed. In a time when post-publication peer review is more prevalent and data frequently come under intense public scrutiny, with whistle-blowers, blogs, and websites dedicated to investigating the validity and veracity of scientific publications, requiring access to the relevant data leads to a more rigorous scientific record.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"PyRDM: A Python-Based Library for Automating the Management and Online Publication of Scientific Software And Data"

Christian T. Jacobs et al. have self-archived "PyRDM: A Python-Based Library for Automating the Management and Online Publication of Scientific Software And Data."

Here's an excerpt:

The recomputability and reproducibility of results from scientific software requires access to both the source code and all associated input and output data. However, the full collection of these resources often does not accompany the key findings published in journal articles, thereby making it difficult or impossible for the wider scientific community to verify the correctness of a result or to build further research on it. This paper presents a new Python-based library, PyRDM, whose functionality aims to automate the process of sharing the software and data via online, citable repositories such as Figshare. The library is integrated into the workflow of an open-source computational fluid dynamics package, Fluidity, to demonstrate an example of its usage.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Peer Review of Datasets: When, Why, and How"

Matthew S. Mayernik et al. have published "Peer Review of Datasets: When, Why, and How" in the Bulletin of the American Meteorological Society.

Here's an excerpt:

This paper discusses issues related to data peer review, in particular the peer review processes, needs, and challenges related to the following scenarios: 1) Data analyzed in traditional scientific articles, 2) Data articles published in traditional scientific journals, 3) Data submitted to open access data repositories, and 4) Datasets published via articles in data journals.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

U.S. Open Data Action Plan

The White House has released the U.S. Open Data Action Plan.

Here's an excerpt:

The Smithsonian Cooper-Hewitt National Design Museum Collection plans to make all digitized collections metadata public domain, and digitized collection images without copyright or other restriction publicly available at the highest available resolution for non-commercial, educational use. . . .

The Smithsonian Freer Gallery of Art and Arthur M. Sackler Gallery plans to make all digitized collections metadata public domain, and digitized collection images without copyright or other restriction publicly available at the highest available resolution for non-commercial, educational use. . . .

After a successful limited release of an API of the Smithsonian American Art Museum collection and hackathon that resulted in a number of working prototypes, the Smithsonian American Art Museum is planning a staged release, from open metadata, like artist or medium, to an open API of digitized collections images without copyright or other restriction available for non- commercial, educational use.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Big Data: Seizing Opportunities, Preserving Values

The Executive Office of the President has released Big Data: Seizing Opportunities, Preserving Values.

Here's an excerpt:

On January 17, in a speech at the Justice Department about reforming the United States' signals intelligence practices, President Obama tasked his Counselor John Podesta with leading a comprehensive review of the impact big data technologies are having, and will have, on a range of economic, social, and government activities. Podesta was joined in this effort by Secretary of Commerce Penny Pritzker, Secretary of Energy Ernest Moniz, the President's Science Advisor John Holdren, the President's Economic Advisor Jeffrey Zients, and other senior government officials. The President's Council of Advisors for Science & Technology conducted a parallel report to take measure of the underlying technologies. Their findings underpin many of the technological assertions in this report.

This review was conceived as fundamentally a scoping exercise. Over 90 days, the review group engaged with academic experts, industry representatives, privacy advocates, civil rights groups, law enforcement agents, and other government agencies. The White House Office of Science and Technology Policy jointly organized three university conferences, at the Massachusetts Institute of Technology, New York University, and the University of California, Berkeley. The White House Office of Science & Technology Policy also issued a "Request for Information" seeking public comment on issues of big data and privacy and received more than 70 responses. In addition, the WhiteHouse.gov platform was used to conduct an unscientific survey of public attitudes about different uses of big data and various big data technologies. A list of the working group's activities can be found in the Appendix.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

What Drives Academic Data Sharing?

RatSWD has released What Drives Academic Data Sharing?.

Here's an excerpt:

Based on a systematic review of 98 scholarly papers and an empirical survey among 603 secondary data users, we develop a conceptual framework that explains the process of data sharing from the primary researcher’s point of view. We show that this process can be divided into six descriptive categories: Data donor, research organization, research community, norms, data infrastructure, and data recipients. Drawing from our findings, we discuss theoretical implications regarding knowledge creation and dissemination as well as research policy measures to foster academic collaboration.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Data Publication Consensus and Controversies"

F1000Research has released an eprint of "Data Publication Consensus and Controversies."

Here's an excerpt:

As data publication venues proliferate, significant debate continues over formats, processes, and terminology. Here, we present an overview of data publication initiatives underway and the current conversation, highlighting points of consensus and issues still in contention.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

How to Discover Requirements for Research Data Management Services

The DCC and DataONE have released How to Discover Requirements for Research Data Management Services.

Here's an excerpt:

This guide is meant for people whose role involves developing services or tools to support research data management (RDM) and digital curation, whether in a Higher Education Institution or a project working across institutions. Your RDM development role might be embedded with the research groups concerned, or at a more centralised level, such as a library or computing service. You will need a methodical approach to plan, elicit, analyse, document and prioritise a range of users' requirements.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

The Value and Impact of Data Sharing and Curation: A Synthesis of Three Recent Studies of UK Research Data Centres

JISC has released The Value and Impact of Data Sharing and Curation: A Synthesis of Three Recent Studies of UK Research Data Centres.

Here's an excerpt from the announcement:

The data centre studies combined quantitative and qualitative approaches in order to quantify value in economic terms and present other, non-economic, impacts and benefits. Uniquely, the studies cover both users and depositors of data, and we believe the surveys of depositors undertaken are the first of their kind. All three studies show a similar pattern of findings, with data sharing via the data centres having a large measurable impact on research efficiency and on return on investment in the data and services. These findings are important for funders, both for making the economic case for investment in data curation and sharing and research data infrastructure, and for ensuring the sustainability of such research data centres.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets"

Christopher W. Belter has published "Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets" in PLOS ONE.

Here's an excerpt:

Evaluation of scientific research is becoming increasingly reliant on publication-based bibliometric indicators, which may result in the devaluation of other scientific activities—such as data curation—that do not necessarily result in the production of scientific publications. This issue may undermine the movement to openly share and cite data sets in scientific publications because researchers are unlikely to devote the effort necessary to curate their research data if they are unlikely to receive credit for doing so. This analysis attempts to demonstrate the bibliometric impact of properly curated and openly accessible data sets by attempting to generate citation counts for three data sets archived at the National Oceanographic Data Center. My findings suggest that all three data sets are highly cited, with estimated citation counts in most cases higher than 99% of all the journal articles published in Oceanography during the same years. I also find that methods of citing and referring to these data sets in scientific publications are highly inconsistent, despite the fact that a formal citation format is suggested for each data set. These findings have important implications for developing a data citation format, encouraging researchers to properly curate their research data, and evaluating the bibliometric impact of individuals and institutions.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Response to Elsevier’s Text and Data Mining Policy: A LIBER Discussion Paper"

LIBER has released "Response to Elsevier's Text and Data Mining Policy: A LIBER Discussion Paper."

Here's an excerpt from the announcement:

LIBER believes that the right to read is the right to mine and that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive. Furthermore, as this discussion paper highlights, licensing has the potential to limit the innovative potential of digital research methods by:

  1. restricting the tools that researchers can use
  2. limiting the way in which research results can be made available
  3. impacting on the transparency and reproducibility of research results.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Exemplar Good Governance Structures and Data Policies

APARSEN has released Exemplar Good Governance Structures and Data Policies.

Here's an excerpt:

This report summarises the level of preparedness for interoperable governance and data policies based on both desktop research on selected data policies and online survey conducted during this study. It is important to understand what current data policies address and if they miss out on important topics, such as specific requirements for data preservation. This will give an indication on the possible impact of such data policies on the individual communities and allows recommendations to be drawn up to guide forthcoming policies. This report concludes with selected recommendations that should be taken into account when drawing up data policies concerning digital preservation.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

PLOS Clarifies Open Data Policy

PLOS has clarified its open data policy.

Here's an excerpt:

In the previous post, and also on our site for PLOS ONE Academic Editors, an attempt to simplify our policy did not represent the policy correctly and we sincerely apologize for that and for the confusion it has caused. We are today correcting that post and hoping it provides the clarity many have been seeking. . . .

Two key things to summarize about the policy are:

  1. The policy does not aim to say anything new about what data types, forms and amounts should be shared.
  2. The policy does aim to make transparent where the data can be found, and says that it shouldn't be just on the authors' own hard drive.

Correction

We have struck out the paragraph in the original PLOS ONE blog post headed "What do we mean by data", as we think it led to much of the confusion. Instead we offer this guidance to authors planning to submit to a PLOS journal.

What data do I need to make available?

We ask you to make available the data underlying the findings in the paper, which would be needed by someone wishing to understand, validate or replicate the work. Our policy has not changed in this regard. What has changed is that we now ask you to say where the data can be found.

As the PLOS data policy applies to all fields in which we publish, we recognize that we'll need to work closely with authors in some subject areas to ensure adherence to the new policy. Some fields have very well established standards and practices around data, while others are still evolving, and we would like to work with any field that is developing data standards. We are aiming to ensure transparency about data availability.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Geospatial Data Stewardship: Key Online Resources

The National Digital Stewardship Alliance has released Geospatial Data Stewardship: Key Online Resources.

Here's an excerpt:

This document lists online resources that highlight key concepts and practices supporting the preservation and stewardship of digital geospatial data and information. GIS practitioners take the initial preservation actions in the decisions they make regarding data creation and management. Librarians, archivists and museum professionals are often called on to support access and the long-term historical and temporal analysis of these same materials. The resources below offer a starting point to methods, tools and approaches across the information lifecycle to assist in understanding current best practices in the stewardship of geospatial data.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"An Introduction to the Coverage of the Data Citation Index (Thomson-Reuters): Disciplines, Document Types and Repositories"

Daniel Torres-Salinas, Alberto Martín-Martín, Enrique Fuente-Gutiérrez have self-archived "An Introduction to the Coverage of the Data Citation Index (Thomson-Reuters): Disciplines, Document Types and Repositories" in arXiv.org.

Here's an excerpt:

In the past years, the movement of data sharing has been enjoying great popularity. Within this context, Thomson Reuters launched at the end of 2012 a new product inside the Web of Knowledge family: the Data Citation Index. The aim of this tool is to enable discovery and access, from a single place, to data from a variety of data repositories from different subject areas and from around the world. In this short note we present some preliminary results from the analysis of the Data Citation Index. Specifically, we address the following issues: discipline coverage, data types present in the database, and repositories that were included at the time of the study.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

PLOS Mandates Immediate Open Access to Article-Related Data

PLOS has mandated that author's provide immediate open access to article-related data upon publication.

Here's an excerpt from the announcement:

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Feet on the Ground: A Practical Approach to the Cloud—Nine Things to Consider When Assessing Cloud Storage

AudioVisual Preservation Solutions, has released Feet on the Ground: A Practical Approach to the Cloud—Nine Things to Consider When Assessing Cloud Storage.

Here's an excerpt:

There is no all-in-one solution that will fulfill every archives' needs for preservation storage. Often, cloud storage services fulfill a portion of an organization's larger preservation infrastructure, providing secure back up for preservation copies or supporting delivery of access files from low-latency storage. Vetting and selection is therefore the alignment of organizational and collection needs with the offerings and functionality of a service. This means defining your acceptance criteria for optimal functionality and understanding how a service will fit in your preservation environment.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

APA/C-DAC International Conference on Digital Preservation and Development of Trusted Digital Repositories 2014 Proceedings

The APA/C-DAC International Conference on Digital Preservation and Development of Trusted Digital Repositories 2014 proceedings have been released.

Presentations and session videos are also available.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"E-Science as a Catalyst for Transformational Change in University Research Libraries"

Mary E. Piorun has self-archived her dissertaion "E-Science as a Catalyst for Transformational Change in University Research Libraries."

Here's an excerpt:

Changes in how research is conducted, from the growth of e-science to the emergence of big data, have lead to new opportunities for librarians to become involved in the creation and management of research data, at the same time the duties and responsibilities of university libraries continue to evolve. This study examines those roles related to e-science while exploring the concept of transformational change and leadership issues in bringing about such a change. Using the framework established by Levy and Merry for first- and second-order change, four case studies of libraries whose institutions are members in the Association of Research Libraries (ARL) are developed.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

Open Science Win: Johnson & Johnson Clinical Trial Data Sharing Agreement

Johnson & Johnson has announced a clinical trial data sharing agreement with the Yale School of Medicine.

Here's an excerpt from the announcement:

Johnson & Johnson today announced that its subsidiary, Janssen Research and Development, LLC, has entered into a novel agreement with Yale School of Medicine's Open Data Access (YODA) Project that will extend its commitment to sharing clinical trials data to enhance public health and advance science and medicine. Under the agreement, YODA will serve as an independent body to review requests from investigators and physicians seeking access to anonymized clinical trials data from Janssen, the pharmaceutical companies of Johnson & Johnson, and make final decisions on data sharing. This is the first time any company has collaborated with a completely independent third party to review and make decisions regarding every request for clinical data.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap