Research Data Stewardship at UNC: Recommendations for Scholarly Practice and Leadership

The University of North Carolina at Chapel Hill School of Information and Library Science has released Research Data Stewardship at UNC: Recommendations for Scholarly Practice and Leadership.

Here's an excerpt:

This working report emanates from efforts to identify policy options for digital research data stewardship at UNC. In January 2011, the UNC Provost charged a task force on the stewardship of digital research data to make recommendations about storage and maintenance of digital data produced in the course of UNC-based research (see Appendix 1 for the task force charge). During the 2011 calendar year, the task force conducted an environmental scan of research data stewardship policies and trends, discussed issues, collected data on campus using interviews and a survey, and developed a set of principles and associated courses of action for the campus to consider (see Appendix 2 for a list of task force meetings). We believe that the principles are in concert with the UNC mission and its academic plan and can serve as the basis for policies and implementations. We recognize, however, that scholarly data and processes are highly diverse and that the technologies and economics of stewardship are changing rapidly. We thus view the implementation alternatives and recommendations here as first steps in what should be an ongoing process that serves the research data stewardship needs of scholars, the campus, and humanity. We offer this document as a working report that we hope will serve as an adaptable framework for research data stewardship across disciplines at UNC and beyond.

| Digital Curation and Preservation Bibliography 2010: "If you're looking for a reading list that will keep you busy from now until the end of time, this is your one-stop shop for all things digital preservation."— "Digital Preservation Reading List," Preservation Services at Dartmouth College weblog, February 21, 2012. | Digital Scholarship |

"Fact Sheet: Big Data Across the Federal Government"

The White House Office of Science and Technology Policy has released "Fact Sheet: Big Data Across the Federal Government."

Here's an excerpt:

Below are highlights of ongoing Federal government programs that address the challenges of, and tap the opportunities afforded by, the big data revolution to advance agency missions and further scientific discovery and innovation.

| Digital Curation and Preservation Bibliography 2010: "If you're looking for a reading list that will keep you busy from now until the end of time, this is your one-stop shop for all things digital preservation."— "Digital Preservation Reading List," Preservation Services at Dartmouth College weblog, February 21, 2012. | Digital Scholarship |

The Value and Benefits of Text Mining

JIASC has released The Value and Benefits of Text Mining.

Here's an excerpt:

Vast amounts of new information and data are generated everyday through economic, academic and social activities. This sea of data, predicted to increase at a rate of 40% p.a., has significant potential economic and societal value. Techniques such as text and data mining and analytics are required to exploit this potential. . . .

To date there has been no systematic analysis of the value and benefits of text mining to UK further and higher education (UKFHE), nor of the additional value and benefits that might result from the exceptions to copyright proposed by Hargreaves. JISC thus commissioned this analysis of 'The Value and Benefits of Text Mining to UK Further and Higher Education'.

We have explored the costs, benefits, barriers and risks associated with text mining within UKFHE research using the approach to welfare economics laid out in the UK Treasury best practice guidelines for evaluation [2]. We gathered our evidence from consultations with key stakeholders and a set of case studies.

| Institutional Repository and ETD Bibliography 2011 | Digital Scholarship |

"The Informatics Transform: Re-engineering Libraries for the Data Decade"

Liz Lyon has published "The Informatics Transform: Re-engineering Libraries for the Data Decade" in the latest issue of the International Journal of Digital Curation.

Here's an excerpt:

In this paper, Liz Lyon explores how libraries can re-shape to better reflect the requirements and challenges of today's data-centric research landscape. The Informatics Transform presents five assertions as potential pathways to change, which will help libraries to re-position, re-profile, and re-structure to better address research data management challenges. The paper deconstructs the institutional research lifecycle and describes a portfolio of ten data support services which libraries can deliver to support the research lifecycle phases. Institutional roles and responsibilities for research data management are also unpacked, building on the framework from the earlier Dealing with Data Report. Finally, the paper examines critical capacity and capability challenges and proposes some innovative steps to addressing the significant skills gaps.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

"Peer-Reviewed Open Research Data: Results of a Pilot"

Marjan Grootveld and Jeff van Egmond have self-archived "Peer-Reviewed Open Research Data: Results of a Pilot" in E-LIS.

Here's an excerpt:

Peer review of publications is at the core of science and primarily seen as instrument for ensuring research quality. However, it is less common to value independently the quality of the underlying data as well. In the light of the "data deluge" it makes sense to extend peer review to the data itself and this way evaluate the degree to which the data are fit for re-use. This paper describes a pilot study at EASY—the electronic archive for (open) research data at our institution. In EASY, researchers can archive their data and add metadata themselves. Devoted to open access and data sharing, at the archive we are interested in further enriching these metadata with peer reviews.

As pilot we established a workflow where researchers who have downloaded data sets from the archive were asked to review the downloaded data set. This paper describes the details of the pilot including the findings, both quantitative and qualitative. Finally we discuss issues that need to be solved when such a pilot should be turned into structural peer review functionality of the archiving system.

| Digital Scholarship |

The Open Data Handbook

The Open Knowledge Foundation has released The Open Data Handbook.

Here's an excerpt from the announcement:

From a basic introduction of the "what and why" of open data, the Handbook goes on to discuss the practicalities of making data open – the "how". It gives advice on everything from choosing a file format and applying a license, to motivating the community and telling the world. Clear explanations, illustrative examples and technical recommendations make the Handbook suitable for people with all levels of experience, from the absolute beginner to the seasoned open data professional.

The Handbook is divided into short chapters which cover individual aspects of open data. It can be read in a single sitting, or dipped into as a reference work.

| Digital Curation and Preservation Bibliography | Digital Scholarship |

Review of Data Management Lifecycle Models

Alex Ball has self-archived Review of Data Management Lifecycle Models in the University of Bath institutional repository.

Here's an excerpt:

The importance of lifecycle models is that they provide a structure for considering the many operations that will need to be performed on a data record throughout its life. Many curatorial actions can be made considerably easier if they have been prepared for in advance – even at or before the point of record creation. For example, a repository can be more certain of the preservation actions it can perform if the rights and licensing status of the data has already been clarified, and researchers are more likely to be able to detail the methodologies and workflows they used if they record them at the time.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Data-Intensive Research: Community Capability Model Framework (Consultation Draft)

The Community Capability Model for Data-Intensive Research project has released a consultation draft of the Community Capability Model Framework.

Here's an excerpt:

The Community Capability Model Framework is a tool developed by UKOLN, University of Bath, and Microsoft Research to assist institutions, research funders and researchers in growing the capability of their communities to perform data-­-intensive research by

  • profiling the current readiness or capability of the community,
  • indicating priority areas for change and investment, and
  • developing roadmaps for achieving a target state of readiness.

The Framework is comprised of eight capability factors representing human, technical and environmental issues. Within each factor are a series of community characteristics that are relevant for determining the capability or readiness of that community to perform data- intensive research.

| E-science and Academic Libraries Bibliography | Digital Scholarship |

Collaborative Yet Independent: Information Practices in the Physical Sciences

The Research Information Network, the Institute of Physics, Institute of Physics Publishing, and the Royal Astronomical Society have released Collaborative Yet Independent: Information Practices in the Physical Sciences.

Here's an excerpt:

In many ways, the physical sciences are at the forefront of using digital tools and methods to work with information and data. However, the fields and disciplines that make up the physical sciences are by no means uniform, and physical scientists find, use, and disseminate information in a variety of ways. This report examines information practices in the physical sciences across seven cases, and demonstrates the richly varied ways in which physical scientists work, collaborate, and share information and data.

| Digital Bibliographies | Digital Scholarship |

Open Access: Online Survey on Scientific Information in the Digital Age

The European Commission has released the Online Survey on Scientific Information in the Digital Age.

Here's an excerpt:

Respondents were asked if there is no access problem to scientific publications in Europe: 84 % disagreed or disagreed strongly with the statement. The high prices of journals/subscriptions (89%) and limited library budgets (85%) were signalled as the most important barriers to accessing scientific publications. More than 1,000 respondents (90%) supported the idea that publications resulting from publicly funded research should, as a matter of principle, be in open access (OA) mode. An even higher number of respondents (91%) agreed or agreed strongly that OA increased access to and dissemination of scientific publications. Self-archiving ("green OA") or a combination of self-archiving and OA publishing ("gold OA") were identified as the preferred ways that public research policy should facilitate in order to increase the number and share of scientific publications available in OA. Respondents were asked, in the case of self-archiving ("green OA"), what the desirable embargo period is (period of time during which publication is not yet open access): a six-month period was favoured by 56% of respondents (although 25% disagree with this option).

| Transforming Scholarly Publishing through Open Access: A Bibliography | Digital Scholarship Publications Overview |

ARL, Johns Hopkins University Libraries, and SPARC Reply to White House RFI on Public Access to Digital Data

The Association of Research Libraries, the Johns Hopkins University Libraries, and SPARC have replied to the White House's Request for Information: Public Access to Digital Data Resulting from Federally Funded Scientific Research.

Here's an excerpt:

Question 1

What specific Federal policies would encourage public access to and the preservation of broadly valuable digital data resulting from federally funded scientific research, to grow the U.S. economy and improve the productivity of the American scientific enterprise?

Comment 1

The most effective Federal policies in this regard would mandate digital data deposit into publicly accessible repositories. In the absence of such policies, there are already cases of digital data which have been lost or remain inaccessible or accessible only with high barriers. While laudable efforts such as the NSF and NIH data management plans move the community in the direction of supporting U.S. economic growth and productivity, the reality is that many researchers continue to strictly interpret the requirement as sharing data based on specific requests or personal provisions. The Federal policy framework should move public access to digital data away from the current idiosyncratic environment to a systematic approach that lowers barriers to data access, discovery, sharing and re-use.

Instead of relying upon individual investigators to interpret and support public access through a point to point network (e.g., researcher provides digital data upon request), Federal policies should ensure that public access can occur through well managed, sustained, preservation archives that enable a legally and policy compliant peer to peer model for sharing. A useful metric for full-fledged public access to digital data is whether someone (or some machine) other than the original data producer can discover, access, interpret and use the digital data without contacting the original data producer.

See also Columbia University Libraries/Information Services' reply and the Creative Commons' reply.

| Transforming Scholarly Publishing through Open Access: A Bibliography | Digital Scholarship |

Three New Documents about Creative Commons Licenses for Data

The Creative Commons has released three new documents about the use of its licenses for data: "Data," "Data and CC Licenses," and "CC0 Use for Data."

Here's an excerpt from the announcement by Sarah Hinchliff Pearson:

We have done a lot of thinking about data in the past year. As a result, we have recently published a set of detailed FAQs designed to help explain how CC licenses work with data and databases.

These FAQs are intended to:

  1. alert CC licensors that some uses of their data and databases may not trigger the license conditions,
  2. reiterate to licensees that CC licenses do not restrict them from doing anything they are otherwise permitted to do under the law, and
  3. clear up confusion about how the version 3.0 CC licenses treat sui generis database rights.

| Digital Scholarship's Weblogs and Tweets | Digital Scholarship |

"The Open Knowledge Foundation: Open Data Means Better Science"

Jennifer C. Molloy has published "The Open Knowledge Foundation: Open Data Means Better Science" in PLoS Biology.

Here's an excerpt:

Data provides the evidence for the published body of scientific knowledge, which is the foundation for all scientific progress. The more data is made openly available in a useful manner, the greater the level of transparency and reproducibility and hence the more efficient the scientific process becomes, to the benefit of society. This viewpoint is becoming mainstream among many funders, publishers, scientists, and other stakeholders in research, but barriers to achieving widespread publication of open data remain. The Open Data in Science working group at the Open Knowledge Foundation is a community that works to develop tools, applications, datasets, and guidelines to promote the open sharing of scientific data. This article focuses on the Open Knowledge Definition and the Panton Principles for Open Data in Science. We also discuss some of the tools the group has developed to facilitate the generation and use of open data and the potential uses that we hope will encourage further movement towards an open scientific knowledge commons.

| Digital Scholarship's Digital Bibliographies | Digital Scholarship |

"Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results"

Jelte M. Wicherts, Marjan Bakker, Dylan Molenaar have published "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results" in PLoS ONE.

Here's an excerpt:

We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance.

| Digital Scholarship's Digital Bibliographies | Digital Scholarship |

Costs and Benefits of Data Provision: Report to the Australian National Data Service

The Australian National Data Service has released Costs and Benefits of Data Provision: Report to the Australian National Data Service by John Houghton.

Here's an excerpt:

This report presents case studies exploring the costs and benefits that PSI [Public Sector Information] producing agencies and their users experience in making information freely available, and preliminary estimates of the wider economic impacts of open access to PSI. In doing so, it outlines a possibly method for cost-benefit analysis at the agency level and explores the data requirements for such an analysis —recognising that few agencies will have all of the data required. . . .

What this study demonstrates is that the direct and measurable benefits of making PSI available freely and without restrictions on use typically outweigh the costs. When one adds the longerterm benefits that we cannot fully measure, and may not even foresee, the case for open access appears to be strong.

| Transforming Scholarly Publishing through Open Access: A Bibliography | Digital Scholarship |

A Surfboard for Riding the Wave—Towards a Four Country Action Programme on Research Data

The Knowledge Exchange has released A Surfboard for Riding the Wave—Towards a Four Country Action Programme on Research Data.

Here's an excerpt from the announcement:

The report not only offers an overview of the present activities and challenges in the field of research data in Denmark, Germany, the Netherlands and the United Kingdom but also outlines an action programme for the four countries in realising a collaborative data infrastructure. This report is a response to the Riding the Wave report which was published by the High Level Expert Group on Scientific Data. . . .

In the report four key drivers are addressed: incentives for researchers, training in relation to researchers in their role as data producers and users of information infrastructure, organisational and technical infrastructure and, finally, the funding of the infrastructure. The report offers recommendations for actions in each of these fields for the partners and others, not only in the four partner countries, but also beyond these borders.

Based on the overview of the present situation in the four Knowledge Exchange partner countries, the report formulates three long-term strategic goals:

  1. Data sharing will be part of the academic culture
  2. Data logistics will be an integral component of academic professional life
  3. Data infrastructure will be sound, both operationally and financially.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Report on Integration of Data and Publications

The Alliance for Permanent Access has released Report on Integration of Data and Publications.

Here's an excerpt:

This report sets out to identify examples of integration between datasets and publications. Findings from existing studies carried out by PARSE.Insight, RIN, SURF and various recent publications are synthesized and examined in relation to three distinct disciplinary groups in order to identify opportunities in the integration of data.

| Scholarly Electronic Publishing Bibliography 2010 | Digital Scholarship |

"Linking to Data—Effect on Citation Rates in Astronomy"

Edwin A. Henneken and Alberto Accomazzi have self-archived "Linking to Data—Effect on Citation Rates in Astronomy" in arXiv.org.

Here's an excerpt:

Is there a difference in citation rates between articles that were published with links to data and articles that were not? Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of furthering science. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. However, linking to data still is a far cry away from being a "practice", especially where it comes to authors providing these links during the writing and submission process. You need to have both a willingness and a publication mechanism in order to create such a practice. Showing that articles with links to data get higher citation rates might increase the willingness of scientists to take the extra steps of linking data sources to their publications. In this presentation we will show this is indeed the case: articles with links to data result in higher citation rates than articles without such links.

| New: E-science and Academic Libraries Bibliography | Digital Scholarship |

"Openness as Infrastructure"

John Wilbanks has published "Openness as Infrastructure" in the Journal of Cheminformatics.

Here's an excerpt:

The advent of open access to peer reviewed scholarly literature in the biomedical sciences creates the opening to examine scholarship in general, and chemistry in particular, to see where and how novel forms of network technology can accelerate the scientific method. This paper examines broad trends in information access and openness with an eye towards their applications in chemistry.

| Transforming Scholarly Publishing through Open Access: A Bibliography | Digital Scholarship |

Data Management Planning: Open Source DMPTool Launched by University of California Curation Center and Others

The University of California Curation Center has announced the launch of DMPTool.

Here's an excerpt from the press release:

The University of California and several other major research institutions have partnered to develop the DMPTool, a flexible online application to help researchers generate data management plans—simple but effective documents for ensuring good data stewardship. These plans increasingly are being required by funders such as the National Science Foundation (NSF), the National Institutes of Health (NIH) and the Gordon and Betty Moore Foundation (GBMF). The DMPTool supports data management plans and funder requirements across the disciplines, including the humanities and physical, medical and social sciences. . . .

The DMPTool is open source, freely available and easily configurable to reflect an institution's local policies and information. Users of the DMPTool can view sample plans, preview funder requirements and view the latest changes to their plans. It permits the user to create an editable document for submission to a funding agency and can accommodate different versions as funding requirements change. Not only can researchers use the tool to generate plans compliant to funder requirements, but institutions also can use the tool to present information and policies relevant to data management and to foster collaboration among faculty, the institutional libraries, contracts and grants offices, and academic computing. . . .

Project partners include the University of California Curation Center (UC3) at the California Digital Library, the UCLA Library, the UC San Diego Libraries, the Smithsonian Institution, the University of Virginia Library, the University of Illinois at Urbana-Champaign, DataONE, and the United Kingdom's Digital Curation Centre. Working collaboratively, these institutions have consolidated their expertise and reduced their costs.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

"Federal Funding Agencies: Data Management and Sharing Policies"

The California Digital Library has released "Federal Funding Agencies: Data Management and Sharing Policies."

Here's an excerpt:

The Office of Management and Budget (OMB) Circular A-110 provides the federal administrative requirements for grants and agreements with institutions of higher education, hospitals and other non-profit organizations. In 1999 Circular A-110 was revised to provide public access under some circumstances to research data through the Freedom of Information Act (FOIA).

Funding agencies have implemented the OMB requirement in various ways. The table below summarizes the data management and sharing requirements of primary US federal funding agencies.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Cite Datasets and Link to Publications

The Digital Curation Centre has released Cite Datasets and Link to Publications.

Here's an excerpt:

This guide will help you create links between your academic publications and the underlying datasets, so that anyone viewing the publication will be able to locate the dataset and vice versa. It provides a working knowledge of the issues and challenges involved, and of how current approaches seek to address them. This guide should interest researchers and principal investigators working on data-led research, as well as the data repositories with which they work.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

E-science and Academic Libraries Bibliography

Digital Scholarship has released the E-science and Academic Libraries Bibliography. It includes English-language articles, books, editorials, and technical reports that are useful in understanding the broad role of academic libraries in e-science efforts. The scope of this brief selective bibliography is narrow, and it does not cover data curation and research data management issues in libraries in general. Most sources have been published from 2007 through October 18, 2011; however, a limited number of key sources published prior to 2007 are also included. The bibliography includes links to freely available versions of included works, such as e-prints and open access articles.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |