Data Management Planning: Open Source DMPTool Launched by University of California Curation Center and Others

The University of California Curation Center has announced the launch of DMPTool.

Here's an excerpt from the press release:

The University of California and several other major research institutions have partnered to develop the DMPTool, a flexible online application to help researchers generate data management plans—simple but effective documents for ensuring good data stewardship. These plans increasingly are being required by funders such as the National Science Foundation (NSF), the National Institutes of Health (NIH) and the Gordon and Betty Moore Foundation (GBMF). The DMPTool supports data management plans and funder requirements across the disciplines, including the humanities and physical, medical and social sciences. . . .

The DMPTool is open source, freely available and easily configurable to reflect an institution's local policies and information. Users of the DMPTool can view sample plans, preview funder requirements and view the latest changes to their plans. It permits the user to create an editable document for submission to a funding agency and can accommodate different versions as funding requirements change. Not only can researchers use the tool to generate plans compliant to funder requirements, but institutions also can use the tool to present information and policies relevant to data management and to foster collaboration among faculty, the institutional libraries, contracts and grants offices, and academic computing. . . .

Project partners include the University of California Curation Center (UC3) at the California Digital Library, the UCLA Library, the UC San Diego Libraries, the Smithsonian Institution, the University of Virginia Library, the University of Illinois at Urbana-Champaign, DataONE, and the United Kingdom's Digital Curation Centre. Working collaboratively, these institutions have consolidated their expertise and reduced their costs.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

"Federal Funding Agencies: Data Management and Sharing Policies"

The California Digital Library has released "Federal Funding Agencies: Data Management and Sharing Policies."

Here's an excerpt:

The Office of Management and Budget (OMB) Circular A-110 provides the federal administrative requirements for grants and agreements with institutions of higher education, hospitals and other non-profit organizations. In 1999 Circular A-110 was revised to provide public access under some circumstances to research data through the Freedom of Information Act (FOIA).

Funding agencies have implemented the OMB requirement in various ways. The table below summarizes the data management and sharing requirements of primary US federal funding agencies.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Cite Datasets and Link to Publications

The Digital Curation Centre has released Cite Datasets and Link to Publications.

Here's an excerpt:

This guide will help you create links between your academic publications and the underlying datasets, so that anyone viewing the publication will be able to locate the dataset and vice versa. It provides a working knowledge of the issues and challenges involved, and of how current approaches seek to address them. This guide should interest researchers and principal investigators working on data-led research, as well as the data repositories with which they work.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

E-science and Academic Libraries Bibliography

Digital Scholarship has released the E-science and Academic Libraries Bibliography. It includes English-language articles, books, editorials, and technical reports that are useful in understanding the broad role of academic libraries in e-science efforts. The scope of this brief selective bibliography is narrow, and it does not cover data curation and research data management issues in libraries in general. Most sources have been published from 2007 through October 18, 2011; however, a limited number of key sources published prior to 2007 are also included. The bibliography includes links to freely available versions of included works, such as e-prints and open access articles.

| Digital Curation and Preservation Bibliography 2010 | Digital Scholarship |

Changing the Conduct of Science in the Information Age

The National Science Foundation has released Changing the Conduct of Science in the Information Age.

Here's an excerpt:

The U.S. National Science Foundation (NSF) held a workshop titled "Changing the Conduct of Science in the Information Age" on November 12, 2010, to promote international cooperation in such policy areas as the promotion of data access, the development of technical solutions for open data platforms, and attribution for research contributions. This report describes the discussions, findings, and suggestions generated by the distinguished group of international workshop participants. . . .

There was a strong consensus that this vision could be achieved with the help of a concerted, collaborative effort by international funding agencies to:

  1. Establish a system of persistent identifiers for researchers and their outputs;
  2. Develop national and international pilot projects that compare different technical solutions for establishing and maintaining open data platforms, fostering the replication of scientific research, and ensuring attribution for the intellectual contributions of researchers; and
  3. Foster formal and informal training to develop scientists' skills in knowledge and data access, as well as data analysis.

| New: Institutional Repository and ETD Bibliography 2011 | Digital Scholarship |

Data Centres: Their Use, Value and Impact

The Research Information Network has released Data Centres: Their Use, Value and Impact.

Here's an excerpt:

In recent years, the value of data as a primary research output has begun to be increasingly recognised. New technology has made it possible to create, store and reuse datasets, either for new analysis or for combination with other data in order to answer different questions. In the UK, academic researchers, funders and institutions have responded to these possibilities by supporting a number of data centres' organisations with responsibility for supplying research data to the academic community, and in some cases for collecting, storing and curating such data as well. . . .

This study sought to understand usage of UK data centres among researchers, and to examine the impact of such use upon their work. We undertook a series of initial interviews with research funders to understand the role and importance of data and data centres within various academic fields, followed by a survey of the users of five data centres. Finally, through the interviews and surveys, a set of case studies was identified where the data centre had benefited a researcher's work, and in some cases that work had gone on to have an impact in wider society.

| New: Institutional Repository and ETD Bibliography 2011 | Digital Scholarship |

"Extracting, Transforming and Archiving Scientific Data"

Daniel Lemire and Andre Vellin have self-archived "Extracting, Transforming and Archiving Scientific Data" in arXiv.org.

Here's an excerpt:

It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.

| Digital Scholarship |

Data Privacy Legislation: An Analysis of the Current Legislative Landscape and the Implications for Higher Education

EDUCAUSE has released Data Privacy Legislation: An Analysis of the Current Legislative Landscape and the Implications for Higher Education .

Here's an excerpt:

With the ubiquity of mobile devices and the increases in data breaches, Congress has responded with bipartisan support for comprehensive privacy legislation. As of August 2011, 18 bills have been introduced in the 112th Congress concerning data privacy. . . .

These privacy bills generally fall into three distinct areas: comprehensive online privacy protection, geolocation and mobile devices, and data security and breach notification. If enacted, many of the bills have implications for data collection, storage, and use that could affect higher education and campus IT operations and academic research.

| Digital Scholarship |

European Commission Launches Public Consultation on Digital Scientific Information Access and Preservation

The European Commission has launched a public consultation on digital scientific information access and preservation.

Here's an excerpt from the press release:

A public consultation on access to, and preservation of, digital scientific information has been launched by the European Commission on the initiative of European Commission Vice President for the Digital Agenda Neelie Kroes and Commissioner for Research and Innovation, Máire Geoghegan-Quinn. European researchers, engineers and entrepreneurs must have easy and fast access to scientific information, to compete on an equal footing with their counterparts across the world. Modern digital infrastructures can play a key role in facilitating access. However, a number of challenges remain, such as high and rising subscription prices to scientific publications, an ever-growing volume of scientific data, and the need to select, curate and preserve research outputs. Open access, defined as free access to scholarly content over the Internet, can help address this. Scientists, research funding organisations, universities, and other interested parties are invited to send their contributions on how to improve access to scientific information. The consultation will run until 9 September 2011. . . .

Interested parties are invited to express their views on the following key science policy questions:

  • how scientific articles could become more accessible to researchers and society at large
  • how research data can be made widely available and how it could be re-used
  • how permanent access to digital content can be ensured and what barriers are preventing the preservation of scientific output

| Digital Curation and Preservation Bibliography 2010 | Electronic Theses and Dissertations Bibliography | Google Books Bibliography | Institutional Repository Bibliography | Transforming Scholarly Publishing through Open Access: A Bibliography | Scholarly Electronic Publishing Bibliography 2010 | Digital Scholarship Publications Overview |

"Who Shares? Who Doesn’t? Factors Associated with Openly Archiving Raw Research Data"

Heather A. Piwowar has published "Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data" in PLoS One.

Here's an excerpt:

First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available.

| Digital Curation and Preservation Bibliography 2010 | Institutional Repository Bibliography | Transforming Scholarly Publishing through Open Access: A Bibliography | Scholarly Electronic Publishing Bibliography 2010 |

JISC Managing Research Data Programme Issues Call for Grant Proposals

The JISC Managing Research Data Programme has issued a call for grant proposals.

Here's an excerpt from the notice:

A total of approximately £4.6M will be available, divided across three strands. The deadline for submissions will be 28 July 2011. . . .

The strands are as follows:

Strand A: Institutional Research Data Management Infrastructure: divided between A(1) Start-up projects to help institutions that are at an early stage of developing a research data management infrastructure; and A(2) Embedding projects to help institutions enhance and extend an existing pilot research data management infrastructure. . . .

Strand B: Research Data Management Planning: projects to design and implement research data management plans for specific projects/departments; including supporting systems and tools. . . .

Strand C: Projects to develop and implement institutional data management planning tools/workflows.

| Digital Scholarship | Digital Scholarship Publications Overview | Digital Curation and Preservation Bibliography 2010 |

Managing and Sharing Data: Best Practice for Researchers

The UK Data Archive has released a new edition of Managing and Sharing Data: Best Practice for Researchers.

Here's an excerpt from the announcement:

To support researchers in producing high quality research data for long-term use, the UK Data Archive has revised and expanded its popular and highly cited Managing and Sharing Data: best practice for researchers, first published in 2009.

The new third edition is 36 pages covering:

  • why and how to share research data
  • data management planning and costing
  • documenting data
  • formatting data
  • storing data
  • ethics and consent issues
  • data copyright
  • data management strategies for large investments

| Digital Scholarship | Digital Scholarship Publications Overview | Digital Curation and Preservation Bibliography 2010 |

Open Data: UK Engineering and Physical Sciences Research Council Adopts EPSRC Policy Framework on Research Data

The UK Engineering and Physical Sciences Research Council, which is "the main UK government agency for funding research and training in engineering and the physical sciences," has adopted the EPSRC Policy Framework on Research Data.

Here's an excerpt from the document:

This policy framework sets out EPSRC's expectations concerning the management and provision of access to EPSRC-funded research data. EPSRC recognises that a range of institutional policies and practices can satisfy these expectations, and encourages research organisations to develop specific approaches which, while aligned with EPSRC's expectations, are appropriate to their own structures and cultures.

The expectations arise from seven core principles which align with the core RCUK principles on data sharing. Two of the principles are of particular importance: firstly, that publicly funded research data should generally be made as widely and freely available as possible in a timely and responsible manner; and, secondly, that the research process should not be damaged by the inappropriate release of such data.

| Digital Scholarship | Digital Scholarship Publications Overview | Digital Curation and Preservation Bibliography 2010 |

"Tragedy of the Data Commons"

Jane Yakowitz has self-archived "Tragedy of the Data Commons" in SSRN.

Here's an excerpt:

Accurate data is vital to enlightened research and policymaking, particularly publicly available data that are redacted to protect the identity of individuals. Legal academics, however, are campaigning against data anonymization as a means to protect privacy, contending that wealth of information available on the Internet enables malfeasors to reverse-engineer the data and identify individuals within them. Privacy scholars advocate for new legal restrictions on the collection and dissemination of research data. This Article challenges the dominant wisdom, arguing that properly de-identified data is not only safe, but of extraordinary social utility. It makes three core claims. First, legal scholars have misinterpreted the relevant literature from computer science and statistics, and thus have significantly overstated the futility of anonymizing data. Second, the available evidence demonstrates that the risks from anonymized data are theoretical – they rarely, if ever, materialize. Finally, anonymized data is crucial to beneficial social research, and constitutes a public resource – a commons – under threat of depletion. The Article concludes with a radical proposal: since current privacy policies overtax valuable research without reducing any realistic risks, law should provide a safe harbor for the dissemination of research data.

| Digital Scholarship | Digital Scholarship Publications Overview | Digital Curation and Preservation Bibliography 2010 |

"Joining in the Enterprise of Response in the Wake of the NSF Data Management Planning Requirement"

Patricia Hswe and Ann Holt have published "Joining in the Enterprise of Response in the Wake of the NSF Data Management Planning Requirement" in the latest issue of Research Library Issues.

Here's an excerpt:

This article affords an overview of the new, leading roles libraries can adopt in the provision of data services, thus blending appraisal with advocacy. How are libraries currently giving assistance in data management planning? What recommendations can libraries make that draw from, and build on, these efforts? The article also reports on new communities of practice forming around the challenges of digital data issues, bringing together much needed knowledge and expertise not only from libraries but also from various other sectors of a university, including IT divisions, grant administration offices, and research institutes.

| Digital Scholarship | Digital Scholarship Publications Overview | Digital Curation and Preservation Bibliography 2010 |

Digital Research Data: What Researchers Want

The SURFfoundation has released What Researchers Want.

Here's an excerpt from the announcement:

This publication reviews recent literature describing what researchers want with regard to data storage and access. It was commissioned by SURFfoundation. Fifteen recent sources were studied, covering the Netherlands, the UK, the USA, Australia, and Europe. . . .

The following factors play a role in making storage successful:

  • Tools and services must be in tune with researchers’ workflows, which are often discipline-specific (and sometimes even project-specific)
  • Researchers resist top-down and/or mandatory schemes.
  • Researchers favour a “cafeteria” model in which they can pick and choose from a set of services.
  • Tools and services must be easy to use.
  • Researchers must be in control of what happens to their data, who has access to it, and under what conditions. Consequently, they want to be sure that whoever is dealing with their data (data centre, library, etc.) will respect their interests.
  • Researchers expect tools and services to support their day-to-day work within the research project; long-term/public requirements must be subordinate to that interest.
  • The benefits of the support must clearly visible – not in three years’ time, but now.
  • Support must be local, hands-on, and available when needed.

| Digital Scholarship | Digital Scholarship Publications Overview | Reviews of Digital Scholarship Publications |

How to License Research Data

The Digital Curation Centre has released How to License Research Data.

Here's an excerpt:

This guide will help you decide how to apply a licence to your research data, and which licence would be most suitable. It should provide you with an awareness of why licensing data is important, the impact licences have on future research, and the potential pitfalls to avoid. It concentrates on the UK context, though some aspects apply internationally; it does not, however, provide legal advice. The guide should interest both the principal investigators and researchers responsible for the data, and those who provide access to them through a data centre, repository or archive.

| Digital Scholarship | Digital Scholarship Publications Overview |

MIT Libraries Awarded $650,000 grant from the Library of Congress for Exhibit 3.0 Project

The MIT Libraries have been awarded a $650,000 grant from the Library of Congress for the Exhibit 3.0 Project.

Here's an excerpt from the press release:

The MIT Libraries has been awarded a $650,000 grant from the Library of Congress for work in collaboration with the MIT Computer Science and Artificial Intelligence Lab (CSAIL) and Zepheira, Inc. on "Exhibit 3.0," a new project to redesign and expand upon Exhibit, the popular open source software tool for searching, browsing and visualizing data on the Web. The goal is to provide libraries, cultural institutions and other organizations grappling with large amounts of digital content, with an enhanced tool that is scalable and useful for data management, visualization and navigation. According to the Library of Congress, "It is the Library's intent that this work also will further contribute to the collaborative knowledge sharing among the broader communities concerned about the critical infrastructure that will ensure sustainability and accessibility of digital content over time."

"This innovative work has already made a considerable impact on digital content communities whose data is diverse and complex. The visualizations bring new understanding to users and curators alike," said Martha Anderson, Director of the National Digital Information Infrastructure and Preservation Program at the Library of Congress. "We're extremely fortunate to have the support of the Library of Congress on this important research," said Ann Wolpert, director of the MIT Libraries. "Our hope is that Exhibit 3.0 will be a useful tool in tackling the daunting challenge all libraries face in ensuring the future sustainability and accessibility of our digital content."

Exhibit was originally developed as part of the MIT Simile Project (simile.mit.edu), an ambitious collaboration of the MIT Libraries, the MIT CSAIL, and the World Wide Web Consortium (W3C) to explore applications of the Semantic Web to problems of information management across both large-scale digital libraries and small-scale personal collections. Exhibit runs inside a Web browser and supports many types of information using common Web standards for data publishing. Since its release, Exhibit has been used by thousands of websites worldwide across a range of diverse industries including cultural heritage, libraries, publishers, medical research, life science and government. Most recently Exhibit has been used by DATA.GOV (http://data.gov/), an Open Government Initiative by President Obama's administration to increase public access to high value data generated by the Executive Branch of the Federal Government. The application has been used to help demonstrate new ways of visualizing government data. . . .

The Exhibit 3.0 project will redesign and re-implement Exhibit to scale from small collections to very large data collections of the magnitude created by the Library of Congress and its National Digital Information Infrastructure and Preservation Program (NDIIPP). The redesigned Exhibit will be as simple to use as the current tool but more scalable, more modular, and easier to integrate into a variety of information management systems and websites—offering an improved user experience.

In addition to the Library of Congress, the MIT Libraries and other organizations that manage large quantities of data will collaborate on the project for their own collections. A major focus of the project will be to build a lively community around Exhibit, of both users of the software and software developers, to help continuously improve the open source tool. Another aspect of the new project will incorporate research by students at MIT's CSAIL (Computer Science and Artificial Intelligence Lab) on personal information management. The research will focus on improving the user experience working with data in Exhibit, and incorporating new data visualization techniques that allow users to explore data in novel ways. "Impressive data-interactive sites abound on the web, but right now you need a team of developers to create them. Exhibit demonstrated that authoring data-interactive sites can be as easy as authoring a static web page. With Exhibit 3.0 we can move from a prototype to a robust platform that anyone can use to author (not program) rich interactive information visualizations that effectively communicate with their users," said David Karger, computer science professor with CSAIL.

The project will begin in January for a period of one year, and a new website and other communication channels will be publicized soon. For more information see http://similewidgets.org/exhibit3.

| Digital Scholarship |

"Data Preservation in High Energy Physics"

David M. South has self-archived "Data Preservation in High Energy Physics" in arXiv.org.

Here's an excerpt:

Data from high-energy physics (HEP) experiments are collected with significant financial and human effort and are in many cases unique. At the same time, HEP has no coherent strategy for data preservation and re-use, and many important and complex data sets are simply lost. In a period of a few years, several important and unique experimental programs will come to an end, including those at HERA, the b-factories and at the Tevatron. An inter-experimental study group on HEP data preservation and long-term analysis (DPHEP) was formed and a series of workshops were held to investigate this issue in a systematic way. The physics case for data preservation and the preservation models established by the group are presented, as well as a description of the transverse global projects and strategies already in place.

| Digital Scholarship |

Unchartered Waters—The State of Open Data in Europe

CSC has released Unchartered Waters—The State of Open Data in Europe

Here's an excerpt:

This study analyses the current state of the open data policy ecosystem and open government data offerings in nine European Member States. Since none of the countries studied currently offers a national open data portal, this study compares the statistics offices’ online data offerings. The analysis shows that they fulfill a number of open data principles but that there is still a lot of room for improvement. This study underlines that the development of data catalogues and portals should not be seen as means to an end.

| Digital Scholarship |

America COMPETES Act Establishes Interagency Public Access Committee

The signing of the America COMPETES Reauthorization Act of 2010 by President Obama establishes a new Interagency Public Access Committee. The International Association of Scientific, Technical & Medical Publishers (STM) has issued a press release that "applauds the efforts of US legislators in crafting the charter of the Interagency Public Access Committee."

Here's an excerpt from the Act:

SEC. 103. INTERAGENCY PUBLIC ACCESS COMMITTEE.

(a) ESTABLISHMENT.—The Director shall establish a working group under the National Science and Technology Council with

the responsibility to coordinate Federal science agency research and policies related to the dissemination and long-term stewardship of the results of unclassified research, including digital data and peer-reviewed scholarly publications, supported wholly, or in part, by funding from the Federal science agencies.

(b) RESPONSIBILITIES.—The working group shall—

(1) identify the specific objectives and public interests that need to be addressed by any policies coordinated under (a);

(2) take into account inherent variability among Federal science agencies and scientific disciplines in the nature of research, types of data, and dissemination models;

(3) coordinate the development or designation of standards for research data, the structure of full text and metadata, navigation tools, and other applications to maximize interoperability across Federal science agencies, across science and engineering disciplines, and between research data and scholarly publications, taking into account existing consensus standards, including international standards;

(4) coordinate Federal science agency programs and activities that support research and education on tools and systems required to ensure preservation and stewardship of all forms of digital research data, including scholarly publications;

(5) work with international science and technology counterparts to maximize interoperability between United States based unclassified research databases and international databases and repositories;

(6) solicit input and recommendations from, and collaborate with, non-Federal stakeholders, including the public, universities, nonprofit and for-profit publishers, libraries, federally funded and non federally funded research scientists, and other organizations and institutions with a stake in long term preservation and access to the results of federally funded research;

(7) establish priorities for coordinating the development of any Federal science agency policies related to public access to the results of federally funded research to maximize the benefits of such policies with respect to their potential economic or other impact on the science and engineering enterprise and the stakeholders thereof;

(8) take into consideration the distinction between scholarly publications and digital data;

(9) take into consideration the role that scientific publishers play in the peer review process in ensuring the integrity of the record of scientific research, including the investments and added value that they make; and

(10) examine Federal agency practices and procedures for providing research reports to the agencies charged with locating and preserving unclassified research.

(c) PATENT OR COPYRIGHT LAW.—Nothing in this section shall be construed to undermine any right under the provisions of title 17 or 35, United States Code.

(d) APPLICATION WITH EXISTING LAW.—Nothing defined in section

(b) shall be construed to affect existing law with respect to Federal science agencies’ policies related to public access.

(e) REPORT TO CONGRESS.—Not later than 1 year after the date of enactment of this Act, the Director shall transmit a report to Congress describing—

(1) the specific objectives and public interest identified under (b)(1);

(2) any priorities established under subsection (b)(7);

(3) the impact the policies described under (a) have had on the science and engineering enterprise and the stakeholders, including the financial impact on research budgets;

(4) the status of any Federal science agency policies related to public access to the results of federally funded research; and

(5) how any policies developed or being developed by Federal science agencies, as described in subsection (a), incorporate input from the non-Federal stakeholders described in subsection (b)(6).

(f) FEDERAL SCIENCE AGENCY DEFINED.—For the purposes of this section, the term ‘‘Federal science agency’’ means any Federal agency with an annual extramural research expenditure of over $100,000,000.

| Digital Scholarship |