Research Data Curation Bibliography, Version 2

Digital Scholarship has released version 2 of the Research Data Curation Bibliography. This selective bibliography includes over 200 English-language articles and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions. It has doubled in size since version 1.

Most sources have been published from 2000 through 2012; however, a limited number of earlier key sources are also included.

The bibliography includes links to freely available versions of included works. If such versions are unavailable, italicized links to the publishers' descriptions are provided.

It is available under a Creative Commons Attribution-Noncommercial 3.0 United States License.

The Case for International Sharing of Scientific Data: A Focus on Developing Countries: Proceedings of a Symposium

The National Academies Press has released The Case for International Sharing of Scientific Data: A Focus on Developing Countries: Proceedings of a Symposium. (Downloading the report requires registration.)

Here's an excerpt:

The culture of science has been international and open for centuries. Indeed, the scientific enterprise can only work when all information is open and accessible, because science works through critical analysis and replication of results. In recent years, as some scientific data, and especially technological data, have increased in economic value frequently has caused us to be far less open with information than business and free enterprise require us to be. Indeed, the worldwide shift to what is known as open innovation is strengthening every day.

| Research Data Curation Bibliography | Digital Scholarship |

"DMP Online and DMPTool: Different Strategies Towards a Shared Goal"

Andrew Sallans and Martin Donnelly have published "DMP Online and DMPTool: Different Strategies Towards a Shared Goal" in the latest issue of The International Journal of Digital Curation.

Here's an excerpt:

This paper provides a comparative discussion of the strategies employed in the UK's DMP Online tool and the US's DMPTool, both designed to provide a structured environment for research data management planning (DMP) with explicit links to funder requirements. Following the Sixth International Digital Curation Conference, held in Chicago in December 2010, a number of US institutions partnered with the Digital Curation Centre's DMP Online team to learn from their experiences while developing a US counterpart. DMPTool arrived in beta in August 2011 and released a production version in November 2011. This joint paper will compare and contrast use cases, organizational and national/cultural characteristics that have influenced the development decisions, outcomes achieved so far, and planned future developments.

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

|Digital Scholarship | Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works |

Fit for Purpose: Developing Business Cases for New Services in Research Libraries Webinar Recording

DuraSpace has released a recording of its Fit for Purpose: Developing Business Cases for New Services in Research Libraries webinar.

Here's an excerpt from the announcement:

Mike Furlough, Associate Dean of Research and Scholarly Communications, Penn State and David Minor Chronopolis Program Manager and Director of Digital Preservation Initiatives University of California San Diego Library/SDSC presented "Fit for Purpose: Developing Business Cases for New Services in Research Libraries" to participants in the DuraSpace/ARL/DLF E-Science Institute. In this webinar, the presenters discussed the CLIR/DLF-funded research project Fit for Purpose, which aims to present a structured, disciplined approach for making decisions about creating and maintaining new services in research libraries.

| Digital Curation Resource Guide | Digital Scholarship |

Purdue University Libraries Launches the Data Curation Profiles Directory

The Purdue University Libraries have launched the Data Curation Profiles Directory

Here's an excerpt from the announcement:

Data Curation Profiles (DCP) are in-depth publications which provide detailed descriptions of research data sets and collections. The DCP, and the associated Toolkit which provides instructions and advice on composing them, are the results of research funded by the Institute of Museum and Library Services (IMLS).

Working with Purdue University Libraries Scholarly Publishing Services, the Data Curation Profiles Directory provides a suite of services to support publication, including: assigning a DOI and citation for each published DCP, improved visibility for Profiles through inclusion in indexing and discovery tools, and a commitment to the preservation of DCPs through CLOCKSS and Portico.

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

| Digital Scholarship | Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works |

Academic Libraries and Research Data Services: Current Practices and Plans for the Future

ARCL has released Academic Libraries and Research Data Services: Current Practices and Plans for the Future.

Here's an excerpt:

This study surveyed a cross section of academic library members of the Association of College and Research Libraries (ACRL) in the United States and Canada to provide a baseline assessment of the current state of and future plans for research data services in academic libraries in these countries.

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

| Digital Scholarship | Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works |

Long-Term Sustainability of Data Archives: EUDAT Sustainability Plan

The EUDAT project has released the EUDAT Sustainability Plan.

Here's an excerpt:

We survey the current provision of infrastructure and long-term data archival services in Europe and review recent efforts to assess the costs involved in preserving research data (Chapters 1 to 4). To focus and constrain sustainability planning, we introduce a number of candidate guiding principles for EUDAT (Chapter 5) and suggest an overall logical model of its future shape, and a number of possible mechanisms for realising this model (Chapter 6). We discuss possible mechanisms to define levels of service and provide funding for a future EUDAT CDI, and introduce our intent to measure actual costs of delivering EUDAT services through an activity-based cost modelling exercise (Chapters 7 and 8).

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

| Digital Scholarship | Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works |

Status and Outlook for University of Michigan Research Profile Data Strategy

Natsuko Nicholls has self-archived Status and Outlook for University of Michigan Research Profile Data Strategy in Deep Blue.

Here's an excerpt:

My investigation into various faculty expertise efforts and activities across institutions shows that many universities have not yet developed or adopted a centralized, comprehensive university-wide system for expertise data collection and activity reporting. There is still substantial variation in procedures across departments and colleges within institutions and considerable duplication of effort across campus units. However, it is indeed the recent trend that many institutions—including the University of Michigan—have actively engaged in campus-wide discussions about research profile data curation needs, concluding that a more centralized system would provide incentives for timely data-entry, guarantee currency of the expertise data, and increase overall efficiency and data quality. This study also sheds light on the role of the academic library as an important stakeholder in expertise data collection and management. My findings suggest that various attributes of an academic library make it an ideal driver for research profile data management. The academic library is a strong resource for information technology expertise as well as information management and dissemination at any institution. Further, it tends to be a neutral and trusted entity, especially with employees who regularly engage with researchers and have a good understanding of the academic landscape and the needs of the research community. In addition to providing an overview of the research landscape where profiling needs are quickly rising and where benefits from a well-managed profile data system are widely understood, this study also illuminates the conventional use of expertise databases and research networking/discovery tools as well as Current Research Information Systems (CRIS).

| Research Data Curation Bibliography | Digital Scholarship |

"Digital Curation in the Academic Library Job Market"

Jeonghyun Kim, Edward Warga, and William Moen have published "Digital Curation in the Academic Library Job Market" in ASIST 2012: Proceedings of the 75th ASIS&T Annual Meeting.

Here's an excerpt:

This study of job advertisements for academic library positions is one activity of a current capacity building project, Information: Curate, Archive, Manage, Preserve (iCAMP). In this project, we are developing a four-course masters level curriculum for digital curation and data management. It deploys a competency-based curriculum approach (Moen, Kim, Warga, Wakefield, & Halbert, 2011). This analysis of job advertisements was carried out to identify and define knowledge, skills, and abilities as a part of the competency development process.

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

|Digital Scholarship |

UNC at Chapel Hill Offers Post-Masters Certificate in Data Curation

The School of Information and Library Science at the University of North Carolina at Chapel Hill is now offering a Post-Masters Certificate in Data Curation.

Here's an excerpt from the announcement:

With a two-week intensive kick-off on the UNC at Chapel Hill campus during summer session (May 2013), the remainder of the program will be taught online and includes guided projects that arise from a student's work experience. The 30 credit program can be completed in two years.

Defined by Drs. Helen Tibbo, alumni distinguished professor, and Christopher (Cal) Lee, associate professor at SILS, "Digital/data curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials a commitment to long-term preservation. Digital/data curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts."

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

|Digital Scholarship |

DuraSpace Gets $861,000 Grant to Develop DuraCloud Data Services

DuraSpace has received a two-year $861,000 grant from the Gordon and Betty Moore Foundation to develop DuraCloud data services.

Here's an excerpt from the press release:

Currently, DuraCloud provides a reliable way to preserve and archive research materials in the cloud, a solution developed within the academic community for academic institutions. During the next phase of DuraCloud development, additional applications, features, and services will be built to extend the cloud in order to facilitate data archiving and content management. DuraSpace offers DuraCloud as a software as a service that enables archiving, preserving, and managing institutional content using cloud storage and intends to expand its service offerings in the next phase of development.

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works Cover

|Digital Scholarship |

Curating for Quality: Ensuring Data Quality to Enable New Science

The UNC School of Information & Library Science has released Curating for Quality: Ensuring Data Quality to Enable New Science.

Here's an excerpt:

The National Science Foundation sponsored a workshop on September 10 and 11, 2012, in Arlington, Virginia on "Curating for Quality: Ensuring Data Quality to Enable New Science." Individuals from government, academic and industry settings gathered to discuss issues, strategies and priorities for ensuring quality in collections of data. This workshop aimed to define data quality research issues and potential solutions. The workshop objectives were organized into four clusters: (1) data quality criteria and contexts, (2) human and institutional factors, (3) tools for effective and painless curation, and (4) metrics for data quality. . . .

The workshop identified several key challenges that include:

  • selection strategies—how to determine what is most valuable to preserve
  • how much and which context to include—how to insure that data is interpretable and usable in the future, what metadata to include
  • tools and techniques to support painless curation—creating and sharing tools and techniques that apply across disciplines
  • cost and accountability models—how to balance selection, context decisions with cost constraints.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

SURA Research Data Management Group Releases "A Step-By-Step Guide to Data Management"

The SURA Research Data Management Group has released "A Step-By-Step Guide to Data Management."

Here's an excerpt from the press release:

SURA has launched an institutional tool for Research Data Management (RDM), developed by a working group formed with the Association of Southeastern Research Libraries (ASERL). The working group brings together CIOs and library professionals from SURA member institutions to explore collaborations for improving their ability to manage the rapidly growing volume of research data.

The working group produced an institutional "Step-By-Step Guide to Data Management," which is being used to identify gaps in existing RDM processes and guide future efforts of the group. The group has also built a discipline specific metadata scheme directory to assist researchers in finding existing metadata models for their research data.

| Digital Curation Resource Guide | Digital Scholarship |

Thomson Reuters Launches Data Citation Index

Thomson Reuters has launched the Data Citation Index within the Web of Knowledge.

Here's an excerpt from the press release:

This new research resource from Thomson Reuters creates a single source of discovery for scientific, social sciences and arts and humanities information. It provides a single access point to discover foundational research within data repositories around the world in the broader context of peer-reviewed literature in journals, books, and conference proceedings already indexed in the Web of Knowledge. . . .

The Thomson Reuters Data Citation Index makes research within the digital universe discoverable, citable and viewable within the context of the output the data has informed. Thomson Reuters partnered with numerous data repositories worldwide to capture bibliographic records and cited references for digital research, facilitating visibility, author attribution, and ultimately the measurement of impact of this growing body of scholarship.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"Public Availability of Published Research Data in High-Impact Journals"

Alawi A. Alsheikh-Ali et al. have published "Public Availability of Published Research Data in High-Impact Journals" in PLOS ONE.

Here's an excerpt:

We reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available.

| Digital Curation Resource Guide | Digital Scholarship |

Intellectual Property Rights for Digital Preservation

The Digital Preservation Coalition has released Intellectual Property Rights for Digital Preservation.

Here's an excerpt:

While a number of legal issues colour contemporary approaches to, and practices of, digital preservation, it is arguable that intellectual property law, represented principally by copyright and its related rights, has been by far the most dominant, and often intractable, influence. It is thus essential for those engaging in digital preservation to understand the letter of the law as it applies to digital preservation, but equally important to be able to identify and implement practical and pragmatic strategies for handling legal risks relating to intellectual property rights in the pursuit of preservation objectives. . . .

This report is aimed primarily at depositors, archivists and researchers/re-users of digital works, but will provide a concise introduction to the subject matter for policymakers and the general public.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"A Sample of Research Data Curation and Management Courses"

Andrew T. Creamer et al. have published "A Sample of Research Data Curation and Management Courses" in the latest issue of the Journal of eScience Librarianship.

Here's an excerpt:

This paper identifies a sample of research data curation and management courses available at American Library Association-accredited Library and Information Science (LIS) Programs in North America. . . .

Only 13 (22%) of LIS programs currently offer a course focused on the management and curation of research data. . . .

Although the literature supports LIS professionals adopting new roles and engaging in eScience and data management, most LIS data-related programs do not have a separate course solely focused on research data management. More LIS programs will need to adapt their curricula in order to help students and practicing professionals develop the needed competencies in research data curation and management.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

California Digital Library and Partners Launch DataUp Data Management Tool

The California Digital Library and its partners have launched the DataUp data management tool.

Here's an excerpt from the press release:

Researchers struggling to meet new data management requirements from funders, journals and their own institutions now can use the DataUp Web application and a Microsoft Excel add-in to document and archive their tabular data. . . .

The DataUp add-in operates within a program many researchers already use: Microsoft Excel. The Web application allows users to upload tabular data in either Excel format or comma-separated value (CSV) format. Both the add-in and the Web application allow users to:

  • Perform a "best practices check" to ensure data are well-formatted and organized
  • Create standardized metadata, or a description of the data, using a wizard-style template
  • Retrieve a unique identifier for their dataset from their data repository
  • Post their datasets and associated metadata to the repository.

Although hundreds of data repositories are available for archiving, many scientific researchers either are unaware of their existence or do not know how to access them. One of the major outcomes of the DataUp project is the ONEShare repository, created specifically for DataUp, where users can deposit tabular data and metadata directly from the tool.

An added advantage of ONEShare is its connection to the DataONE network of repositories. DataONE links existing data centers and enables users to search for data across participating repositories by using a single search interface. Data deposited into ONEShare will be indexed and made available by any DataONE user, facilitating collaboration and enabling data re-use.

| Research Data Curation Bibliography | Digital Scholarship |

"Academic Libraries as Data Quality Hubs"

Michael Joseph Giarlo has self-archived a preprint of "Academic Libraries as Data Quality Hubs" in ScholarSphere.

Here's an excerpt:

This position paper argues that academic libraries have a critical role to play serving as data quality hubs on campus, based on the need for increased data quality for "e-science" and on academic libraries' record of providing digital curation and preservation services. Scientific data are shown to be sufficiently at risk to demonstrate a clear niche for such services to be provided. Data quality measurements are defined, and digital curation processes are explained and mapped to these measurements in order to establish that academic libraries already have sufficient competencies "in-house" to provide data quality services. Opportunities for improvement and challenges are identified as areas that are fruitful for future research and exploration.

| Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works | Digital Scholarship |

"The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation"

Matthew S. Mayernik, G. Sayeed Choudhury, Tim DiLauro, Elliot Metsger, Barbara Pralle, Mike Rippin, and Ruth Duerr have published "The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation" in the latest issue of D-LIB Magazine.

Here's an excerpt:

Digital research data can only be managed and preserved over time through a sustained institutional commitment. Research data curation is a multi-faceted issue, requiring technologies, organizational structures, and human knowledge and skills to come together in complementary ways. This article provides a high-level description of the Data Conservancy Instance, an implementation of infrastructure and organizational services for data collection, storage, preservation, archiving, curation, and sharing. While comparable to institutional repository systems and disciplinary data repositories in some aspects, the DC Instance is distinguished by featuring a data-centric architecture, discipline-agnostic data model, and a data feature extraction framework that facilitates data integration and cross-disciplinary queries. The Data Conservancy Instance is intended to support, and be supported by, a skilled data curation staff, and to facilitate technical, financial, and human sustainability of organizational data curation services. The Johns Hopkins University Data Management Services (JHU DMS) are described as an example of how the Data Conservancy Instance can be deployed.

| Digital Curation Resource Guide | Digital Scholarship |

Middleware and Managing Data and Knowledge in a Data-Rich World

The Trans-European Research and Education Networking Association has released Middleware and Managing Data and Knowledge in a Data-Rich World.

Here's an excerpt:

This report explores the important aspects of data handling and storage in the context of future research networks and the associated services. The study encompasses networking requirements, storage, middleware, data policies, and data origin, each of which is considered from the standpoint of five disciplines: Genomics, High Energy Physics, Digital Cultural Heritage, Radio Astronomy, and Distributed Music Performance.

| Research Data Curation Bibliography | Digital Scholarship |