"Operationalizing the Replication Standard: A Case Study of the Data Curation and Verification Workflow for Scholarly Journals"

Thu-Mai Christian et al. have self-archived "Operationalizing the Replication Standard: A Case Study of the Data Curation and Verification Workflow for Scholarly Journals."

Here's an excerpt:

In response to widespread concerns about the integrity of research published in scholarly journals, several initiatives have emerged that are promoting research transparency through access to data underlying published scientific findings. Journal editors, in particular, have made a commitment to research transparency by issuing data policies that require authors to submit their data, code, and documentation to data repositories to allow for public access to the data. In the case of the American Journal of Political Science (AJPS) Data Replication Policy, the data also must undergo an independent verification process in which materials are reviewed for quality as a condition of final manuscript publication and acceptance. Aware of the specialized expertise of the data archives, AJPS called upon the Odum Institute Data Archive to provide a data review service that performs data curation and verification of replication datasets. This article presents a case study of the collaboration between AJPS and the Odum Institute Data Archive to develop a workflow that bridges manuscript publication and data review processes. The case study describes the challenges and the successes of the workflow integration, and offers lessons learned that may be applied by other data archives that are considering expanding their services to include data curation and verification services to support reproducible research.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework "

Hagen Peukert has published "Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework" in the International Journal of Digital Curation.

Here's an excerpt:

Handling heterogeneous data, subject to minimal costs, can be perceived as a classic management problem. The approach at hand applies established managerial theorizing to the field of data curation. It is argued, however, that data curation cannot merely be treated as a standard case of applying management theory in a traditional sense. Rather, the practice of curating humanities research data, the specifications and adjustments of the model suggested here reveal an intertwined process, in which knowledge of both strategic management and solid information technology have to be considered. Thus, suggestions on the strategic positioning of research data, which can be used as an analytical tool to understand the proposed workflow mechanisms, and the definition of workflow modules, which can be flexibly used in designing new standard workflows to configure research data repositories, are put forward.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"How Important is Data Curation? Gaps and Opportunities for Academic Libraries"

Lisa R Johnston et al. have published "How Important is Data Curation? Gaps and Opportunities for Academic Libraries" in the Journal of Librarianship and Scholarly Communication.

Here's an excerpt:

INTRODUCTION Data curation may be an emerging service for academic libraries, but researchers actively "curate" their data in a number of ways—even if terminology may not always align. Building on past userneeds assessments performed via survey and focus groups, the authors sought direct input from researchers on the importance and utilization of specific data curation activities. METHODS Between October 21, 2016, and November 18, 2016, the study team held focus groups with 91 participants at six different academic institutions to determine which data curation activities were most important to researchers, which activities were currently underway for their data, and how satisfied they were with the results. RESULTS Researchers are actively engaged in a variety of data curation activities, and while they considered most data curation activities to be highly important, a majority of the sample reported dissatisfaction with the current state of data curation at their institution. DISCUSSION Our findings demonstrate specific gaps and opportunities for academic libraries to focus their data curation services to more effectively meet researcher needs. CONCLUSION Research libraries stand to benefit their users by emphasizing, investing in, and/or heavily promoting the highly valued services that may not currently be in use by many researchers.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

2017 Fixity Survey Report: An NDSA Report

The National Digital Stewardship Alliance has released the 2017 Fixity Survey Report: An NDSA Report.

Here's an excerpt:

Fixity checking, or the practice of algorithmically reviewing digital content to insure that it has not changed over time, is a complex but essential aspect in digital preservation management. To date, there have been no broadly established best practices surrounding fixity checking, perhaps largely due to the wide variety of digital preservation systems and solutions employed by cultural heritage organizations. In an attempt to understand the common practices that exist for fixity checking, as well as the challenges institutions face when implementing a fixity check routine, the National Digital Stewardship Alliance (NDSA) Fixity Working Group developed and published a survey on fixity practices in fall of 2017. A total of 164 survey responses were recorded, of which 89 completed surveys were used in results analysis.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"ARCHANGEL: Trusted Archives of Digital Public Documents"

John Collomosse, have self-archived "ARCHANGEL: Trusted Archives of Digital Public Documents."

Here's an excerpt:

We present ARCHANGEL; a de-centralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation—trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Sharing Selves: Developing an Ethical Framework for Curating Social Media Data"

Sara Mannheimer and Elizabeth A. Hull have published "Sharing Selves: Developing an Ethical Framework for Curating Social Media Data" in the International Journal of Digital Curation.

Here's an excerpt:

Open sharing of social media data raises new ethical questions that researchers, repositories and data curators must confront, with little existing guidance available. In this paper, the authors draw upon their experiences in their multiple roles as data curators, academic librarians, and researchers to propose the STEP framework for curating and sharing social media data. The framework is intended to be used by data curators facilitating open publication of social media data. Two case studies from the Dryad Digital Repository serve to demonstrate implementation of the STEP framework. The STEP framework can serve as one important 'step' along the path to achieving safe, ethical, and reproducible social media research practice.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Framework for the Preservation of a Docker Container"

Iain Emsley and David De Roure have published "A Framework for the Preservation of a Docker Container" in the International Journal of Digital Curation.

Here's an excerpt:

Reliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, and scientific projects to rapidly deploy a set of services repeatably. Using Dockerfiles to ensure that a container is built repeatably, to allow conformance and easy updating when changes take place are becoming common within projects. Its seen as part of sustainable software development. Containerisation technology occupies a dual space: it is both a repository of software and software itself. In considering Docker in this fashion, we should verify that the Dockerfile can be reproduced. Using a subset of the Dockerfile specification, a domain specific language is created to ensure that Docker files can be reused at a later stage to recreate the original environment. We provide a simple framework to address the question of the preservation of containers and its environment. We present experiments on an existing Dockerfile and conclude with a discussion of future work. Taking our work, a pipeline was implemented to check that a defined Dockerfile conforms to our desired model, extracts the Docker and operating system details. This will help the reproducibility of results by creating the machine environment and package versions. It also helps development and testing through ensuring that the system is repeatably built and that any changes in the software environment can be equally shared in the Dockerfile. This work supports not only the citation process it also the open scientific one by providing environmental details of the work. As a part of the pipeline to create the container, we capture the processes used and put them into the W3C PROV ontology. This provides the potential for providing it with a persistent identifier and traceability of the processes used to preserve the metadata. Our future work will look at the question of linking this output to a workflow ontology to preserve the complete workflow with the commands and parameters to be given to the containers. We see this provenance within the build process useful to provide a complete overview of the workflow.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Archiving Large-Scale Legacy Multimedia Research Data: A Case Study"

Claudia Yogeswaran and Kearsy Cormier have published "Archiving Large-Scale Legacy Multimedia Research Data: A Case Study " in the International Journal of Digital Curation.

Here's an excerpt:

In this paper we provide a case study of the creation of the DCAL Research Data Archive at University College London. In doing so, we assess the various challenges associated with archiving large-scale legacy multimedia research data, given the lack of literature on archiving such datasets. We address issues such as the anonymisation of video research data, the ethical challenges of managing legacy data and historic consent, ownership considerations, the handling of large-size multimedia data, as well as the complexity of multi-project data from a number of researchers and legacy data from eleven years of research.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"If These Crawls Could Talk: Studying and Documenting Web Archives Provenance"

Emily Maemura et al. have self-archived "If These Crawls Could Talk: Studying and Documenting Web Archives Provenance."

Here's an excerpt:

This study examines the decision space of web archives and its role in shaping what is and what is not captured in the web archiving process. By comparing how three different web archives collections were created and documented, we investigate how curatorial decisions interact with technical and external factors and we compare commonalities and differences. The findings reveal the need to understand both the social and technical context that shapes those decisions and the ways in which these individual decisions interact. Based on the study, we propose a framework for documenting key dimensions of a collection that addresses the situated nature of the organizational context, technical specificities, and unique characteristics of web materials that are the focus of a collection.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The State of Assessing Data Stewardship Maturity —An Overview"

Ge Peng has published "The State of Assessing Data Stewardship Maturity —An Overview" in Data Science Journal.

Here's an excerpt:

Data stewardship encompasses all activities that preserve and improve the information content, accessibility, and usability of data and metadata. Recent regulations, mandates, policies, and guidelines set forth by the U.S. government, federal other, and funding agencies, scientific societies and scholarly publishers, have levied stewardship requirements on digital scientific data. This elevated level of requirements has increased the need for a formal approach to stewardship activities that supports compliance verification and reporting. Meeting or verifying compliance with stewardship requirements requires assessing the current state, identifying gaps, and, if necessary, defining a roadmap for improvement. This, however, touches on standards and best practices in multiple knowledge domains. Therefore, data stewardship practitioners, especially these at data repositories or data service centers or associated with data stewardship programs, can benefit from knowledge of existing maturity assessment models. This article provides an overview of the current state of assessing stewardship maturity for federally funded digital scientific data. A brief description of existing maturity assessment models and related application(s) is provided. This helps stewardship practitioners to readily obtain basic information about these models. It allows them to evaluate each model’s suitability for their unique verification and improvement needs.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Text Data Mining from the Author’s Perspective: Whose Text, Whose Mining, and to Whose Benefit?"

Christine L. Borgman has self-archived "Text Data Mining from the Author's Perspective: Whose Text, Whose Mining, and to Whose Benefit?."

Here's an excerpt:

Given the many technical, social, and policy shifts in access to scholarly content since the early days of text data mining, it is time to expand the conversation about text data mining from concerns of the researcher wishing to mine data to include concerns of researcher-authors about how their data are mined, by whom, for what purposes, and to whose benefits.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science"

Kyle Chard et al. have published "The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science" in PeerJ.

Here's an excerpt:

In this article, we first define the problems that research data portals address, introduce the legacy approach, and examine its limitations. We then introduce the MRDP design pattern and describe its realization via the integration of two elements: Science DMZs (Dart et al., 2013) (high-performance network enclaves that connect large-scale data servers directly to high-speed networks) and cloud-based data management and authentication services such as those provided by Globus (Chard, Tuecke & Foster, 2014). We then outline a reference implementation of the MRDP design pattern, also provided in its entirety on the companion web site, https://docs.globus.org/mrdp, that the reader can study—and, if they so desire, deploy and adapt to build their own high-performance research data portal. We also review various deployments to show how the MRDP approach has been applied in practice: examples like the National Center for Atmospheric Research's Research Data Archive, which provides for high-speed data delivery to thousands of geoscientists; the Sanger Imputation Service, which provides for online analysis of user-provided genomic data; the Globus data publication service, which provides for interactive data publication and discovery; and the DMagic data sharing system for data distribution from light sources. We conclude with a discussion of related technologies and summary.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Longitudinal Assessment of the Persistence of Twitter Datasets"

Arkaitz Zubiaga has self-archived "A Longitudinal Assessment of the Persistence of Twitter Datasets."

Here's an excerpt:

With social media datasets being increasingly shared by researchers, it also presents the caveat that those datasets are not always completely replicable. Having to adhere to requirements of platforms like Twitter, researchers cannot release the raw data and instead have to release a list of unique identifiers, which others can then use to recollect the data from the platform themselves. This leads to the problem that subsets of the data may no longer be available, as content can be deleted or user accounts deactivated. To quantify the impact of content deletion in the replicability of datasets in a long term, we perform a longitudinal analysis of the persistence of 30 Twitter datasets, which include over 147 million tweets. . . . Even though the ratio of available tweets keeps decreasing as the dataset gets older, we find that the textual content of the recollected subset is still largely representative of the whole dataset that was originally collected. The representativity of the metadata, however, keeps decreasing over time, both because the dataset shrinks and because certain metadata, such as the users' number of followers, keeps changing.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Andrew W. Mellon Foundation Awards Grant to the Internet Archive for Long Tail Journal Preservation"

The Internet Archive has released "Andrew W. Mellon Foundation Awards Grant to the Internet Archive for Long Tail Journal Preservation."

Here's an excerpt:

The Andrew W. Mellon Foundation has awarded a research and development grant to the Internet Archive to address the critical need to preserve the "long tail" of open access scholarly communications. The project, Ensuring the Persistent Access of Long Tail Open Access Journal Literature, builds on prototype work identifying at-risk content held in web archives by using data provided by identifier services and registries. Furthermore, the project expands on work acquiring missing open access articles via customized web harvesting, improving discovery and access to this materials from within extant web archives, and developing machine learning approaches, training sets, and cost models for advancing and scaling this project’s work.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Data Sustainability and Reuse Pathways of Natural Resources and Environmental Scientists"

Yi Shen has self-archived "Data Sustainability and Reuse Pathways of Natural Resources and Environmental Scientists."

Here's an excerpt:

This paper presents a multifarious examination of natural resources and environmental scientists' adventures navigating the policy change towards open access and cultural shift in data management, sharing, and reuse. Situated in the institutional context of Virginia Tech, a focus group and multiple individual interviews were conducted exploring the domain scientists' all-around experiences, performances, and perspectives on their collection, adoption, integration, preservation, and management of data. . . . Based on these findings, this study provides suggestions on data modeling and knowledge representation strategies to support the long-term viability, stewardship, accessibility, and sustainability of scientific data.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Portage Releases Draft Institutional RDM Strategy Template"

The Portage Network has released "Portage Releases Draft Institutional RDM Strategy Template."

Here's an excerpt:

In response to the anticipated Tri-Agency research data management (RDM) policy, the Portage Institutional RDM Strategy Working Group has released a draft template and supporting guidance document that are designed to assist Canadian research institutions in developing an overarching strategy for RDM. These resources will exist as living documents, to be updated by the Working Group as needed.

See also: Template—Institutional Research Data Management Strategy and Institutional Research Data Management Strategy: Guidance Document.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"From Passive to Active, From Generic to Focused: How Can an Institutional Data Archive Remain Relevant in a Rapidly Evolving Landscape?"

Maria Cruz et al. have self-archived "From Passive to Active, From Generic to Focused: How Can an Institutional Data Archive Remain Relevant in a Rapidly Evolving Landscape?."

Here's an excerpt:

Founded in 2008 as an initiative of the libraries of three of the four technical universities in the Netherlands, the 4TU.Centre for Research Data (4TU.Research Data) provides since 2010 a fully operational, cross-institutional, long-term archive that stores data from all subjects in applied sciences and engineering.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Stewardship in the ‘Age of Algorithms’"

Clifford Lynch has published "Stewardship in the 'Age of Algorithms'" in First Monday.

Here's an excerpt:

This paper explores pragmatic approaches that might be employed to document the behavior of large, complex socio-technical systems (often today shorthanded as "algorithms") that centrally involve some mixture of personalization, opaque rules, and machine learning components. Thinking rooted in traditional archival methodology–focusing on the preservation of physical and digital objects, and perhaps the accompanying preservation of their environments to permit subsequent interpretation or performance of the objects–has been a total failure for many reasons, and we must address this problem. The approaches presented here are clearly imperfect, unproven, labor-intensive, and sensitive to the often hidden factors that the target systems use for decision-making (including personalization of results, where relevant); but they are a place to begin, and their limitations are at least outlined. Numerous research questions must be explored before we can fully understand the strengths and limitations of what is proposed here. But it represents a way forward.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"CLIR Receives Sloan Foundation Grants for Software and Data Curation Fellows, Energy Fellows"

CLIR has released "CLIR Receives Sloan Foundation Grants for Software and Data Curation Fellows, Energy Fellows."

Here's an excerpt:

A $521,200 grant from Sloan's Energy and Environment program—its first to CLIR—will create a cohort of CLIR/Digital Library Federation (DLF) Postdoctoral Fellows in Data Curation for Energy Economics, a new area of focus for the postdoctoral fellowship program. Energy fellows will have joint appointments between energy research centers and libraries at four major universities for two years starting in 2018.

A $925,361 grant from Sloan's Digital Information Technology program, which has funded research data curation fellowships since 2012, will help support eight new scholar-practitioners to take leading roles in the development of sustainable approaches to software and research data curation in the sciences and social sciences.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"ARL Awarded Sloan Grant to Help Preserve Software, Save Cultural Record, Advance Discovery"

ARL has released "ARL Awarded Sloan Grant to Help Preserve Software, Save Cultural Record, Advance Discovery."

Here's an excerpt:

The Association of Research Libraries (ARL) has been awarded a $315,000 grant from the Alfred P. Sloan Foundation to develop and disseminate a Code of Best Practices in Fair Use for Software Preservation. This code will give individuals and institutions clear guidance on the legality of archiving software, in order to ensure continued access to digital files of all kinds and to offer hands-on understanding of the history of technology.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Staffing for Effective Digital Preservation 2017: An NDSA Report

The National Digital Stewardship Alliance has released Staffing for Effective Digital Preservation 2017: An NDSA Report.

Here's an excerpt:

The 2017 Digital Preservation Staffing Survey provides a useful snapshot of the way digital preservation is accomplished in 2017 and how its practitioners feel about the effectiveness of their current organizational structures. It also builds on the 2012 survey and begins to establish data with which the digital preservation community can identify trends in staffing in the field.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Evolution, Approval and Implementation of the U.S. Geological Survey Science Data Lifecycle Model"

John L. Faundeen and Vivian B. Hutchison have published "The Evolution, Approval and Implementation of the U.S. Geological Survey Science Data Lifecycle Model" in the Journal of eScience Librarianship.

Here's an excerpt:

This paper details how the U.S. Geological Survey (USGS) Community for Data Integration (CDI) Data Management Working Group developed a Science Data Lifecycle Model, and the role the Model plays in shaping agency-wide policies and data management applications. Starting with an extensive literature review of existing data lifecycle models, representatives from various backgrounds in USGS attended a two-day meeting where the basic elements for the Science Data Lifecycle Model were determined. Refinements and reviews spanned two years, leading to finalization of the model and documentation in a formal agency publication.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap