"The Changing Influence of Journal Data Sharing Policies on Local RDM Practices "

Dylanne Dearborn et al. have published "The Changing Influence of Journal Data Sharing Policies on Local RDM Practices" in the International Journal of Digital Curation.

Here's an excerpt:

The purpose of this study was to examine changes in research data deposit policies of highly ranked journals in the physical and applied sciences between 2014 and 2016, as well as to develop an approach to examining the institutional impact of deposit requirements. Policies from the top ten journals (ranked by impact factor from the Journal Citation Reports) were examined in 2014 and again in 2016 in order to determine if data deposits were required or recommended, and which methods of deposit were listed as options. For all 2016 journals with a required data deposit policy, publication information (2009-2015) for the University of Toronto was pulled from Scopus and departmental affiliation was determined for each article. The results showed that the number of high-impact journals in the physical and applied sciences requiring data deposit is growing. In 2014, 71.2% of journals had no policy, 14.7% had a recommended policy, and 13.9% had a required policy (n=836). In contrast, in 2016, there were 58.5% with no policy, 19.4% with a recommended policy, and 22.0% with a required policy (n=880). It was also evident that U of T chemistry researchers are by far the most heavily affected by these journal data deposit requirements, having published 543 publications, representing 32.7% of all publications in the titles requiring data deposit in 2016. The Python scripts used to retrieve institutional publications based on a list of ISSNs have been released on GitHub so that other institutions can conduct similar research.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Framework for Aggregating Private and Public Web Archives"

Mat Kelly, Michael L. Nelson, and Michele C. Weigle have self-archived "A Framework for Aggregating Private and Public Web Archives."

Here's an excerpt:

Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private (e.g., banking) Web pages. We introduce a framework to mitigate issues of aggregation in private, personal, and public Web archives without compromising potential sensitive information contained in private captures. We amend Memento syntax and semantics to allow TimeMap enrichment to account for additional attributes to be expressed inclusive of the requirements for dereferencing private Web archive captures. We provide a method to involve the user further in the negotiation of archival captures in dimensions beyond time. We introduce a model for archival querying precedence and short-circuiting, as needed when aggregating private and personal Web archive captures with those from public Web archives through Memento. Negotiation of this sort is novel to Web archiving and allows for the more seamless aggregation of various types of Web archives to convey a more accurate picture of the past Web.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository"

Amy M Pienta et al. have published "A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository" in the International Journal of Digital Curation.

Here's an excerpt:

Social scientists are producing an ever-expanding volume of data, leading to questions about appraisal and selection of content given finite resources to process data for reuse. We analyze users’ search activity in an established social science data repository to better understand demand for data and more effectively guide collection development. By applying a data-driven approach, we aim to ensure curation resources are applied to make the most valuable data findable, understandable, accessible, and usable. We analyze data from a domain repository for the social sciences that includes over 500,000 annual searches in 2014 and 2015 to better understand trends in user search behavior. Using a newly created search-to-study ratio technique, we identified gaps in the domain data repository’s holdings and leveraged this analysis to inform our collection and curation practices and policies. The evaluative technique we propose in this paper will serve as a baseline for future studies looking at trends in user demand over time at the domain data repository being studied with broader implications for other data repositories.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Modelling the Research Data Lifecycle"

Stacy T. Kowalczyk has published "Modelling the Research Data Lifecycle" in the International Journal of Digital Curation.

Here's an excerpt:

This paper develops and tests a lifecycle model for the preservation of research data by investigating the research practices of scientists. This research is based on a mixed-method approach. An initial study was conducted using case study analytical techniques; insights from these case studies were combined with grounded theory in order to develop a novel model of the Digital Research Data Lifecycle. A broad-based quantitative survey was then constructed to test and extend the components of the model. The major contribution of these research initiatives are the creation of the Digital Research Data Lifecycle, a data lifecycle that provides a generalized model of the research process to better describe and explain both the antecedents and barriers to preservation. The antecedents and barriers to preservation are data management, contextual metadata, file formats, and preservation technologies. The availability of data management support and preservation technologies, the ability to create and manage contextual metadata, and the choices of file formats all significantly effect the preservability of research data.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Research Data Preservation in Canada: A White Paper

The Portage Network has released Research Data Preservation in Canada: A White Paper.

Here’s an excerpt from the announcement:

The Preservation Expert Group (PEG) was created to advise Portage on developing research data management (RDM) infrastructure and best practices for preserving research data and metadata in Canada. The members of PEG have written this White Paper as a foundation document to describe the current digital preservation landscape, highlighting some of the digital preservation work already being undertaken in Canada, and to identify challenges that need to be addressed by Portage and other stakeholders to develop and improve RDM capacity and infrastructure across the country.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Frictionless Data: Making Research Data Quality Visible"

Dan Fowler, Jo Barratt, and Paul Walsh have published "Frictionless Data: Making Research Data Quality Visible " in the International Journal of Digital Curation.

Here's an excerpt:

There is significant friction in the acquisition, sharing, and reuse of research data. It is estimated that eighty percent of data analysis is invested in the cleaning and mapping of data (Dasu and Johnson,2003). This friction hampers researchers not well versed in data preparation techniques from reusing an ever-increasing amount of data available within research data repositories. Frictionless Data is an ongoing project at Open Knowledge International focused on removing this friction. We are doing this by developing a set of tools, specifications, and best practices for describing, publishing, and validating data. The heart of this project is the "Data Package", a containerization format for data based on existing practices for publishing open source software. This paper will report on current progress toward that goal.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Implementing a Research Data Policy at Leiden University "

Fieke Schoots et al. have published "Implementing a Research Data Policy at Leiden University " in the International Journal of Digital Curation.

Here's an excerpt:

In this paper, we discuss the various stages of the institution-wide project that lead to the adoption of the data management policy at Leiden University in 2016. We illustrate this process by highlighting how we have involved all stakeholders. Each organisational unit was represented in the project teams. Results were discussed in a sounding board with both academic and support staff. Senior researchers acted as pioneers and raised awareness and commitment among their peers. By way of example, we present pilot projects from two faculties. We then describe the comprehensive implementation programme that will create facilities and services that must allow implementing the policy as well as monitoring and evaluating it. Finally, we will present lessons learnt and steps ahead. The engagement of all stakeholders, as well as explicit commitment from the Executive Board, has been an important key factor for the success of the project and will continue to be an important condition for the steps ahead.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"How Valid is your Validation? A Closer Look Behind the Curtain of JHOVE "

Michelle Lindlar and Yvonne Tunnat have published "How Valid is your Validation? A Closer Look Behind the Curtain of JHOVE " in the International Journal of Digital Curation.

Here's an excerpt:

Validation is a key task of any preservation workflow and often JHOVE is the first tool of choice for characterizing and validating common file formats. Due to the tool’s maturity and high adoption, decisions if a file is indeed fit for long-term availability are often made based on JHOVE output. But can we trust a tool simply based on its wide adoption and maturity by age? How does JHOVE determine the validity and well-formedness of a file? Does a module really support all versions of a file format family? How much of the file formats’ standards do we need to know and understand in order to interpret the output correctly? Are there options to verify JHOVE-based decisions within preservation workflows? While the software has been a long-standing favourite within the digital curation domain for many years, a recent look at JHOVE as a vital decision supporting tool is currently missing. This paper presents a practice report which aims to close this gap.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Support Your Data: A Research Data Management Guide for Researchers"

John A Borghi et al. have published "Support Your Data: A Research Data Management Guide for Researchers" in Research Ideas and Outcomes.

Here's an excerpt:

Researchers are faced with rapidly evolving expectations about how they should manage and share their data, code, and other research materials. To help them meet these expectations and generally manage and share their data more effectively, we are developing a suite of tools which we are currently referring to as "Support Your Data". These tools, which include a rubric designed to enable researchers to self-assess their current data management practices and a series of short guides which provide actionable information about how to advance practices as necessary or desired, are intended to be easily customizable to meet the needs of a researchers working in a variety of institutional and disciplinary contexts.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Operationalizing the Replication Standard: A Case Study of the Data Curation and Verification Workflow for Scholarly Journals"

Thu-Mai Christian et al. have self-archived "Operationalizing the Replication Standard: A Case Study of the Data Curation and Verification Workflow for Scholarly Journals."

Here's an excerpt:

In response to widespread concerns about the integrity of research published in scholarly journals, several initiatives have emerged that are promoting research transparency through access to data underlying published scientific findings. Journal editors, in particular, have made a commitment to research transparency by issuing data policies that require authors to submit their data, code, and documentation to data repositories to allow for public access to the data. In the case of the American Journal of Political Science (AJPS) Data Replication Policy, the data also must undergo an independent verification process in which materials are reviewed for quality as a condition of final manuscript publication and acceptance. Aware of the specialized expertise of the data archives, AJPS called upon the Odum Institute Data Archive to provide a data review service that performs data curation and verification of replication datasets. This article presents a case study of the collaboration between AJPS and the Odum Institute Data Archive to develop a workflow that bridges manuscript publication and data review processes. The case study describes the challenges and the successes of the workflow integration, and offers lessons learned that may be applied by other data archives that are considering expanding their services to include data curation and verification services to support reproducible research.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework "

Hagen Peukert has published "Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework" in the International Journal of Digital Curation.

Here's an excerpt:

Handling heterogeneous data, subject to minimal costs, can be perceived as a classic management problem. The approach at hand applies established managerial theorizing to the field of data curation. It is argued, however, that data curation cannot merely be treated as a standard case of applying management theory in a traditional sense. Rather, the practice of curating humanities research data, the specifications and adjustments of the model suggested here reveal an intertwined process, in which knowledge of both strategic management and solid information technology have to be considered. Thus, suggestions on the strategic positioning of research data, which can be used as an analytical tool to understand the proposed workflow mechanisms, and the definition of workflow modules, which can be flexibly used in designing new standard workflows to configure research data repositories, are put forward.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"How Important is Data Curation? Gaps and Opportunities for Academic Libraries"

Lisa R Johnston et al. have published "How Important is Data Curation? Gaps and Opportunities for Academic Libraries" in the Journal of Librarianship and Scholarly Communication.

Here's an excerpt:

INTRODUCTION Data curation may be an emerging service for academic libraries, but researchers actively "curate" their data in a number of ways—even if terminology may not always align. Building on past userneeds assessments performed via survey and focus groups, the authors sought direct input from researchers on the importance and utilization of specific data curation activities. METHODS Between October 21, 2016, and November 18, 2016, the study team held focus groups with 91 participants at six different academic institutions to determine which data curation activities were most important to researchers, which activities were currently underway for their data, and how satisfied they were with the results. RESULTS Researchers are actively engaged in a variety of data curation activities, and while they considered most data curation activities to be highly important, a majority of the sample reported dissatisfaction with the current state of data curation at their institution. DISCUSSION Our findings demonstrate specific gaps and opportunities for academic libraries to focus their data curation services to more effectively meet researcher needs. CONCLUSION Research libraries stand to benefit their users by emphasizing, investing in, and/or heavily promoting the highly valued services that may not currently be in use by many researchers.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

2017 Fixity Survey Report: An NDSA Report

The National Digital Stewardship Alliance has released the 2017 Fixity Survey Report: An NDSA Report.

Here's an excerpt:

Fixity checking, or the practice of algorithmically reviewing digital content to insure that it has not changed over time, is a complex but essential aspect in digital preservation management. To date, there have been no broadly established best practices surrounding fixity checking, perhaps largely due to the wide variety of digital preservation systems and solutions employed by cultural heritage organizations. In an attempt to understand the common practices that exist for fixity checking, as well as the challenges institutions face when implementing a fixity check routine, the National Digital Stewardship Alliance (NDSA) Fixity Working Group developed and published a survey on fixity practices in fall of 2017. A total of 164 survey responses were recorded, of which 89 completed surveys were used in results analysis.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"ARCHANGEL: Trusted Archives of Digital Public Documents"

John Collomosse, have self-archived "ARCHANGEL: Trusted Archives of Digital Public Documents."

Here's an excerpt:

We present ARCHANGEL; a de-centralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation—trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Sharing Selves: Developing an Ethical Framework for Curating Social Media Data"

Sara Mannheimer and Elizabeth A. Hull have published "Sharing Selves: Developing an Ethical Framework for Curating Social Media Data" in the International Journal of Digital Curation.

Here's an excerpt:

Open sharing of social media data raises new ethical questions that researchers, repositories and data curators must confront, with little existing guidance available. In this paper, the authors draw upon their experiences in their multiple roles as data curators, academic librarians, and researchers to propose the STEP framework for curating and sharing social media data. The framework is intended to be used by data curators facilitating open publication of social media data. Two case studies from the Dryad Digital Repository serve to demonstrate implementation of the STEP framework. The STEP framework can serve as one important 'step' along the path to achieving safe, ethical, and reproducible social media research practice.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Framework for the Preservation of a Docker Container"

Iain Emsley and David De Roure have published "A Framework for the Preservation of a Docker Container" in the International Journal of Digital Curation.

Here's an excerpt:

Reliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, and scientific projects to rapidly deploy a set of services repeatably. Using Dockerfiles to ensure that a container is built repeatably, to allow conformance and easy updating when changes take place are becoming common within projects. Its seen as part of sustainable software development. Containerisation technology occupies a dual space: it is both a repository of software and software itself. In considering Docker in this fashion, we should verify that the Dockerfile can be reproduced. Using a subset of the Dockerfile specification, a domain specific language is created to ensure that Docker files can be reused at a later stage to recreate the original environment. We provide a simple framework to address the question of the preservation of containers and its environment. We present experiments on an existing Dockerfile and conclude with a discussion of future work. Taking our work, a pipeline was implemented to check that a defined Dockerfile conforms to our desired model, extracts the Docker and operating system details. This will help the reproducibility of results by creating the machine environment and package versions. It also helps development and testing through ensuring that the system is repeatably built and that any changes in the software environment can be equally shared in the Dockerfile. This work supports not only the citation process it also the open scientific one by providing environmental details of the work. As a part of the pipeline to create the container, we capture the processes used and put them into the W3C PROV ontology. This provides the potential for providing it with a persistent identifier and traceability of the processes used to preserve the metadata. Our future work will look at the question of linking this output to a workflow ontology to preserve the complete workflow with the commands and parameters to be given to the containers. We see this provenance within the build process useful to provide a complete overview of the workflow.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Archiving Large-Scale Legacy Multimedia Research Data: A Case Study"

Claudia Yogeswaran and Kearsy Cormier have published "Archiving Large-Scale Legacy Multimedia Research Data: A Case Study " in the International Journal of Digital Curation.

Here's an excerpt:

In this paper we provide a case study of the creation of the DCAL Research Data Archive at University College London. In doing so, we assess the various challenges associated with archiving large-scale legacy multimedia research data, given the lack of literature on archiving such datasets. We address issues such as the anonymisation of video research data, the ethical challenges of managing legacy data and historic consent, ownership considerations, the handling of large-size multimedia data, as well as the complexity of multi-project data from a number of researchers and legacy data from eleven years of research.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"If These Crawls Could Talk: Studying and Documenting Web Archives Provenance"

Emily Maemura et al. have self-archived "If These Crawls Could Talk: Studying and Documenting Web Archives Provenance."

Here's an excerpt:

This study examines the decision space of web archives and its role in shaping what is and what is not captured in the web archiving process. By comparing how three different web archives collections were created and documented, we investigate how curatorial decisions interact with technical and external factors and we compare commonalities and differences. The findings reveal the need to understand both the social and technical context that shapes those decisions and the ways in which these individual decisions interact. Based on the study, we propose a framework for documenting key dimensions of a collection that addresses the situated nature of the organizational context, technical specificities, and unique characteristics of web materials that are the focus of a collection.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The State of Assessing Data Stewardship Maturity —An Overview"

Ge Peng has published "The State of Assessing Data Stewardship Maturity —An Overview" in Data Science Journal.

Here's an excerpt:

Data stewardship encompasses all activities that preserve and improve the information content, accessibility, and usability of data and metadata. Recent regulations, mandates, policies, and guidelines set forth by the U.S. government, federal other, and funding agencies, scientific societies and scholarly publishers, have levied stewardship requirements on digital scientific data. This elevated level of requirements has increased the need for a formal approach to stewardship activities that supports compliance verification and reporting. Meeting or verifying compliance with stewardship requirements requires assessing the current state, identifying gaps, and, if necessary, defining a roadmap for improvement. This, however, touches on standards and best practices in multiple knowledge domains. Therefore, data stewardship practitioners, especially these at data repositories or data service centers or associated with data stewardship programs, can benefit from knowledge of existing maturity assessment models. This article provides an overview of the current state of assessing stewardship maturity for federally funded digital scientific data. A brief description of existing maturity assessment models and related application(s) is provided. This helps stewardship practitioners to readily obtain basic information about these models. It allows them to evaluate each model’s suitability for their unique verification and improvement needs.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Text Data Mining from the Author’s Perspective: Whose Text, Whose Mining, and to Whose Benefit?"

Christine L. Borgman has self-archived "Text Data Mining from the Author's Perspective: Whose Text, Whose Mining, and to Whose Benefit?."

Here's an excerpt:

Given the many technical, social, and policy shifts in access to scholarly content since the early days of text data mining, it is time to expand the conversation about text data mining from concerns of the researcher wishing to mine data to include concerns of researcher-authors about how their data are mined, by whom, for what purposes, and to whose benefits.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science"

Kyle Chard et al. have published "The Modern Research Data Portal: A Design Pattern for Networked, Data-Intensive Science" in PeerJ.

Here's an excerpt:

In this article, we first define the problems that research data portals address, introduce the legacy approach, and examine its limitations. We then introduce the MRDP design pattern and describe its realization via the integration of two elements: Science DMZs (Dart et al., 2013) (high-performance network enclaves that connect large-scale data servers directly to high-speed networks) and cloud-based data management and authentication services such as those provided by Globus (Chard, Tuecke & Foster, 2014). We then outline a reference implementation of the MRDP design pattern, also provided in its entirety on the companion web site, https://docs.globus.org/mrdp, that the reader can study—and, if they so desire, deploy and adapt to build their own high-performance research data portal. We also review various deployments to show how the MRDP approach has been applied in practice: examples like the National Center for Atmospheric Research's Research Data Archive, which provides for high-speed data delivery to thousands of geoscientists; the Sanger Imputation Service, which provides for online analysis of user-provided genomic data; the Globus data publication service, which provides for interactive data publication and discovery; and the DMagic data sharing system for data distribution from light sources. We conclude with a discussion of related technologies and summary.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Longitudinal Assessment of the Persistence of Twitter Datasets"

Arkaitz Zubiaga has self-archived "A Longitudinal Assessment of the Persistence of Twitter Datasets."

Here's an excerpt:

With social media datasets being increasingly shared by researchers, it also presents the caveat that those datasets are not always completely replicable. Having to adhere to requirements of platforms like Twitter, researchers cannot release the raw data and instead have to release a list of unique identifiers, which others can then use to recollect the data from the platform themselves. This leads to the problem that subsets of the data may no longer be available, as content can be deleted or user accounts deactivated. To quantify the impact of content deletion in the replicability of datasets in a long term, we perform a longitudinal analysis of the persistence of 30 Twitter datasets, which include over 147 million tweets. . . . Even though the ratio of available tweets keeps decreasing as the dataset gets older, we find that the textual content of the recollected subset is still largely representative of the whole dataset that was originally collected. The representativity of the metadata, however, keeps decreasing over time, both because the dataset shrinks and because certain metadata, such as the users' number of followers, keeps changing.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap