"Preserving Transactional Data"

Sara Day Thomson has published "Preserving Transactional Data" in The International Journal of Digital Curation.

Here's an excerpt:

This paper discusses requirements for preserving transactional data and the accompanying challenges facing the companies and institutions who aim to re-use these data for analysis or research. It presents a range of use cases—examples of transactional data—in order to describe the characteristics and difficulties of these 'big' data for long-term access. Based on the overarching trends discerned in these use cases, the paper will define the challenges facing the preservation of these data early in the curation lifecycle. It will point to potential solutions within current legal and ethical frameworks, but will focus on positioning the problem of re-using these data from a preservation perspective.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Version 7 of the Research Data Curation Bibliography Released

Digital Scholarship has released Version 7 of the Research Data Curation Bibliography. This selective bibliography includes over 620 English-language articles, books, and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions.

The Research Data Curation Bibliography covers topics such as research data creation, acquisition, metadata, provenance, repositories, management, policies, support services, funding agency requirements, peer review, publication, citation, sharing, reuse, and preservation.

Most sources have been published from January 2009 through December 2016; however, a limited number of earlier key sources are also included. The bibliography includes links to freely available versions of included works. If such versions are unavailable, links to the publishers' descriptions are provided.

Abstracts are included in this bibliography if a work is under a Creative Commons Attribution License (BY and national/international variations), a Creative Commons public domain dedication (CC0), or a Creative Commons Public Domain Mark and this is clearly indicated in the work.

The Research Data Curation Bibliography is under a Creative Commons Attribution 4.0 International License.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Piracy, Public Access, and Preservation: An Exploration of Sustainable Accessibility in a Public Torrent Index"

John Martin has self-archived "Piracy, Public Access, and Preservation: An Exploration of Sustainable Accessibility in a Public Torrent Index."

Here's an excerpt:

Using a snapshot of torrents on the site, this study considers the potential for torrent networks to preserve and provide access to cultural materials in the form of digital media content. Metadata from 2.1 million torrents were categorized by media type and the robustness of given torrents was assessed. Trends over time, such as number of uploads and volume, were also investigated. This study found that relatively few torrents exhibit long-term survivability, even though the overall volume in the index shows continuous increase.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content"

Shawn M. Jones et al. have published "Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content" in PLOS ONE.

Here's an excerpt:

Increasingly, scholarly articles contain URI references to "web at large" resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource's content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

iPRES 2016: 13th International Conference on Digital Preservation Proceedings

The iPRES 2016: 13th International Conference on Digital Preservation Proceedings is available as a 169-page PDF.

Here's an excerpt:

In keeping with previous years, the iPRES 2016 programme is organised into research and practice streams. This format ensures visibility and promotion of both academic research work and the projects and initiatives of institutions involved in digital preservation practices. Furthermore, work- shops and tutorials provide opportunities for participants to share information, knowledge and best practices, and explore opportunities for collaboration on new approaches.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Current Status of Scientific Data Curation Research and Practices in Mainland China"

Shiyan Ou and Yu Zhou have published "Current Status of Scientific Data Curation Research and Practices in Mainland China" in LIBRES.

Here's an excerpt:

With the rapid growth in the body of scientific data, scientific research depends more and more on finding theories and knowledge from the data, and thus data-intensive scientific discovery has become the fourth paradigm of scientific research. Therefore, it is urgent to develop and adopt methods to support the collection, collation, preservation and utilization of scientific data. This paper provides an overview of scientific data curation research and practices in mainland China. Firstly, it reviews Chinese research articles on data curation and outlines the research status and progress in this area. Secondly, it surveys existing scientific data repositories or platforms in mainland China, and analyzes the gaps between China's and other countries' data curation practices.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Digitized Archival Document Trustworthiness Scale"

Devan Ray Donaldson has published "The Digitized Archival Document Trustworthiness Scale" in the International Journal of Digital Curation.

Here's an excerpt:

Designated communities are central to validation of preservation. If a designated community is able to understand and use information found within a digital repository, the assumption is that the information has been properly preserved. As judging the trustworthiness of information requires at least some level of understanding of that information, this paper presents results of a study aimed at developing a tool for measuring designated community members' perceptions of trustworthiness for preserved information found within a digital repository. The study focuses on genealogists at the Washington State Digital Archives who routinely interact with digitized genealogical records, including digitized marriage, death, and birth records. Results of the study include construction of an original Digitized Archival Document Trustworthiness Scale (DADTS). DADTS is a ready-made tool for digital curators to use to measure the trustworthiness perceptions of their designated community members. Implications of this study include the feasibility of engaging members of a designated community in the construction of a scale for measuring trustworthiness perception, thereby providing deeper insight into the understandability and usability of preserved information by that designated community.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Two Reports on Disk Image Formats from the Harvard Library Digital Preservation Program

The Harvard Library Digital Preservation Program has released Disk Image Content Model and Metadata Analysis ACTIVITY 1: Comparative Format Matrix Analysis and Disk Image Content Model and Metadata Analysis ACTIVITY 2: Metadata Analysis

Here's an excerpt from the announcement:

Harvard Library collections include a variety of computer media that will be imaged using forensic disk imaging techniques and preserved in the Library's preservation and access repository—the Digital Repository Service (DRS). As a first step towards providing support for this material in the DRS, the Library contracted AVPreserve in late 2015 to assist with the analysis. The goals of the analysis were:

  • Recommended disk image formats to accept and prefer for the DRS
  • Recommended technical metadata schema(s) to use for disk image file formats
  • DRS content models for these objects
  • Recommendations for enhancing Harvard Library's FITS tool to better support these objects

See also: Disk Image Format Matrix spreadsheet.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Durability and Fragility of Knowledge Infrastructures: Lessons Learned from Astronomy"

Christine L. Borgman, Peter T. Darch, Ashley E. Sands, and Milena S. Golshan have self-archived "The Durability and Fragility of Knowledge Infrastructures: Lessons Learned from Astronomy."

Here's an excerpt:

Infrastructures are not inherently durable or fragile, yet all are fragile over the long term. Durability requires care and maintenance of individual components and the links between them. Astronomy is an ideal domain in which to study knowledge infrastructures, due to its long history, transparency, and accumulation of observational data over a period of centuries. Research reported here draws upon a long-term study of scientific data practices to ask questions about the durability and fragility of infrastructures for data in astronomy. Methods include interviews, ethnography, and document analysis. As astronomy has become a digital science, the community has invested in shared instruments, data standards, digital archives, metadata and discovery services, and other relatively durable infrastructure components. Several features of data practices in astronomy contribute to the fragility of that infrastructure. These include different archiving practices between ground- and space-based missions, between sky surveys and investigator-led projects, and between observational and simulated data. Infrastructure components are tightly coupled, based on international agreements. However, the durability of these infrastructures relies on much invisible work—cataloging, metadata, and other labor conducted by information professionals. Continual investments in care and maintenance of the human and technical components of these infrastructures are necessary for sustainability.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Happy Beta Release Day, Omeka S!!"

The Roy Rosenzweig Center for History and New Media, George Mason University has released "Happy Beta Release Day, Omeka S!!."

Here's an excerpt:

Omeka S is the next-generation, open source web-publishing platform that is fully integrated into the scholarly communications ecosystem and designed to serve the needs of medium to large institutional users who wish to launch, monitor, and upgrade many sites from a single installation.

Though Omeka S is a completely new software package, it shares the same goals and principles of Omeka Classic that users have come to love: a commitment to cost-effective deployment and design, an intuitive user interface, open access to data and resources, and interoperability through standardized data.

Created with funding from The Andrew W. Mellon Foundation and the Institute of Museum and Library Services, Omeka S is engineered to ease the burdens of administrators who want to make it possible for their end-user communities to easily build their own sites that showcase digital cultural heritage materials.

See also: Omeka S Beta Technical Specs.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Provenance in Support of ANDS’ Four Transformations"

Andrew E. Treloar and Mingfang Wu have published "Provenance in Support of ANDS' Four Transformations" in the International Journal of Digital Curation.

Here's an excerpt:

This article introduces the provenance activities that are being carried out at the Australia National Data Services (ANDS). Since its beginning, ANDS has been promoting four data transformations so that Australia's research data become more valuable and reusable by researchers. Among many other activities that enable the four transformations, ANDS has been encouraging ANDS partners to capture and describe rich context at the time when a data collection is created. In 2015, ANDS funded a number of external projects that had provenance components. In addition, ANDS is working on the interoperability between the schema that is used by the ANDS research data registration and discovery service – Research Data Australia (RDA) – and the W3C recommended provenance standard, Provenance Ontology (PROV-O), and investigating how to enrich the schema to access provenance information. The article concludes by discussing the lessons we learnt and our future planned activity.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"OSS4EVA: Using Open-Source Tools to Fulfill Digital Preservation Requirements"

Marty Gengenbach et al. have published "OSS4EVA: Using Open-Source Tools to Fulfill Digital Preservation Requirements" in Code4Lib Journal.

Here's an excerpt:

This paper builds on the findings of a workshop held at the 2015 International Conference on Digital Preservation (iPRES), entitled, "Using Open-Source Tools to Fulfill Digital Preservation Requirements" (OSS4PRES hereafter). This day-long workshop brought together participants from across the library and archives community, including practitioners, proprietary vendors, and representatives from open-source projects. The resulting conversations were surprisingly revealing: while OSS' significance within the preservation landscape was made clear, participants noted that there are a number of roadblocks that discourage or altogether prevent its use in many organizations. Overcoming these challenges will be necessary to further widespread, sustainable OSS adoption within the digital preservation community. This article will mine the rich discussions that took place at OSS4PRES to (1) summarize the workshop's key themes and major points of debate, (2) provide a comprehensive analysis of the opportunities, gaps, and challenges that using OSS entails at a philosophical, institutional, and individual level, and (3) offer a tangible set of recommendations for future work designed to broaden community engagement and enhance the sustainability of open source initiatives, drawing on both participants' experience as well as additional research.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Cobweb: Collaborative Collection Development for Web Archives"

The California Digital Library has released "Cobweb: Collaborative Collection Development for Web Archives."

Here's an excerpt:

A partnership between the CDL, Harvard Library, and UCLA Library has been award funding from IMLS to create Cobweb, a collaborative collection development platform for web archiving, https://github.com/CobwebOrg/cobweb.

See also the grant proposal.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Organizational Assessment Frameworks for Digital Preservation: A Literature Review and Mapping"

Emily Maemura et al. have self-archived "Organizational Assessment Frameworks for Digital Preservation: A Literature Review and Mapping."

Here's an excerpt:

As the field of digital preservation matures, there is an increasing need to systematically assess an organization's abilities to achieve its digital preservation goals, and a wide variety of assessment tools have been created for this purpose. To map the landscape of research in this area, evaluate the current maturity of knowledge on this central question in DP and provide direction for future research, this paper reviews assessment frameworks in digital preservation through a systematic literature search and categorizes the literature by type of research. The analysis shows that publication output around assessment in digital preservation has increased markedly over time, but most existing work focuses on developing new models rather than rigorous evaluation and validation of existing frameworks.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"From Plan to Action: Successful Data Management Plan Implementation in a Multidisciplinary Project"

Margaret H. Burnette, Sarah C. Williams, and Heidi J. Imker have published "From Plan to Action: Successful Data Management Plan Implementation in a Multidisciplinary Project" in the Journal of eScience Librarianship.

Here's an excerpt:

A case study was designed to gather insights from the research group through semi-structured interviews. Questions focused on which of the recommended data management strategies were adopted and how those strategies affected the project in terms of cost, time, effectiveness, and long-term data use.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Campus Support Systems for Technical Researchers Navigating Big Data Ethics"

Bonnie Tijerina has published "Campus Support Systems for Technical Researchers Navigating Big Data Ethics" in EDUCAUSE Review.

Here's an excerpt:

A team at Data & Society recently conducted interviews and campus visits with computer science researchers and librarians at eight U.S. universities to examine the role of research librarians in assisting technical researchers as they navigate emerging issues of privacy, ethics, and equitable access to data at different phases of the research process.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Research Data Management in Social Sciences and Humanities: A Survey at the University of Lille (France)"

Joachim Schöpfel and Hélène Prost have published "Research Data Management in Social Sciences and Humanities: A Survey at the University of Lille (France)" in LIBREAS.

Here's an excerpt:

The paper presents results from a campus-wide survey at the University of Lille (France) on research data management in social sciences and humanities. The survey received 270 responses, equivalent to 15% of the whole sample of scientists, scholars, PhD students, administrative and technical staff (research management, technical support services); all disciplines were represented. The responses show a wide variety of practice and usage. The results are discussed regarding job status and disciplines and compared to other surveys. Four groups can be distinguished, i.e. pioneers (20-25%), motivated (25-30%), unaware (30%) and reluctant (5-10%). Finally, the next steps to improve the research data management on the campus are presented.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Pathways of Research Software Preservation: An Educational and Planning Resource for Service Development"

Fernando Rios has published "The Pathways of Research Software Preservation: An Educational and Planning Resource for Service Development" in D-Lib Magazine.

Here's an excerpt:

Research communities, funders, publishers, and academic libraries have put much effort towards ensuring that research data are preserved. However, the same level of attention has not been given to the associated software used to process and analyze it. As a guide to those tasked with preserving research outputs, a novel visual representation of preservation approaches relevant to research software, termed the Pathways of Research Software Preservation, is presented. The Pathways are discussed in the context of service development within the Data Management Services group at Johns Hopkins University.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Towards Narrowing the Curation Gap—Theoretical Considerations and Lessons Learned from Decades of Practice"

Ana Sesartić, Andreas Fischlin, and Matthias Töwe ave published "Towards Narrowing the Curation Gap-Theoretical Considerations and Lessons Learned from Decades of Practice" in the ISPRS International Journal of Geo-Information.

Here's an excerpt:

Research as a digital enterprise has created new, often poorly addressed challenges for the management and curation of research to ensure continuity, transparency, and accountability. There is a common misunderstanding that curation can be considered at a later point in the research cycle or delegated or that it is too burdensome or too expensive due to a lack of efficient tools. This creates a curation gap between research practice and curation needs. We argue that this gap can be narrowed if curators provide attractive support that befits research needs and if researchers consistently manage their work according to generic concepts consistently from the beginning. A rather uniquely long-term case study demonstrates how such concepts have helped to pragmatically implement a research practice intentionally using only minimalist tools for sustained, self-contained archiving since 1989. The paper sketches the concepts underlying three core research activities. (i) handling of research data, (ii) reference management as part of scholarly publishing, and (iii) advancing theories through modelling and simulation. These concepts represent a universally transferable best research practice, while technical details are obviously prone to continuous change. We hope it stimulates researchers to manage research similarly and that curators gain a better understanding of the curation challenges research practice actually faces.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Academic Data Librarian Profession in Canada: History and Future Directions"

S. Vincent Gray and Elizabeth Hill have self-archived "The Academic Data Librarian Profession in Canada: History and Future Directions."

Here's an excerpt:

From the 1970s onward, Canadians have been active in developing services and establishing structures to support the dissemination of data. In recent years the academic data profession in Canada has largely developed around access to data from the national statistics agency, Statistics Canada, and around the services which have been developed to permit access to these data. This chapter will provide a historical background for these activities and explain how current and emerging trends continue to affect the profession.

Research Data Curation Bibliography, Version 6. Over 560 works. Over 200 works added. Live links. Selected abstracts. OA. CC-BY License. Covers topics such as research data creation, acquisition, metadata, repositories, provenance, management, policies, support services, funding agency requirements, peer review, publication, citation, sharing, reuse, and preservation.

"Scholarly Communication and Data"

Hailey Mooney has self-archived "Scholarly Communication and Data."

Here's an excerpt:

The purpose of this chapter is to provide foundational knowledge for the data librarian by developing an understanding of the place of data within the current paradigm of networked digital scholarly communication. This includes defining the nature of data and data publications, examining the open science movement and its effects on data sharing, and delving into the challenges inherent to the wider integration of data into the scholarly communication system and the academic library

Research Data Curation Bibliography, Version 6. Over 560 works. Over 200 works added. Live links. Selected abstracts. OA. CC-BY License. Covers topics such as research data creation, acquisition, metadata, repositories, provenance, management, policies, support services, funding agency requirements, peer review, publication, citation, sharing, reuse, and preservation.

Preserving Transactional Data

The Digital Preservation Coalition, UK Data Service, and Charles Beagrie Ltd. have released Preserving Transactional Data .

Here's an excerpt from the announcement:

This report tackles the requirements for preserving transactional data and the accompanying challenges facing companies and institutions that aim to re-use these data for analysis or research, presenting the issues and strategies which emphasize preservation practices that facilitate re-use and reproducibility.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Revisiting the Data Lifecycle with Big Data Curation"

Line Pouchard has published "Revisiting the Data Lifecycle with Big Data Curation" in the International Journal of Digital Curation.

Here's an excerpt:

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

The article is under a Creative Commons Attribution 2.0 UK: England & Wales License.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Research Data Curation Bibliography, Version 6

Digital Scholarship has released Version 6 of the Research Data Curation Bibliography. This selective bibliography includes over 560 English-language articles, books, and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions. Over 200 new works have been added to the bibliography since version five.

The Research Data Curation Bibliography covers topics such as research data creation, acquisition, metadata, repositories, provenance, management, policies, support services, funding agency requirements, peer review, publication, citation, sharing, reuse, and preservation.

Most sources have been published from January 2009 through May 2016; however, a limited number of earlier key sources are also included. The bibliography includes links to freely available versions of included works. If such versions are unavailable, links to the publishers' descriptions are provided.

Abstracts are included in this bibliography if a work is under a Creative Commons Attribution License (BY and national/international variations), a Creative Commons public domain dedication (CC0), or a Creative Commons Public Domain Mark and this is clearly indicated in the work.

The Research Data Curation Bibliography is under a Creative Commons Attribution 4.0 International License.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap