"DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization"


Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter-university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine-readability of data and its documentation. There are opportunities to enhance dataset search by improving users’ ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot-based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives’ and institutional repositories’ ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.

https://arxiv.org/abs/2305.18358

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Open Access Books through Open Data Sources: Assessing Prevalence, Providers, and Preservation"


In total, 396,995 unique records were identified from the OA book bibliometric sources, of which 19% were found to be included in at least one of the preservation services. The results suggest reason for concern for the long tail of OA books distributed at thousands of different web domains as these include volatile cloud storage or sometimes no longer contained the files at all.

https://doi.org/10.1108/JD-02-2023-0016

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Scholarship Has Released the Academic Libraries and Research Data Management Bibliography

The Academic Libraries and Research Data Management Bibliography includes over 345 selected English-language articles and books that are useful in understanding how academic libraries plan for, implement, provide, evaluate, and conduct studies about research data management (RDM) services. Most sources have been published from 2012 through 2023. It includes full abstracts for works under certain Creative Commons Licenses. It is available as a website and a website PDF with live links.

Digital Scholarship’s other bibliographies about research data curation include the Research Data Curation and Management Bibliography (over 800 works), the Research Data Publication and Citation Bibliography (over 225 works), and the Research Data Sharing and Reuse Bibliography (over 200 works).

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"FAIR in Action — A Flexible Framework to Guide FAIRification"


The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with several major public-private partnership projects, demonstrating and delivering improvements across all aspects of FAIR and across a variety of datasets and their contexts. We therefore managed to establish the reproducibility and far-reaching applicability of our approach to FAIRification tasks.

https://doi.org/10.1038/s41597-023-02167-2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Replayed: Essential Writings on Software Preservation and Game Histories


Since the early 2000s, Henry Lowood has led or had a key role in numerous initiatives devoted to the preservation and documentation of virtual worlds, digital games, and interactive simulations, establishing himself as a major scholar in the field of game studies. . . . Replayed consolidates Lowood’s far-flung and significant publications on these subjects into a single volume.

https://www.press.jhu.edu/books/title/12805/replayed

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Global Trends in Digital Preservation: Outsourcing versus In-House Practices"


This study aimed at investigating the trends toward digital preservation in terms of in-house activities versus outsourcing by systematically reviewing the extant literature. . . . . The meta-analysis of the final studies affirms a strong global preference of libraries, archives, and other cultural and memory organizations toward in-house activities for the preservation of their digital objects and collections compared to outsourcing digital preservation activities by third parties.

https://doi.org/10.1177/09610006231173461

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Reference Rot in the Digital Humanities Literature: An Analysis of Citations Containing Website Links in DHQ"


Link rot is likely most familiar in the form of "404 Not Found" error messages, but there are other less prominent obstacles to accessing web content. Our study examines instances of link rot in Digital Humanities Quarterly articles and its impact on the ability to access the online content referenced in these articles after their publication. . . .

Our data shows that a significant number of works cited no longer exist, are inaccessible, or have additional barriers to access. Instances of link rot increase with time. Additionally, there is a higher frequency and higher proportion of links contained in DHQ articles, showing that internet resources are a critical part of the DH literature. Taken together, the combined result is a persistent and cumulative threat to the integrity and stability of the DH literature, and one that is even more alarming when compared to other disciplines.

https://bit.ly/42iZq3t

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Integrating Preservation into Librarian Workflows"


The core mission of libraries is to ensure perpetual access to the record of knowledge. As a review of the NASIG webinar (formerly North American Serials Interest Group), "Integrating Preservation into Librarian Workflows," by Jill Emery and Sunshine Carter, this article examines working models constructed to sustain perpetual access for their institutional communities. In reflecting on these data-intensive practices, both presenters now recognize that previously impactful collection development business decisions were being made in the dark. Reviewing the webinar also reveals that this issue of preservation access has two critically distinct aspects, which should not be conflated as interchangeable. One is concerned with long-term preservation and the other addresses a library’s ability to provide post-cancellation access to its user community, given budgetary or physical space constraints. The following is an analysis of how effective the processes explored in the webinar are in addressing both post-cancellation access and long-term perpetual access goals. Based on a 2018 NASIG survey, results indicated that many organizations in scholarly communications lacked preservation policies. In June 2022, as a result of the survey, NASIG released the model digital preservation policy as a template to guide consequential and explicit decision-making by addressing issues including scope, roles, responsibilities, tools and techniques. These policy issues are important for librarians to understand before negotiating content licenses, in sustaining long-term discovery and access, and when developing collaborative access frameworks to address collection development and maintenance challenges.

https://doi.org/10.1629/uksg.614

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

WorldFAIR Project (D13.2) Cultural Heritage Image Sharing Recommendations Report


Deliverable 13.2 aims to build on our understanding of what it means to support FAIR in the sharing of image data derived from GLAM collections. This report looks at previous efforts by the sector towards FAIR alignment and presents 5 recommendations designed to be implemented and tested at the DRI that are also broadly applicable to the work of the GLAMs. The recommendations are ultimately a roadmap for the Digital Repository of Ireland (DRI) to follow in improving repository services, as well as a call for continued dialogue around "what is FAIR?" within the cultural heritage research data landscape.

https://doi.org/10.5281/zenodo.7897243

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Rethinking Transparency and Rigor from a Qualitative Open Science Perspective"


To further complicate matters, many qualitative researchers would posit that while secondary data are a combination of the researcher’s perceptions and observations, even primary data, such as interview transcripts, are filtered to some extent through the researcher. This is because, in qualitative research, the researcher is an instrument of both data collection and analysis . . . .

The researcher-as-instrument tradition also complicates discussions around reproducibility (i.e., the ability for another researcher to look at someone’s data and reproduce the analyses), one of the key components of rigor as it is currently discussed in the open science movement (NIH, n.d.). Quantitative researchers’ focus on reproducibility is often contrary to the tenets of qualitative research, particularly in methodologies aiming to uncover new ways of knowing, such as constructivist and grounded theory approaches. If one understands the researcher as a data collection instrument and a filter through which data is processed, strict quantitative-focused reproducibility becomes less likely—not through misconduct or error, but because ultimately, people conduct research, and people are not likely to have exactly the same perspectives. Guidelines that reinforce reproducibility without addressing this tension are not going to be useful for all researchers.

https://bit.ly/3MEbtnk

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Pilot Study to Locate Historic Scientific Data in a University Archive"


Historic data in analog (or print) format is a valuable resource that is utilized by scientists in many fields. This type of data may be found in various locations on university campuses including offices, labs, storage facilities, and archives. This study investigates whether biological data held in one institutional university archives could be identified, described, and thus made potentially useful for contemporary life scientists. Scientific data was located and approximately half of it was deemed to be of some value to current researchers and about 20% included enough information for the study to be repeated. Locating individual data sets in the collections at the University Archives at the University of Minnesota proved challenging. This preliminary work points to possible ways to move forward to make raw data in university archives collections more discoverable and likely to be reused. It raises questions that can help inform future work in this area.

https://bit.ly/41JBMNb

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Initial Insight Into Three Modes of Data Sharing: Prevalence of Primary Reuse, Data Integration and Dataset Release in Research Articles"


While data sharing has received research interest in recent times, its real status remains unclear, owing to its ambiguous concept. To understand the current status of data sharing, this study examined primary reuse, data integration, and dataset release as the actual practices of data sharing. A total of 963 articles, chosen from those published in 2018 and registered in the Web of Science global citation database, were manually checked. Existing data were reused in the mode of data integration (13.3%) as frequently as they were for the mode of primary reuse (12.1%). Dataset release was the least common mode (9.0%). The results show the variation in data sharing and indicate the need for standardization of data description in articles based on thorough registration and expansion in public data archives to close the loop that results in the virtuous cycle of research data.

https://doi.org/10.1002/leap.1546

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"’We Share All Data with Each Other’: Data-Sharing in Peer-to-Peer Relationships"


The analysis identifies three social forms of data-sharing in peer-to-peer relationships: (a) closed communal sharing, which is based on a feeling of belonging together; (b) closed associative sharing, in which the participants act on the basis of an agreement; and (c) open associative sharing, which is oriented to “institutional imperatives” (Merton) and to formal regulations. The study shows that far more data-sharing is occurring in scientific practice than seems to be apparent from a concept of open data alone.

https://doi.org/10.1007/s11024-023-09487-y

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

FADGI: Technical Guidelines for Digitizing Cultural Heritage Materials, Third Edition


The Technical Guidelines for Digitizing Cultural Heritage Materials: Third Edition (linked below) were developed by the Still Image Working Group in 2022-2023. This document is an update of the 2016 Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files. The latest revision of the guidelines expands on earlier works and incorporates new material reflecting the advances in imaging science and cultural heritage imaging best practice. The Guidelines include shared best practices for still image materials (e.g., textual content, maps, and photographic prints and negatives) followed by agencies participating in the Federal Agencies Digital Guidelines Initiative (FADGI).These guidelines are intended to be used in conjunction with digital image conformance evaluation targets and software. Together, these guidelines and appropriate testing and monitoring systems provide the foundation for a FADGI-conforming digitization program.

https://bit.ly/3Bn6hOm

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Open Science: A Practical Guide for Early-Career Researchers


Beginning researchers are an important link in the transition to Open Science, so this guide is aimed at PhD candidates, Research Master Students, and early-career researchers from all disciplines at Dutch universities and research institutes. [This guide will be very useful to non-Dutch researchers.] It is designed to accompany researchers in every step of their research, from the phase of preparing your research project and discovering relevant resources (chapter 2) to the phase of data collection and analysis (chapter 3), writing and publishing articles, data, and other research output (chapter 4), and outreach and assessment (chapter 5). Every chapter provides you with the best tools and practices to implement immediately.

https://doi.org/10.5281/zenodo.7716152

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Scholarship Has Released Digital Curation Certificate and Master’s Degree Programs

Digital Scholarship has released Digital Curation Certificate and Master’s Degree Programs. This document describes digital curation certificate and master’s degree programs in North America, identifying those that are online. It does not cover individualized certificate programs, such as those at Indiana University Bloomington or the University of Illinois Urbana-Champaign. Nor does it cover digital curation specializations within MLS and other master’s degree programs in iSchools. It is available as a website and a website PDF with live links.

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How and Why Do Researchers Reference Data? A Study of Rhetorical Features and Functions of Data References in Academic Articles"


Data reuse is a common practice in the social sciences. While published data play an essential role in the production of social science research, they are not consistently cited, which makes it difficult to assess their full scholarly impact and give credit to the original data producers. Furthermore, it can be challenging to understand researchers’ motivations for referencing data. Like references to academic literature, data references perform various rhetorical functions, such as paying homage, signaling disagreement, or drawing comparisons. This paper studies how and why researchers reference social science data in their academic writing. We develop a typology to model relationships between the entities that anchor data references, along with their features (access, actions, locations, styles, types) and functions (critique, describe, illustrate, interact, legitimize). We illustrate the use of the typology by coding multidisciplinary research articles (n = 30) referencing social science data archived at the Inter-university Consortium for Political and Social Research (ICPSR). We show how our typology captures researchers’ interactions with data and purposes for referencing data. Our typology provides a systematic way to document and analyze researchers’ narratives about data use, extending our ability to give credit to data that support research.

https://doi.org/10.5334/dsj-2023-010

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Good, Better, Best: Practices in Archiving & Preserving Open Access Monographs


Good, Better, Best: Practices in Archiving & Preserving Open Access Monographs brings together the project’s growing knowledge and understanding around this community of practice, as well as reports on the Work Package’s research and development over the course of the project.

Following an introduction chapter giving a brief background landscape summary alongside employed methodologies, Chapter 2, "A basic guidebook for the small and scholar-led press" considers good, better, and best practices around file formats, metadata, content packaging, existing routes to digital publication archives, archiving and preservation workflows, and challenges surrounding copyright, reuse, and licensing. Additional chapters detail the repository workflow experimentations, both manual and automated, as well as successful proof-of-concept archiving in two online repositories: one, and institutional repository, and the other, the Internet Archive. Along with a chapter (Chapter 6) that explores the current understanding around implications for archiving and preserving complex and experimental monographs, two further chapters (7 and 8) look at future work: the expansion and development of the Thoth Archiving Network and the new Open Book Futures project, beginning May 2023. Appendices include signposting to toolkits, guides, and resources, as well as a brief glossary that provides links to more comprehensive archiving and preservation glossaries already in existence. We hope this will be a useful resource for the small and scholar-led press community and beyond.

https://doi.org/10.5281/zenodo.7876047

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Data Sharing in the Context of Community-Engaged Research Partnerships"


Over the past 20 years, the National Institutes for Health (NIH) has implemented several policies designed to improve sharing of research data, such as the NIH public access policy for publications, NIH genomic data sharing policy, and National Cancer Institute (NCI) Cancer Moonshot public access and data sharing policy. . . . Important questions that we must consider as data sharing is expanded are to whom do benefits of data sharing accrue and to whom do benefits not accrue? In an era of growing efforts to engage diverse communities in research, we must consider the impact of data sharing for all research participants and the communities that they represent.

We examine the issue of data sharing through a community-engaged research lens, informed by a long-standing partnership between community-engaged researchers and a key community health organization (Kruse et al., 2022). We contend that without effective community engagement and rich contextual knowledge, biases resulting from data sharing can remain unchecked. We provide several recommendations that would allow better community engagement related to data sharing to ensure both community and researcher understanding of the issues involved and move toward shared benefits. By identifying good models for evaluating the impact of data sharing on communities that contribute data, and then using those models systematically, we will advance the consideration of the community perspective and increase the likelihood of benefits for all.

https://doi.org/10.1016/j.socscimed.2023.115895

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Estimating Social Bias in Data Sharing Behaviours: An Open Science Experiment"


Open data sharing is critical for scientific progress. Yet, many authors refrain from sharing scientific data, even when they have promised to do so. Through a preregistered, randomized audit experiment (N = 1,634), we tested possible ethnic, gender and status-related bias in scientists’ data-sharing willingness. 814 (54%) authors of papers where data were indicated to be ‘available upon request’ responded to our data requests, and 226 (14%) either shared or indicated willingness to share all or some data. While our preregistered hypotheses regarding bias in data-sharing willingness were not confirmed, we observed systematically lower response rates for data requests made by putatively Chinese treatments compared to putatively Anglo-Saxon treatments. Further analysis indicated a theoretically plausible heterogeneity in the causal effect of ethnicity on data-sharing. In interaction analyses, we found indications of lower responsiveness and data-sharing willingness towards male but not female data requestors with Chinese names. These disparities, which likely arise from stereotypic beliefs about male Chinese requestors’ trustworthiness and deservingness, impede scientific progress by preventing the free circulation of knowledge.

https://doi.org/10.1038/s41597-023-02129-8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "We Need a Plan D"


Researchers, institutions and funders should collaborate to develop an overarching strategy for data preservation — a plan D. There will doubtless be calls for a ‘PubMed Central for data’. But what we really need is a federated system of repositories with functionality tailored to the information that they archive. This will require domain experts to agree standards for different types of data from different fields: what should be archived and when, which format, where, and for how long.

https://doi.org/10.1038/s41592-023-01817-y

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Continuity and Discontinuity in Web Archives"


Web archival materials are not direct traces of the web, they are direct traces of crawlers. By design, the structure of web archives limits our collective capacity to explore the memory of the Web. These structural issues induce temporal discontinuities in the archives such as inconsistency, redundancy and blindness. In this paper, we address the question of re-injecting continuity within large corpora of web archives. We thus introduce the notions of persistences (series of time-stable snapshots of archived web pages) and continuity spaces (networks of time-consistent persistences). We demonstrate how { on the basis of a quality score { persistences can be used to select subsets of web archives within which in-depth historical analysis can be conducted at scale. We next propose to make use of a new visualization approach called the web cernes to graphically reconstruct the multi-level evolution of an archived web site. We finally apply our framework to study the archives of the firsttuesday movement: a constellation of networking web sites that acted in the interest of the economical growth of the web in the early 2000’s.

https://hal.science/hal-04057507

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Science Journals Integrate Dryad to Simplify Data Deposition and Strengthen Scientific Reproducibility"


The Science family journals have announced a partnership with the nonprofit data repository Dryad that simplifies the process by which authors deposit data underlying new work — a critical step to facilitating data’s routine reuse. The partnership is yet another step taken by the Science journals to ensure data the scientific community requires to verify, replicate and reanalyze new research is openly available. . . .

Because the partnership with Dryad integrates Dryad’s platform with the Science family journal’s submission process, authors will have the option to deposit data at Dryad directly from the submission site of any Science family journal. As authors submit research to the journals, they will be prompted about data availability and welcome to deposit their data to any suitable disciplinary repository. But, if data do not yet have a home, authors will have the opportunity to upload their data to Dryad. . . .

To ensure that this service is widely available, the Science journals will cover costs of Dryad data publication for accepted papers.

http://bit.ly/43wtVoD

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Guest Post — Why Interoperability Matters for Open Research — And More Than Ever"


The question remains, why have we not achieved more in delivering connectivity across the research system? While funding for this kind of underpinning infrastructure is notable in its absence (or where it is available it is often too temporary in nature), the other major challenge is in securing adoption among the service providers (funders, publishers, and institutions among the key players) that would maximize the use and potential of building those connections. It is notoriously hard for organisations to tweak or adapt existing workflows and legacy systems and to demonstrate the benefits (and hence prioritise the work) at an individual organisation level that may seem obvious at a system level.

https://cutt.ly/K7hxFQz

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |