Internet Archive: "New Feature Alert: Access Archived Webpages Directly through Google Search"


In a significant step forward for digital preservation, Google Search is now making it easier than ever to access the past. Starting today, users everywhere can view archived versions of webpages directly through Google Search, with a simple link to the Internet Archive’s Wayback Machine. . . .

To access this new feature, conduct a search on Google as usual. Next to each search result, you’ll find three dots—clicking on these will bring up the “About this Result” panel. Within this panel, select “More About This Page” to reveal a link to the Wayback Machine page for that website.

Through this direct link, you’ll be able to view previous versions of a webpage via the Wayback Machine, offering a snapshot of how it appeared at different points in time.

https://tinyurl.com/ms749s28

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"3D Data Long-Term Preservation in Cultural Heritage"


The report explores the challenges and strategies for preserving 3D digital data in cultural heritage. It discusses the issue of technological obsolescence, emphasising the need for ustainable storage solutions and ongoing data management strategies. Key topics include understanding technological obsolescence, the lifecycle of digital content, digital continuity, data management plans (DMP), FAIR principles, and the use of public repositories. The report also covers the importance of metadata in long-term digital preservation, including types of metadata and strategies for building valuable metadata. It examines the evolving standards and interoperability in 3D format preservation and the importance of managing metadata and paradata. The document provides a comprehensive overview of the challenges and solutions for preserving 3D cultural heritage data in the long term.

https://arxiv.org/abs/2409.04507

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"What Needs to be Learned by U.S. Cultural Heritage Professionals? Results from the Digital Preservation Outreach & Education Network"


With the current proliferation of training opportunities available in digital preservation, this study asks: what are the most in demand digital preservation instruction topics? To answer this question, we did a qualitative content analysis of 168 Professional Development Support applications received by the Digital Preservation Outreach and Education Network (DPOE-N) between September 2020 and December 2023. The study finds that the management of digital records and metadata/cataloging standards were the most requested training topics, and that general and broadly applicable skills tend to be the most sought after. This indicates that there is a continuing need to provide education focusing on the core elements of digital preservation and knowledge, and that we have not moved on yet to a place where cultural heritage professionals are solely seeking skills in more advanced or specialized digital preservation topics.

https://doi.org/10.1515/pdtc-2024-0024

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: British Library: "User-Centred Collecting for Emerging Formats"


This paper provides an overview of the work conducted at legal deposit libraries to better understand access requirements for emerging formats, from a user’s perspective and with a focus on web-based interactive narratives. . . . It also considers how existing tools and methodologies, such as web archiving, can be adapted and built to support the collection of emerging formats. Finally, it delves into different research projects conducted at the British Library around archiving and performing quality assurance for interactive narratives, collecting contextual information, and lessons learnt from exhibiting born-digital content in a physical space.

https://doi.org/10.1080/13614568.2024.2389101

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Research on the Generation Mechanism and Action Mechanism of Scientific Data Reuse Behavior"


Specifically, this study takes scientific data reuse attitudes as a breakthrough to discuss the factors that influence researchers’ scientific data reuse attitudes and the extent to which these factors influence scientific data reuse behaviors. It also further explores the impact of scientific data reuse behavior on research and innovation performance and the moderating effect of scientific data services on scientific data reuse behavior.

https://doi.org/10.1016/j.acalib.2024.102921

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Artificial Intelligence Assisted Curation of Population Groups in Biomedical Literature "


Curation of the growing body of published biomedical research is of great importance to both the synthesis of contemporary science and the archiving of historical biomedical literature. Each of these tasks has become increasingly challenging given the expansion of journal titles, preprint repositories and electronic databases. Added to this challenge is the need for curation of biomedical literature across population groups to better capture study populations for improved understanding of the generalizability of findings. To address this, our study aims to explore the use of generative artificial intelligence (AI) in the form of large language models (LLMs) such as GPT-4 as an AI curation assistant for the task of curating biomedical literature for population groups. We conducted a series of experiments which qualitatively and quantitatively evaluate the performance of OpenAI’s GPT-4 in curating population information from biomedical literature. Using OpenAI’s GPT-4 and curation instructions, executed through prompts, we evaluate the ability of GPT-4 to classify study ‘populations’, ‘continents’ and ‘countries’ from a previously curated dataset of public health COVID-19 studies.

Using three different experimental approaches, we examined performance by: A) evaluation of accuracy (concordance with human curation) using both exact and approximate string matches within a single experimental approach; B) evaluation of accuracy across experimental approaches; and C) conducting a qualitative phenomenology analysis to describe and classify the nature of difference between human curation and GPT curation. Our study shows that GPT-4 has the potential to provide assistance in the curation of population groups in biomedical literature. Additionally, phenomenology provided key information for prompt design that further improved the LLM’s performance in these tasks. Future research should aim to improve prompt design, as well as explore other generative AI models to improve curation performance. An increased understanding of the populations included in research studies is critical for the interpretation of findings, and we believe this study provides keen insight on the potential to increase the scalability of population curation in biomedical studies.

https://doi.org/10.2218/ijdc.v18i1.950

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Unfolding the Downloads of Datasets: A Multifaceted Exploration of Influencing Factors"


Scientific data are essential to advancing scientific knowledge and are increasingly valued as scholarly output. Understanding what drives dataset downloads is crucial for their effective dissemination and reuse. Our study, analysing 55,473 datasets from 69 data repositories, identifies key factors driving dataset downloads, focusing on interpretability, reliability, and accessibility. We find that while lengthy descriptive texts can deter users due to complexity and time requirements, readability boosts a dataset’s appeal. Reliability, evidenced by factors like institutional reputation and citation counts of related papers, also significantly increases a dataset’s attractiveness and usage. Additionally, our research shows that open access to datasets increases their downloads and amplifies the importance of interpretability and reliability. This indicates that easy access enhances the overall attractiveness and usage of datasets in the scholarly community. By emphasizing interpretability, reliability, and accessibility, this study offers a comprehensive framework for future research and guides data management practices toward ensuring clarity, credibility, and open access to maximize the impact of scientific datasets.

https://doi.org/10.1038/s41597-024-03591-8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Curation is Communal: Transparency, Trust, and (In)visible Labour "


Research about trust and transparency within the realm of research data management and sharing typically centres on accreditation and compliance. Missing from many of these conversations are the social systems and enabling structures that are built on interpersonal connections. As members of the Data Curation Network (DCN), a consortium of United States-based institutional and non-profit data repositories, we have experienced first-hand the effort required to develop and sustain interpersonal trust and the benefits it provides to curation. In this paper, we reflect on the well-documented realities of curator and labour invisibility; the importance of fostering active communities (such as the DCN); and how trust, vulnerability and connectivity among colleagues leads to better curation practices. Through an investigation into data curators in the DCN, we found that, while curation can be isolating and invisible work, having a network of trusted peers helps alleviate these burdens and makes us better curators. We conclude with practical suggestions for implementing trust and transparency in relationships with colleagues and researchers.

https://doi.org/10.2218/ijdc.v18i1.938

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Closing Gaps: A Model of Cumulative Curation and Preservation Levels for Trustworthy Digital Repositories "


Curation and preservation measures carried out by digital repository staff are an important building block in maintaining the accessibility and usability of digital resources over time. The measures adequate to achieve long-term usability for a given audience strongly depend on scenarios of (re)use, the (intended) users’ needs and skills, the organisational setting (e.g., mission, resources, policies), as well as the characteristics of the digital objects to be preserved. The assessment of curation and preservation measures also forms an important part of existing certification procedures for trustworthy digital repositories (TDRs) as offered, for example, by the CoreTrustSeal foundation, the nestor network, or ISO.

The digital curation community is presented with the challenge of finding community-, organisation-, and object-specific approaches to curation and preservation at the same time as defining the minimum level of curation and preservation measures expected from a TDR in sufficiently generic terms to ensure applicability to a wide array of repositories. Against this backdrop, this paper discusses the need for and benefits of community-agreed levels of curation and preservation to address this challenge, and considers the tiered model proposed by the CoreTrustSeal Board as an example.

The proposed model is then applied in an analysis of successful CoreTrustSeal applications from 2018–2022 in an effort to better understand the capacity of the curation and preservation levels to capture the respective practices of repositories and to identify potential gaps.

https://doi.org/10.2218/ijdc.v18i1.926

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Ten Simple Rules for Recognizing Data and Software Contributions in Hiring, Promotion, and Tenure"


The ways in which promotion and tenure committees operate vary significantly across universities and departments. While committees often have the capability to evaluate the rigor and quality of articles and monographs in their scientific field, assessment with respect to practices concerning research data and software is a recent development and one that can be harder to implement, as there are few guidelines to facilitate the process. More specifically, the guidelines given to tenure and promotion committees often reference data and software in general terms, with some notable exceptions such as guidelines in [5] and are almost systematically trumped by other factors such as the number and perceived impact of journal publications. The core issue is that many colleges establish a scholarship versus service dichotomy: Peer-reviewed articles or monographs published by university presses are considered scholarship, while community service, teaching, and other categories are given less weight in the evaluation process. This dichotomy unfairly disadvantages digital scholarship and community-based scholarship, including data and software contributions [6]. In addition, there is a lack of resources for faculties to facilitate the inclusion of responsible data and software metrics into evaluation processes or to assess faculty’s expertise and competencies to create, manage, and use data and software as research objects. As a result, the outcome of the assessment by the tenure and promotion committee is as dependent on the guidelines provided as on the committee members’ background and proficiency in the data and software domains.

The presented guidelines aim to help alleviate these issues and align the academic evaluation processes to the principles of open science. We focus here on hiring, tenure, and promotion processes, but the same principles apply to other areas of academic evaluation at institutions. While these guidelines are by no means sufficient for handling the complexity of a multidimensional process that involves balancing a large set of nuanced and diverse information, we hope that they will support an increasing adoption of processes that recognize data and software as key research contributions.

https://doi.org/10.1371/journal.pcbi.1012296

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Transparent Disclosure, Curation & Preservation of Dynamic Digital Resources "


This paper explores an enhanced curation lifecycle being developed at the UK Data Service (UKDS), with our Data Product Builder. Through a Graphical User Interface, we aim to provide the researcher with a tailored digital resource. We detail the threefold motivation behind this initiative: data dissemination scalability, researcher satisfaction and the reduction of nationwide duplication of research effort.

Subsequent sections detail the technical components and challenges involved. In addition to more standard data subsetting, filtering and linking components, this data dissemination platform offers dynamic disclosure assessments – identifying combinations of variables that present a potential disclosure risk. All components are underpinned by the Data Documentation Initiative’s new Cross-Domain Integration standard (DDI-CDI), designed to handle the many structures in which data may be organised.

Ever conscious of the scale of the task we are embarking on, we remain motivated by the need for such advances in data dissemination and optimistic of the feasibility of such a system to meet the needs of the researcher while balancing the data disclosivity concerns of the data depositor.

https://doi.org/10.2218/ijdc.v18i1.937

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Sharing Practices of Software Artefacts and Source Code for Reproducible Research"


While source code of software and algorithms depicts an essential component in all fields of modern research involving data analysis and processing steps, it is uncommonly shared upon publication of results throughout disciplines. Simple guidelines to generate reproducible source code have been published. Still, code optimization supporting its repurposing to different settings is often neglected and even less thought of to be registered in catalogues for a public reuse. Though all research output should be reasonably curated in terms of reproducibility, it has been shown that researchers are frequently non-compliant with availability statements in their publications. These do not even include the use of persistent unique identifiers that would allow referencing archives of code artefacts at certain versions and time for long-lasting links to research articles. In this work, we provide an analysis on current practices of authors in open scientific journals in regard to code availability indications, FAIR principles applied to code and algorithms. We present common repositories of choice among authors. Results further show disciplinary differences of code availability in scholarly publications over the past years. We advocate proper description, archiving and referencing of source code and methods as part of the scientific knowledge, also appealing to editorial boards and reviewers for supervision.

https://doi.org/10.1007/s41060-024-00617-7

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Back to Basics: Considering Categories of Data Services Consults"


Consultations are fundamental to data librarianship, serving as a vital means of one-on-one support for researchers. However, the topics and forms of support unique to data services consults are not always carefully considered. This commentary addresses five common services offered by data librarians—dataset reference, data management support, data analysis and software support, data curation, and data management (and sharing) plan writing—and considers strategies for successful patron support within the boundaries of a consultation.

https://doi.org/10.7191/jeslib.931

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Constructing Risk in Trustworthy Digital Repositories"


This article investigates the construction of risk within trustworthy digital repository audits. It contends that risk is a social construct, and social factors influence how stakeholders in digital preservation processes comprehend and react to risk.

https://doi.org/10.1108/JD-08-2023-0157

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Promoting Data Sharing: The Moral Obligations of Public Funding Agencies"


Sharing research data has great potential to benefit science and society. However, data sharing is still not common practice. Since public research funding agencies have a particular impact on research and researchers, the question arises: Are public funding agencies morally obligated to promote data sharing? We argue from a research ethics perspective that public funding agencies have several pro tanto obligations requiring them to promote data sharing. However, there are also pro tanto obligations that speak against promoting data sharing in general as well as with regard to particular instruments of such promotion. We examine and weigh these obligations and conclude that all things considered funders ought to promote the sharing of data. Even the instrument of mandatory data sharing policies can be justified under certain conditions.

https://doi.org/10.1007/s11948-024-00491-3

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Sustaining a Spatial Collaboration: Leveraging Social Infrastructure to Support Technological Advancement in Geospatial Data Discovery"


The Big Ten Academic Alliance (BTAA) Geospatial Information Network (GIN) serves as a prime example of an enduring, successful collaboration across multiple institutions. The proliferation and significance of geospatial data have outpaced the development of adequate search tools and high-quality metadata, prompting the necessity for streamlined geospatial data discovery. In response, the BTAA-GIN established, maintains, and continuously enhances a geoportal that federates metadata from public geospatial data providers in addition to geospatial resources from member institutions. This article provides a reflective analysis of the evolution of the BTAA-GIN over the past nine years, an exploration of the ever-shifting open-source technology landscape, possible future directions and potential expansions of scope, and highlights key features contributing to its success. Examining the trajectory of the BTAA-GIN and the factors behind its achievements yields valuable insights for comparable large-scale, multi-institutional endeavors.

https://doi.org/10.1080/15420353.2024.2388576

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Data Quality Assurance Practices in Research Data Repositories — A Systematic Literature Review"


This study conducted a systematic analysis of data quality assurance (DQA) practices in RDRs, guided by activity theory and data quality literature, resulting in conceptualizing a data quality assurance model (DQAM) for RDRs. DQAM outlines a DQA process comprising evaluation, intervention, and communication activities and categorizes 17 quality dimensions into intrinsic and product-level data quality. It also details specific improvement actions for data products and identifies the essential roles, skills, standards, and tools for DQA in RDRs. By comparing DQAM with existing DQA models, the study highlights its potential to improve these models by adding a specific DQA activity structure.

https://doi.org/10.1002/asi.24948

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Metadata Management in Data Lake Environments: A Survey"


Data lakes are storage repositories that contain large amounts of data in its native format; either structured ssemi-structured or unstructured, to be used when needed. . . .This survey congregates different facets of metadata management in data lakes and presents a global view along with the technological implications and the required features for building successful metadata management systems. Besides, this survey summarizes and discusses research gaps, open problems and main challenges facing both industrialists and academics.

https://doi.org/10.1080/19386389.2024.2359310

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Reproducible and Attributable Materials Science Curation Practices: A Case Study"


While small labs produce much of the fundamental experimental research in Material Science and Engineering (MSE), little is known about their data management and sharing practices and the extent to which they promote trust in and transparency of the published research. In this research, a case study is conducted on a leading MSE research lab [at MIT] to characterize the limits of current data management and sharing practices concerning reproducibility and attribution. The workflows are systematically reconstructed, underpinning four research projects by combining interviews, document review, and digital forensics. Then, information graph analysis and computer-assisted retrospective auditing are applied to identify where critical research information is unavailable orat risk.

Data management and sharing practices in this leading lab protect against computer and disk failure; however, they are insufficient to ensure reproducibility or correct attribution of work,especiallywhen a group member withdraws before the project completion.Therefore, recommendations for adjustments in MSE data management and sharing practices are proposed to promote trustworthiness and transparency by adding lightweight automated file-level auditing and automated data transfer processes.

https://doi.org/10.2218/ijdc.v18i1.940

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "What Is Research Data ‘Misuse’? And How Can It Be Prevented or Mitigated?"


In the article, we emphasize the challenge of defining misuse broadly and identify various forms that misuse can take, including methodological mistakes, unauthorized reuse, and intentional misrepresentation. We pay particular attention to underscoring the complexity of defining misuse, considering different epistemological perspectives and the evolving nature of scientific methodologies. We propose a theoretical framework grounded in the critical analysis of interdisciplinary literature on the topic of misusing research data, identifying similarities and differences in how data misuse is defined across a variety of fields, and propose a working definition of what it means to "misuse" research data. Finally, we speculate about possible curatorial interventions that data intermediaries can adopt to prevent or respond to instances of misuse.

https://doi.org/10.1002/asi.24944

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Digital Curation Practices on Web and Social Media Archiving in Libraries and Archives"


Qualitative research was undertaken to explore the archiving practices through semi-structured interviews with 13 practitioners working in international libraries and archives at national and institutional levels across three continents. . . . Challenges were found in barriers to social media acquisition, lack of awareness, limited resources for preservation, uneven technical capacity, copyright and privacy concerns, and meeting user demands.

https://doi.org/10.1177/09610006241252661

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Training to Act FAIR: A Pre-Post Study on Teaching FAIR Guiding Principles to (Future) Researchers in Higher Education"


With a pre-post test design, the study evaluates the short-term effectiveness of FAIR training on students’ scientific suggestions and justifications in line with FAIR’s guiding principles. The study also assesses the influence of university legal frameworks on students’ inclination towards FAIR training. Before FAIR training, 81.1% of students suggested that scientific actions were not in line with the FAIR guiding principles. However, there is a 3.75-fold increase in suggestions that adhere to these principles after the training. Interestingly, the training does not significantly impact how students justify FAIR actions. The study observes a positive correlation between the presence of university legal frameworks on FAIR guiding principles and students’ inclination towards FAIR training.

https://doi.org/10.1007/s10805-024-09547-2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Privacy Protection Framework for Open Data: Constructing and Assessing an Effective Approach"


This framework [Privacy Protection Framework for Open Data] aims to establish clear privacy protection measures and safeguard individuals’ privacy rights. Existing privacy protection practices were examined using content analysis, and 36 indicators across five dimensions were developed and validated through an empirical study with 437 participants. The PPFOD offers comprehensive guidelines for data openness, empowering individuals to identify privacy risks, guiding businesses to ensure legal compliance and prevent data leaks, and assisting libraries and data institutions in implementing effective privacy education and training programs, fostering a more privacy-conscious and secure data era.

https://doi.org/10.1016/j.lisr.2024.101312

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |