“Understanding and Improving Data Repurposing”


We live in an age of unprecedented opportunities to use existing data for tasks not anticipated when those data were collected, resulting in widespread data repurposing. This commentary defines and maps the scope of data repurposing to highlight its importance for organizations and society and the need to study data repurposing as a frontier of data management. We explain how repurposing differs from original data use and data reuse and then develop a framework for data repurposing consisting of concepts and activities for adapting existing data to new tasks. The framework and its implications are illustrated using two examples of repurposing, one in healthcare and one in citizen science. We conclude by suggesting opportunities for research to better understand data repurposing and enable more effective data repurposing practices.

https://www.arxiv.org/abs/2506.09073

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Data Curation: Introducing a Competency Framework for the Social Sciences”


Research data management includes more than the question how researchers handle their data. In the sense of the FAIR principles, it is also about the sustainable safeguarding and organized reusability of research data. For social science, data-intensive research, research data centers and their data curating staff are therefore becoming increasingly important: data curators usually take on curation-specific tasks such as data preparation, securing research data in suitable archival environments, ensuring data accessibility, and the related control of the conditions of data re-use by third parties. Hence, they are specialized in the entire data curation process and, in particular, take on tasks of archiving and providing research data for reuse. Although the standards of comprehensive research data management are becoming more and more specific, this trend has not yet arrived in the corresponding training and further education measures. As a result, there is a gap between the growing demands on data curators and the development of competencies in the field of research data management with a focus on data curation. The competency framework presented in this article is intended to help close this gap: based on a Data Curation Lifecycle Model, a competency framework has been developed to support the development of targeted training and continuing education programs in the field of data curation, the formulation of learning objectives, and the evaluation of the corresponding trainings. The article points out the necessity to advance the development of competencies for this field, illustrates the schematic substructure of the data curation lifecycle, describes the development as well as the central core elements of the presented competency framework and discusses its perspectives. Overall, this competence framework is aimed in particular at (future) data curators, or as a schematic basis for the training of the relevant personnel. The focus is primarily on the data-intensive discipline of social sciences, although large parts can certainly be adapted for other disciplines and the corresponding data curation. The competency framework and this companion article are thereby intended to assist in advancing the sustainable professionalization of the previously understudied competency field of data curation.

https://doi.org/10.2218/ijdc.v19i1.889

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Linking Data Citation to Repository Visibility: An Empirical Study”


In today’s data-driven research landscape, dataset visibility and accessibility play a crucial role in advancing scientific knowledge. At the same time, data citation is essential for maintaining academic integrity, acknowledging contributions, validating research outcomes, and fostering scientific reproducibility. As a critical link, it connects scholarly publications with the datasets that drive scientific progress. This study investigates whether repository visibility influences data citation rates. We hypothesize that repositories with higher visibility, as measured by search engine metrics, are associated with increased dataset citations. Using OpenAlex data and repository impact indicators (including the visibility index from Sistrix, the h-index of repositories, and citation metrics such as mean and median citations), we analyze datasets in Social Sciences and Economics to explore their relationship. Our findings suggest that datasets hosted on more visible web domains tend to receive more citations, with a positive correlation observed between web domain visibility and dataset citation counts, particularly for datasets with at least one citation. However, when analyzing domain-level citation metrics, such as the h-index, mean, and median citations, the correlations are inconsistent and weaker. While higher visibility domains tend to host datasets with greater citation impact, the distribution of citations across datasets varies significantly. These results suggest that while visibility plays a role in increasing citation counts, it is not the sole factor influencing dataset citation impact. Other elements, such as dataset quality, research trends, and disciplinary norms, also contribute significantly to citation patterns.

https://arxiv.org/abs/2506.09530

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Preservation and Digital Repositories: Connections, Possibilities, and Needs”


This chapter aims to explore certain aspects of the challenges of digital preservation and digital repositories, including their roles, significance, and associated costs. . . . Beginning with a necessary delineation of the relationship between digital preservation, digital repositories, and their digital assets, the chapter proceeds to conduct a brief analysis of the perceived needs for these components. These needs primarily encompass organizational aspects (policy, planning, actions), financial considerations (costs), and technological factors (standardization) crucial for supporting digital preservation and repositories.

https://tinyurl.com/5y5bfbdr

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Navigating Data Science and Artificial Intelligence Integration in Library and Information Science: Insights from Four National Libraries”


This chapter examines the integration of artificial intelligence (AI) and data science in library and information science, using insights from four national libraries: the British Library, the National Library of France, the Royal Library of Belgium, and the Royal Danish Library. . . . This study adopts a qualitative approach, drawing on in-depth interviews with key personnel and analyses of strategic documents to explore the challenges and opportunities posed by AI. The findings highlight critical organizational issues such as resistance to change, cross-departmental collaboration, resource allocation, and the need for skill development.

https://tinyurl.com/2s4ahcta

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Managing Retractions and their Afterlife: A Tripartite Framework for Research Datasets”


Retractions serve as a critical, albeit last-resort, post-publication correction mechanism in scholarly publishing, playing an important role in upholding the integrity of the scientific record. By formally retracting flawed or misleading research, the scientific community mitigates the harm caused by errors or misconduct that may have escaped detection during peer review. While retractions of research articles have been extensively discussed across scientific disciplines and are well-integrated into most publishers’ workflows, the retraction of research datasets remains underexplored and rarely implemented. This paper seeks to address this gap by reviewing recent developments in this area, analyzing a sample of publicly available retracted dataset records considering existing recommendations and guidelines, and putting forward a few points for discussion—particularly for cases where datasets have been published and correction is no longer feasible, or when all efforts to amend the dataset have been exhausted. These considerations are framed into three main categories: (1) preventive actions and timely response, (2) purposeful damage control, and (3) community engagement and shared standards. Although still preliminary, this framework aims to help entertain future debates and inform actionable strategies for addressing the unique challenges of managing retracted datasets where scientific rigor has been compromised. By contributing to the discussion on dataset retractions, this work seeks to better equip data curators, repository managers, and other stakeholders with tools to enhance accountability and transparency throughout the data preservation process, while also helping to mitigate the error cascade effect in science.

https://doi.org/10.2218/ijdc.v19i1.1062

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text”


Large language models (LLMs) are typically trained on enormous quantities of unlicensed text, a practice that has led to scrutiny due to possible intellectual property infringement and ethical concerns. Training LLMs on openly licensed text presents a first step towards addressing these issues, but prior data collection efforts have yielded datasets too small or low-quality to produce performant LLMs. To address this gap, we collect, curate, and release the Common Pile v0.1, an eight terabyte collection of openly licensed text designed for LLM pretraining. The Common Pile comprises content from 30 sources that span diverse domains including research papers, code, books, encyclopedias, educational materials, audio transcripts, and more. Crucially, we validate our efforts by training two 7 billion parameter LLMs on text from the Common Pile: Comma v0.1-1T and Comma v0.1-2T, trained on 1 and 2 trillion tokens respectively. Both models attain competitive performance to LLMs trained on unlicensed text with similar computational budgets, such as Llama 1 and 2 7B. In addition to releasing the Common Pile v0.1 itself, we also release the code used in its creation as well as the training mixture and checkpoints for the Comma v0.1 models.

https://arxiv.org/abs/2506.05209

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Request of Endocrinology and Metabolism Journals for Data Sharing Statements in Clinical Trial Reports: A Survey”


Background: To enhance reproducibility and transparency, the International Committee of Medical Journal Editors (ICMJE) required that all trial reports submitted after July 2018 must include a data sharing statement (DSS). Accordingly, emerging biomedical journals required trial authors to include a DSS in submissions for publication if trial reports were accepted. Nevertheless, it was unclear whether endocrinology and metabolism journals had this request for DSS of clinical trial reports. Therefore, we aimed to explore whether endocrinology and metabolism journals requested DSS in clinical trial submissions, and their compliance with the declared request in published trial reports.

Methods: Journals that were from the category of “Endocrinology & Metabolism” defined by Journal Citation Reports (JCR, as of June 2023) and published clinical trial reports between 2019 and 2022, were included for analysis. The primary outcome was whether a journal explicitly requested a DSS in its manuscript submission instructions for clinical trials, which was extracted and verified in December 2023. We also evaluated whether these journals indeed included a DSS in their published trial reports that were published between December 2023 and May 2024.

Results: A total of 141 endocrinology and metabolism journals were included for analysis, among which 125 (88.7%) requested DSS in clinical trial submissions. Journals requesting DSS had a significantly lower JCR quartile and higher impact factor when compared with those journals without DSS request. Among the 90 journals requesting DSS, 14 (15.6%) journals indeed did not publish any DSS in their published trial reports between December 2023 and May 2024.

Conclusion: Over 10% of endocrinology and metabolism journals did not request DSS in clinical trial submissions. More than 15% of the journals declaring to request DSS from their submission instructions, did not publish DSS in their published trial reports. More efforts are needed to improve the practice of endocrinology and metabolism journals in requesting and publishing DSS of clinical trial reports.

https://doi.org/10.3389/fmed.2025.1518399

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: “From Data Lifecycle to Research Activity Model: Research Data Management in Data-Intensive Social Sciences and Humanities Research”


Unmet needs in terms of existing infrastructure (e.g. repositories) and services are affecting the research data management practices in data-intensive social sciences and humanities research, where less common tasks include data sharing and reuse. Based on these perceived requirements, an improved version of the Data Documentation Initiative Lifecycle that includes the support needs required for effectively managing data throughout the research process is developed.

https://doi.org/10.1108/AJIM-12-2024-0959

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: “A Checklist to Publish Collections as Data in GLAM Institutions”


The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use.

https://doi.org/10.1108/GKMC-06-2023-0195

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Maturity Model for Organizational Research Data Management Services”


Developing research data management (RDM) services has become an international trend in response to the movement promoting open science. There is an urgent need to establish support systems for universities and research institutions to strengthen governance. However, the diversity of RDM services and the absence of a universally applicable model create challenges in implementation. To address this, we propose a maturity model for organizational RDM services. By analyzing existing RDM service maturity models, we extract six key dimensions —awareness, data policy, budget, services, user needs, and IT infrastructure—and develop a structured evaluation framework with a five-level rating system. The model is validated through a step-by-step approach: author evaluation, domain evaluation, and practical setting evaluation via a national survey of Japanese institutions. The results demonstrate the model’s applicability across institutions of varying sizes and types, enabling RDM managers to quantitatively assess service maturity and compare progress against national benchmarks. Furthermore, we discuss the potential value and utilization of the framework through two case studies. This study provides an organizational benchmark for RDM services that is applicable to institutions of diverse sizes and natures. It also helps identify issues in the future implementation of organizational RDM services and highlights priority areas for investment.

https://doi.org/10.5334/dsj-2025-018

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Making Reproducibility a Reality By 2035? Enabling Publisher Collaboration for Enhanced Data Policy Enforcement”


This paper describes a project which identified practical and pragmatic ways to increase the FAIRness and reproducibility of published research. Academic journals have supported Open Science through the implementation of data sharing policies for over ten years; some evidence has since emerged on the additional time, resources and expertise that policy enforcement requires as part of an editorial workflow. A series of publisher workshops facilitated by the EC-funded TIER2 project aimed to identify the key checks needed to enforce strengthened journal data sharing policies and to understand which editorial roles have the capacity to undertake such enforcement. The intended outcome of this work was to establish the workflows and resourcing which can support academic journals to enforce stronger data sharing policies in future.

https://doi.org/10.2218/ijdc.v19i1.1064

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“AI and Open Science: Implications and Library Practice Recommendations”


With the increasing proliferation of artificial intelligence (AI) in higher education and science, technology, engineering, and mathematics research, what are the implications for open science? As the open science movement advocates for increased transparency and openness in the research process, where do AI and machine learning fit in? And where does that leave library and information science professionals in roles related to open science? This article explores several approaches and considerations for how AI impacts open science, including whether AI has sufficient openness and transparency to align with the goals of open science, whether AI can be used to further open science goals, and the effects of AI use on researcher and public attitudes and actions. The article provides recommendations for library practice, including knowledge-building, connections and advocacy, consultations and liaison work, licensing, and science communication and engagement.

https://dx.doi.org/10.1353/lib.2025.a961191

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Obstacles to Dataset Citation Using Bibliographic Management Software”


Governmental, funder, and scholarly publisher mandates for FAIR and open data are pushing researchers to archive data with persistent identifiers in repositories and link datasets in journal articles. Data citations enable transparency in research and credit and impact metrics for data reuse. However, numerous adoption barriers still exist, including that bibliographic reference management software commonly used by researchers to ease the referencing process may not yet be equipped to handle datasets. This paper examines the readiness of commonly used reference management software to support researchers in importing bibliographic metadata for datasets and generating references that comply with leading practices for data citation. Using seven major reference managers and datasets sampled across 14 Earth, space, and environmental sciences repositories, we identify and analyze common errors in reference-manager-facilitated metadata capture, storage, and citation export, using quantitative content analysis to compare repository-provided recommended citations, reference manager results, and DataCite metadata records. We find that a majority of frequently used reference managers do not adequately support data citation, obstructing uptake of data citation by researchers and thereby limiting the growth of credit and incentives for data sharing and reuse. The range and scale of issues uncovered are broadly extensible and relevant to data citation across disciplines. We present actionable recommendations for reference manager, data repository, scholarly publisher, and researcher stakeholders for increasing the ease, efficiency, and accuracy of bibliographic management software-facilitated data citation.

https://doi.org/10.5334/dsj-2025-017

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“How Are Research Data Referenced? The Use Case of the Research Data Repository Radar”


Publishing research data aims to improve the transparency of research results and facilitate the reuse of datasets. In both cases, referencing the datasets that were used is recommended. Research data repositories can support data referencing through various measures and also benefit from it, for example using this information to demonstrate their impact. However, the literature shows that the practice of formally citing research data is not widespread, data metrics are not yet established, and effective incentive structures are lacking. This article examines how often and in what form datasets published via the research data repository RADAR are referenced. For this purpose, the data sources Google Scholar, DataCite Event Data and the Data Citation Corpus were analyzed. The analysis shows that 27.9 % of the datasets in the repository were referenced at least once. 21.4 % of these references were (also) present in the reference lists and are therefore considered data citations. Datasets were referenced often in data availability statements. A comparison of the three data sources showed that there was little overlap in the coverage of references. In most cases (75.8 %), data and referencing objects were published in the same year. Two definition approaches were considered to investigate data reuse. 118 RADAR datasets were referenced more than once. Only 21 references had no overlaps in the authorship information — these datasets were referenced by researchers that were not involved in data collection.

https://arxiv.org/abs/2505.08533

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: “Organizational Structure and Representation in Digital Institutional Repository Collections”


This study finds that most digital repositories favor a flat organizational structure, largely due to technological constraints and user interface design choices. This approach often neglects the original hierarchical structure of archival collections, leading to user frustration and difficulties in information retrieval.

https://doi.org/10.1108/DLP-06-2024-0092

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Recommendations on Open Science Rewards and Incentives: Guidance for Multiple Stakeholders in Research”


Open Science contributes to the collective building of scientific knowledge and societal progress. However, academic research currently fails to recognise and reward efforts to share research outputs. Yet it is crucial that such activities be valued, as they require considerable time, energy, and expertise to make scientific outputs usable by others, as stated by the FAIR principles. To address this challenge, several bottom-up and top-down initiatives have emerged to explore ways to assess and credit Open Science activities (e.g., Research Data Alliance, RDA) and to promote the assessment of a broad spectrum of research outputs, including datasets and software (e.g., Coalition for Advancing Research Assessment, CoARA). As part of the RDA-SHARC (SHAring Rewards and Credit) interest group, we have developed a set of recommendations to help implement various rewarding schemes at different levels. The recommendations target a broad range of stakeholders. For instance, institutions are encouraged to provide digital services and infrastructure, organise training and cover expenses associated with making data available for the community. Funders should establish policies requiring Open Access to data produced by funded research and provide corresponding support. Publishers should favour open peer-review models and Open Access to articles, data, and software. Government policymakers should set up a comprehensive Open Science strategy, as recommended by UNESCO and followed by a growing number of countries. The present work details different measures that are proposed to the stakeholders. The need to include sharing activities in research evaluation schemes as an overarching mechanism to promote Open Science practices is specifically emphasised.

https://doi.org/10.5334/dsj-2025-015

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Open Science? Responsiveness to Requests for Data in A Review of Smoking Cessation Interventions”


Objective

Little research has examined rates or correlates of adherence to Open Science practices such as data sharing. We investigated how often researchers share data for inclusion in a meta-analysis and their reasons for not sharing data, and tested factors that could be associated with data sharing.

Methods

We requested data for 189 studies (167 authors) as part of a National Cancer Institute-funded meta-analysis of quit intentions and smoking cessation. Authors were contacted via email up to 4 times. We tracked responses, reasons for not sharing data, and coded 23 features of the author team (eg, number of authors and h-index), the request (eg, amount of information requested), and the study (eg, year of publication and preregistration).

Results

Thirty-five percent of authors provided the requested data, 21% responded but did not provide data, and 44% never responded to our request. Of the 37 reasons offered for not sharing data, the most common were loss of access to data (76%) and lack of time (11%). More recent trials, fewer citations, publication in medical (vs. behavioral) journals, and study preregistration were each associated with providing the requested data (Ps < .05).

Conclusions

Contacting authors for unpublished data resulted in a moderate response rate (56%) and modest provision of data (35%). Barriers to data sharing such as access and time constraints highlight challenges faced by behavioral health researchers in promoting transparency. The factors associated with responsiveness underscore the importance of journal policies and Open Science practices in enhancing data sharing

https://doi.org/10.1093/abm/kaaf029

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“What to Do About Data Distance? Responsible Alternatives to Data Sharing”


Building on their extensive expertise as both scholars and developers of methods for data circulation, Christine Borgman and Paul Groth (2005) highlight the crucial impact of sociotechnical dimensions in shaping attempts to bridge the distance between data creators and users, thereby making it possible to transfer and develop knowledge across contexts and domains. . . .

In this commentary, I take issue with one assumption that underlies Borgman and Groth’s arguments; that is, the idea that data reuse requires the sharing of data, and that transparency is therefore a key principle guiding data work, including the practices of data formatting, cleaning, filtering, modeling, curation, and visualization carried out by knowledge intermediaries. By contrast, I argue that data distance is sometimes so large and fraught with challenges, that a better way to facilitate data reuse is to employ intelligent methods of data governance and interpretation that do not involve the sharing of data. I focus on two such methods in this commentary: mining algorithms facilitating data analysis (sometimes also called ‘data visiting’ methods) and narratives (‘data stories’) forged to contextualize and interpret data in specific ways. These methods have a key characteristic in common: they require the explicit articulation of specific visions for prospective data use, thereby moving away from the quest to open data to any possible usage, and rather placing emphasis on the need to account for how choices are made when circulating data end up affecting—and, indeed, constraining—the interpretation of such data in new contexts. In this sense, these methods foster responsible approaches to data reuse, which take account of and potentially help address the scientific and social challenges involved in bridging data distance, while at the same time recognizing that there is no such thing as ‘neutral’ data processing. All data work unavoidably encompasses human judgments around how data may or may not be used, what phenomena they can help study, and how their interpretation may inform knowledge and decision-making; the best one can do to facilitate reliable data interpretation is to make such judgments explicit. I conclude that data management discussions need to move away from simplistic reliance on data sharing and related notions of transparency focused on disclosure (Elliott,2022; Rappert, 2025) and invest instead in skills, methods, and training to foster strategic forms of data mining and storytelling.

https://doi.org/10.1162/99608f92.e95b5c26

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“From Data Creator to Data Reuser: Distance Matters”


Sharing research data is necessary, but not sufficient, for data reuse. Open science policies focus more heavily on data sharing than on reuse, yet both are complex, labor-intensive, expensive, and require infrastructure investments by multiple stakeholders. The value of data reuse lies in relationships between creators and reusers. By addressing knowledge exchange rather than mere transactions between stakeholders, investments in data management and knowledge infrastructures can be made more wisely. Drawing upon empirical studies of data sharing and reuse, we develop the metaphor of distance between data creator and data reuser, identifying six dimensions of distance that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. We explore how social and socio-technical aspects of these dimensions may decrease – or increase – distances to be traversed between creators and reusers. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies. ‘It takes a village’ to share research data – and a village to reuse data. Our aim is to provoke new research questions, new research, and new investments in effective and efficient circulation of research data, and to identify criteria for investments at each stage of data and research life cycles.

https://doi.org/10.1162/99608f92.35d32cfc

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Humanities Data Reuse: Humanity First”


After a detailed introduction to a case study in data reuse within the humanities, this article uses that initial discussion to provide a detailed discussion of Borgman and Groth (2005). We point out that data reuse in the humanities enables us to transform the relationship between specialist research and the intellectual life of society. Research data, originally designed for a relatively narrow audience of specialists, can make primary sources, composed in languages other than English and in unfamiliar cultural contexts, newly accessible both to specialists from other areas and to the public as a whole (Crane et al, 2023). Data sharing and reuse can, in this way, transform the relationship between specialist research in the humanities and the intellectual life of society as a whole. The more efficiently and effectively specialists can reuse data, the more effectively we will be able to contribute tangible value to nonspecialists. The goal is not to simplify complexity but to provide pathways from an initial, cursory engagement into the richer material and ultimately to as much expertise as individuals from around the world wish to develop. In this model, specialist data reaches new audiences and realizes value that was not feasible in print culture. The outcome of efficient data sharing is to revitalize the social contract between the humanities and society and thus to invigorate and expand the humanities at every level.

https://tinyurl.com/5n7m9h6w

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Realising Open Data Principles In UK Research Institutions”


We report on the state of open research data (ORD) policy and practice across UK research institutions through the STAR (Sustainable & TrAnsparent Research data) project. Through qualitative interviews, focus groups, and workshops involving 52 university staff across 21 UK institutions, we investigated the progress and challenges in ORD practices since 2016 publication of the Concordat on Open Research Data.

We observed that while institutions have made progress establishing ORD specialist roles, developing policies, and creating repository infrastructures, systematic monitoring processes and widespread adoption remain stalled. Key challenges include capacity constraints in institutional repositories, limited workload recognition, insufficient funding for long-term archiving, and varying disciplinary interpretations of ORD relevance.

Based on workshops with participants, we recommend recognition of ORD in academic career frameworks, development of disciplinary-relevant data sharing practices, improved infrastructure for monitoring ORD practices, and enhanced support for external disciplinary repositories. The study emphasizes the need for a values-driven rather than compliance-driven approach to ORD implementation, calling for deeper engagement with diverse academic communities to ensure ORD requirements remain meaningful and relevant across disciplines. These findings provide insights for research institutions and funding bodies in developing more effective and inclusive ORD policies.

https://doi.org/10.2218/ijdc.v19i1.1052

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Electronic Health Data Reuse Purposes”


This chapter elaborates on several fields of electronic health data (EHD) reuse in healthcare, mainly for public interest reasons. Real-life examples of EHD reuse in epidemiology, including insights into how EHD is applied in surveillance and occupational health, are provided in the first section. The second section elaborates how EHD can be reused in supporting institutional activities and policy making: project examples carried out by eminent health institutions around the globe, such as the global World Health Organization (WHO), the continental European Centre for Disease Prevention and Control (ECDC), the American Centres for Disease Control and Prevention (CDC), and some regional institutions, such as the National Institute for Health and Care Excellence (NICE), are illustrated. The third section explores the application of EHD reuse for improving healthcare systems and for carrying out research activities. Specifically, some of the related areas covered include how EHD can be reused in learning healthcare systems, how to advance personalized medicine, how to improve healthcare quality and safety, and how to carry out various research activities. Finally, the fourth section is dedicated to the reuse of EHD for the artificial intelligence (AI) market, which has been experiencing an expansion in healthcare, addressing relevant topics such as administrative costs and associated burden reduction but also training and developing innovative AI-based tools for telemedicine to identify patients at risk for other reuses.

https://doi.org/10.1007/978-3-031-88497-9_2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Reusing Chemical Data Across Disciplines: Initiatives and Common Challenges”


This work discusses reuse of chemical data across disciplines and the role of various data initiatives and projects including PARC, NORMAN-SLE, MassBank, WorldFAIR, PSDI and NFDI4Chem to facilitate increased data sharing. Improved machine-readable chemical data supports global research and interdisciplinary methodologies crucial for sustainable development and achievement of UNESCO’s Open Science priorities and the UN Sustainability Development Goals. Examples of success and ongoing approaches include integrating toxicology and chemical exposure data using ontologies, linking specialised chemical data collections with larger repositories such as PubChem, and developing IUPAC International Chemicals Identifier (InChI) extensions for nanomaterials and mixtures. National data infrastructure projects in the UK and Germany focus on digitising and standardising chemical research data management workflows, aiding scientists in data collection, storage, processing, analysis, disclosure, and reuse. These global initiatives aim to enhance chemical data interoperability to solve real-world problems, foster collaboration, and promote innovation while considering sustainable data resources beyond individual projects.

https://doi.org/10.1515/ci-2025-0203

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

“Enabling Factors and Opportunities to Maximize Health Data Reuse”

This chapter looks at future developments and maximization of health data reuse in public health. The digital maturity of healthcare systems is, for example, a crucial factor in enabling the availability of electronic health data and their sharing through interconnected databases. The frontiers opened by artificial intelligence to improve health surveillance, disease detection, and resource allocation are changing public health programmes and population well-being by enabling targeted health promotion efforts, identifying high-risk populations, enhancing communication strategies tailored to specific patient subgroups, optimizing logistics in healthcare delivery, and supporting professionals’ decision-making processes. The common data spaces, which are going to be built in the EU to promote data sharing and innovation, are sustained and strengthened by important reforms, such as the European for Health Data Space Regulation, which aims to standardize eHealth data exchange, empower individuals, and facilitate the secondary use of health data for research, innovation, and policy making by providing precise rules for health data governance, interoperability, and safe data sharing across EU Member States.

https://doi.org/10.1007/978-3-031-88497-9_3

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |