Data Curation, Open Data, and Research Data Management

“What’s in a Name? Exploring How Voluntary Library Data Literacy Workshop Titles and Descriptions Affect Learner Motivations to Enroll”

This study examined a large teaching and research-intensive university’s data library that offers several data literacy workshops. Although the data library’s voluntary data literacy workshops can be popular, with some workshops waitlisted, interest ebbs and flows. One way to improve the situation is to better market library workshops through effectively crafting workshop titles and descriptions that encourage engagement. Duke and Tucker (2007) state that it is important to market academic library services to increase service use and meet the needs of its users. Understanding marketing barriers is essential to improving workshop engagement.

https://doi.org/10.1016/j.acalib.2025.103045

“Enhancing FAIR Data Practices in the Norwegian Research Data Archive: Towards Research Objects and Improved Interoperability”

The increasing volume and complexity of research data necessitate robust data management practices to ensure data is Findable, Accessible, Interoperable, and Reusable (FAIR). The Norwegian Research Data Archive (NRDA) is at the forefront of efforts to create a comprehensive platform for researchers to share and archive their data. This paper discusses NRDA’s ongoing initiatives to enhance its infrastructure in alignment with FAIR principles, emphasizing the integration of Research Objects (ROs) and RO-Crate technologies. These improvements aim to facilitate better data discoverability, accessibility, and interoperability, thereby fostering a more integrated and sustainable data ecosystem. The paper also highlights NRDA’s collaborative efforts with other platforms via the use of Research Objects to support data sharing and reuse across repositories. By focusing on standardized metadata, persistent identifiers, and interoperability, NRDA is advancing Open Science practices, ultimately contributing to a more transparent, efficient, and collaborative research environment. The challenges and future directions of these initiatives are also explored, providing insights into the ongoing efforts to create a more open and interconnected scientific landscape.

https://doi.org/10.52825/ocp.v5i.1202

“The New Zealand Thesis Project: Connecting a Nation’s Dissertations Using Wikidata”

Introduction: Libraries hold large amounts of bibliographic data, with great potential for enrichment with linked open data. The New Zealand Thesis Project explored this potential by uploading thesis metadata records from New Zealand institutional repositories to Wikidata, a collaborative linked data knowledge base.

Description of Project: Nine New Zealand tertiary institutions collaborated with four Wikidata experts to upload a combined national dataset of doctoral and master’s theses. Thesis records, including author and advisor names and richly described with main subject statements, were extracted from each repository, combined, and data cleaned before being uploaded to Wikidata. The team then undertook additional data enrichment, round-tripped Wikidata’s QID identifiers back to individual repositories, and used the new records to cite theses on authors’ Wikipedia pages. Wikidata queries and other visualizations were created to demonstrate how connecting the thesis metadata to records for authors, advisors, institutions, and subjects allows new insights into our collections.

Next Steps: Documentation is being fine-tuned to support future similar projects, and a second combined upload is under discussion to continue growing the New Zealand Thesis Project. There is considerable scope to continue enriching Wikidata records, some of which is already underway by Wikidata volunteers.

https://doi.org/10.31274/jlsc.18295

“The Quest to Share Data”

Data sharing in scientific research is widely acknowledged as crucial for accelerating progress and innovation. Mandates from funders, such as the NIH’s updated Data Sharing Policy, have been beneficial in promoting data sharing. However, the effectiveness of such mandates relies heavily on the motivation of data providers. Despite policy-imposed requirements, many researchers may only comply minimally, resulting in data that is inadequately reusable. Here, we discuss the multifaceted challenges of incentivizing data sharing and the complex interplay of factors involved. Our paper delves into the motivations of various stakeholders, including funders, investigators, and data users, highlighting the differences in perspectives and concerns. We discuss the role of guidelines, such as the FAIR principles, in promoting good data management practices but acknowledge the practical and ethical challenges in implementation. We also examine the impact of infrastructure on data sharing effectiveness, emphasizing the need for systems that support efficient data discovery, access, and analysis. We address disparities in resources and expertise among researchers and concerns related to data misuse and misinterpretation. Here, we advocate for a holistic approach to incentivizing data sharing beyond mere compliance with mandates. It calls for the development of reward systems, financial incentives, and supportive infrastructure to encourage researchers to share data enthusiastically and effectively. By addressing these challenges collaboratively, the scientific community can realize the full potential of data sharing to advance knowledge and innovation.

https://doi.org/10.3389/fninf.2025.1570568

“What Are Journals and Reviewers Concerned about in Data Papers? Evidence From Journal Guidelines and Review Reports”

The evolution of data journals and the increase in data papers call for associated peer review, which is intricately linked yet distinct from traditional scientific paper review. This study investigates the data paper review guidelines of 22 scholarly journals that publish data papers and analyses 131 data papers’ review reports from the journal Data. Peer review is an essential part of scholarly publishing. Although the 22 data journals employ disparate review models, their review purposes and requirements exhibit similarities. Journal guidelines provide authors and reviewers with comprehensive references for reviewing, which cover the entire life cycle of data. Reviewer attitudes predominantly encompass Suggestion, Inquiry, Criticism and Compliment during the specific review process, focusing on 18 key targets including manuscript writing, diagram presentation, data process and analysis, references and review and so forth. In addition, objective statements and other general opinions are also identified. The findings show the distinctive characteristics of data publication assessment and summarise the main concerns of journals and reviewers regarding the evaluation of data papers.

https://doi.org/10.1002/leap.2001

“Are Data Papers Cited as Research Data? Preliminary Analysis on Interdisciplinary Data Paper Citations”

Introduction. Research data sharing and reuse have become increasingly important in modern science, and data papers represent a new academic publication genre aimed at enhancing the visibility, sharing, and reuse of research data. However, whether citations to data papers reflect actual data reuse remains largely unexplored. This paper presents preliminary findings from a project designed to address this gap.

Method. we conducted a content analysis to manually annotate 437 citation sentences from 309 research articles referencing 50 data papers published in Data in Brief, a chief academic journal that only publishes data papers. The data papers were sampled from five knowledge domains based on a paper-level classification system.

Results. Our results show that most citations to all selected data papers (89%) are unrelated to the research data being described in the paper, instead focusing on the research findings or methodologies. This suggests that data papers are being cited similarly to traditional research articles, despite their unique purpose and content.

Conclusion. These findings raise questions about the effectiveness of data papers as representations of research data within the scholarly communication system, as well as their utility in quantitative studies on data reuse.

https://tinyurl.com/3f5u33fs

Paywall: “Challenges in Tracking Archive’s Data Reuse in Social Sciences”

Identifying data reuse is challenging, due to technical reasons, and, in particular, incorrect citation practices among scholars. This paper aims to propose an automatic method to track the reuse of data deposited in the archives joined to the CESSDA (Consortium of European Social Science Data Archives) infrastructure. The paper also offers an overview on the identified data to understand the characteristics of the most reused data sets.

https://doi.org/10.1108/DLP-07-2024-0112

“To be FAIR: Theory Specification Needs an Update”

Innovations in open science and meta-science have focused on rigorous *theory testing*, yet methods for specifying, sharing, and iteratively improving theories remain underdeveloped. To address these limitations, we introduce *FAIR theory*: A standard for specifying theories as Findable, Accessible, Interoperable, and Reusable information artifacts. FAIR theories are Findable in well-established archives, Accessible in practical terms and in terms of their ability to be understood, Interoperable for specific purposes, e.g., to guide control variable selection, and Reusable so that they can be iteratively improved through collaborative efforts. This paper adapts the FAIR principles for theory, reflects on the FAIRness of contemporary theoretical practices in psychology, introduces a workflow for FAIRifying theory, and explores FAIR theories’ potential impact in terms of reducing research waste, enabling meta-research on the structure and development of theories, and incorporating theory into reproducible research workflows – from hypothesis generation to simulation studies. We make use of well-established open science infrastructure, including Git for version control, GitHub for collaboration, and Zenodo for archival and search indexing. By applying the principles and infrastructure that have already revolutionized sharing of data and publications to theory, we establish a sustainable, transparent, and collaborative approach to theory development. FAIR theory equips scholars with a standard for systematically specifying and refining theories, bridging a critical gap in open research practices and supporting the renewed interest in theory development in psychology and beyond. FAIR theory provides a structured, cumulative framework for theory development, increasing efficiency and potentially accelerating the pace of cumulative knowledge acquisition.

https://doi.org/10.31234/osf.io/t53np_v1

“Implementing and Learning from a Summer Research Data Management Training Program for Student Researchers”

Background

This study explores a library-led research data management (RDM) training program at a Canadian post-secondary institution that targeted students participating in summer research assistantships as well as their faculty supervisors. This paper describes the program in detail and shares findings from a student reflection assignment about practicing RDM for the first time.

Methods

The RDM training program included four requirements: attending an introductory RDM session; attending a data management plan (DMP) workshop; submitting a DMP for feedback; and completing a reflection assignment. Where consent was obtained (n=19), reflection assignments were analyzed using a qualitative content analysis approach.

Results

35 faculty supervisors registered 53 students to participate. 62.2% (n=33) of students completed all components of the program. Perceived benefits of completing a DMP included improved project planning, supporting best practices, potential for data reuse, and team communication. Perceived challenges included the inflexibility of DMPs, difficulty populating DMPs, demands on researchers’ time, and lack of long-term utility. 73.6% of students (n=14/19) reported that building a DMP helped them with their summer projects.

Conclusion

Through instruction, practical engagement, and reflection within the context of real-world research, the program supported participants in learning about and practicing RDM, and provided insights for academic librarians who wish to refine or develop training in their local contexts as they continue to navigate emerging expectations from funders and publishers.

https://doi.org/10.21083/partnership.v19i2.7753

“Frontiers introduces FAIR² Data Management”

FAIR² Data Management leverages AI-assisted curation to structure research data for publication, making it easier to find, reuse, and analyze—both by humans and machines—so researchers can focus on discovery rather than data preparation. By making datasets shareable and optimized for reuse, FAIR² Data Management enhances research efficiency and reproducibility, accelerating breakthroughs in global health, planetary sustainability, and scientific innovation. . . .

FAIR² (FAIR Squared) extends the FAIR principles by defining a formal specification that makes research data AI-ready, aligned with Responsible AI principles, and structured for deep scientific reuse. Compatible with MLCommons Croissant’s AI-ready format, it integrates essential elements for scientific rigor, reproducibility, and interoperability. FAIR² ensures data is richly documented and linked to provenance, methodology, and a detailed data dictionary, creating a context-rich representation of each dataset. It also integrates with TensorFlow, JAX, and PyTorch, enabling AI-driven analysis and easy sharing on Kaggle and Hugging Face, amplifying its impact across disciplines.

https://tinyurl.com/3bwjbsw6

“Developing Practices for FAIR and Linked Data in Heritage Science”

Heritage Science has a lot to gain from the Open Science movement but faces major challenges due to the interdisciplinary nature of the field, as a vast array of technological and scientific methods can be applied to any imaginable material. Historical and cultural contexts are as significant as the methods and material properties, which is something the scientific templates for research data management rarely take into account. While the FAIR data principles are a good foundation, they do not offer enough practical help to researchers facing increasing demands from funders and collaborators. In order to identify the issues and needs that arise “on the ground floor”, the staff at the Heritage Laboratory at the Swedish National Heritage Board took part in a series of workshops with case studies. The results were used to develop guides for good data practices and a list of recommended online vocabularies for standardised descriptions, necessary for findable and interoperable data. However, the project also identified areas where there is a lack of useful vocabularies and the consequences this could have for discoverability of heritage studies on materials from areas of the world that have historically been marginalised by Western culture. If Heritage Science as a global field of study is to reach its full potential this must be addressed.

https://doi.org/10.1038/s40494-025-01598-x

“The Economic Impact of Open Science: A Scoping Review”

This paper summarised a comprehensive scoping review of the economic impact of Open Science (OS), examining empirical evidence from 2000 to 2023. It focuses on Open Access (OA), Open/FAIR Data (OFD), Open Source Software (OSS), and Open Methods, assessing their contributions to efficiency gains in research production, innovation enhancement, and economic growth. Evidence, although limited, indicates that OS accelerates research processes, reduces the related costs, fosters innovation by improving access to data and resources and this ultimately generates economic growth. Specific sectors, such as life sciences, are researched more and the literature exhibits substantial gains, mainly thanks to OFD and OA. OSS supports productivity, while the very limited studies on Open Methods indicate benefits in terms of productivity gains and innovation enhancement. However, gaps persist in the literature, particularly in fields like Citizen Science and Open Evaluation, for which no empirical findings on economic impact could be detected. Despite limitations, empirical evidence on specific cases highlight economic benefits. This review underscores the need for further metrics and studies across diverse sectors and regions to fully capture OS’s economic potential.

https://doi.org/10.31222/osf.io/kqse5_v1

“The Economic Impact of Open Science: A Scoping Review”

This paper summarised a comprehensive scoping review of the economic impact of Open Science (OS), examining empirical evidence from 2000 to 2023. It focuses on Open Access (OA), Open/FAIR Data (OFD), Open Source Software (OSS), and Open Methods, assessing their contributions to efficiency gains in research production, innovation enhancement, and economic growth. Evidence, although limited, indicates that OS accelerates research processes, reduces the related costs, fosters innovation by improving access to data and resources and this ultimately generates economic growth. Specific sectors, such as life sciences, are researched more and the literature exhibits substantial gains, mainly thanks to OFD and OA. OSS supports productivity, while the very limited studies on Open Methods indicate benefits in terms of productivity gains and innovation enhancement. However, gaps persist in the literature, particularly in fields like Citizen Science and Open Evaluation, for which no empirical findings on economic impact could be detected. Despite limitations, empirical evidence on specific cases highlight economic benefits. This review underscores the need for further metrics and studies across diverse sectors and regions to fully capture OS’s economic potential.

https://osf.io/preprints/metaarxiv/kqse5_v1

“Datafication and Cultural Heritage Collections Data Infrastructures: Critical Perspectives on Documentation, Cataloguing and Data-sharing in Cultural Heritage Institutions”

The role of cultural heritage collections within the research ecosystem is rapidly changing. From often-passive primary source or reference point for humanities research, cultural heritage collections are now becoming integral part of large-scale interdisciplinary inquiries using computational-driven methods and tools. This new status for cultural heritage collections, in the ‘collections-as-data’ era, would not be possible without foundational work that was and is still going on ‘behind the scenes’ in cultural heritage institutions through cataloguing, documentation and curation of cultural heritage records. This article assesses the landscape for cultural heritage collections data infrastructure in the UK through an empirical and critical perspective, presenting insights on the infrastructure that cultural heritage organisations use to record and manage their collections, exploring the range of systems being used, the levels of complexity or ease at which collections data can be accessed, and the shape of interactions between software suppliers, cultural heritage organisations, and third-party partners. The paper goes on to include a critical analysis of the findings based on the sector’s approach to ‘3s’, that is standards, skill sets and scale, and how that applies to different cultural heritage organisations throughout the data lifecycle, from data creation, stewardship to sharing and re-using.

https://doi.org/10.5334/johd.277

“Building as They Come: Comparative Case Studies of Co-constructing Data Visualization Services with Academic Communities”

Academic libraries are well-situated to be strong supporters of democratizing and building knowledge and expertise in the use of data and data visualization as they cut across all of academia, regardless of discipline or department. Within the past decade, many academic libraries across North America have added data visualization services to their offerings. This has been done in several ways, from existing librarians with related portfolios like GIS or research data learning new skills to libraries creating new positions with the focus on the portfolio on data visualization. This chapter presents and compares two case studies of building data visualization services at York University Libraries and McMaster University Library.

https://hdl.handle.net/10315/42647

“Data and Code Availability in Political Science Publications from 1995 to 2022”

In this paper, we assess the availability of reproduction archives in political science. By “reproduction archive,” we mean the data and code supporting quantitative research articles that allows others to reproduce the computations described in the published paper. We collect a random sample of quantitative research articles published in political science from 1995 to 2022. We find that—even in 2022—most quantitative research articles do not point a reproduction archive. However, practices are improving. In 2014, when the DA-RT symposium was published in PS, about 12% of quantitative research articles point to the data and code. Eight years later, in 2022, that has increased to 31%. This underscores a massive shift in norms, requirements, and infrastructure. Still, only a minority of articles share the supporting data and code.

https://doi.org/10.31235/osf.io/a5yxe_v2

“Peer Review of Data Papers: Does It Achieve Expectations for Facilitating Data Sharing and Reuse?”

This paper presents a qualitative study of open peer review reports of data papers in a data journal Earth System Science Data. We examine to what extent the actual review practices of data papers align with identifying the most valuable datasets and promoting data reuse. We conclude that peer reviewers adopted a variety of criteria to evaluate data papers, but it is still challenging for reviewers to identify the most valuable datasets that should be reused. In addition, our findings demonstrate the correlation between data paper evaluations and subsequent reuse of the underlying datasets.

https://dx.doi.org/10.2139/ssrn.5130257

CODATA: “Official Publication of DDI Cross-Domain Integration (DDI-CDI) Version 1.0”

DDI-CDI extends traditional DDI metadata to describe data beyond the social, behavioral, and economic (SBE) domains, addressing the need for broader capabilities. It supports descriptions of event and sensor data (“long” data), key-value data (often associated with “big” data and no-SQL data), and multi-dimensional data. By integrating these with traditional “wide” (or “rectangular”) DDI data descriptions, DDI-CDI enables the management and production of integrated data sets from diverse sources.

Further descrption from “DDI-CDI (DDI Cross-Domain Integration)”:

DDI-CDI is a new standard which is designed to be used with research data from any domain. While it minimally describes metadata for cataloguing and citation, its fundamental purpose is to describe data and process. The specification is domain-neutral and covers the majority of data structures in common use today: Wide, Long, Multi-Dimensional and Key-Value. It offers, for the first time, a mechanism to interoperate disparate data from multiple disciplines and domains at the lowest level of granularity i.e. the datum itself. While it is designed to complement its siblings in the DDI Alliance Product Suite – DDI-Codebook and DDI-Lifecycle, which operate in the Social, Behavioral and Economic domain – it is also intended to work with a wide variety of other domain-specific and generic metadata specifications. Integration is a first-order consideration in DDI-CDI and so it is designed from the ground up to work well with controlled vocabularies from any domain as well as with other standards.

https://tinyurl.com/yvph3r68

“Data Stewardship Decoded: Mapping Its Diverse Manifestations and Emerging Relevance at a Time of AI”

Data stewardship has become a critical component of modern data governance, especially with the growing use of artificial intelligence (AI). Despite its increasing importance, the concept of data stewardship remains ambiguous and varies in its application. This paper explores four distinct manifestations of data stewardship to clarify its emerging position in the data governance landscape. These manifestations include a) data stewardship as a set of competencies and skills, b) a function or role within organizations, c) an intermediary organization facilitating collaborations, and d) a set of guiding principles. The paper subsequently outlines the core competencies required for effective data stewardship, explains the distinction between data stewards and Chief Data Officers (CDOs), and details the intermediary role of stewards in bridging gaps between data holders and external stakeholders. It also explores key principles aligned with the FAIR framework (Findable, Accessible, Interoperable, Reusable) and introduces the emerging principle of AI readiness to ensure data meets the ethical and technical requirements of AI systems. The paper emphasizes the importance of data stewardship in enhancing data collaboration, fostering public value, and managing data reuse responsibly, particularly in the era of AI. It concludes by identifying challenges and opportunities for advancing data stewardship, including the need for standardized definitions, capacity building efforts, and the creation of a professional association for data stewardship.

https://arxiv.org/abs/2502.10399

U.S. Research Data Summit: Strengthening Cooperation Across Organizations and Sectors: Proceedings of a Workshop

On October 10-11, 2023, the National Academies of Sciences, Engineering, and Medicine hosted the U.S. Research Data Summit at the National Academy of Sciences Building in Washington, DC. The summit was undertaken by a planning committee organized under the U.S. National Committee for CODATA. The summit was informed by input from 29 organizations, including leaders from federal government agencies, the private sector, public and nonprofit organizations, and research institutions. This publication summarizes the presentations and discussion of the summit.

https://tinyurl.com/yjbuhkwz

“Supporting the Research Data Management Journey of a Postgraduate Student at the University of St Andrews”

Most research funders have requirements for data management plans and open data to foster good research data management practices. In order to embed these practices in the postgraduate research (PGR) student journey we have introduced the requirement for a data management plan as part of the first-year progress review and the encouragement to make data underpinning theses publicly available. To support students through these processes we provide a suite of training workshops and are available for one-to-one consultations. User feedback and frequently asked questions are used to review and improve our support offering.

This brief report discusses the planning and implementation processes for data management plan requirement and encouragement of underpinning data. It dives deeper into the workflows, especially for the data deposit, and describes training and support available to students. Statistics on training uptake, data management plan submissions and annual trends for data deposit are also presented. The report concludes with lessons learnt and the team’s plans for the near future.

https://doi.org/10.2218/ijdc.v19i1.980

“How Will We Prepare for an Uncertain Future? The Value of Open Data and Code for Unborn Generations Facing Climate Change”

What is the unit of knowledge that we would most like to protect for future generations? Is it the scientific publication? Or is it our datasets? Datasets are snapshots in space and time of n-dimensional hypervolumes of information that are resources in and of themselves—each giving numerous insights into the measured world [134,135]. New publishing paradigms, such as Octopus, allow researchers to link multiple ‘Analysis’ and/or ‘Interpretation’ publications to a single ‘Results’ publication as alternative analyses and interpretations of the same data [159]. A more traditional research paper, on the other hand, is one realization of many possible assessments of the data that were originally collected, and a wide diversity of results can be obtained when many individuals analyse one dataset with the same research question in mind [160,161]. That is, publications are one version of an oversimplified projection through n-dimensional space which communicate stories that our human minds can comprehend. Manuscript narratives, by necessity, leave out information to craft such a story.

This is not to say that scientific publications in and of themselves are not useful. On the contrary, they frame our current and historical understanding of the world and put scientific inquiry into the relevant spatial and temporal context. Scientific articles offer analysis and interpretation of data which will allow future generations to understand why certain policies, management actions, or approaches were attempted and/or abandoned. However, if future researchers are not granted access to our (past) data, future humans will have to repeat costly (e.g. time and resources) experiments, laboriously extract information directly from figures, tables and text in the articles themselves (assuming the relevant information is available and detailed enough, although there is evidence that this is not the case in at least some disciplines [55,162]) or will have to trust our analytical procedures and our intuitions and perceptions about the data we collected [160,161].

https://doi.org/10.1098/rspb.2024.1515

“Leveraging Task-Specific Large Language Models to Enhance Research Data Management Services”

Applying prompt engineering and RAG [Retrieval-Augmented Generation ] to research data management and sharing activities offers numerous opportunities for enhancing institutional research data support services. Here, we present just a few illustrative examples that highlight how these technologies could significantly improve service efficiencies, reduce researcher burden, and support adherence with evolving policies. These examples aim to inspire further exploration and future work rather than serve as extensive case studies.

Task-Specific, Agent-Based Chatbots for Data Management and Sharing Plans (DMSPs): Agent-based chatbots can assist researchers in drafting DMSPs by prompting for specific information based on funder requirements. This would offer researchers an interactive, guided experience that streamlines the process of developing a DMSP. The chatbot can be pre-loaded with knowledge of DMSP policies, institutional resources, and common pitfalls observed during plan reviews. Moreover, by incorporating review criteria, these chatbots could also provide real-time feedback on draft plans, allowing researchers to refine their submissions before institutional review.

Automated Text Extraction for Structured Compliance Reporting: Using these approaches, institutions can also automate the extraction of key details from narrative-based DMSPs and transform them into structured, formatted fields. This could be particularly useful for converting narrative-based DMSPs into actionable steps for researchers, service providers, and compliance officers, enabling efficient monitoring and follow-up on data management and sharing commitments.

Customized Knowledge Retrieval for Policy Guidance and Updates: Institutions can further leverage these approaches to develop tools that offer researchers up-to-date guidance on data management and sharing policies from major funders and publishers as well as institutional requirements. For instance, a researcher could query these tools to receive the latest mandates, institutional requirements, or best practices related to data management and sharing. This capability would reduce the burden for researchers in tracking down the most recent policy update.

https://tinyurl.com/bdee5u29

Paywall: Data Culture in Academic Libraries: A Practical Guide to Building Communities, Partnerships, and Collaborations

In five parts, Data Culture in Academic Libraries: A Practical Guide to Building Communities, Partnerships, and Collaborations can help you foster an institutional culture that favors the curation, creation, and wider use of datasets.

Data at all Levels

Data Services and Instruction

Data Outreach

Data Communities

Data Partnerships

https://tinyurl.com/ydsmdjbj

“From Data Creator to Data Reuser: Distance Matters”

Sharing research data is necessary, but not sufficient, for data reuse. Open science policies focus more heavily on data sharing than on reuse, yet both are complex, labor-intensive, expensive, and require infrastructure investments by multiple stakeholders. The value of data reuse lies in relationships between creators and reusers. By addressing knowledge exchange, rather than mere transactions between stakeholders, investments in data management and knowledge infrastructures can be made more wisely. Drawing upon empirical studies of data sharing and reuse, we develop the metaphor of distance between data creator and data reuser, identifying six dimensions of distance that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. We explore how social and socio-technical aspects of these dimensions may decrease – or increase – distances to be traversed between creators and reusers. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies. ‘It takes a village’ to share research data – and a village to reuse data. Our aim is to provoke new research questions, new research, and new investments in effective and efficient circulation of research data; and to identify criteria for investments at each stage of data and research life cycles.

https://tinyurl.com/3429p526