Open Science – DigitalKoans

“Enhancing FAIR Data Practices in the Norwegian Research Data Archive: Towards Research Objects and Improved Interoperability”

The increasing volume and complexity of research data necessitate robust data management practices to ensure data is Findable, Accessible, Interoperable, and Reusable (FAIR). The Norwegian Research Data Archive (NRDA) is at the forefront of efforts to create a comprehensive platform for researchers to share and archive their data. This paper discusses NRDA’s ongoing initiatives to enhance its infrastructure in alignment with FAIR principles, emphasizing the integration of Research Objects (ROs) and RO-Crate technologies. These improvements aim to facilitate better data discoverability, accessibility, and interoperability, thereby fostering a more integrated and sustainable data ecosystem. The paper also highlights NRDA’s collaborative efforts with other platforms via the use of Research Objects to support data sharing and reuse across repositories. By focusing on standardized metadata, persistent identifiers, and interoperability, NRDA is advancing Open Science practices, ultimately contributing to a more transparent, efficient, and collaborative research environment. The challenges and future directions of these initiatives are also explored, providing insights into the ongoing efforts to create a more open and interconnected scientific landscape.

https://doi.org/10.52825/ocp.v5i.1202

“The New Zealand Thesis Project: Connecting a Nation’s Dissertations Using Wikidata”

Introduction: Libraries hold large amounts of bibliographic data, with great potential for enrichment with linked open data. The New Zealand Thesis Project explored this potential by uploading thesis metadata records from New Zealand institutional repositories to Wikidata, a collaborative linked data knowledge base.

Description of Project: Nine New Zealand tertiary institutions collaborated with four Wikidata experts to upload a combined national dataset of doctoral and master’s theses. Thesis records, including author and advisor names and richly described with main subject statements, were extracted from each repository, combined, and data cleaned before being uploaded to Wikidata. The team then undertook additional data enrichment, round-tripped Wikidata’s QID identifiers back to individual repositories, and used the new records to cite theses on authors’ Wikipedia pages. Wikidata queries and other visualizations were created to demonstrate how connecting the thesis metadata to records for authors, advisors, institutions, and subjects allows new insights into our collections.

Next Steps: Documentation is being fine-tuned to support future similar projects, and a second combined upload is under discussion to continue growing the New Zealand Thesis Project. There is considerable scope to continue enriching Wikidata records, some of which is already underway by Wikidata volunteers.

https://doi.org/10.31274/jlsc.18295

“The Quest to Share Data”

Data sharing in scientific research is widely acknowledged as crucial for accelerating progress and innovation. Mandates from funders, such as the NIH’s updated Data Sharing Policy, have been beneficial in promoting data sharing. However, the effectiveness of such mandates relies heavily on the motivation of data providers. Despite policy-imposed requirements, many researchers may only comply minimally, resulting in data that is inadequately reusable. Here, we discuss the multifaceted challenges of incentivizing data sharing and the complex interplay of factors involved. Our paper delves into the motivations of various stakeholders, including funders, investigators, and data users, highlighting the differences in perspectives and concerns. We discuss the role of guidelines, such as the FAIR principles, in promoting good data management practices but acknowledge the practical and ethical challenges in implementation. We also examine the impact of infrastructure on data sharing effectiveness, emphasizing the need for systems that support efficient data discovery, access, and analysis. We address disparities in resources and expertise among researchers and concerns related to data misuse and misinterpretation. Here, we advocate for a holistic approach to incentivizing data sharing beyond mere compliance with mandates. It calls for the development of reward systems, financial incentives, and supportive infrastructure to encourage researchers to share data enthusiastically and effectively. By addressing these challenges collaboratively, the scientific community can realize the full potential of data sharing to advance knowledge and innovation.

https://doi.org/10.3389/fninf.2025.1570568

“What Are Journals and Reviewers Concerned about in Data Papers? Evidence From Journal Guidelines and Review Reports”

The evolution of data journals and the increase in data papers call for associated peer review, which is intricately linked yet distinct from traditional scientific paper review. This study investigates the data paper review guidelines of 22 scholarly journals that publish data papers and analyses 131 data papers’ review reports from the journal Data. Peer review is an essential part of scholarly publishing. Although the 22 data journals employ disparate review models, their review purposes and requirements exhibit similarities. Journal guidelines provide authors and reviewers with comprehensive references for reviewing, which cover the entire life cycle of data. Reviewer attitudes predominantly encompass Suggestion, Inquiry, Criticism and Compliment during the specific review process, focusing on 18 key targets including manuscript writing, diagram presentation, data process and analysis, references and review and so forth. In addition, objective statements and other general opinions are also identified. The findings show the distinctive characteristics of data publication assessment and summarise the main concerns of journals and reviewers regarding the evaluation of data papers.

https://doi.org/10.1002/leap.2001

“Cambridge to Conduct ‘Radical’ Review of Open Research ”

Cambridge University Press is to conduct a “radical, community-led” review of the open research publishing ecosystem. The review aims to identify bold and workable solutions that support innovation and researchers’ needs in a manner that’s sustainable for all major stakeholders.

The project will focus on four areas crucial to the future of open research:

The link between publishing, reward and recognition

Equity in research dissemination

Research integrity

Technological change and the future of research publishing

https://tinyurl.com/2879upe8

“Are Data Papers Cited as Research Data? Preliminary Analysis on Interdisciplinary Data Paper Citations”

Introduction. Research data sharing and reuse have become increasingly important in modern science, and data papers represent a new academic publication genre aimed at enhancing the visibility, sharing, and reuse of research data. However, whether citations to data papers reflect actual data reuse remains largely unexplored. This paper presents preliminary findings from a project designed to address this gap.

Method. we conducted a content analysis to manually annotate 437 citation sentences from 309 research articles referencing 50 data papers published in Data in Brief, a chief academic journal that only publishes data papers. The data papers were sampled from five knowledge domains based on a paper-level classification system.

Results. Our results show that most citations to all selected data papers (89%) are unrelated to the research data being described in the paper, instead focusing on the research findings or methodologies. This suggests that data papers are being cited similarly to traditional research articles, despite their unique purpose and content.

Conclusion. These findings raise questions about the effectiveness of data papers as representations of research data within the scholarly communication system, as well as their utility in quantitative studies on data reuse.

https://tinyurl.com/3f5u33fs

“To be FAIR: Theory Specification Needs an Update”

Innovations in open science and meta-science have focused on rigorous *theory testing*, yet methods for specifying, sharing, and iteratively improving theories remain underdeveloped. To address these limitations, we introduce *FAIR theory*: A standard for specifying theories as Findable, Accessible, Interoperable, and Reusable information artifacts. FAIR theories are Findable in well-established archives, Accessible in practical terms and in terms of their ability to be understood, Interoperable for specific purposes, e.g., to guide control variable selection, and Reusable so that they can be iteratively improved through collaborative efforts. This paper adapts the FAIR principles for theory, reflects on the FAIRness of contemporary theoretical practices in psychology, introduces a workflow for FAIRifying theory, and explores FAIR theories’ potential impact in terms of reducing research waste, enabling meta-research on the structure and development of theories, and incorporating theory into reproducible research workflows – from hypothesis generation to simulation studies. We make use of well-established open science infrastructure, including Git for version control, GitHub for collaboration, and Zenodo for archival and search indexing. By applying the principles and infrastructure that have already revolutionized sharing of data and publications to theory, we establish a sustainable, transparent, and collaborative approach to theory development. FAIR theory equips scholars with a standard for systematically specifying and refining theories, bridging a critical gap in open research practices and supporting the renewed interest in theory development in psychology and beyond. FAIR theory provides a structured, cumulative framework for theory development, increasing efficiency and potentially accelerating the pace of cumulative knowledge acquisition.

https://doi.org/10.31234/osf.io/t53np_v1

“openRxiv Launch to Sustain and Expand Preprint Sharing in Life and Health Sciences”

Since their launches in 2013 and 2019, respectively, preprint servers bioRxiv and medRxiv have transformed how scientific findings are communicated. They have hosted more than 325,000 reports of new discoveries, enabling scientists worldwide to collaborate, iterate, and build upon each other’s work at an unprecedented pace. . . .

Establishing openRxiv aims to accelerate the value of these preprint servers, making it easier for these resources to grow and adapt. Created as services of Cold Spring Harbor Laboratory in partnership with other institutions, bioRxiv and medRxiv now move under openRxiv’s researcher-driven governance, ensuring that preprint sharing remains independent, sustainable, and responsive to researchers’ evolving needs.

https://tinyurl.com/2auerw5t

“The Academic Impact of Open Science: A Scoping Review”

Open Science seeks to make research processes and outputs more accessible, transparent and inclusive, ensuring that scientific findings can be freely shared, scrutinized and built upon by researchers and others. To date, there has been no systematic synthesis of the extent to which Open Science (OS) reaches these aims. We use the PRISMA scoping review methodology to partially address this gap, scoping evidence on the academic (but not societal or economic) impacts of OS. We identify 485 studies related to all aspects of OS, including Open Access (OA), Open/FAIR Data (OFD), Open Code/Software, Open Evaluation and Citizen Science (CS). Analysing and synthesizing findings, we show that the majority of studies investigated effects of OA, CS and OFD. Key areas of impact studied are citations, quality, efficiency, equity, reuse, ethics and reproducibility, with most studies reporting positive or at least mixed impacts. However, we also identified significant unintended negative impacts, especially those regarding equity, diversity and inclusion. Overall, the main barrier to academic impact of OS is lack of skills, resources and infrastructure to effectively re-use and build on existing research. Building on this synthesis, we identify gaps within this literature and draw implications for future research and policy.

https://doi.org/10.1098/rsos.241248

“Frontiers introduces FAIR² Data Management”

FAIR² Data Management leverages AI-assisted curation to structure research data for publication, making it easier to find, reuse, and analyze—both by humans and machines—so researchers can focus on discovery rather than data preparation. By making datasets shareable and optimized for reuse, FAIR² Data Management enhances research efficiency and reproducibility, accelerating breakthroughs in global health, planetary sustainability, and scientific innovation. . . .

FAIR² (FAIR Squared) extends the FAIR principles by defining a formal specification that makes research data AI-ready, aligned with Responsible AI principles, and structured for deep scientific reuse. Compatible with MLCommons Croissant’s AI-ready format, it integrates essential elements for scientific rigor, reproducibility, and interoperability. FAIR² ensures data is richly documented and linked to provenance, methodology, and a detailed data dictionary, creating a context-rich representation of each dataset. It also integrates with TensorFlow, JAX, and PyTorch, enabling AI-driven analysis and easy sharing on Kaggle and Hugging Face, amplifying its impact across disciplines.

https://tinyurl.com/3bwjbsw6

“Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs”

Paywalls, licenses and copyright rules often restrict the broad dissemination and reuse of scientific knowledge. We take the position that it is both legally and technically feasible to extract the scientific knowledge in scholarly texts. Current methods, like text embeddings, fail to reliably preserve factual content, and simple paraphrasing may not be legally sound. We urge the community to adopt a new idea: convert scholarly documents into Knowledge Units using LLMs. These units use structured data capturing entities, attributes and relationships without stylistic content. We provide evidence that Knowledge Units: (1) form a legally defensible framework for sharing knowledge from copyrighted research texts, based on legal analyses of German copyright law and U.S. Fair Use doctrine, and (2) preserve most (~95%) factual knowledge from original text, measured by MCQ performance on facts from the original copyrighted text across four research domains. Freeing scientific knowledge from copyright promises transformative benefits for scientific research and education by allowing language models to reuse important facts from copyrighted text. To support this, we share open-source tools for converting research documents into Knowledge Units. Overall, our work posits the feasibility of democratizing access to scientific knowledge while respecting copyright.

https://arxiv.org/abs/2502.19413

“The Economic Impact of Open Science: A Scoping Review”

This paper summarised a comprehensive scoping review of the economic impact of Open Science (OS), examining empirical evidence from 2000 to 2023. It focuses on Open Access (OA), Open/FAIR Data (OFD), Open Source Software (OSS), and Open Methods, assessing their contributions to efficiency gains in research production, innovation enhancement, and economic growth. Evidence, although limited, indicates that OS accelerates research processes, reduces the related costs, fosters innovation by improving access to data and resources and this ultimately generates economic growth. Specific sectors, such as life sciences, are researched more and the literature exhibits substantial gains, mainly thanks to OFD and OA. OSS supports productivity, while the very limited studies on Open Methods indicate benefits in terms of productivity gains and innovation enhancement. However, gaps persist in the literature, particularly in fields like Citizen Science and Open Evaluation, for which no empirical findings on economic impact could be detected. Despite limitations, empirical evidence on specific cases highlight economic benefits. This review underscores the need for further metrics and studies across diverse sectors and regions to fully capture OS’s economic potential.

https://doi.org/10.31222/osf.io/kqse5_v1

“The Economic Impact of Open Science: A Scoping Review”

This paper summarised a comprehensive scoping review of the economic impact of Open Science (OS), examining empirical evidence from 2000 to 2023. It focuses on Open Access (OA), Open/FAIR Data (OFD), Open Source Software (OSS), and Open Methods, assessing their contributions to efficiency gains in research production, innovation enhancement, and economic growth. Evidence, although limited, indicates that OS accelerates research processes, reduces the related costs, fosters innovation by improving access to data and resources and this ultimately generates economic growth. Specific sectors, such as life sciences, are researched more and the literature exhibits substantial gains, mainly thanks to OFD and OA. OSS supports productivity, while the very limited studies on Open Methods indicate benefits in terms of productivity gains and innovation enhancement. However, gaps persist in the literature, particularly in fields like Citizen Science and Open Evaluation, for which no empirical findings on economic impact could be detected. Despite limitations, empirical evidence on specific cases highlight economic benefits. This review underscores the need for further metrics and studies across diverse sectors and regions to fully capture OS’s economic potential.

https://osf.io/preprints/metaarxiv/kqse5_v1

“Open Infrastructures for Responsible Research Assessment: the CoARA Working Group publishes Its First Report”

The OI4RRA report stresses that transitioning to OIs requires institutions and stakeholders to identify the advantages that OIs offer in comparison with closed systems. These can be summarised in four key contributions.

In contrast to the traditional focus on publications and journal-based metrics, OIs support the consideration of a broad range of scholarly contributions in research evaluations.

OIs have the ability to integrate data-driven indicators with the nuance of contextual and narrative based information.

Thirdly, interoperability paired with community-driven governance for evaluations promote the uptake of best practices and foster trust.

Lastly, an emphasis on transparent data automation streamlines workflows, allowing researchers to devote more time to actual research.

https://tinyurl.com/vasmmrsh

Open Infrastructures for Responsible Research Assessment: Principles and Framework

“Data and Code Availability in Political Science Publications from 1995 to 2022”

In this paper, we assess the availability of reproduction archives in political science. By “reproduction archive,” we mean the data and code supporting quantitative research articles that allows others to reproduce the computations described in the published paper. We collect a random sample of quantitative research articles published in political science from 1995 to 2022. We find that—even in 2022—most quantitative research articles do not point a reproduction archive. However, practices are improving. In 2014, when the DA-RT symposium was published in PS, about 12% of quantitative research articles point to the data and code. Eight years later, in 2022, that has increased to 31%. This underscores a massive shift in norms, requirements, and infrastructure. Still, only a minority of articles share the supporting data and code.

https://doi.org/10.31235/osf.io/a5yxe_v2

“Peer Review of Data Papers: Does It Achieve Expectations for Facilitating Data Sharing and Reuse?”

This paper presents a qualitative study of open peer review reports of data papers in a data journal Earth System Science Data. We examine to what extent the actual review practices of data papers align with identifying the most valuable datasets and promoting data reuse. We conclude that peer reviewers adopted a variety of criteria to evaluate data papers, but it is still challenging for reviewers to identify the most valuable datasets that should be reused. In addition, our findings demonstrate the correlation between data paper evaluations and subsequent reuse of the underlying datasets.

https://dx.doi.org/10.2139/ssrn.5130257

U.S. Research Data Summit: Strengthening Cooperation Across Organizations and Sectors: Proceedings of a Workshop

On October 10-11, 2023, the National Academies of Sciences, Engineering, and Medicine hosted the U.S. Research Data Summit at the National Academy of Sciences Building in Washington, DC. The summit was undertaken by a planning committee organized under the U.S. National Committee for CODATA. The summit was informed by input from 29 organizations, including leaders from federal government agencies, the private sector, public and nonprofit organizations, and research institutions. This publication summarizes the presentations and discussion of the summit.

https://tinyurl.com/yjbuhkwz

“Unveiling the Report Findings from IOI’s Study on the State of Open Research Software Infrastructure”

Key recommendations

Surface hidden information – One of the biggest challenges we discovered during the study is a scarcity of available, standardized, and meaningful data. This information gap limits the visibility of what is happening in research software and in the development of infrastructure to support it. There is a pressing need to give time and attention at the field level to identify and subsequently gather the needed data to fill the information gaps.

Strengthen the scaffolding – As the field matures, its actors need stronger scaffolding to support norms and activities. Scaffolding, in this instance, can be defined as elements that, with appropriate instantiation, might become the backbone (social, technical, administrative) infrastructure supporting the field. There is a need to shift the priority from creating to integrating and maintaining, and to encourage and enable consolidation, specialization, mergers, and handoffs.

Grow the market – One of the challenges we have noticed is that research software infrastructures are leaning on the same funding sources, and those funding sources may not last. We’ve seen this in other fields as well. There is a need to figure out how to identify the research software users and how those users connect to customers. Understanding how that user connects to the dollars necessary to keep the research infrastructures running is also essential. This is not about profit but keeping things running and having a dependable system.

Invest in coordination – Research software is still in its infancy and lacks well-established practices, scaffolding, and market structures. With these conditions, no single actor can succeed alone in this evolving field, especially amid today’s challenging fiscal and political landscape for open science. Philanthropic funders can step in with targeted investments that build the foundational architecture of research software infrastructure. Such investments would bolster individual projects, programs, and organizations and create the necessary environment — providing time, space, tools, and structured support — across training, packaging, hosting, socialization, and advocacy) to collaborate across disciplines and geographies.

https://tinyurl.com/bddnzd7c

The State of Open Research Software Infrastructure

“From Data Creator to Data Reuser: Distance Matters”

Sharing research data is necessary, but not sufficient, for data reuse. Open science policies focus more heavily on data sharing than on reuse, yet both are complex, labor-intensive, expensive, and require infrastructure investments by multiple stakeholders. The value of data reuse lies in relationships between creators and reusers. By addressing knowledge exchange, rather than mere transactions between stakeholders, investments in data management and knowledge infrastructures can be made more wisely. Drawing upon empirical studies of data sharing and reuse, we develop the metaphor of distance between data creator and data reuser, identifying six dimensions of distance that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. We explore how social and socio-technical aspects of these dimensions may decrease – or increase – distances to be traversed between creators and reusers. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies. ‘It takes a village’ to share research data – and a village to reuse data. Our aim is to provoke new research questions, new research, and new investments in effective and efficient circulation of research data; and to identify criteria for investments at each stage of data and research life cycles.

https://tinyurl.com/3429p526

“Charting Open Science Landscapes: A Systematized Review of US Academic Libraries’ Engagement in Open Research Practices”

Open Science aims to make research publicly accessible, transparent, and reusable, promoting collaboration across disciplines and fostering relationships among government, academia, industry, and society. International and regional reviews have explored the roles of academic libraries in promoting open science on both global and local scales. However, practices within U.S. academic libraries have not been examined comprehensively. This study addresses this gap. We employ a systematized literature review methodology to map U.S. academic library engagement in key areas of open science (e.g., open access, open data, open educational resources) and overlap analysis is used to assess shifts from discrete initiatives (e.g., open access, research data management) to holistic, integrated services that span the research lifecycle. Using a comprehensive search strategy, we identified 3,752 publications for inclusion in the study. We find that U.S. academic libraries are actively engaged in open science practices, with the most extensive involvement in open access and the provision of infrastructure to support open science. However, engagement in activities related to citizen science remains limited. Through thematic overlap analysis, we find that ~50% of publications report activities across two or more themes of open science, suggesting a possible shift toward more comprehensive practices. A key challenge reported by libraries is the need for continuous professional development to address technical skills gaps. As research needs and corresponding librarian responsibilities continue to evolve, maintaining librarian professional development opportunities will remain crucial for equipping librarians with the skills necessary to continue supporting and advancing open science initiatives.

https://osf.io/pv7k2/

“Towards the Interoperability of Scholarly Repository Registries”

The enactment of Open Science relies on scholarly repositories that make research products findable and accessible, while scholarly repository registries maintain authoritative metadata and persistent identifiers (PIDs) to help researchers and infrastructure providers discover and access needed repositories. However, the proliferation of repositories targeting different research products (e.g., publications, data, and software) or serving specific disciplines has led to the creation of multiple registries whose scope is not mutually exclusive. . . . While favouring the existence of a plurality of registries, this paper advocates for their interoperability, which is essential to eliminate the aforementioned barriers and enable their full, unambiguous utilisation. We analyse the data models of four prominent registries—FAIRsharing, re3data, OpenDOAR, and ROAR—and classify their properties and overlap. We provide a crosswalk between their data models and suggest a common data model shared across the examined registries to pave the way toward interoperability. As a means of validation, we include a coverage evaluation of the proposed data model.The paper adopts a pragmatic approach towards scholarly registry interoperability and suggests a common metadata model to foster the exchange of information across these platforms.

https://doi.org/10.1007/s00799-025-00414-y

"Open Science at the Generative AI Turn: An Exploratory Analysis of Challenges and Opportunities"

Technology influences Open Science (OS) practices, because conducting science in transparent, accessible, and participatory ways requires tools and platforms for collaboration and sharing results. Due to this relationship, the characteristics of the employed technologies directly impact OS objectives. Generative Artificial Intelligence (GenAI) is increasingly used by researchers for tasks such as text refining, code generation/editing, reviewing literature, and data curation/analysis. Nevertheless, concerns about openness, transparency, and bias suggest that GenAI may benefit from greater engagement with OS. GenAI promises substantial efficiency gains but is currently fraught with limitations that could negatively impact core OS values, such as fairness, transparency, and integrity, and may harm various social actors. In this paper, we explore the possible positive and negative impacts of GenAI on OS. We use the taxonomy within the UNESCO Recommendation on Open Science to systematically explore the intersection of GenAI and OS. We conclude that using GenAI could advance key OS objectives by broadening meaningful access to knowledge, enabling efficient use of infrastructure, improving engagement of societal actors, and enhancing dialogue among knowledge systems. However, due to GenAI’s limitations, it could also compromise the integrity, equity, reproducibility, and reliability of research. Hence, sufficient checks, validation, and critical assessments are essential when incorporating GenAI into research workflows.

https://doi.org/10.1162/qss_a_00337

"An Analysis of the Effects of Sharing Research Data, Code, and Preprints on Citations"

In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122’000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations. We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% (±.7) on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% (±.8) on average. However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

https://doi.org/10.1371/journal.pone.0311493

Institutionally Based Research Data Services: Current Developments and Future Direction

The Summit for Academic Institutional Readiness in Data Sharing (STAIRS) was a multi-phased project that brought together a diverse group of representatives from academic institutions across the United States who support research data sharing efforts. Building off preliminary assessment work and a virtual learning series, this was a unique chance to discuss the opportunities and challenges in supporting researchers’ data sharing needs within and across institutions. This report captures the details of the project, including the preliminary assessment work as well as the summit. Following a description of the broad themes and overarching takeaways from this multi-phased effort, we conclude with next steps and future directions for the academic data services community.

https://tinyurl.com/3v8b5xc3

"Supporting Data Discovery: Comparing Perspectives of Support Specialists and Researchers Authors"

Purpose: Much of the research in data discovery is centered on the users’ viewpoint, frequently overlooking the perspective of those who develop and maintain the discovery infrastructure. Our goal is to conduct a comparative study on research data discovery, examining both support specialists’ and researchers’ views by merging new analysis with prior research insights.

Methods: This work summarizes the studies the authors have conducted over the last seven years investigating the data discovery practices of support specialists from different disciplines. Although support specialists were not the main target of some of these studies, data about their perspectives was collected. Our corpus comprises in-depth interviews with 6 social science support specialists, interviews with 19 researchers and 3 support specialists from multiple disciplines, a global survey with 1630 researchers and 47 support specialists, and a use case analysis of 25 support specialists. In the analysis section, we juxtapose the fresh insights on support specialists’ views with the already documented perspectives of researchers for a holistic understanding. The latter is primarily discussed in the literature review, with references made in the analysis section to draw comparisons.

Results: We found that support specialists’ views on data discovery are not entirely different from those of the researchers. There are, however, some differences that we have identified, most notably the interconnection of data discovery with general web search, literature search, and social networks. . . .

We conclude by proposing recommendations for different types of support work to better support researchers’ data discovery practices.

https://doi.org/10.5334/dsj-2024-048