"Open Science at the Generative AI Turn: An Exploratory Analysis of Challenges and Opportunities"


Technology influences Open Science (OS) practices, because conducting science in transparent, accessible, and participatory ways requires tools and platforms for collaboration and sharing results. Due to this relationship, the characteristics of the employed technologies directly impact OS objectives. Generative Artificial Intelligence (GenAI) is increasingly used by researchers for tasks such as text refining, code generation/editing, reviewing literature, and data curation/analysis. Nevertheless, concerns about openness, transparency, and bias suggest that GenAI may benefit from greater engagement with OS. GenAI promises substantial efficiency gains but is currently fraught with limitations that could negatively impact core OS values, such as fairness, transparency, and integrity, and may harm various social actors. In this paper, we explore the possible positive and negative impacts of GenAI on OS. We use the taxonomy within the UNESCO Recommendation on Open Science to systematically explore the intersection of GenAI and OS. We conclude that using GenAI could advance key OS objectives by broadening meaningful access to knowledge, enabling efficient use of infrastructure, improving engagement of societal actors, and enhancing dialogue among knowledge systems. However, due to GenAI’s limitations, it could also compromise the integrity, equity, reproducibility, and reliability of research. Hence, sufficient checks, validation, and critical assessments are essential when incorporating GenAI into research workflows.

https://doi.org/10.1162/qss_a_00337

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"An Analysis of the Effects of Sharing Research Data, Code, and Preprints on Citations"


In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122’000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations. We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% (±.7) on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% (±.8) on average. However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

https://doi.org/10.1371/journal.pone.0311493

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Institutionally Based Research Data Services: Current Developments and Future Direction


The Summit for Academic Institutional Readiness in Data Sharing (STAIRS) was a multi-phased project that brought together a diverse group of representatives from academic institutions across the United States who support research data sharing efforts. Building off preliminary assessment work and a virtual learning series, this was a unique chance to discuss the opportunities and challenges in supporting researchers’ data sharing needs within and across institutions. This report captures the details of the project, including the preliminary assessment work as well as the summit. Following a description of the broad themes and overarching takeaways from this multi-phased effort, we conclude with next steps and future directions for the academic data services community.

https://tinyurl.com/3v8b5xc3

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Supporting Data Discovery: Comparing Perspectives of Support Specialists and Researchers Authors"


Purpose: Much of the research in data discovery is centered on the users’ viewpoint, frequently overlooking the perspective of those who develop and maintain the discovery infrastructure. Our goal is to conduct a comparative study on research data discovery, examining both support specialists’ and researchers’ views by merging new analysis with prior research insights.

Methods: This work summarizes the studies the authors have conducted over the last seven years investigating the data discovery practices of support specialists from different disciplines. Although support specialists were not the main target of some of these studies, data about their perspectives was collected. Our corpus comprises in-depth interviews with 6 social science support specialists, interviews with 19 researchers and 3 support specialists from multiple disciplines, a global survey with 1630 researchers and 47 support specialists, and a use case analysis of 25 support specialists. In the analysis section, we juxtapose the fresh insights on support specialists’ views with the already documented perspectives of researchers for a holistic understanding. The latter is primarily discussed in the literature review, with references made in the analysis section to draw comparisons.

Results: We found that support specialists’ views on data discovery are not entirely different from those of the researchers. There are, however, some differences that we have identified, most notably the interconnection of data discovery with general web search, literature search, and social networks. . . .

We conclude by proposing recommendations for different types of support work to better support researchers’ data discovery practices.

https://doi.org/10.5334/dsj-2024-048

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"IOP Publishing Study Reveals Varied Adoption and Barriers in Open Data Sharing Among Physical Research Communities"


Environmental scientists are the most open with their research data, yet legal constraints related to third-party ownership often limit their ability to follow the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles. Physicists are also willing to share data but have concerns about the accessibility and understanding of the formats used. Engineering and materials scientists face the most significant barriers to sharing FAIR data due to concerns over confidentiality and sensitivity.

https://tinyurl.com/2s3jjzft

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Paradox of Competition: How Funding Models Could Undermine the Uptake of Data Sharing Practices"


Although beneficial to scientific development, data sharing is still uncommon in many research areas. Various organisations, including funding agencies that endorse open science, aim to increase its uptake. However, estimating the large-scale implications of different policy interventions on data sharing by funding agencies, especially in the context of intense competition among academics, is difficult empirically. Here, we built an agent-based model to simulate the effect of different funding schemes (i.e., highly competitive large grants vs. distributive small grants), and varying intensity of incentives for data sharing on the uptake of data sharing by academic teams strategically adapting to the context. Our results show that more competitive funding schemes may lead to higher rates of data sharing in the short term, but lower rates in the long-term, because the uncertainty associated with competitive funding negatively affects the cost/benefit ratio of data sharing. At the same time, more distributive grants do not allow academic teams to cover the costs and time required for data sharing, limiting uptake. Our findings suggest that without support services and infrastructure to minimise the costs of data sharing and other ancillary conditions (e.g., university policy support, reputational rewards and benefits of data sharing for academic teams), it is unlikely that funding agencies alone can play a leading role for the uptake of data sharing. Therefore, any attempt to reform reward and recognition systems towards open science principles should carefully consider the potential impact of their proposed policies and their long-term side effects.

https://doi.org/10.31222/osf.io/gb4v2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Changes to Data Management and Sharing (DMS) Plan Progress Reporting and the Submission of Revised DMS Plans Are Coming on October 1"


On October 1, NIH is adding several new Data Management and Sharing (DMS) questions to Research Performance Progress Reports (RPPRs) and updating the process for submitting revised DMS Plans to NIH for review. In brief:

  • As mentioned in a May 2024 Guide Notice, NIH is including several new questions about DMS activities in RPPRs submitted on or after October 1, 2024 (See Guide Notice NOT-OD-24-175). For awards for which the NIH DMS Policy applies, recipients will now be asked:
  • Whether data has been generated or shared to date
  • What repositories any data was shared to and under what unique digital identifier
  • If data has not been generated and/or shared per the award’s DMS Plan, why and what corrective actions have or will be taken to comply with the plan
  • If significant changes to the DMS Plan are anticipated in the coming year, recipients will be asked to explain them and provide a revised DMS Plan for approval.

https://tinyurl.com/4mxwtn8k

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Knowledge Infrastructures are Growing Up: The Case for Institutional (Data) Repositories 10 Years After the Holdren Memo"


Institutional data repositories are uniquely positioned to support researchers in sharing scholarly outputs. As funding agencies develop and institute policies for research data access and sharing, institutional data repositories have emerged as a critical feature in ecosystems for data stewardship and sharing. We show that institutional data repositories can meet and exceed the requirements and recommendations of federal data policy, thereby maximizing the benefits of data sharing. We present results of a mixed-method study which explores the adoption and usage of institutional repositories to share data from 2017 to 2023. Data from two previous studies were combined with data collected in 2023 on the data sharing solutions of Association of Research Libraries member institutions in the United States and Canada. The analysis of the aggregated data indicates that data stewardship has increased in both institutional repositories and institutional data repositories with an increase in complementary infrastructure to support data sharing. We then conduct an “infrastructural inversion” (Bowker & Star, 1999) to ‘surface invisible work’ of making data repositories function well, and demonstrate that institutional data repositories have advantages for providing sustainable stewardship, curation, and sharing of research data. Finally, we show that institutional data repositories may produce additional benefits through established infrastructure, local interoperability, and control.

https://doi.org/10.5334/dsj-2024-046

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Plan S: "New Tool to Assess Equity in Scholarly Communication Models"


The tool [https://tinyurl.com/2crwwhes], which was inspired by the “How Open Is It?” framework, is targeted at institutions, library consortia, funders and publishers, i.e. the stakeholders either investing or receiving funds for publishing services. It offers users the opportunity to rate scholarly communication models and arrangements across seven criteria:

  • Access to Read
  • Publishing immediate Open Access
  • Maximizing participation
  • Re-use rights
  • Pricing and fee transparency
  • Promoting and encouraging open research practices: data and code
  • Promoting and encouraging open research practices: preprints and open peer review

https://tinyurl.com/ycwmp3nk

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Living Library: A Process-Based Tool for Open Literature Review, Probing the Boundaries of Open Science"


In this paper, we present a new tool for open science research, the Living Library. The Living Library provides an online platform and methodological framework for open, continuous literature reviewing. As a research medium, it explores what openness means in light of the human dimension and interpretive nature of engaging with societal questions. As a tool, the Living Library allows researchers to collectively sort, dynamically interpret and openly discuss the evolving literature on a topic of interest. The interface is built around a timeline along which articles can be filtered, themes with which articles are coded, and an open researcher logbook that documents the development of the library. The first rendition of a Living Library can be found via this link: https://eduvision-living-library.web.app/, and the code to develop your own Living Library can be found via this link: https://github.com/Simon-Dirks/living-library.

https://doi.org/10.1007/s43545-024-00964-z

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Creating a Fully Open Environment for Research Code and Data"


Quantitative research in the social and natural sciences is increasingly dependent on new datasets and forms of code. Making these resources open and accessible is a key aspect of open research and underpins efforts to maintain research integrity. Erika Pastrana explains how Springer Nature developed Nature Computational Science to be fully compliant with open research and data principles.

https://tinyurl.com/7uwdxrrz

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "The FAIRification Process for Data Stewardship: A Comprehensive Discourse on the Implementation of the Fair Principles for Data Visibility, Interoperability and Management"


Using a systematic literature review, the study focuses on the implementation of these [FAIR] principles in research data management and their applicability in data repositories and data centres. It highlights the importance of implementing these principles systematically, allowing stakeholders to choose the minimum requirements and provide a vision for implementing them in data repositories and data centres. The article also highlights the steps in the FAIRification process, which can enhance data interoperability, discovery and reusability.

https://doi.org/10.1177/03400352241270692

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"An Analysis of the Impact of Gold Open Access Publications in Computer Science"


There has been some concern about the impact of predatory publishers on scientific research for some time. Recently, publishers that might previously have been considered `predatory’ have established their bona fides, at least to the extent that they are included in citation impact scores such as the field-weighted citation impact (FWCI). These are sometimes called ‘grey’ publishers (MDPI, Frontiers, Hindawi). In this paper, we show that the citation landscape for these grey publications is significantly different from the mainstream landscape and that affording publications in these venues the same status as publications in mainstream journals may significantly distort metrics such as the FWCI.

https://arxiv.org/abs/2408.10262

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Research on the Generation Mechanism and Action Mechanism of Scientific Data Reuse Behavior"


Specifically, this study takes scientific data reuse attitudes as a breakthrough to discuss the factors that influence researchers’ scientific data reuse attitudes and the extent to which these factors influence scientific data reuse behaviors. It also further explores the impact of scientific data reuse behavior on research and innovation performance and the moderating effect of scientific data services on scientific data reuse behavior.

https://doi.org/10.1016/j.acalib.2024.102921

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Lawmakers Raise New Licensing Concerns over White House Open Access Mandate"


While Republican appropriators in the House have previously tried to entirely block the White House’s open access policy, now appropriators in both chambers of Congress have advanced legislation that would block federal agencies from limiting authors’ ability to choose how to license their work. . . .

This language used in the House report and Senate report regarding researcher choice is identical, though the House goes further by advising federal agencies not to “exert broad ‘federal purpose’ authority over peer reviewed articles” or “otherwise force use of an open license.”

House Republicans also propose that the White House be prohibited from using any funding to implement the policy, as they attempted in last year’s legislation.

https://tinyurl.com/46y42ecr

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Unfolding the Downloads of Datasets: A Multifaceted Exploration of Influencing Factors"


Scientific data are essential to advancing scientific knowledge and are increasingly valued as scholarly output. Understanding what drives dataset downloads is crucial for their effective dissemination and reuse. Our study, analysing 55,473 datasets from 69 data repositories, identifies key factors driving dataset downloads, focusing on interpretability, reliability, and accessibility. We find that while lengthy descriptive texts can deter users due to complexity and time requirements, readability boosts a dataset’s appeal. Reliability, evidenced by factors like institutional reputation and citation counts of related papers, also significantly increases a dataset’s attractiveness and usage. Additionally, our research shows that open access to datasets increases their downloads and amplifies the importance of interpretability and reliability. This indicates that easy access enhances the overall attractiveness and usage of datasets in the scholarly community. By emphasizing interpretability, reliability, and accessibility, this study offers a comprehensive framework for future research and guides data management practices toward ensuring clarity, credibility, and open access to maximize the impact of scientific datasets.

https://doi.org/10.1038/s41597-024-03591-8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Infra Finder: a New Tool to Enhance Transparency, Discoverability and Trust in Open Infrastructure"


This paper describes Infra Finder, a new tool built by Invest in Open Infrastructure to help institutional budget holders and libraries make more informed decisions around adoption of and investment in open infrastructure. Through increased transparency and discoverability, we aim for this tool to foster trust in the decision-making process and to help build connections between services, users, and funders. The design of Infra Finder is intended to contribute to ongoing discussions and developments regarding trust and transparency in open scholarly infrastructure, as well as help level the playing field between organizations with limited resources to conduct extensive due diligence processes and those with their own analyst teams. In this work, we describe the landscape analysis that led to the creation of Infra Finder, the use cases for the tool, and the approach IOI is taking to create and foster use of Infra Finder in the open infrastructure environment. We also address some of the principles of trust in open source and open infrastructure that have informed and impacted the Infra Finder project and our work in creating this tool.

https://doi.org/10.2218/ijdc.v18i1.927

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Ten Simple Rules for Recognizing Data and Software Contributions in Hiring, Promotion, and Tenure"


The ways in which promotion and tenure committees operate vary significantly across universities and departments. While committees often have the capability to evaluate the rigor and quality of articles and monographs in their scientific field, assessment with respect to practices concerning research data and software is a recent development and one that can be harder to implement, as there are few guidelines to facilitate the process. More specifically, the guidelines given to tenure and promotion committees often reference data and software in general terms, with some notable exceptions such as guidelines in [5] and are almost systematically trumped by other factors such as the number and perceived impact of journal publications. The core issue is that many colleges establish a scholarship versus service dichotomy: Peer-reviewed articles or monographs published by university presses are considered scholarship, while community service, teaching, and other categories are given less weight in the evaluation process. This dichotomy unfairly disadvantages digital scholarship and community-based scholarship, including data and software contributions [6]. In addition, there is a lack of resources for faculties to facilitate the inclusion of responsible data and software metrics into evaluation processes or to assess faculty’s expertise and competencies to create, manage, and use data and software as research objects. As a result, the outcome of the assessment by the tenure and promotion committee is as dependent on the guidelines provided as on the committee members’ background and proficiency in the data and software domains.

The presented guidelines aim to help alleviate these issues and align the academic evaluation processes to the principles of open science. We focus here on hiring, tenure, and promotion processes, but the same principles apply to other areas of academic evaluation at institutions. While these guidelines are by no means sufficient for handling the complexity of a multidimensional process that involves balancing a large set of nuanced and diverse information, we hope that they will support an increasing adoption of processes that recognize data and software as key research contributions.

https://doi.org/10.1371/journal.pcbi.1012296

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Sharing Practices of Software Artefacts and Source Code for Reproducible Research"


While source code of software and algorithms depicts an essential component in all fields of modern research involving data analysis and processing steps, it is uncommonly shared upon publication of results throughout disciplines. Simple guidelines to generate reproducible source code have been published. Still, code optimization supporting its repurposing to different settings is often neglected and even less thought of to be registered in catalogues for a public reuse. Though all research output should be reasonably curated in terms of reproducibility, it has been shown that researchers are frequently non-compliant with availability statements in their publications. These do not even include the use of persistent unique identifiers that would allow referencing archives of code artefacts at certain versions and time for long-lasting links to research articles. In this work, we provide an analysis on current practices of authors in open scientific journals in regard to code availability indications, FAIR principles applied to code and algorithms. We present common repositories of choice among authors. Results further show disciplinary differences of code availability in scholarly publications over the past years. We advocate proper description, archiving and referencing of source code and methods as part of the scientific knowledge, also appealing to editorial boards and reviewers for supervision.

https://doi.org/10.1007/s41060-024-00617-7

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Back to Basics: Considering Categories of Data Services Consults"


Consultations are fundamental to data librarianship, serving as a vital means of one-on-one support for researchers. However, the topics and forms of support unique to data services consults are not always carefully considered. This commentary addresses five common services offered by data librarians—dataset reference, data management support, data analysis and software support, data curation, and data management (and sharing) plan writing—and considers strategies for successful patron support within the boundaries of a consultation.

https://doi.org/10.7191/jeslib.931

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Promoting Data Sharing: The Moral Obligations of Public Funding Agencies"


Sharing research data has great potential to benefit science and society. However, data sharing is still not common practice. Since public research funding agencies have a particular impact on research and researchers, the question arises: Are public funding agencies morally obligated to promote data sharing? We argue from a research ethics perspective that public funding agencies have several pro tanto obligations requiring them to promote data sharing. However, there are also pro tanto obligations that speak against promoting data sharing in general as well as with regard to particular instruments of such promotion. We examine and weigh these obligations and conclude that all things considered funders ought to promote the sharing of data. Even the instrument of mandatory data sharing policies can be justified under certain conditions.

https://doi.org/10.1007/s11948-024-00491-3

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The State of Open Infrastructure Funding: A Recap of IOI’s Community Conversation "


In July, IOI hosted its second State of Open Infrastructure Community Conversation — this time, exploring the state of open infrastructure grant funding.

To set the stage,, IOI’s senior researcher Gail Steinhart provided an overview of the methods that were used to gather over $415M USD in grant funding data for open infrastructures (OIs) and broke down some of the key findings from the analysis. To dive further into the topic of funding data, IOI Executive Director Kaitlin Thaney facilitated a panel conversation that featured Steinhart, collaborators Cameron Neylon and Karl Huang from the Curtin Open Knowledge Initiative (COKI), and John Mohr, CIO of Information Technology for theMacArthur Foundation and co-founder of the Philanthropy Data Commons. With their extensive experience in grant funding from diverse perspectives of the scholarly ecosystem, the panel shed light on the trends, impact, and limitations of grant funding for OIs. . . . .

Across the grants the team mapped for the 36 open infrastructures represented in this dataset, awards were categorized to reflect whether they provide direct support to an OI, indirect support (meaning the OI is referenced in the award title or abstract, but the funding does not directly support the OI though it may provide some indication of on OI’s broader impact), adoption support (funding that supports the implementation of an instance of an OI at a local or community scale), and grants we were unable to classify (unknown). While a significant amount (42%) of funding goes to direct support, the majority of the funding (52%) goes to indirect support.

https://tinyurl.com/ye2yfzsr

Video

Dataset

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |