"Tracing Data: A Survey Investigating Disciplinary Differences in Data Citation"


Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing — or not citing — data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse.

https://doi.org/10.1162/qss_a_00264

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Expanding the Data Ark: An Attempt to Make the Data from Highly Cited Social Science Papers Publicly Available"


Access to scientific data can enable independent reuse and verification; however, most data are not available and become increasingly irrecoverable over time. This study aimed to retrieve and preserve important datasets from 160 of the most highly-cited social science articles published between 2008-2013 and 2015-2018. We asked authors if they would share data in a public repository — the Data Ark — or provide reasons if data could not be shared. Of the 160 articles, data for 117 (73%, 95% CI [67% – 80%]) were not available and data for 7 (4%, 95% CI [0% – 12%]) were available with restrictions. Data for 36 (22%, 95% CI [16% – 30%]) articles were available in unrestricted form: 29 of these datasets were already available and 7 datasets were made available in the Data Ark. Most authors did not respond to our data requests and a minority shared reasons for not sharing, such as legal or ethical constraints. These findings highlight an unresolved need to preserve important scientific datasets and increase their accessibility to the scientific community.

https://doi.org/10.31222/osf.io/w9crz

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Towards a Toolbox for Automated Assessment of Machine-Actionable Data Management Plans"


Most research funders require Data Management Plans (DMPs). The review process can be time consuming, since reviewers read text documents submitted by researchers and provide their feedback. Moreover, it requires specific expert knowledge in data stewardship, which is scarce. Machine-actionable Data Management Plans (maDMPs) and semantic technologies increase the potential for automatic assessment of information contained in DMPs. However, the level of automation and new possibilities are still not well-explored and leveraged. This paper discusses methods for the automation of DMP assessment. It goes beyond generating human-readable reports. It explores how the information contained in maDMPs can be used to provide automated pre-assessment or to fetch further information, allowing reviewers to better judge the content. We map the identified methods to various reviewer goals.

https://doi.org/10.5334/dsj-2023-028

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Engaging with Researchers and Raising Awareness of FAIR and Open Science through the FAIR+ Implementation Survey Tool (FAIRIST)"


Seven years after the seminal paper on FAIR was published, that introduced the concept of making research outputs Findable, Accessible, Interoperable, and Reusable, researchers still struggle to understand how to implement the principles. For many researchers, FAIR promises long-term benefits for near-term effort, requires skills not yet acquired, and is one more thing in a long list of unfunded mandates and onerous requirements for scientists. Even for those required to, or who are convinced that they must make time for FAIR research practices, their preference is for just-in-time advice properly sized to the scientific artifacts and process. Because of the generality of most FAIR implementation guidance, it is difficult for a researcher to adjust to the advice according to their situation. Technological advances, especially in the area of artificial intelligence (AI) and machine learning (ML), complicate FAIR adoption, as researchers and data stewards ponder how to make software, workflows, and models FAIR and reproducible. The FAIR+ Implementation Survey Tool (FAIRIST) mitigates the problem by integrating research requirements with research proposals in a systematic way. FAIRIST factors in new scholarly outputs, such as nanopublications and notebooks, and the various research artifacts related to AI research (data, models, workflows, and benchmarks). Researchers step through a self-serve survey process and receive a table ready for use in their data management plan (DMP) and/or work plan. while gaining awareness of the FAIR Principles and Open Science concepts. FAIRIST is a model that uses part of the proposal process as a way to do outreach, raise awareness of FAIR dimensions and considerations, while providing timely assistance for competitive proposals.

https://doi.org/10.5334/dsj-2023-032

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Effects of Research Data Management Services: Associating the Data Curation Lifecycle with Open Research Output"


This study seeks to understand the relationship between research data management (RDM) services framed in the data curation life cycle and the production of open data. An electronic questionnaire was distributed to US researchers and RDM specialists, and the results were analyzed using Chi-Square tests for association. The data curation life cycle does associate with the production of open data and shareable research, but tasks like data management plans have stronger associations with the production of open data. The findings analyze the intersection of these concepts and provide insight into RDM services that facilitate the production of open data and shareable research.

https://doi.org/10.5860/crl.84.5.751

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"re3data — Indexing the Global Research Data Repository Landscape Since 2012"


For more than ten years, re3data, a global registry of research data repositories (RDRs), has been helping scientists, funding agencies, libraries, and data centers with finding, identifying, and referencing RDRs. As the world’s largest directory of RDRs, re3data currently describes over 3,000 RDRs on the basis of a comprehensive metadata schema. The service allows searching for RDRs of any type and from all disciplines, and users can filter results based on a wide range of characteristics. The re3data RDR descriptions are available as Open Data accessible through an API and are utilized by numerous Open Science services. re3data is engaged in various initiatives and projects concerning data management and is mentioned in the policies of many scientific institutions, funding organizations, and publishers. This article reflects on the ten-year experience of running re3data and discusses ten key issues related to the management of an Open Science service that caters to RDRs worldwide.

https://doi.org/10.1038/s41597-023-02462-y

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Data Management Plan Implementation, Assessments, and Evaluations: Implications and Recommendations"


Data management plans (DMPs) have become nearly a worldwide requirement for research funding. To meet these new funding agency expectations, information professionals across domains and the world have worked to create resources and services to successfully implement and sometimes assess DMPs. This essay presents a series of case studies from different institutions across the globe to highlight current practices and share recommendations for future work. A summary of various projects related to DMP implementation, assessment, and evaluation in different contexts provides a useful overview of current practices. The essay concludes with recommendations for practical oversight and scoring to improve DMPs’ utility in enabling the sharing of data.

https://doi.org/10.5334/dsj-2023-027

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Computational Reproducibility of Jupyter Notebooks from Biomedical Publications"


Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. We address computational reproducibility at two levels: First, using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks related to publications indexed in PubMed Central. We identified such notebooks by mining the articles full text, locating them on GitHub and re-running them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. Second, this study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over two years. Out of 27271 notebooks from 2660 GitHub repositories associated with 3467 articles, 22578 notebooks were written in Python, including 15817 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 10388 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 1203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. We zoom in on common problems, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.

https://arxiv.org/abs/2308.07333

More about Jupyter notebooks.

The Jupyter Notebook is an interactive computing environment that enables users to author notebook documents that include code, interactive widgets, plots, narrative text, equations, images and even video! The Jupyter name comes from 3 programming languages: Julia, Python, and R. It is a popular tool for literate programming. Donald Knuth first defined literate programming as a script, notebook, or computational document that contains an explanation of the program logic in a natural language (e.g. English or Mandarin), interspersed with snippets of macros and source code, which can be compiled and rerun. You can think of it as an executable paper!

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Actually Accessible Data: An Update and a Call to Action"


As funder, journal, and disciplinary norms and mandates have foregrounded obligations of data sharing and opportunities for data reuse, the need to plan for and curate data sets that can reach researchers and end-users with disabilities has become even more urgent. We begin by exploring the disability studies literature, describing the need for advocacy and representation of disabled scholars as data creators, subjects, and users. We then survey the landscape of data repositories, curation guidelines, and research-data-related standards, finding little consideration of accessibility for people with disabilities. We suggest three sets of minimal good practices for moving toward truly accessible research data: 1) ensuring Web accessibility for data repositories; 2) ensuring accessibility of common text formats, including those used in documentation; and 3) enhancement of visual and audiovisual materials. We point to some signs of progress in regard to truly accessible data by highlighting exemplary practices by repositories, standards, and data professionals. Accessibility needs to become a mainstream component of curation practice included in every training, manual, and primer.

https://tinyurl.com/2p8p4dau

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"A Decade of Surveys on Attitudes to Data Sharing Highlights Three Factors for Achieving Open Science"


Over a 10 year period Carol Tenopir of DataONE and her team conducted a global survey of scientists, managers and government workers involved in broad environmental science activities about their willingness to share data and their opinion of the resources available to do so. . . .

The most surprising result was that a higher willingness to share data corresponded with a decrease in satisfaction with data sharing resources across nations (e.g., skills, tools, training) (Fig.1). That is, researchers who did not want to share data were satisfied with the available resources, and those that did want to share data were dissatisfied. Researchers appear to only discover that the tools are insufficient when they begin the hard work of engaging in open science practices. This indicates that a cultural shift in the attitudes of researchers needs to precede the development of support and tools for data management.

https://tinyurl.com/4sx54c6d

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Data Sharing for Research: A Compendium of Case Studies, Analysis, and Recommendations


This report contains eight case studies that look at specific corporate/academic data-sharing partnerships in depth, from initiation through the publication of research findings. These case studies illuminate practical challenges for implementing corporate data sharing with researchers. Some common themes that emerged from the case studies include:

  • Successful data-sharing partnerships use Data-Sharing Agreements that require both the company and researchers to take steps to protect privacy.
  • Some of the challenges of data sharing include technical knowledge and infrastructure gaps between companies and researchers, and the continuing need for ethics and privacy review for industry-based research.
  • Promising practices for data sharing include the use of Privacy Enhancing Technologies and company-created, public-facing data-sharing menus to facilitate new partnerships.
  • While data sharing has significant costs and inherent risks, the risks can be managed, and the benefits to researchers, companies, and society make data sharing worth the effort.

https://tinyurl.com/a9axcscp

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Research Reproducibility Activities in Health Sciences Libraries"


Within medical and health sciences libraries, research reproducibility work and services are seldom described in those terms, and are often hidden within other data services. RR work is highly dependent on institutional context, such as availability of partners and institutional needs. Most of the RR work is handled by individuals or teams who tend to focus on data services broadly. Meaningful assessment of the work is not done well at present. Getting administrators, researchers, and other stakeholders to associate the library with RR is a particular challenge. Librarians who are interested in RR could learn from others who are doing the work, understand their institutional context, identify relevant institutional partners, and model RR practices in their own work.

https://doi.org/10.7191/jeslib.650

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Actually Accessible Data: An Update and a Call to Action"


As funder, journal, and disciplinary norms and mandates have foregrounded obligations of data sharing and opportunities for data reuse, the need to plan for and curate data sets that can reach researchers and end-users with disabilities has become even more urgent. We begin by exploring the disability studies literature, describing the need for advocacy and representation of disabled scholars as data creators, subjects, and users. We then survey the landscape of data repositories, curation guidelines, and research-data-related standards, finding little consideration of accessibility for people with disabilities. We suggest three sets of minimal good practices for moving toward truly accessible research data: 1) ensuring Web accessibility for data repositories; 2) ensuring accessibility of common text formats, including those used in documentation; and 3) enhancement of visual and audiovisual materials. We point to some signs of progress in regard to truly accessible data by highlighting exemplary practices by repositories, standards, and data professionals. Accessibility needs to become a mainstream component of curation practice included in every training, manual, and primer.

https://tinyurl.com/2p4au2ar

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Data Journals: Where Data Sharing Policy Meets Practice"


Data journals incorporate elements of traditional scholarly communications practices—reviewing for quality and rigor through editorial and peer-review—and the data sharing / open data movement—prioritizing broad dissemination through repositories, sometimes with curation or technical checks. Their goals for dataset review and sharing are recorded in journal-based data policies and operationalized through workflows. In this qualitative, small cohort semi-structured interview study of eight different journals that review and publish research data, we explored (1) journal data policy requirements, (2) data review standards, and (3) implementation of standardized data evaluation workflows. Differences among the journals can be understood by considering editors’ approaches to balancing the interests of varied stakeholders. Assessing data quality for reusability is primarily conditional on fitness for use which points to an important distinction between disciplinary and discipline-agnostic data journals.

https://doi.org/10.17615/nqtz-b568

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Who Re-Uses Data? A Bibliometric Analysis of Dataset Citations"


Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship between creators and citers of datasets at the individual, institutional, and national levels. We find that the vast majority of datasets have no recorded citations, and that most cited datasets only have a single citation. Rates of self-citation by individuals and institutions tend towards the low end of previous findings and vary widely across disciplines. At the country level, the United States is by far the most prominent exporter of re-used datasets, while importation is more evenly distributed. Understanding where and how the sharing of data between researchers, institutions, and countries takes place is essential to developing open research practices.

https://arxiv.org/abs/2308.04379

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ARL Awarded Grant to Continue Research on Institutional Expenses for Public Access to Research Data"


The US Institute of Museum and Library Services (IMLS) has awarded the Association of Research Libraries (ARL), in collaboration with Duke University, the University of Minnesota, and Washington University in St. Louis, all of whom are members of the Data Curation Network (DCN), a $741,921 National Leadership Grant to examine institutional expenses for public access to research data. This research builds upon ARL’s existing Realities of Academic Data Sharing initiative.

https://tinyurl.com/378dzab6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Association of Research Libraries and California Digital Library Receive Grant to Advance Data Management and Sharing"


The Association of Research Libraries (ARL) and the California Digital Library (CDL) have received a $668,048 National Leadership Grant from the US Institute of Museum and Library Services (IMLS) to assist institutions in managing and sharing federally funded research data. This project will build a machine-actionable data-management plan (maDMP) tool by enhancing and developing new DMPTool features utilizing persistent identifiers (PIDs). CDL and ARL will work together to further strengthen institutional capacity for tracking research outputs by piloting the institutional integration of maDMPs across an academic campus and building community across institutions for maDMPs.

https://tinyurl.com/35x9d45z

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"New at Dryad: Support for NIH-funded researchers"


Dryad provides a simple submission process that makes it easy for researchers to upload your datasets, apply metadata that makes them discoverable and reusable, and get a persistent identifier (DOI) you can use in grant reporting. Once submitted, datasets are made publicly accessible so they can be reused by others in order to advance scientific discovery and collaboration across disciplines. Dryad also provides an extensive library of existing datasets from various sources, including those funded by NIH grants, that are completely free to access and reuse.

https://tinyurl.com/4uu9tz2r

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Data Sharing Implementation in Top 10 Ophthalmology Journals in 2021"


Background/Aims: Deidentified individual participant data (IPD) sharing has been implemented in the International Committee of Medical Journal Editors journals since 2017. However, there were some published clinical trials that did not follow the new implemented policy. This study examines the number of clinical trials that endorsed IPD sharing policy among top ophthalmology journals.

Method: All published original articles in 2021 in 10 highest-ranking ophthalmology journals according to the 2020 journal impact factor were included. Clinical trials were determined by the WHO definition of clinical trials. Each article was then thoroughly searched for the IPD sharing statement either in the manuscript or in the clinical trial registry. We collected the number of published clinical trials that implemented IPD sharing policy as our primary outcome.

Results: 1852 published articles in top 10 ophthalmology journals were identified, and 9.45% were clinical trials. Of these clinical trials, 44% had clinical trial registrations and 49.14% declared IPD sharing statements. Only 42 (48.83%) clinical trials were willing to share IPD, and 5 (10.21%) of these share IPD via an online repository platform. In terms of sharing period, 37 clinical trials were willing to share right after the publication and only 2 showed the ending of sharing period.

Conclusion: This report shows that the number of clinical trials in top ophthalmology journals that endorsed the IPD sharing policy and the number of registrations is lower than half even though the policy has been implemented for several years. Future updates are necessary as policy evolves.

http://dx.doi.org/10.1136/bmjophth-2023-001276

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"eLife and PREreview to Enhance the ‘Publish, Review, Curate’ Ecosystem Through Adoption of COAR Notify"


The project will put in place the basic infrastructure and protocols needed for all-round and standardised connections between preprint repositories, community-led preprint review platforms, journals, and preprint review aggregation and curation platforms. The aim is to lower existing technological and cost barriers so that as many of these services as possible can more easily participate in the ‘publish, review, curate’ future for research.

https://tinyurl.com/36emyk9b

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Springer Nature Continues Open Research Drive with Acquisition of protocols.io"


Scientific advancement depends on data credibility and work that can be verified, built upon and reproduced. Sharing all elements of research, including data, methods and materials, and even negative results, makes research more efficient, enables reproducibility and therefore builds trust in science. Studies show that lack of awareness of existing work or negative results leads to unnecessary duplication and could waste up to €26 billion in Europe alone.

By laying out detailed step-by-step instructions for research methods, aiming to standardise the process, ensure accuracy of results and enabling research to be reproduced, protocols have a vital role to play in addressing this. With protocols.io joining Springer Nature’s leading protocol offering, researchers will now have the option to make their protocols openly available on the protocols.io platform (fully OA) as well as publishing them in peer-reviewed publications (searchable via the Springer Nature Experiments).

https://tinyurl.com/3j4kn49w

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Trends in Research Data Management and Academic Health Sciences Libraries"


Spurred by the National Institute of Health mandating a data management and sharing plan as a requirement of grant funding, research data management has exploded in importance for librarians supporting researchers and research institutions. This editorial examines the role and direction of libraries in this process from several viewpoints. Key markers of success include collaboration, establishing new relationships, leveraging existing relationships, accessing multiple avenues of communication, and building niche expertise and cachè as a valued and trustworthy partner. [Article includes case studies.]

https://doi.org/10.1080/02763869.2023.2218776

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

SPARC: "Oppose Section 552 That Will Block Taxpayer Access to Research"


The U.S. House Appropriations Subcommittee on Commerce, Justice, and Science (CJS) has released an appropriations bill containing language that would block implementation of the 2022 updated OSTP policy guidance (the Nelson Memo) that would ensure immediate, free access to taxpayer-funded research. If enacted, this will prevent American taxpayers from seeing the benefits of the more than $90 billion in scientific research that the U.S. government funds each year. . . .

Write to Congress

Look up contact details for your Representatives and Senators, then customize the text in this template letter.

Call Congress

Look up contact details for your Representatives and Senators, then call the office and tell them to remove Section 552 of the House CJS bill.

https://tinyurl.com/3mbbmwxw

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"How Are Exclusively Data Journals Indexed in Major Scholarly Databases? An Examination of the Web of Science, Scopus, Dimensions, and OpenAlex"


As part of the data-driven paradigm and open science movement, the data paper is becoming a popular way for researchers to publish their research data, based on academic norms that cross knowledge domains. Data journals have also been created to host this new academic genre. The growing number of data papers and journals has made them an important large-scale data source for understanding how research data is published and reused in our research system. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively. As a result, we argue that efforts should be made by data journals and databases to improve the quality of metadata for this emerging genre.

https://arxiv.org/abs/2307.09704

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |