"Tracing Data: A Survey Investigating Disciplinary Differences in Data Citation"


Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing — or not citing — data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse.

https://doi.org/10.1162/qss_a_00264

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Expanding the Data Ark: An Attempt to Make the Data from Highly Cited Social Science Papers Publicly Available"


Access to scientific data can enable independent reuse and verification; however, most data are not available and become increasingly irrecoverable over time. This study aimed to retrieve and preserve important datasets from 160 of the most highly-cited social science articles published between 2008-2013 and 2015-2018. We asked authors if they would share data in a public repository — the Data Ark — or provide reasons if data could not be shared. Of the 160 articles, data for 117 (73%, 95% CI [67% – 80%]) were not available and data for 7 (4%, 95% CI [0% – 12%]) were available with restrictions. Data for 36 (22%, 95% CI [16% – 30%]) articles were available in unrestricted form: 29 of these datasets were already available and 7 datasets were made available in the Data Ark. Most authors did not respond to our data requests and a minority shared reasons for not sharing, such as legal or ethical constraints. These findings highlight an unresolved need to preserve important scientific datasets and increase their accessibility to the scientific community.

https://doi.org/10.31222/osf.io/w9crz

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Towards a Toolbox for Automated Assessment of Machine-Actionable Data Management Plans"


Most research funders require Data Management Plans (DMPs). The review process can be time consuming, since reviewers read text documents submitted by researchers and provide their feedback. Moreover, it requires specific expert knowledge in data stewardship, which is scarce. Machine-actionable Data Management Plans (maDMPs) and semantic technologies increase the potential for automatic assessment of information contained in DMPs. However, the level of automation and new possibilities are still not well-explored and leveraged. This paper discusses methods for the automation of DMP assessment. It goes beyond generating human-readable reports. It explores how the information contained in maDMPs can be used to provide automated pre-assessment or to fetch further information, allowing reviewers to better judge the content. We map the identified methods to various reviewer goals.

https://doi.org/10.5334/dsj-2023-028

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Survey of Open Science Practices and Attitudes in the Social Sciences"


Since September of 2019, a task group within the European Open Science Cloud – EOSC Nordic Project, work-package 5 (T5.3.2), has focused its attention on machine-actionable Data Management Plans (maDMPs). A delivery working-paper from the group (Hasan et al. 2021) concluded in summary that extracting useful information from traditional free-text based DMPs is problematic. While maDMPs are generally more FAIR compliant, and as such accessible to both humans and machines, more interoperable with other systems, and serving different stakeholders for processing, sharing, evaluation and reuse. Different DMP tools and templates have developed independently, to a varying degree, allowing for the creation of genuinely machine actionable DMPs. Here we will describe the first three tools or projects for creating maDMPs that were central parts of the original task group mission. We will get into a more detailed account of one of these, specifically the Stockholm University — EOSC Nordic maDMP project using the DMP Online tool, as described by Philipson (2021). We will also briefly touch upon some other current tools and projects for creating maDMPs that are compliant with the RDA DMP Common Standard (RDCS), aiming for integration with other research information systems or research data repositories. A possible conclusion from this overview is that the development of tools for maDMPs is progressing fast and seems to converge towards a common standard. Nonetheless, there remains an immense amount of work to get there.

https://doi.org/10.1038/s41467-023-41111-1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Engaging with Researchers and Raising Awareness of FAIR and Open Science through the FAIR+ Implementation Survey Tool (FAIRIST)"


Seven years after the seminal paper on FAIR was published, that introduced the concept of making research outputs Findable, Accessible, Interoperable, and Reusable, researchers still struggle to understand how to implement the principles. For many researchers, FAIR promises long-term benefits for near-term effort, requires skills not yet acquired, and is one more thing in a long list of unfunded mandates and onerous requirements for scientists. Even for those required to, or who are convinced that they must make time for FAIR research practices, their preference is for just-in-time advice properly sized to the scientific artifacts and process. Because of the generality of most FAIR implementation guidance, it is difficult for a researcher to adjust to the advice according to their situation. Technological advances, especially in the area of artificial intelligence (AI) and machine learning (ML), complicate FAIR adoption, as researchers and data stewards ponder how to make software, workflows, and models FAIR and reproducible. The FAIR+ Implementation Survey Tool (FAIRIST) mitigates the problem by integrating research requirements with research proposals in a systematic way. FAIRIST factors in new scholarly outputs, such as nanopublications and notebooks, and the various research artifacts related to AI research (data, models, workflows, and benchmarks). Researchers step through a self-serve survey process and receive a table ready for use in their data management plan (DMP) and/or work plan. while gaining awareness of the FAIR Principles and Open Science concepts. FAIRIST is a model that uses part of the proposal process as a way to do outreach, raise awareness of FAIR dimensions and considerations, while providing timely assistance for competitive proposals.

https://doi.org/10.5334/dsj-2023-032

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Effects of Research Data Management Services: Associating the Data Curation Lifecycle with Open Research Output"


This study seeks to understand the relationship between research data management (RDM) services framed in the data curation life cycle and the production of open data. An electronic questionnaire was distributed to US researchers and RDM specialists, and the results were analyzed using Chi-Square tests for association. The data curation life cycle does associate with the production of open data and shareable research, but tasks like data management plans have stronger associations with the production of open data. The findings analyze the intersection of these concepts and provide insight into RDM services that facilitate the production of open data and shareable research.

https://doi.org/10.5860/crl.84.5.751

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Decade of Surveys on Attitudes to Data Sharing Highlights Three Factors for Achieving Open Science"


Over a 10 year period Carol Tenopir of DataONE and her team conducted a global survey of scientists, managers and government workers involved in broad environmental science activities about their willingness to share data and their opinion of the resources available to do so. . . .

The most surprising result was that a higher willingness to share data corresponded with a decrease in satisfaction with data sharing resources across nations (e.g., skills, tools, training) (Fig.1). That is, researchers who did not want to share data were satisfied with the available resources, and those that did want to share data were dissatisfied. Researchers appear to only discover that the tools are insufficient when they begin the hard work of engaging in open science practices. This indicates that a cultural shift in the attitudes of researchers needs to precede the development of support and tools for data management.

https://tinyurl.com/4sx54c6d

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Code Sharing Increases Citations, but Remains Uncommon"


Overall, R code was only available in 49 of the 1001 papers examined (4.9%) (Figure 1). When included, code was most often in the Supplemental Information (41%), followed by Github (20%), Figshare (6%), or other repositories (33%). Open-access publications were 70% more likely to include code than closed access publications (7.21% vs. 4.22%, X2 = 4.442, p < 0.05). Code-sharing was estimated to increase at 0.5% / year, but this trend was not significant (p=0.11). The year of 2021 and 2022 showed a shift towards more frequent sharing, but the percentage of code-sharing has been consistently below 15% over the past decade (Figure 1).

We found papers including code disproportionately impact the literature (Figure 2), and accumulate citations faster (i.e., a marginally significant year-by-code-inclusion interaction; p = 0.0863). Further, we found a significant interaction between Open Access and code inclusion (p = 0.0265), with publications meeting both Open Science criteria (i.e., open code and open access) having highest overall predicted citation rates (Figure 2). For example, Open Science papers are expected to receive more than doubled citations (96.25 vs. 36.89) in year 13 post-publication compared with fully closed papers (Figure 2).

https://doi.org/10.21203/rs.3.rs-3222221/v1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

NASA’s Public Access Plan for Increasing Access to the Results of Scientific Research


This section highlights the significant changes to this document since the original plan was released in 2014. To wit:

  • There shall be no publication embargo period for peer-reviewed publications
  • Data that support peer-reviewed publications shall be made available in a public archive at the time of publication
  • Software should be included as part of Open Access, subject to NASA software release requirements
  • Software used to generate research findings/results should be made available in a public archive at the time of publication
  • Other data products beyond peer-reviewed publications and software should be considered as part of Open Access

https://tinyurl.com/4h9ezkk8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Rights of UC Authors Are at Stake. Here’s What We Are Doing about It."


"We have learned that many publishers are requiring UC authors to sign misleading License to Publish agreements, which undermine the spirit and intent of [UC’s open access policies]," wrote Susan Cochran, Chair of the faculty Academic Senate PDF.

By purporting to restrict an author’s abilities to reuse their own work, "these agreements essentially turn faculty authors into readers, as opposed to creators and owners of their own work," the Academic Senate chair concludes.

The team that leads negotiations with scholarly publishers on behalf of the university, including representatives from UC’s California Digital Library, the 10 campus libraries, and the Academic Senate, is now taking up the charge, making author rights the next frontier in advocating for the UC research community.

https://tinyurl.com/mry3hczw

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Future of Open Source Is Still Very Much in Flux"


Today, 96% of all code bases incorporate open-source software. GitHub, the biggest platform for the open-source community, is used by more than 100 million developers worldwide. The Biden administration’s Securing Open Source Software Act of 2022 publicly recognized open-source software as critical economic and security infrastructure. Even AWS, Amazon’s money-making cloud arm, supports the development and maintenance of open-source software; it committed its portfolio of patents to an open use community in December of last year. Over the last two years, while public trust in private technology companies has plummeted, organizations including Google, Spotify, the Ford Foundation, Bloomberg, and NASA have established new funding for open-source projects and their counterparts in open science efforts—an extension of the same values applied to scientific research.

https://tinyurl.com/4ksns2ha

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Care to Share? Experimental Evidence on Code Sharing Behavior in the Social Sciences"


Transparency and peer control are cornerstones of good scientific practice and entail the replication and reproduction of findings. The feasibility of replications, however, hinges on the premise that original researchers make their data and research code publicly available. This applies in particular to large-N observational studies, where analysis code is complex and may involve several ambiguous analytical decisions. To investigate which specific factors influence researchers’ code sharing behavior upon request, we emailed code requests to 1,206 authors who published research articles based on data from the European Social Survey between 2015 and 2020. In this preregistered multifactorial field experiment, we randomly varied three aspects of our code request’s wording in a 2x4x2 factorial design: the overall framing of our request (enhancement of social science research, response to replication crisis), the appeal why researchers should share their code (FAIR principles, academic altruism, prospect of citation, no information), and the perceived effort associated with code sharing (no code cleaning required, no information). Overall, 37.5% of successfully contacted authors supplied their analysis code. Of our experimental treatments, only framing affected researchers’ code sharing behavior, though in the opposite direction we expected: Scientists who received the negative wording alluding to the replication crisis were more likely to share their research code. Taken together, our results highlight that the availability of research code will hardly be enhanced by small-scale individual interventions but instead requires large-scale institutional norms.

https://doi.org/10.1371/journal.pone.0289380

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Actually Accessible Data: An Update and a Call to Action"


As funder, journal, and disciplinary norms and mandates have foregrounded obligations of data sharing and opportunities for data reuse, the need to plan for and curate data sets that can reach researchers and end-users with disabilities has become even more urgent. We begin by exploring the disability studies literature, describing the need for advocacy and representation of disabled scholars as data creators, subjects, and users. We then survey the landscape of data repositories, curation guidelines, and research-data-related standards, finding little consideration of accessibility for people with disabilities. We suggest three sets of minimal good practices for moving toward truly accessible research data: 1) ensuring Web accessibility for data repositories; 2) ensuring accessibility of common text formats, including those used in documentation; and 3) enhancement of visual and audiovisual materials. We point to some signs of progress in regard to truly accessible data by highlighting exemplary practices by repositories, standards, and data professionals. Accessibility needs to become a mainstream component of curation practice included in every training, manual, and primer.

https://tinyurl.com/2p4au2ar

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Data Journals: Where Data Sharing Policy Meets Practice"


Data journals incorporate elements of traditional scholarly communications practices—reviewing for quality and rigor through editorial and peer-review—and the data sharing / open data movement—prioritizing broad dissemination through repositories, sometimes with curation or technical checks. Their goals for dataset review and sharing are recorded in journal-based data policies and operationalized through workflows. In this qualitative, small cohort semi-structured interview study of eight different journals that review and publish research data, we explored (1) journal data policy requirements, (2) data review standards, and (3) implementation of standardized data evaluation workflows. Differences among the journals can be understood by considering editors’ approaches to balancing the interests of varied stakeholders. Assessing data quality for reusability is primarily conditional on fitness for use which points to an important distinction between disciplinary and discipline-agnostic data journals.

https://doi.org/10.17615/nqtz-b568

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Who Re-Uses Data? A Bibliometric Analysis of Dataset Citations"


Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship between creators and citers of datasets at the individual, institutional, and national levels. We find that the vast majority of datasets have no recorded citations, and that most cited datasets only have a single citation. Rates of self-citation by individuals and institutions tend towards the low end of previous findings and vary widely across disciplines. At the country level, the United States is by far the most prominent exporter of re-used datasets, while importation is more evenly distributed. Understanding where and how the sharing of data between researchers, institutions, and countries takes place is essential to developing open research practices.

https://arxiv.org/abs/2308.04379

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Policy Recommendations to Ensure That Research Software Is Openly Accessible and Reusable"


There is now an opportunity to expand US federal policies in similar ways and align their research software sharing aspects across agencies.

To do this, we recommend:

  1. As part of their updated policy plans submitted in response to the 2022 OSTP memo, US federal agencies should, at a minimum, articulate a pathway for developing guidance on research software sharing, and, at a maximum, incorporate research software sharing requirements as a necessary extension of any data sharing policy and a critical strategy to make data truly FAIR (as these principles have been adapted to apply to research software [12]).
  2. As part of sharing requirements, federal agencies should specify that research software should be deposited in trusted, public repositories that maximize discovery, collaborative development, version control, long-term preservation, and other key elements of the National Science and Technology Council’s "Desirable Characteristics of Data Repositories for Federally Funded Research" [13], as adapted to fit the unique considerations of research software.
  3. US federal agencies should encourage grantees to use non-proprietary software and file formats, whenever possible, to collect and store data. We realize that for some research areas and specialized techniques, viable non-proprietary software may not exist for data collection. However, in many cases, files can be exported and shared using non-proprietary formats or scripts can be provided to allow others to open files.
  4. Consistent with the US Administration’s approach to cybersecurity [<14], federal agencies should provide clear guidance on measures grantees are expected to undertake to ensure the security and integrity of research software. This guidance should encompass the design, development, dissemination, and documentation of research software. Examples include the National Institute of Standards and Technology’s secure software development framework and Linux Foundation’s open source security foundation.
  5. As part of the allowable costs that grantees can request to help them meet research sharing requirements, US federal agencies should include reasonable costs associated with developing and maintaining research software needed to maximize data accessibility and reusability for as long as it is practical. Federal agencies should ensure that such costs are additive to proposal budgets, rather than consuming funds that would otherwise go to the research itself.
  6. US federal agencies should encourage grantees to apply licenses to their research software that facilitate replication, reuse, and extensibility, while balancing individual and institutional intellectual property considerations. Agencies can point grantees to guidance on desirable criteria for distribution terms and approved licenses from the Open Source Initiative.
  7. In parallel with the actions listed above that can be immediately incorporated into new public access plans, US federal agencies should also explore long-term strategies to elevate research software to co-equal research outputs and further incentivize its maintenance and sharing to improve research reproducibility, replicability, and integrity.

https://doi.org/10.1371/journal.pbio.3002204

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Trends in Research Data Management and Academic Health Sciences Libraries"


Spurred by the National Institute of Health mandating a data management and sharing plan as a requirement of grant funding, research data management has exploded in importance for librarians supporting researchers and research institutions. This editorial examines the role and direction of libraries in this process from several viewpoints. Key markers of success include collaboration, establishing new relationships, leveraging existing relationships, accessing multiple avenues of communication, and building niche expertise and cachè as a valued and trustworthy partner. [Article includes case studies.]

https://doi.org/10.1080/02763869.2023.2218776

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Building a Framework for Open Research Skills at the University of York"


This case study describes the development of an open research skills framework at the University of York. The framework responds to a need for more comprehensive training, clarity and understanding around open research practices across disciplines at York, in line with the University’s commitment to the long-term development of an open research culture. The framework was developed by Library, Archives and Learning Services (LALS) in partnership with practitioners from different disciplines across the University’s research community. We summarize the background of open research activities at York since 2020, describe how the project was initiated and progressed during the summer of 2022, then provide an overview of the framework itself including areas for future development and consideration. We conclude with some early indicators of usage and reflections on the project, and we hope that this case study will prove useful for research support staff who may be considering developing a similar framework for their own institution.

https://doi.org/10.1629/uksg.618

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

SPARC: "Oppose Section 552 That Will Block Taxpayer Access to Research"


The U.S. House Appropriations Subcommittee on Commerce, Justice, and Science (CJS) has released an appropriations bill containing language that would block implementation of the 2022 updated OSTP policy guidance (the Nelson Memo) that would ensure immediate, free access to taxpayer-funded research. If enacted, this will prevent American taxpayers from seeing the benefits of the more than $90 billion in scientific research that the U.S. government funds each year. . . .

Write to Congress

Look up contact details for your Representatives and Senators, then customize the text in this template letter.

Call Congress

Look up contact details for your Representatives and Senators, then call the office and tell them to remove Section 552 of the House CJS bill.

https://tinyurl.com/3mbbmwxw

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"How Are Exclusively Data Journals Indexed in Major Scholarly Databases? An Examination of the Web of Science, Scopus, Dimensions, and OpenAlex"


As part of the data-driven paradigm and open science movement, the data paper is becoming a popular way for researchers to publish their research data, based on academic norms that cross knowledge domains. Data journals have also been created to host this new academic genre. The growing number of data papers and journals has made them an important large-scale data source for understanding how research data is published and reused in our research system. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively. As a result, we argue that efforts should be made by data journals and databases to improve the quality of metadata for this emerging genre.

https://arxiv.org/abs/2307.09704

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Ithaka S+R Draft for Comment: The Second Digital Transformation of Scholarly Publishing: Strategic Context and Shared Infrastructure


The issue that we identified as the biggest gap today is the perceived need for a secure digital identity for legitimate scholars, to help editors triage submissions into more and less trusted categories. We see opportunities for researcher identifiers to be used as the hub for much greater information about digital identity, in part by allowing publishers and other parties to submit markers of identity into identifier records. As examples, publishers that have processed APC transactions using credit cards have substantial signs of verified identity, as do universities that have securely linked an email address.

The boundaries of the scholarly record represent another aspect of research integrity that requires new forms of infrastructure. Of course the record has never had absolute boundaries. But in a subscription landscape, libraries played an important role in establishing the metes and bounds of the scholarly record (and what would be preserved over time) based on their selection decision-making. In a gold or diamond open access environment, libraries may have a reduced role and so other forms of boundary-setting may be required. Journal rankings may increasingly serve to set the boundaries of the scholarly record, although whether that is the right form of shared infrastructure, or whether it has the right governance and business model to allow it to serve this role without fear or favor, is not yet settled.

https://tinyurl.com/mr2ce748

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Over 1000 Institutions Now Covered by RSC (Royal Society of Chemistry) Read & Publish Agreements"


The Royal Society of Chemistry has signed a Read & Publish agreement with CRUE (Conferencia de Rectores de las Universidades Españolas, the national consortium of Spanish Universities), taking the number of institutions in the RSC’s R&P community to more than one thousand covering 32 countries.

https://tinyurl.com/3jc9juus

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Prevalence and Predictors of Data and Code Sharing in the Medical and Health Sciences: Systematic Review with Meta-Analysis of Individual Participant Data"


The review found that public code sharing was persistently low across medical research. Declarations of data sharing were also low, increasing over time, but did not always correspond to actual sharing of data. The effectiveness of mandatory data sharing policies varied substantially by journal and type of data, a finding that might be informative for policy makers when designing policies and allocating resources to audit compliance.

https://doi.org/10.1136/bmj-2023-075767

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |