2.6 Billion Total Downloads: arXiv Annual Report 2022


Our critical priorities during 2022 were to secure additional funding, hire technical and program directors, and ramp up our efforts to modernize arXiv’s software by moving it to the cloud, which will provide better stability, scalability and maintainability. I’m pleased to report that we were able to make significant progress on all of these fronts. arXiv brought in more funding than expected in the form of grants, memberships, and donations, and we hired Stephanie Orphan as program director and Charles Frankston as technical director. Both bring strong and complementary expertise to the team. Moving the technical operations of arXiv—a service with a 30 year history—off of Cornell’s on-premises servers is a major, complicated task. The move to the cloud is currently in progress and on track

bit.ly/41exRsX

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"arXiv Announces New Policy on ChatGPT and Similar Tools"

In view of this, we

  1. continue to require authors to report in their work any significant use of sophisticated tools, such as instruments and software; we now include in particular text-to-text generative AI among those that should be reported consistent with subject standards for methodology.
  2. remind all colleagues that by signing their name as an author of a paper, they each individually take full responsibility for all its contents, irrespective of how the contents were generated. If generative AI language tools generate inappropriate language, plagiarized content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).
  3. generative AI language tools should not be listed as an author; instead authors should refer to (1).

bit.ly/3wKlx5J

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Motivations, Concerns and Selection Biases When Posting Preprints: A Survey of bioRxiv Authors"


Since 2013, the usage of preprints as a means of sharing research in biology has rapidly grown, in particular via the preprint server bioRxiv. Recent studies have found that journal articles that were previously posted to bioRxiv received a higher number of citations or mentions/shares on other online platforms compared to articles in the same journals that were not posted. However, the exact causal mechanism for this effect has not been established, and may in part be related to authors’ biases in the selection of articles that are chosen to be posted as preprints. We aimed to investigate this mechanism by conducting a mixed-methods survey of 1,444 authors of bioRxiv preprints, to investigate the reasons that they post or do not post certain articles as preprints, and to make comparisons between articles they choose to post and not post as preprints. We find that authors are most strongly motivated to post preprints to increase awareness of their work and increase the speed of its dissemination; conversely, the strongest reasons for not posting preprints centre around a lack of awareness of preprints and reluctance to publicly post work that has not undergone a peer review process. We additionally find evidence that authors do not consider quality, novelty or significance when posting or not posting research as preprints, however, authors retain an expectation that articles they post as preprints will receive more citations or be shared more widely online than articles not posted.

https://doi.org/10.1371/journal.pone.0274441

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"medRxiv to PLOS: Direct Preprint Transfers"

PLOS has released "medRxiv to PLOS: Direct Preprint Transfers."

Here's an excerpt:

Authors with preprints on the new health sciences preprint server medRxiv now have the option to transfer their manuscripts for publication consideration at relevant PLOS journals in the topic area, PLOS Medicine, PLOS NTDs, or PLOS ONE. PLOS is excited to be among the first publishers to offer direct transfer service from the new server.

Research Data Curation Bibliography, Version 10 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Meta-Research: Tracking the Popularity and Outcomes of All bioRxiv Preprints"

Richard J Abdill and Ran Blekhman have self-archived "Meta-Research: Tracking the Popularity and Outcomes of All bioRxiv Preprints."

Here's an excerpt:

The growth of preprints in the life sciences has been reported widely and is driving policy changes for journals and funders, but little quantitative information has been published about preprint usage. Here, we report how we collected and analyzed data on all 37,648 preprints uploaded to bioRxiv.org, the largest biology-focused preprint server, in its first five years. The rate of preprint uploads to bioRxiv continues to grow (exceeding 2,100 in October 2018), as does the number of downloads (1.1 million in October 2018). We also find that two-thirds of preprints posted before 2017 were later published in peer-reviewed journals, and find a relationship between the number of downloads a preprint has received and the impact factor of the journal in which it is published. We also describe Rxivist.org, a web application that provides multiple ways to interact with preprint metadata.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"arXiv and the Symbiosis of Physics Preprints and Journal Review Articles: A Model"

Brian Simboli has self-archived "arXiv and the Symbiosis of Physics Preprints and Journal Review Articles: A Model."

Here's an excerpt:

This paper recommends a publishing model that can help achieve the goal of reforming physics publishing. It distinguishes two complementary needs in scholarly communication. Preprints, increasingly important in science, are properly the vehicle for claiming priority of discovery and for eliciting feedback that will help with versioning. Traditional journal publishing, however, should focus on providing synthesis in the form of overlay journals that play the same role as review articles.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"New Release: arXiv Search v0.1"

Cornell University has released "New Release: arXiv Search v0.1."

Here's an excerpt:

Today we launched a reimplementation of our search system. As part of our broader strategy for arXiv-NG, we are incrementally decoupling components from the classic arXiv codebase, and replacing them with more modular services developed in Python. Our goal was to replace the aging Lucene search backend, achieve feature-parity with the classic search system, and give the search interface an opportunistic face-lift. . . .The most important win for us in this milestone is that the new backend lays the groundwork for more dramatic improvements to search, our APIs, and other components targeted for reimplementation in arXiv-NG.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Next Stage of SocArXiv’s Development: Bringing Greater Transparency and Efficiency to the Peer Review Process"

Philip Cohen has published "The Next Stage of SocArXiv's Development: Bringing Greater Transparency and Efficiency to the Peer Review Proces" in LSE Impact of Social Sciences.

Here's an excerpt:

Looking ahead to the next stage of its development, Philip Cohen considers how SocArXiv might challenge the peer review system to be more efficient and transparent, firstly by confronting the bias that leads many who benefit from the status quo to characterise mooted alternatives as extreme. The value and implications of openness at the various decision points in the system must be debated, as should potentially more disruptive innovations such as non-exclusive review and publication or crowdsourcing reviews.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Lots of Institutional Repositories Keep E-prints Safe

The seductive allure of a commercial mega repository is two-fold: (1) everything is conveniently in one place, and (2) a company is taking care of the dreary and expensive business of running it.

Everything seems fine: problem solved! That is until something goes wrong, such as the repository being bought and controlled by a publisher or being threatened by lawsuits by a coterie of publishers.

Then it's important to remember: it's a company, and companies exist to make a profit.

Heh, companies are great. I wouldn't have just had that tasty cup of coffee without them. But, we should be very clear about what motivates companies and controls their behavior. And we shouldn't be shocked if they do things that aren't motivated by lofty goals.

I know: institutional repositories are hard work. The bloom is off the rose. But they exist to serve higher education, not make money, and they part of the academic communities they serve. And they can't be bought. And their universities don't often go out of business. And there are a lot of them. And they are not likely to be attractive targets for lawsuits unless something has gone very, very wrong at the local level.

Copyright is complicated. No one is advocating that we ignore it and just shove e-prints into IR's willy-nilly. Getting faculty to understand the ins and outs of e-print copyright is no picnic, nor is monitoring for compliance. But the battle is easier to fight at the local level where one-on-one faculty to librarian communication is possible.

For self-archiving to flourish in the long run, institutional repositories must flourish. By and large, librarians establish, run, and support them, and they are the quiet heroes of green open access who will continue to provide a sustainable and reliable infrastructure for self-archiving.

"Has the Open Access Movement Delayed the Revolution?"

Richard Poynder has published "Has the Open Access Movement Delayed the Revolution?" in Open and Shut?.

Here's an excerpt:

As I said, publishers are also co-opting green OA. They are doing this by buying up repository platforms like SSRN and bepress, for instance, and by imposing lengthy embargoes before green OA papers can be made freely available. Again, the OA movement has assisted in this by, for instance, advocating for and supporting OA policies that accept publisher-imposed embargoes as a given, and by partnering with publishers in initiatives that turn repositories into little more than search interfaces. This has the effect of directing users away from repositories to legacy publishers’ sites (see here for instance, and here).

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

ACS Launches ChemRxiv

ACS has launched ChemRxiv.

Here's an excerpt from the announcement:

ChemRxiv, a new chemistry preprint server for the global chemistry community, is now available in a fully functioning Beta version for use and feedback by researchers. The Beta launch has been undertaken with initial strategic input from the American Chemical Society (ACS), Royal Society of Chemistry, German Chemical Society and other not-for profit organizations, as well as other scientific publishers and preprint services. The free-of-charge service, originally announced late last year, is managed on behalf of the chemical science community by ACS and is powered by Figshare, an online digital repository for academic research.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

SSRN Launches ChemRN (Chemistry Research Network)

SSRN has launched ChemRN

Here's an excerpt from the announcement:

Chemistry researchers can share ideas and other early stage research, including posting preprints and working papers on ChemRN. Users can quickly upload and read papers for free, across all of Chemistry, including the fields of Energy, Environmental and Materials Sciences.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

SSRN Launches Biology Research Network (BioRN)

SSRN has launched the Biology Research Network (BioRN).

Here's an excerpt from the announcement:

Biology researchers are able to post preprints and working papers on BioRN, share ideas and other early stage research, and collaborate. It allows users to quickly upload and read abstracts and full-text papers, free of charge. A preprint is the author’s own write-up of research results and analysis that has not been peer-reviewed or had any value added to it by a publisher (such as formatting, copy-editing, technical enhancements). A preprint server, or working paper repository as they are also known, allows users to share these documents.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap