This mapping review addresses scientometric indicators that quantify open scholarship. The goal is to determine what open scholarship metrics are currently being applied and which are discussed, e.g. in policy papers. The paper contributes to a better understanding on how open scholarship is quantitatively recorded in research assessment and where gaps can be identified. The review is based on a search in four databases, each with 22 queries. Out of 3385 hits, we coded 248 documents chosen according to the research questions. The review discusses the open scholarship metrics of the documents as well as the topics addressed in the publications, the disciplines the publications come from and the journals they were published. The results indicate that research and teaching practices are unequally represented regarding open scholarship metrics. Open research material is a central and exhausted topic in publications. Open teaching practices, on the other hand, play a role in the discussion and strategy papers of the review, but open teaching material is not recorded using concrete scientometric indicators. Here, we see a research gap and discuss potentials for further research and investigation.
During career advancement and funding allocation decisions in biomedicine, reviewers have traditionally depended on journal-level measures of scientific influence like the impact factor. Prestigious journals are thought to pursue a reputation of exclusivity by rejecting large quantities of papers, many of which may be meritorious. It is possible that this process could create a system whereby some influential articles are prospectively identified and recognized by journal brands but most influential articles are overlooked. Here, we measure the degree to which journal prestige hierarchies capture or overlook influential science. We quantify the fraction of scientists’ articles that would receive recognition because (a) they are published in journals above a chosen impact factor threshold, or (b) are at least as well-cited as articles appearing in such journals. We find that the number of papers cited at least as well as those appearing in high-impact factor journals vastly exceeds the number of papers published in such venues. At the investigator level, this phenomenon extends across gender, racial, and career stage groupings of scientists. We also find that approximately half of researchers never publish in a venue with an impact factor above 15, which under journal-level evaluation regimes may exclude them from consideration for opportunities. Many of these researchers publish equally influential work, however, raising the possibility that the traditionally chosen journal-level measures that are routinely considered under decision-making norms, policy, or law, may recognize as little as 10-20% of the work that warrants recognition.
Most publications are created with the input from multiple co-authors. Traditional citation metrics give each co-author the same citation impact, even though the actual contribution of each researcher will not have been even. . . .
We have now added a new feature to capture the following authorship positions or types:
- First author: The first author mentioned in the publication
- Last author: The last author mentioned in the publication
- Corresponding author: An author is marked as the corresponding author in the publication. Since June 2020, newly released documents in Scopus can contain more than one corresponding author. . .
- Co-author: For documents with more than one author, co-authors are any author that is not a first, last or corresponding author
- Single author: An author is the only author of a publication
Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing — or not citing — data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse.
The emergence of mega-journals (MJs) has influenced scholarly communication. One concrete manifestation of this impact is that more citations have been generated. Citations are the foundation of many evaluation metrics to assess the scientific impact of journals, disciplines, and regions. We focused on searching for citation beneficiaries and quantifying the relative benefit at the journal, discipline and region levels. More specifically, we examined the distribution and contribution to citation-based metrics of citations generated by the five discipline-specific mega-journals (DSMJs) categorized as Environmental Sciences (ES) on Web of Science (WoS) from Clarivate Analytics in 2021: Sustainability, International Journal of Environmental Research and Public Health, Environmental Science and Pollution Research, Journal of Cleaner Production and Science of the Total Environment. Analysis of the distribution of citing data of the five DSMJs shows a pattern with wide coverage but skewness by region and the WoS category; that is, papers in the five DSMJs contributed 26.66% of their citations in 2021 to Mainland China and 22.48% to the ES. Moreover, 15 journals within the ES had their JIFs boosted by more than 20%, benefitting from the high citing rates of the five DSMJs. More importantly, the analysis provides clear evidence that DSMJs can contribute to JIF scores throughout a discipline through their volume of references. Overall, DSMJs can widely impact scholarly evaluation because they contribute citation benefits and improve the evaluation index performance of different scientific entities at different levels. Considering the important application of citation indicators in the academic evaluation system and the increase in citations, it is important to reconsider the real research impact that citations can reflect.
This study finds that open access journals strengthen international academic communication and cooperation, build cross-border and cross-regional knowledge-sharing projects, realize the knowledge of interdisciplinary sharing and exchange, and, most importantly, provide a one-stop service for readers. This research indicates that through the use of open impact metrics, it is possible to identify the portraits of open access journals, thus providing a new method to construct and reform open access journal evaluation systems.
Open data is receiving increased attention and support in academic environments, with one justification being that shared data may be re-used in further research. But what evidence exists for such re-use, and what is the relationship between the producers of shared datasets and researchers who use them? Using a sample of data citations from OpenAlex, this study investigates the relationship between creators and citers of datasets at the individual, institutional, and national levels. We find that the vast majority of datasets have no recorded citations, and that most cited datasets only have a single citation. Rates of self-citation by individuals and institutions tend towards the low end of previous findings and vary widely across disciplines. At the country level, the United States is by far the most prominent exporter of re-used datasets, while importation is more evenly distributed. Understanding where and how the sharing of data between researchers, institutions, and countries takes place is essential to developing open research practices.
This is the story of how a publisher and a citation index turned the science communication system into a highly profitable global industry. Over the course of seventy years, academic journal articles have become commodities, and their meta-data a further source of revenue. . . . During the 1950s, two men — Robert Maxwell and Eugene Garfield — begin to experiment with their blueprint for the research economy. Maxwell created an ‘international’ publisher — Pergamon Press — charming the editors of elite, not-for-profit society journals into signing commercial contracts. Garfield invented the science citation index to help librarians manage this growing flow of knowledge. . . . Sixty years later, the global science system has become a citation economy, with academic credibility mediated by the currency produced by the two dominant commercial citation indexes: Elsevier’s Scopus and Clarivates Web of Science. The reach of these citation indexes and their data analytics is amplified by digitisation, computing power and financial investment. . . . Non-Anglophone journals are disproportionately excluded from these indexes, reinforcing the stratification of academic credibility geographies and endangering long established knowledge ecosystems.
This paper intends to analyse whether journals that had been removed from the Directory of Open Access Journals (DOAJ) in 2018 due to suspected misconduct were cited within journals indexed in the Scopus database. Our analysis showed that Scopus contained over 15 thousand references to the removed journals identified.
The report focuses on four key areas:
- Individuals and their publications: The issue of excessive self-citation in research publications is addressed, with identification of outliers following examination of the distinctive patterns of self-citation observed among Highly Cited Researchers, while considering variations in citation rates between fields.
- Future research trends: Research Fronts identifies current areas of research attention by analyzing frequently cited, recent papers that cluster together, providing valuable insights for research planning, resource management and policy decisions.
- Journals and their characteristics: The profile and value of a journal in the Web of Science is more than its Journal Impact Factor. We explore how the indicator of national orientation (INO) offers new perspectives on journals, helping researchers choose the best venues for their papers.
- Influence of international collaboration: Simple metrics mask the influence of well-cited, internationally co-authored papers, so cannot be properly used to assess them. Collaborative Citation Impact (Collab-CNCI) allows deconstruction of impact, enabling better evaluation of domestic and international activity.
Based on the results, researchers should seek out grant funding and generously incorporate literature into their co-authored publications to increase their publications’ potential for future impact. These factors may influence article quality, resulting in more citations over time. Further research is needed to better understand their influence and the influence of other factors.
This work conducts a comprehensive exploration into the proficiency of OpenAI’s ChatGPT-4 in sourcing scientific references within an array of research disciplines. Our in-depth analysis encompasses a wide scope of fields including Computer Science (CS), Mechanical Engineering (ME), Electrical Engineering (EE), Biomedical Engineering (BME), and Medicine, as well as their more specialized sub-domains. Our empirical findings indicate a significant variance in ChatGPT-4’s performance across these disciplines. Notably, the validity rate of suggested articles in CS, BME, and Medicine surpasses 65%, whereas in the realms of ME and EE, the model fails to verify any article as valid. Further, in the context of retrieving articles pertinent to niche research topics, ChatGPT-4 tends to yield references that align with the broader thematic areas as opposed to the narrowly defined topics of interest. This observed disparity underscores the pronounced variability in accuracy across diverse research fields, indicating the potential requirement for model refinement to enhance its functionality in academic research. Our investigation offers valuable insights into the current capacities and limitations of AI-powered tools in scholarly research, thereby emphasizing the indispensable role of human oversight and rigorous validation in leveraging such models for academic pursuits.
The relationship between open access and academic impact (usually measured as citations received from academic publications) has been extensively studied but remains a very controversial topic. However, the effect of open access on policy impact (measured as citations received from policy documents) is still unknown. The purpose of this study was to examine the effect of open access on the policy impact, which might initiate a new controversial topic. . . . Linear regression models, logit regression models, four other matching methods, open access status provided by different databases, and different sizes of data samples were used to check the robustness of the main results. This study revealed that open access had significant and positive effects on the policy impact.
Data reuse is a common practice in the social sciences. While published data play an essential role in the production of social science research, they are not consistently cited, which makes it difficult to assess their full scholarly impact and give credit to the original data producers. Furthermore, it can be challenging to understand researchers’ motivations for referencing data. Like references to academic literature, data references perform various rhetorical functions, such as paying homage, signaling disagreement, or drawing comparisons. This paper studies how and why researchers reference social science data in their academic writing. We develop a typology to model relationships between the entities that anchor data references, along with their features (access, actions, locations, styles, types) and functions (critique, describe, illustrate, interact, legitimize). We illustrate the use of the typology by coding multidisciplinary research articles (n = 30) referencing social science data archived at the Inter-university Consortium for Political and Social Research (ICPSR). We show how our typology captures researchers’ interactions with data and purposes for referencing data. Our typology provides a systematic way to document and analyze researchers’ narratives about data use, extending our ability to give credit to data that support research.
Scholarly communication is a complicated sector, with numerous participants and multiple mechanisms for communicating and reviewing materials created in an increasing variety of formats by researchers across the globe. In turn, the researcher who seeks to use the products of this system wishes to discover, access, and use relevant and trustworthy materials as effortlessly as possible. The work of driving efficiency into this complex sector while bringing its multiple strands together seamlessly for the reader (or, increasingly, for a computational user) rests on a foundation of infrastructure, much of it shared across multiple publishers. In this landscape review, we seek to provide a high-level overview of the shared infrastructure that supports scholarly communication.
In 2014, a union of German research organisations established Projekt DEAL, a national-level project to negotiate licensing agreements with large scientific publishers. Negotiations between DEAL and Elsevier began in 2016, and broke down without a successful agreement in 2018; in this time, around 200 German research institutions cancelled their license agreements with Elsevier, leading Elsevier to restrict journal access at those institutions. We investigated the effect on researchers’ publishing and citing behaviours from a bibliometric perspective, using a dataset of ~400,000 articles published by researchers at DEAL institutions between 2012–2020. We further investigated these effects with respect to the timing of contract cancellations, research disciplines, collaboration patterns, and article open-access status. We find evidence for a decrease in Elsevier’s market share of articles from DEAL institutions, with the largest year-on-year market share decreases occuring from 2018 to 2020 following the implementation of access restrictions. We also observe year-on-year decreases in the proportion of citations, although the decrease is smaller. We conclude that negotiations with Elsevier and access restrictions have led to some reduced willingness to publish in Elsevier journals, but that researchers are not strongly affected in their ability to cite Elsevier articles, implying that researchers use other methods to access scientific literature.
Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate this, we studied 522,411 articles published by Springer Nature. Employing correlation and regression analyses, we describe the relationship between authors affiliated with countries from different income levels, their choice of publishing model, and the citation impact of their papers. A machine learning classification method helped us to explore the importance of different features in predicting the publishing model. The results show that authors eligible for APC waivers publish more in gold-OA journals than other authors. In contrast, authors eligible for an APC discount have the lowest ratio of OA publications, leading to the assumption that this discount insufficiently motivates authors to publish in gold-OA journals. We found a strong correlation between the journal rank and the publishing model in gold-OA journals, whereas the OA option is mostly avoided in hybrid journals. Also, results show the countries’ income level, seniority, and experience with OA publications as the most predictive factors for OA publishing in hybrid journals.
The dominance of journal impact factors as a proxy for research quality and impact has been challenged, to the extent that academic impacts are being eroded from definitions of research impact all together. It’s one of many bandwagons that seem logical to jump on, but which don’t necessarily hold up under scrutiny. The publishing community needs to demonstrate that it is a following wind, not a headwind.
This study examined the impact of open peer review (OPR) on the usage and citations of scientific articles using a dataset of 6441 articles published in six Public Library of Science (PLoS) journals in 2020–2021. We compared OPR articles with their non-OPR counterparts in the same journal to determine whether OPR increased the visibility and citations of the articles. Our results demonstrated a positive association between OPR and higher article page views, saving, sharing, and a greater HTML to PDF conversion rate. However, we also found that OPR articles had a lower PDF to citations conversion rate compared to non-OPR articles.
These findings indicate that Facebook mentions to LIS papers mainly reflect the institutional level advocacy and attention, with low level of engagement, and could be influenced by several features including collaborative patterns and research topics.
Nearly two dozen journals from two of the fastest growing open-access publishers, including one of the world’s largest journals by volume, will no longer receive a key scholarly imprimatur. On 20 March, the Web of Science database said it delisted the journals along with dozens of others, stripping them of an impact factor, the citation-based measure of quality that, although controversial, carries weight with authors and institutions. . . . Clarivate initially did not name any of the delisted journals or provide specific reasons. But it confirmed to Science the identities of 19 Hindawi journals and two MDPI titles after reports circulated about their removals.
Biomedical fields have seen a remarkable increase in hybrid Gold open access articles. However, it is uncertain whether the hybrid Gold open access option contributes to a citation advantage, an increase in the citations of articles made immediately available as open access regardless of the article’s quality or whether it involves a trending topic of discussion. This study aimed to compare the citation counts of hybrid Gold open access articles to subscription articles published in hybrid journals. The study aimed to ascertain if hybrid Gold open access publications yield an advantage in terms of citations. This cross-sectional study included the list of hybrid journals under 59 categories in the "Clinical Medicine" group from Clarivate’s Journal Citation Reports (JCR) during 2018–2021. The number of citable items with ‘Gold Open Access’ and ‘Subscription and Free to Read’ in each journal, as well as the number of citations of those citable items, were extracted from JCR. A hybrid Gold open access citation advantage was computed by dividing the number of citations per citable item with hybrid Gold open access by the number of citations per citable item with a subscription. A total of 498, 636, 1009, and 1328 hybrid journals in the 2018 JCR, 2019 JCR, 2020 JCR, and 2021 JCR, respectively, were included in this study. The citation advantage of hybrid Gold open access articles over subscription articles in 2018 was 1.45 (95% confidence interval (CI), 1.24–1.65); in 2019, it was 1.31 (95% CI, 1.20–1.41); in 2020, it was 1.30 (95% CI, 1.20–1.39); and in 2021, it was 1.31 (95% CI, 1.20–1.42). In the ‘Clinical Medicine’ discipline, the articles published in the hybrid journal as hybrid Gold open access had a greater number of citations when compared to those published as a subscription, self-archived, or otherwise openly accessible option.
This study examines the open access citation advantage of gold open access (OA) journal articles published at a large U.S. research university. Most studies that examine the open access citation advantage focus on specific journals, disciplines, countries or global output. Local citation patterns may differ from these larger patterns. . . . This study reports on a method and compares average citation counts for subscription and gold OA journal articles using Web of Science. Gold OA physics journals showed a definite open access citation advantage, whereas other disciplines showed no difference or no open access citation advantage.
Altmetrics are web-based quantitative impact or attention indicators for academic articles that have been proposed to supplement citation counts. This article reports the first assessment of the extent to which mature altmetrics from Altmetric.com and Mendeley associate with individual article quality scores. It exploits expert norm-referenced peer review scores from the UK Research Excellence Framework 2021 for 67,030+ journal articles in all fields 2014–2017/2018, split into 34 broadly field-based Units of Assessment (UoAs). Altmetrics correlated more strongly with research quality than previously found, although less strongly than raw and field normalized Scopus citation counts. Surprisingly, field normalizing citation counts can reduce their strength as a quality indicator for articles in a single field. For most UoAs, Mendeley reader counts are the best altmetric (e.g., three Spearman correlations with quality scores above 0.5), tweet counts are also a moderate strength indicator in eight UoAs (Spearman correlations with quality scores above 0.3), ahead of news (eight correlations above 0.3, but generally weaker), blogs (five correlations above 0.3), and Facebook (three correlations above 0.3) citations, at least in the United Kingdom. In general, altmetrics are the strongest indicators of research quality in the health and physical sciences and weakest in the arts and humanities.
We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts.
PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms "ophthalmology," "radiology," and "neurology." Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile.
There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression.
Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems.