"Google Scholar, Web of Science, and Scopus: a Systematic Comparison of Citations in 252 Subject Categories"

Alberto Martín-Martín et al. have self-archived "Google Scholar, Web of Science, and Scopus: a Systematic Comparison of Citations in 252 Subject Categories."

Here's an excerpt:

Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2,299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. . . . Despite the many unique GS citing sources, Spearman correlations between citation counts in GS and WoS or Scopus are high (0.78-0.99). They are lower in the Humanities, and lower between GS and WoS than between GS and Scopus. The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage.

Research Data Curation Bibliography, Version 9 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Search Results Ranking Using Machine-Learning Algorithms: "Best Match: New Relevance Search for PubMed"

Nicolas Fiorini et al. have published "Best Match: New Relevance Search for PubMed" in PLOS Biology.

Here's an excerpt:

PubMed is a free search engine for biomedical literature accessed by millions of users from around the world each day. With the rapid growth of biomedical literature—about two articles are added every minute on average—finding and retrieving the most relevant papers for a given query is increasingly challenging. We present Best Match, a new relevance search algorithm for PubMed that leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order. The Best Match algorithm is trained with past user searches with dozens of relevance-ranking signals (factors), the most important being the past usage of an article, publication date, relevance score, and type of article. This new algorithm demonstrates state-of-the-art retrieval performance in benchmarking experiments as well as an improved user experience in real-world testing (over 20% increase in user click-through rate). Since its deployment in June 2017, we have observed a significant increase (60%) in PubMed searches with relevance sort order: it now assists millions of PubMed searches each week. In this work, we hope to increase the awareness and transparency of this new relevance sort option for PubMed users, enabling them to retrieve information more effectively.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories"

Alberto Martín-Martín have self-archived "Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories."

Here's an excerpt:

Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2,299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. GS consistently found the largest percentage of citations across all areas (93%-96%), far ahead of Scopus (35%-77%) and WoS (27%-73%). GS found nearly all the WoS (95%) and Scopus (92%) citations. Most citations found only by GS were from non-journal sources (48%-65%), including theses, books, conference papers, and unpublished materials. Many were non-English (19%-38%).. . . The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage.

Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Evidence of Open Access of Scientific Publications in Google Scholar: A Large-Scale Analysis"

Alberto Martín et al. have self-archived "Evidence of Open Access of Scientific Publications in Google Scholar: A Large-Scale Analysis."

Here's an excerpt:

This article uses Google Scholar (GS) as a source of data to analyse Open Access (OA) levels across all countries and fields of research. All articles and reviews with a DOI and published in 2009 or 2014 and covered by the three main citation indexes in the Web of Science (2,269,022 documents) were selected for study. The links to freely available versions of these documents displayed in GS were collected. To differentiate between more reliable (sustainable and legal) forms of access and less reliable ones, the data extracted from GS was combined with information available in DOAJ, CrossRef, OpenDOAR, and ROAR. This allowed us to distinguish the percentage of documents in our sample that are made OA by the publisher (23.1%, including Gold, Hybrid, Delayed, and Bronze OA) from those available as Green OA (17.6%), and those available from other sources (40.6%, mainly due to ResearchGate). The data shows an overall free availability of 54.6%, with important differences at the country and subject category levels. The data extracted from GS yielded very similar results to those found by other studies that analysed similar samples of documents, but employed different methods to find evidence of OA, thus suggesting a relative consistency among methods.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Can Microsoft Academic Assess the Early Citation Impact of In-Press Articles? A Multi-Discipline Exploratory Analysis"

Kayvan Kousha et al. have self-archived "Can Microsoft Academic Assess the Early Citation Impact of In-Press Articles? A Multi-Discipline Exploratory Analysis."

Here's an excerpt:

For over 65,000 Scopus in-press articles from 2016 and 2017 across 26 fields, Microsoft Academic found 2-5 times as many citations as Scopus, depending on year and field. From manual checks of 1,122 Microsoft Academic citations not found in Scopus, Microsoft Academic's citation indexing was faster but not much wider than Scopus for journals. It achieved this by associating citations to preprints with their subsequent in-press versions and by extracting citations from in-press articles. In some fields its coverage of scholarly digital libraries, such as arXiv.org, was also an advantage. Thus, Microsoft Academic seems to be a more comprehensive automatic source of citation counts for in-press articles than Scopus.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Try Our New, Experimental PubMed Search and User Interface in PubMed Labs"

NCBI has released "Try Our New, Experimental PubMed Search and User Interface in PubMed Labs."

Here's an excerpt:

NLM needs your input. We are experimenting with a new PubMed search algorithm, as well as a modern, mobile-first user interface, and want to know what you think. You can try out these experimental elements at PubMed Labs, a website we created for the very purpose of giving potential new PubMed features a test drive and gathering user opinions.

Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"EU Research Committee Wants to Gift Publishers New Rights to Restrict Access to Scientific Research"

COMMUNIA has released "EU Research Committee Wants to Gift Publishers New Rights to Restrict Access to Scientific Research."

Here's an excerpt:

Last week the Culture and Education Committee (CULT) and the Committee on Industry, Research and Energy (ITRE) voted on their final opinions on the Commission’s Directive on Copyright in the Digital Single Market. . . .

The introduction of a new right for press publishers (aka the “link tax”) to extract fees from search engines for incorporating short snippets of—or even linking to—their content in article 11 is one of the most controversial issues of the proposed directive. Adopting this type of ancillary right at the EU level would have a strong negative impact on all stakeholders, including publishers, authors, journalists, researchers, online service providers, and readers. . . .

In the votes last week in the CULT and ITRE committees, the press publishers right was also carried through – and even expanded. Both of the recent opinions remove the restriction that the right applies to digital uses only, meaning that if adopted it would cover all uses—both digital and in print. Even worse, ITRE—the committee responsible for policy relating to the promotion of research—voted to extend the press publishers right to cover scientific publications.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"An Evidence-Based Review of Academic Web Search Engines, 2014-2016: Implications for Librarians’ Practice and Research Agenda"

Jody C. Fagan has self-archived "An Evidence-Based Review of Academic Web Search Engines, 2014-2016: Implications for Librarians' Practice and Research Agenda."

Here's an excerpt:

While the fitness of Google Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books have not received much attention. Recent studies have much to tell us about the coverage and utility of Google Scholar, its coverage of the sciences, and its utility for evaluating researcher impact. But other aspects have been understudied, such as coverage of the arts and humanities, books, and non-Western, non-English publications. User research has also tapered off. A small number of articles hint at the opportunity for librarians to become expert advisors concerning opportunities of scholarly communication made possible or enhanced by these platforms. This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years with a mind to informing practice and setting a research agenda. Selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Coverage of Microsoft Academic: Analyzing the Publication Output of a University"

Sven E. Hug and Martin P. Braendle have self-archived "The Coverage of Microsoft Academic: Analyzing the Publication Output of a University."

Here's an excerpt:

This is the first in-depth study on the coverage of Microsoft Academic (MA). The coverage of a verified publication list of a university was analyzed on the level of individual publications in MA, Scopus, and Web of Science (WoS). Citation counts were analyzed and issues related to data retrieval and data quality were examined. . . . MA surpasses Scopus and WoS clearly with respect to book-related document types and conference items but falls slightly behind Scopus with regard to journal articles. MA shows the same biases as Scopus and WoS with regard to the coverage of the social sciences and humanities, non-English publications, and open-access publications. Rank correlations of citation counts are high between MA and the benchmark databases. . . .Given the fast and ongoing development of MA, we conclude that MA is on the verge of becoming a bibliometric superpower. However, comprehensive studies on the quality of MA data are still lacking.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"’Just Google It’—The Scope of Freely Available Information Sources for Doctoral Thesis Writing"

Vincas Grigas et al. have published "'Just Google It'—The Scope of Freely Available Information Sources for Doctoral Thesis Writing" in Information Research.

Here's an excerpt:

Library collections and subscribed databases could cover up to 80 per cent of all information resources used in doctoral theses. Among the most significant findings to emerge from this study is the fact that on average more than half (57 per cent) of all utilised information resources were freely available or were accessed without library support. We may presume that the library as a direct intermediator for information users is potentially important and irreplaceable only in four out of ten attempts of PhD students to seek information.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

Creative Commons Releases CC Search Beta

The Creative Commons has released CC Search Beta.

Here's an excerpt from the announcement:

Our goal is to cover the whole commons, but we wanted to develop something people could test and react to that would be useful at launch. To build our beta, we settled on a goal to represent one percent of the known Commons, or about 10 million works, and we chose a vertical slice of images only, to fully explore a purpose-built interface that represented one type but many providers. . . .

After a detailed review of potential sources, the available APIs, and the quality of their datasets, we selected the Rijksmuseum, Flickr, 500px, the New York Public Library as our initial sources. Later, after discussions with the Metropolitan Museum of Art regarding their collection of public domain works, which were released under CC0 on February 7, 2017, we incorporated their 200,000 CC0 images as well. . . .

The prototype of this tool focuses on photos as its first media and uses open APIs in order to index the available works. The search filters allow users to search by license type, title, creator, tags, collection, and type of institution.

CC Search Beta also provides social features, allowing users to create and share lists as well as add tags and favorites to the objects in the commons, and save their searches. Finally, it incorporates one-click attribution, giving users pre-formatted copy for easy attribution.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Citation Analysis with Microsoft Academic"

Sven E. Hug, Michael Ochsner, and Martin P. Braendle have self-archived "Citation Analysis with Microsoft Academic."

Here's an excerpt:

We explored if and how Microsoft Academic (MA) could be used for bibliometric analyses. First, we examined the Academic Knowledge API (AK API), an interface to access MA data. Second, we performed a comparative citation analysis of researchers by normalizing data from MA and Scopus. We found that MA offers structured and rich metadata, which facilitates data retrieval, handling and processing. In addition, the AK API allows retrieving histograms. These features have to be considered a major advantage of MA over Google Scholar. However, there are two serious limitations regarding the available metadata. First, MA does not provide the document type of a publication and, second, the 'fields of study' are dynamic, too fine-grained and field-hierarchies are incoherent. Nevertheless, we showed that average-based indicators as well as distribution-based indicators can be calculated with MA data. We postulate that MA has the potential to be used for fully-fledged bibliometric analyses.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"2016 Scholar Metrics Released"

Google has published "2016 Scholar Metrics Released."

Here's an excerpt:

Scholar Metrics provide an easy way for authors to quickly gauge the visibility and influence of recent articles in scholarly publications. Today, we are releasing the 2016 version of Scholar Metrics. This release covers articles published in 2011-2015 and includes citations from all articles that were indexed in Google Scholar as of June 2016.

The top 100 publications include e-print servers and open access journals, such as arXiv Cosmology and Extragalactic Astrophysics (astro-ph.CO), arXiv High Energy Physics – Experiment (hep-ex), PLoS ONE, and PLoS Genetics.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Two-Sided Academic Landscape: Portrait of Highly-Cited Documents in Google Scholar (1950-2013)"

Alberto Martin-Martin et al. have self-archived "A Two-Sided Academic Landscape: Portrait of Highly-Cited Documents in Google Scholar (1950-2013)."

Here's an excerpt:

Since the existence of a full-text link does not guarantee the disposal of the full-text (some links actually refer to publisher's abstracts), the results (40% of the documents had a free full-text link) might be somewhat overestimated. In any case, these values are consistent with those published by Archambault et al. (2013), who found that over 40% of the articles from their sample were freely accessible; higher than those by Khabsa and Giles (2014) and Björk et al. (2010), who found only a 24% and 20.4% of open access documents respectively; and much lower than Jamali and Nabavi (2015) and Pitol and De Groote (2014), who found 61.1% and 70% respectively.

The different nature of the samples makes it difficult to draw comparisons among these studies. Nonetheless, the sample used in this study (64,000 documents) is the largest ever used to date.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Economics of Book Digitization and the Google Books Litigation"

Hannibal Travis has self-archived "The Economics of Book Digitization and the Google Books Litigation."

Here's an excerpt from the announcement:

This piece explores the digitization and uploading to the Internet of full-text books, book previews in the form of chapters or snippets, and databases that index the contents of book collections. Along the way, it will describe the economics of copyright, the "digital dilemma," and controversies surrounding fair use arguments in the digital environment. It illustrates the deadweight losses from restricting digital libraries, book previews, copyright litigation settlements, and dual-use technologies that enable infringement but also fair use.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Policy: Google Books: The Final Chapter?"

Walt Crawford has published "Policy: Google Books: The Final Chapter?" in Cites & Insights: Crawford at Large.

Here's an excerpt:

On Monday, April 18, 2016, the U.S. Supreme Court declined to hear the Authors Guild appeal of a district court decision finding, once again, that Google Books Search is fair use. . . .

That should be the final chapter in this decade-long epic case, and maybe I should stop right here.

But let's look at a couple of the early commentaries after the denial (two of many), then go back for the usual chronological citations and notes on items since the last coverage of this legal marathon. The question mark in the essay's title? Well, the Authors Litigation Guild (the middle word isn't part of the name, but maybe it should be) seems as incapable of admitting defeat as it apparently is of recognizing that it only represents the interests of a few hundred or few thousand writers. And, of course, there's the enticing if unlikely counter possibility: what if Google asked to recover its legal costs, which must surely be in the millions of dollars?

See also: “Google Case Ends, but Copyright Fight Goes On.”

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap