Category: Search Engines and Discovery Systems
"Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories"
Alberto Martín-Martín have self-archived "Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories."
Here's an excerpt:
Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2,299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. GS consistently found the largest percentage of citations across all areas (93%-96%), far ahead of Scopus (35%-77%) and WoS (27%-73%). GS found nearly all the WoS (95%) and Scopus (92%) citations. Most citations found only by GS were from non-journal sources (48%-65%), including theses, books, conference papers, and unpublished materials. Many were non-English (19%-38%).. . . The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage.
Academic Library as Scholarly Publisher Bibliography | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"AP Exclusive: Google Tracks Your Movements, Like It or Not"
"1science Launches 1findr, the World’s Largest Curated Collection of Peer-Reviewed Articles"
"Evidence of Open Access of Scientific Publications in Google Scholar: A Large-Scale Analysis"
Alberto Martín et al. have self-archived "Evidence of Open Access of Scientific Publications in Google Scholar: A Large-Scale Analysis."
Here's an excerpt:
This article uses Google Scholar (GS) as a source of data to analyse Open Access (OA) levels across all countries and fields of research. All articles and reviews with a DOI and published in 2009 or 2014 and covered by the three main citation indexes in the Web of Science (2,269,022 documents) were selected for study. The links to freely available versions of these documents displayed in GS were collected. To differentiate between more reliable (sustainable and legal) forms of access and less reliable ones, the data extracted from GS was combined with information available in DOAJ, CrossRef, OpenDOAR, and ROAR. This allowed us to distinguish the percentage of documents in our sample that are made OA by the publisher (23.1%, including Gold, Hybrid, Delayed, and Bronze OA) from those available as Green OA (17.6%), and those available from other sources (40.6%, mainly due to ResearchGate). The data shows an overall free availability of 54.6%, with important differences at the country and subject category levels. The data extracted from GS yielded very similar results to those found by other studies that analysed similar samples of documents, but employed different methods to find evidence of OA, thus suggesting a relative consistency among methods.
Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"Can Microsoft Academic Assess the Early Citation Impact of In-Press Articles? A Multi-Discipline Exploratory Analysis"
Kayvan Kousha et al. have self-archived "Can Microsoft Academic Assess the Early Citation Impact of In-Press Articles? A Multi-Discipline Exploratory Analysis."
Here's an excerpt:
For over 65,000 Scopus in-press articles from 2016 and 2017 across 26 fields, Microsoft Academic found 2-5 times as many citations as Scopus, depending on year and field. From manual checks of 1,122 Microsoft Academic citations not found in Scopus, Microsoft Academic's citation indexing was faster but not much wider than Scopus for journals. It achieved this by associating citations to preprints with their subsequent in-press versions and by extracting citations from in-press articles. In some fields its coverage of scholarly digital libraries, such as arXiv.org, was also an advantage. Thus, Microsoft Academic seems to be a more comprehensive automatic source of citation counts for in-press articles than Scopus.
Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"Try Our New, Experimental PubMed Search and User Interface in PubMed Labs"
NCBI has released "Try Our New, Experimental PubMed Search and User Interface in PubMed Labs."
Here's an excerpt:
NLM needs your input. We are experimenting with a new PubMed search algorithm, as well as a modern, mobile-first user interface, and want to know what you think. You can try out these experimental elements at PubMed Labs, a website we created for the very purpose of giving potential new PubMed features a test drive and gathering user opinions.
Research Data Curation Bibliography, Version 8 | Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"EU Research Committee Wants to Gift Publishers New Rights to Restrict Access to Scientific Research"
COMMUNIA has released "EU Research Committee Wants to Gift Publishers New Rights to Restrict Access to Scientific Research."
Here's an excerpt:
Last week the Culture and Education Committee (CULT) and the Committee on Industry, Research and Energy (ITRE) voted on their final opinions on the Commission’s Directive on Copyright in the Digital Single Market. . . .
The introduction of a new right for press publishers (aka the “link tax”) to extract fees from search engines for incorporating short snippets of—or even linking to—their content in article 11 is one of the most controversial issues of the proposed directive. Adopting this type of ancillary right at the EU level would have a strong negative impact on all stakeholders, including publishers, authors, journalists, researchers, online service providers, and readers. . . .
In the votes last week in the CULT and ITRE committees, the press publishers right was also carried through – and even expanded. Both of the recent opinions remove the restriction that the right applies to digital uses only, meaning that if adopted it would cover all uses—both digital and in print. Even worse, ITRE—the committee responsible for policy relating to the promotion of research—voted to extend the press publishers right to cover scientific publications.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"An Evidence-Based Review of Academic Web Search Engines, 2014-2016: Implications for Librarians’ Practice and Research Agenda"
Jody C. Fagan has self-archived "An Evidence-Based Review of Academic Web Search Engines, 2014-2016: Implications for Librarians' Practice and Research Agenda."
Here's an excerpt:
While the fitness of Google Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books have not received much attention. Recent studies have much to tell us about the coverage and utility of Google Scholar, its coverage of the sciences, and its utility for evaluating researcher impact. But other aspects have been understudied, such as coverage of the arts and humanities, books, and non-Western, non-English publications. User research has also tapered off. A small number of articles hint at the opportunity for librarians to become expert advisors concerning opportunities of scholarly communication made possible or enhanced by these platforms. This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years with a mind to informing practice and setting a research agenda. Selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"The Coverage of Microsoft Academic: Analyzing the Publication Output of a University"
Sven E. Hug and Martin P. Braendle have self-archived "The Coverage of Microsoft Academic: Analyzing the Publication Output of a University."
Here's an excerpt:
This is the first in-depth study on the coverage of Microsoft Academic (MA). The coverage of a verified publication list of a university was analyzed on the level of individual publications in MA, Scopus, and Web of Science (WoS). Citation counts were analyzed and issues related to data retrieval and data quality were examined. . . . MA surpasses Scopus and WoS clearly with respect to book-related document types and conference items but falls slightly behind Scopus with regard to journal articles. MA shows the same biases as Scopus and WoS with regard to the coverage of the social sciences and humanities, non-English publications, and open-access publications. Rank correlations of citation counts are high between MA and the benchmark databases. . . .Given the fast and ongoing development of MA, we conclude that MA is on the verge of becoming a bibliometric superpower. However, comprehensive studies on the quality of MA data are still lacking.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"’Just Google It’—The Scope of Freely Available Information Sources for Doctoral Thesis Writing"
Vincas Grigas et al. have published "'Just Google It'—The Scope of Freely Available Information Sources for Doctoral Thesis Writing" in Information Research.
Here's an excerpt:
Library collections and subscribed databases could cover up to 80 per cent of all information resources used in doctoral theses. Among the most significant findings to emerge from this study is the fact that on average more than half (57 per cent) of all utilised information resources were freely available or were accessed without library support. We may presume that the library as a direct intermediator for information users is potentially important and irreplaceable only in four out of ten attempts of PhD students to seek information.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
Creative Commons Releases CC Search Beta
The Creative Commons has released CC Search Beta.
Here's an excerpt from the announcement:
Our goal is to cover the whole commons, but we wanted to develop something people could test and react to that would be useful at launch. To build our beta, we settled on a goal to represent one percent of the known Commons, or about 10 million works, and we chose a vertical slice of images only, to fully explore a purpose-built interface that represented one type but many providers. . . .
After a detailed review of potential sources, the available APIs, and the quality of their datasets, we selected the Rijksmuseum, Flickr, 500px, the New York Public Library as our initial sources. Later, after discussions with the Metropolitan Museum of Art regarding their collection of public domain works, which were released under CC0 on February 7, 2017, we incorporated their 200,000 CC0 images as well. . . .
The prototype of this tool focuses on photos as its first media and uses open APIs in order to index the available works. The search filters allow users to search by license type, title, creator, tags, collection, and type of institution.
CC Search Beta also provides social features, allowing users to create and share lists as well as add tags and favorites to the objects in the commons, and save their searches. Finally, it incorporates one-click attribution, giving users pre-formatted copy for easy attribution.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"Citation Analysis with Microsoft Academic"
Sven E. Hug, Michael Ochsner, and Martin P. Braendle have self-archived "Citation Analysis with Microsoft Academic."
Here's an excerpt:
We explored if and how Microsoft Academic (MA) could be used for bibliometric analyses. First, we examined the Academic Knowledge API (AK API), an interface to access MA data. Second, we performed a comparative citation analysis of researchers by normalizing data from MA and Scopus. We found that MA offers structured and rich metadata, which facilitates data retrieval, handling and processing. In addition, the AK API allows retrieving histograms. These features have to be considered a major advantage of MA over Google Scholar. However, there are two serious limitations regarding the available metadata. First, MA does not provide the document type of a publication and, second, the 'fields of study' are dynamic, too fine-grained and field-hierarchies are incoherent. Nevertheless, we showed that average-based indicators as well as distribution-based indicators can be calculated with MA data. We postulate that MA has the potential to be used for fully-fledged bibliometric analyses.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"2016 Scholar Metrics Released"
Google has published "2016 Scholar Metrics Released."
Here's an excerpt:
Scholar Metrics provide an easy way for authors to quickly gauge the visibility and influence of recent articles in scholarly publications. Today, we are releasing the 2016 version of Scholar Metrics. This release covers articles published in 2011-2015 and includes citations from all articles that were indexed in Google Scholar as of June 2016.
The top 100 publications include e-print servers and open access journals, such as arXiv Cosmology and Extragalactic Astrophysics (astro-ph.CO), arXiv High Energy Physics – Experiment (hep-ex), PLoS ONE, and PLoS Genetics.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"A Two-Sided Academic Landscape: Portrait of Highly-Cited Documents in Google Scholar (1950-2013)"
Alberto Martin-Martin et al. have self-archived "A Two-Sided Academic Landscape: Portrait of Highly-Cited Documents in Google Scholar (1950-2013)."
Here's an excerpt:
Since the existence of a full-text link does not guarantee the disposal of the full-text (some links actually refer to publisher's abstracts), the results (40% of the documents had a free full-text link) might be somewhat overestimated. In any case, these values are consistent with those published by Archambault et al. (2013), who found that over 40% of the articles from their sample were freely accessible; higher than those by Khabsa and Giles (2014) and Björk et al. (2010), who found only a 24% and 20.4% of open access documents respectively; and much lower than Jamali and Nabavi (2015) and Pitol and De Groote (2014), who found 61.1% and 70% respectively.
The different nature of the samples makes it difficult to draw comparisons among these studies. Nonetheless, the sample used in this study (64,000 documents) is the largest ever used to date.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"The Economics of Book Digitization and the Google Books Litigation"
Hannibal Travis has self-archived "The Economics of Book Digitization and the Google Books Litigation."
Here's an excerpt from the announcement:
This piece explores the digitization and uploading to the Internet of full-text books, book previews in the form of chapters or snippets, and databases that index the contents of book collections. Along the way, it will describe the economics of copyright, the "digital dilemma," and controversies surrounding fair use arguments in the digital environment. It illustrates the deadweight losses from restricting digital libraries, book previews, copyright litigation settlements, and dual-use technologies that enable infringement but also fair use.
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"Policy: Google Books: The Final Chapter?"
Walt Crawford has published "Policy: Google Books: The Final Chapter?" in Cites & Insights: Crawford at Large.
Here's an excerpt:
On Monday, April 18, 2016, the U.S. Supreme Court declined to hear the Authors Guild appeal of a district court decision finding, once again, that Google Books Search is fair use. . . .
That should be the final chapter in this decade-long epic case, and maybe I should stop right here.
But let's look at a couple of the early commentaries after the denial (two of many), then go back for the usual chronological citations and notes on items since the last coverage of this legal marathon. The question mark in the essay's title? Well, the Authors Litigation Guild (the middle word isn't part of the name, but maybe it should be) seems as incapable of admitting defeat as it apparently is of recognizing that it only represents the interests of a few hundred or few thousand writers. And, of course, there's the enticing if unlikely counter possibility: what if Google asked to recover its legal costs, which must surely be in the millions of dollars?
See also: “Google Case Ends, but Copyright Fight Goes On.”
Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap
"Back to the Past: On the Shoulders of an Academic Search Engine Giant"
Alberto Martin-Martin et al. have self-archived "Back to the Past: On the Shoulders of an Academic Search Engine Giant."
Here's an excerpt:
A study released by the Google Scholar team found an apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990-2013). To demonstrate this finding we conducted a complementary study using a different data source (Journal Citation Reports), metric (aggregate cited half-life), time spam (2003-2013), and set of categories (53 Social Science subject categories and 167 Science subject categories). Although the results obtained confirm and reinforce the previous findings, the possible causes of this phenomenon keep unclear. We finally hypothesize that first page results syndrome in conjunction with the fact that Google Scholar favours the most cited documents are suggesting the growing trend of citing old documents is partly caused by Google Scholar.
Massive Yahoo News Feed Dataset Released
Yahoo has released a massive News Feed dataset.
Here's an excerpt from the announcement:
The Yahoo News Feed dataset is a collection based on a sample of anonymized user interactions on the news feeds of several Yahoo properties, including the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate. The dataset stands at a massive ~110B lines (1.5TB bzipped) of user-news item interaction data, collected by recording the user-news item interaction of about 20M users from February 2015 to May 2015.
"Google Scholar as a Tool for Discovering Journal Articles in Library and Information Science"
Dirk Lewandowski has self-archived "Google Scholar as a Tool for Discovering Journal Articles in Library and Information Science."
Here's an excerpt:
We found that only some journals are completely indexed by Google Scholar, that the ratio of versions available depends on the type of publisher, and that availability varies a lot from journal to journal. Google Scholar cannot substitute for abstracting and indexing services in that it does not cover the complete literature of the field. However, it can be used in many cases to easily find available full texts of articles already found using another tool.
"Fair Use in the Digital Age: Reflections on the Fair Use Doctrine in Copyright Law"
The Program on Information Justice and Intellectual Property at the American University Washington College of Law has released a digital video of Judge Pierre N. Leval's "Fair Use in the Digital Age: Reflections on the Fair Use Doctrine in Copyright Law" lecture.
Here's an excerpt from the announcement:
At the Fourth Annual Peter A. Jaszi Distinguished Lecture in Intellectual Property, Judge Pierre N. Leval of the United States Court of Appeals for the Second Circuit will present a lecture on the role of the fair use doctrine within the structure of copyright law. Judge Leval is responsible for introducing the concept of transformative use to United States fair use jurisprudence and will discuss the development of the doctrine to date. He is the author of the court's opinion in Authors Guild Inc., et al. v. Google, Inc. (October 16, 2015) in which the court held that Google's digitization of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. Judge Leval also authored Toward a Fair Use Standard, 103 HARV. L. REV. 1105 (1990).
"Does Google Scholar Contain All Highly Cited Documents (1950-2013)?"
Alberto Martín-Martín et al. have self-archived "Does Google Scholar Contain All Highly Cited Documents (1950-2013)."
Here's an excerpt:
The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below:
1. Which are the most cited documents in GS?
2. Which are the most cited document types in GS?
3. What languages are the most cited documents written in GS?
4. How many highly cited documents are freely accessible?
4.1 What file types are the most commonly used to store these highly cited documents?
4.2 Which are the main providers of these documents?
5. How many of the highly cited documents indexed by GS are also indexed by WoS?
6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS?
7. How many versions of these highly cited documents has GS detected?
8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received?
9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages?
10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received?
Digital Scholarship | "A Quarter-Century as an Open Access Publisher"
Google Settles American Society of Media Photographers, Inc. et al. v. Google Inc.
Google has settled the American Society of Media Photographers, Inc. et al. v. Google Inc. lawsuit. The agreement is confidential.
Here's an excerpt from the press release:
The agreement resolves a copyright infringement lawsuit filed against Google in April, 2010, bringing to an end more than four years of litigation. It does not involve any admission of liability by Google. As the settlement is between the parties to the litigation, the court is not required to approve its terms.
This settlement does not affect Google's current litigation with the Authors Guild or otherwise address the underlying questions in that suit.
Digital Scholarship | "A Quarter-Century as an Open Access Publisher"
"EFF Urges Appeals Court to Keep Protecting Fair Use"
EFF has released "EFF Urges Appeals Court to Keep Protecting Fair Use."
Here's an excerpt:
In this latest appeal, the Authors Guild (and its supporters) claim that fair use is being unjustly expanded, portraying Judge Chin's ruling and other recent court opinions as some kind of fair-use creep, stretching beyond the original intent of the doctrine. Specifically, the Guild argues that the first of the four statutory fair use factors—the purpose of the use, which asks whether the use of the copyrighted material is transformative and/or non-commercial—weighs against Google. The Authors Guild and its amici insist that a use cannot be transformative if it doesn't add new creative expression to the pre-existing work. But as Judge Chin so rightly recognized, a use can be transformative if serves a new and distinct purpose.
Digital Scholarship | "A Quarter-Century as an Open Access Publisher"
"The Dark Side of Open Access in Google and Google Scholar: The Case of Latin-American Repositories"
Enrique Orduña-Malea et al. have self-archived "The Dark Side of Open Access in Google and Google Scholar: The Case of Latin-American Repositories."
Here's an excerpt:
The main objective of this study is to ascertain the presence and visibility of Latin American repositories in Google and Google Scholar through the application of page count and visibility indicators. For a sample of 137 repositories, the results indicate that the indexing ratio is low in Google, and virtually nonexistent in Google Scholar; they also indicate a complete lack of correspondence between the repository records and the data produced by these two search tools. These results are mainly attributable to limitations arising from the use of description schemas that are incompatible with Google Scholar (repository design) and the reliability of web indicators (search engines). We conclude that neither Google nor Google Scholar accurately represent the actual size of open access content published by Latin American repositories; this may indicate a non-indexed, hidden side to open access, which could be limiting the dissemination and consumption of open access scholarly literature.
Digital Scholarship | Digital Scholarship Publications Overview | Sitemap