"2016 Scholar Metrics Released"

Google has published "2016 Scholar Metrics Released."

Here's an excerpt:

Scholar Metrics provide an easy way for authors to quickly gauge the visibility and influence of recent articles in scholarly publications. Today, we are releasing the 2016 version of Scholar Metrics. This release covers articles published in 2011-2015 and includes citations from all articles that were indexed in Google Scholar as of June 2016.

The top 100 publications include e-print servers and open access journals, such as arXiv Cosmology and Extragalactic Astrophysics (astro-ph.CO), arXiv High Energy Physics – Experiment (hep-ex), PLoS ONE, and PLoS Genetics.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"A Two-Sided Academic Landscape: Portrait of Highly-Cited Documents in Google Scholar (1950-2013)"

Alberto Martin-Martin et al. have self-archived "A Two-Sided Academic Landscape: Portrait of Highly-Cited Documents in Google Scholar (1950-2013)."

Here's an excerpt:

Since the existence of a full-text link does not guarantee the disposal of the full-text (some links actually refer to publisher's abstracts), the results (40% of the documents had a free full-text link) might be somewhat overestimated. In any case, these values are consistent with those published by Archambault et al. (2013), who found that over 40% of the articles from their sample were freely accessible; higher than those by Khabsa and Giles (2014) and Björk et al. (2010), who found only a 24% and 20.4% of open access documents respectively; and much lower than Jamali and Nabavi (2015) and Pitol and De Groote (2014), who found 61.1% and 70% respectively.

The different nature of the samples makes it difficult to draw comparisons among these studies. Nonetheless, the sample used in this study (64,000 documents) is the largest ever used to date.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"The Economics of Book Digitization and the Google Books Litigation"

Hannibal Travis has self-archived "The Economics of Book Digitization and the Google Books Litigation."

Here's an excerpt from the announcement:

This piece explores the digitization and uploading to the Internet of full-text books, book previews in the form of chapters or snippets, and databases that index the contents of book collections. Along the way, it will describe the economics of copyright, the "digital dilemma," and controversies surrounding fair use arguments in the digital environment. It illustrates the deadweight losses from restricting digital libraries, book previews, copyright litigation settlements, and dual-use technologies that enable infringement but also fair use.

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Policy: Google Books: The Final Chapter?"

Walt Crawford has published "Policy: Google Books: The Final Chapter?" in Cites & Insights: Crawford at Large.

Here's an excerpt:

On Monday, April 18, 2016, the U.S. Supreme Court declined to hear the Authors Guild appeal of a district court decision finding, once again, that Google Books Search is fair use. . . .

That should be the final chapter in this decade-long epic case, and maybe I should stop right here.

But let's look at a couple of the early commentaries after the denial (two of many), then go back for the usual chronological citations and notes on items since the last coverage of this legal marathon. The question mark in the essay's title? Well, the Authors Litigation Guild (the middle word isn't part of the name, but maybe it should be) seems as incapable of admitting defeat as it apparently is of recognizing that it only represents the interests of a few hundred or few thousand writers. And, of course, there's the enticing if unlikely counter possibility: what if Google asked to recover its legal costs, which must surely be in the millions of dollars?

See also: “Google Case Ends, but Copyright Fight Goes On.”

Digital Curation and Digital Preservation Works | Open Access Works | Digital Scholarship | Digital Scholarship Sitemap

"Back to the Past: On the Shoulders of an Academic Search Engine Giant"

Alberto Martin-Martin et al. have self-archived "Back to the Past: On the Shoulders of an Academic Search Engine Giant."

Here's an excerpt:

A study released by the Google Scholar team found an apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990-2013). To demonstrate this finding we conducted a complementary study using a different data source (Journal Citation Reports), metric (aggregate cited half-life), time spam (2003-2013), and set of categories (53 Social Science subject categories and 167 Science subject categories). Although the results obtained confirm and reinforce the previous findings, the possible causes of this phenomenon keep unclear. We finally hypothesize that first page results syndrome in conjunction with the fact that Google Scholar favours the most cited documents are suggesting the growing trend of citing old documents is partly caused by Google Scholar.

Digital Scholarship | Digital Scholarship Sitemap

Massive Yahoo News Feed Dataset Released

Yahoo has released a massive News Feed dataset.

Here's an excerpt from the announcement:

The Yahoo News Feed dataset is a collection based on a sample of anonymized user interactions on the news feeds of several Yahoo properties, including the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate. The dataset stands at a massive ~110B lines (1.5TB bzipped) of user-news item interaction data, collected by recording the user-news item interaction of about 20M users from February 2015 to May 2015.

Digital Scholarship | Digital Scholarship Sitemap

"Google Scholar as a Tool for Discovering Journal Articles in Library and Information Science"

Dirk Lewandowski has self-archived "Google Scholar as a Tool for Discovering Journal Articles in Library and Information Science."

Here's an excerpt:

We found that only some journals are completely indexed by Google Scholar, that the ratio of versions available depends on the type of publisher, and that availability varies a lot from journal to journal. Google Scholar cannot substitute for abstracting and indexing services in that it does not cover the complete literature of the field. However, it can be used in many cases to easily find available full texts of articles already found using another tool.

Digital Scholarship | Digital Scholarship Sitemap

"Fair Use in the Digital Age: Reflections on the Fair Use Doctrine in Copyright Law"

The Program on Information Justice and Intellectual Property at the American University Washington College of Law has released a digital video of Judge Pierre N. Leval's "Fair Use in the Digital Age: Reflections on the Fair Use Doctrine in Copyright Law" lecture.

Here's an excerpt from the announcement:

At the Fourth Annual Peter A. Jaszi Distinguished Lecture in Intellectual Property, Judge Pierre N. Leval of the United States Court of Appeals for the Second Circuit will present a lecture on the role of the fair use doctrine within the structure of copyright law. Judge Leval is responsible for introducing the concept of transformative use to United States fair use jurisprudence and will discuss the development of the doctrine to date. He is the author of the court's opinion in Authors Guild Inc., et al. v. Google, Inc. (October 16, 2015) in which the court held that Google's digitization of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. Judge Leval also authored Toward a Fair Use Standard, 103 HARV. L. REV. 1105 (1990).

Digital Scholarship | Digital Scholarship Sitemap

"Does Google Scholar Contain All Highly Cited Documents (1950-2013)?"

Alberto Martín-Martín et al. have self-archived "Does Google Scholar Contain All Highly Cited Documents (1950-2013)."

Here's an excerpt:

The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below:

1. Which are the most cited documents in GS?
2. Which are the most cited document types in GS?
3. What languages are the most cited documents written in GS?
4. How many highly cited documents are freely accessible?
4.1 What file types are the most commonly used to store these highly cited documents?
4.2 Which are the main providers of these documents?
5. How many of the highly cited documents indexed by GS are also indexed by WoS?
6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS?
7. How many versions of these highly cited documents has GS detected?
8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received?
9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages?
10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received?

Digital Scholarship | "A Quarter-Century as an Open Access Publisher"

Google Settles American Society of Media Photographers, Inc. et al. v. Google Inc.

Google has settled the American Society of Media Photographers, Inc. et al. v. Google Inc. lawsuit. The agreement is confidential.

Here's an excerpt from the press release:

The agreement resolves a copyright infringement lawsuit filed against Google in April, 2010, bringing to an end more than four years of litigation. It does not involve any admission of liability by Google. As the settlement is between the parties to the litigation, the court is not required to approve its terms.

This settlement does not affect Google's current litigation with the Authors Guild or otherwise address the underlying questions in that suit.

Digital Scholarship | "A Quarter-Century as an Open Access Publisher"

"EFF Urges Appeals Court to Keep Protecting Fair Use"

EFF has released "EFF Urges Appeals Court to Keep Protecting Fair Use."

Here's an excerpt:

In this latest appeal, the Authors Guild (and its supporters) claim that fair use is being unjustly expanded, portraying Judge Chin's ruling and other recent court opinions as some kind of fair-use creep, stretching beyond the original intent of the doctrine. Specifically, the Guild argues that the first of the four statutory fair use factors—the purpose of the use, which asks whether the use of the copyrighted material is transformative and/or non-commercial—weighs against Google. The Authors Guild and its amici insist that a use cannot be transformative if it doesn't add new creative expression to the pre-existing work. But as Judge Chin so rightly recognized, a use can be transformative if serves a new and distinct purpose.

Digital Scholarship | "A Quarter-Century as an Open Access Publisher"

"The Dark Side of Open Access in Google and Google Scholar: The Case of Latin-American Repositories"

Enrique Orduña-Malea et al. have self-archived "The Dark Side of Open Access in Google and Google Scholar: The Case of Latin-American Repositories."

Here's an excerpt:

The main objective of this study is to ascertain the presence and visibility of Latin American repositories in Google and Google Scholar through the application of page count and visibility indicators. For a sample of 137 repositories, the results indicate that the indexing ratio is low in Google, and virtually nonexistent in Google Scholar; they also indicate a complete lack of correspondence between the repository records and the data produced by these two search tools. These results are mainly attributable to limitations arising from the use of description schemas that are incompatible with Google Scholar (repository design) and the reliability of web indicators (search engines). We conclude that neither Google nor Google Scholar accurately represent the actual size of open access content published by Latin American repositories; this may indicate a non-indexed, hidden side to open access, which could be limiting the dissemination and consumption of open access scholarly literature.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Empirical Evidences in Citation-Based Search Engines: Is Microsoft Academic Search Dead?"

Enrique Orduna-Malea et al. have self-archived "Empirical Evidences in Citation-Based Search Engines: Is Microsoft Academic Search Dead?"

Here's an excerpt:

The goal of this working paper is to summarize the main empirical evidences provided by the scientific community as regards the comparison between the two main citation based academic search engines: Google Scholar and Microsoft Academic Search, paying special attention to the following issues: coverage, correlations between journal rankings, and usage of these academic search engines. Additionally, selfelaborated data is offered, which are intended to provide current evidence about the popularity of these tools on the Web, by measuring the number of rich files PDF, PPT and DOC in which these tools are mentioned, the amount of external links that both products receive, and the search queries frequency from Google Trends. The poor results obtained by MAS led us to an unexpected and unnoticed discovery: Microsoft Academic Search is outdated since 2013.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"The Number of Scholarly Documents on the Public Web"

Madian Khabsa and C. Lee Giles mail have published "The Number of Scholarly Documents on the Public Web" in PLOS ONE.

Here's an excerpt:

The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Checking In With Google Books, HathiTrust, and the DPLA"

Naomi Eichenlaub has published "Checking In With Google Books, HathiTrust, and the DPLA" in Computers in Libraries.

Here's an excerpt:

Google Books and HathiTrust have been making headlines in the library world and beyond for years now, while a new player, the Digital Public Library of America (DPLA), has only recently entered the scene. This article will provide a "state of the environment" update for these digital library projects including project history and background. It will also examine some challenges common to all three projects including copyright, orphan works, metadata, and quality issues.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Google Scholar as Replacement for Systematic Literature Searches: Good Relative Recall and Precision Are Not Enough"

Martin Boeker, Werner Vach and Edith Motschall have published "Google Scholar as Replacement for Systematic Literature Searches: Good Relative Recall and Precision Are Not Enough" in BMC Medical Research Methodology.

Here's an excerpt:

The objectives of this work are to measure the relative recall and precision of searches with Google Scholar under conditions which are derived from structured search procedures conventional in scientific literature retrieval; and to provide an overview of current advantages and disadvantages of the Google Scholar search interface in scientific literature retrieval. . . .

The reported relative recall must be interpreted with care. It is a quality indicator of Google Scholar confined to an experimental setting which is unavailable in systematic retrieval due to the severe limitations of the Google Scholar search interface. Currently, Google Scholar does not provide necessary elements for systematic scientific literature retrieval such as tools for incremental query optimization, export of a large number of references, a visual search builder or a history function. Google Scholar is not ready as a professional searching tool for tasks where structured retrieval methodology is necessary.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Just Google It—Digital Research Practices of Humanities Scholars"

Max Kemman, Martijn Kleppe, and Stef Scagliola have self-archived "Just Google It—Digital Research Practices of Humanities Scholars" in arXiv.org.

Here's an excerpt:

The transition from analogue to digital archives and the recent explosion of online content offers researchers novel ways of engaging with data. The crucial question for ensuring a balance between the supply and demand-side of data, is whether this trend connects to existing scholarly practices and to the average search skills of researchers. To gain insight into this process we conducted a survey among nearly three hundred (N= 288) humanities scholars in the Netherlands and Belgium with the aim of finding answers to the following questions: 1) To what extent are digital databases and archives used? 2) What are the preferences in search functionalities 3) Are there differences in search strategies between novices and experts of information retrieval? Our results show that while scholars actively engage in research online they mainly search for text and images. General search systems such as Google and JSTOR are predominant, while large-scale collections such as Europeana are rarely consulted.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"A Perspective on the Merits of the Antitrust Objections to the Failed Google Books Settlement"

Pamela Samuelson has published "A Perspective on the Merits of the Antitrust Objections to the Failed Google Books Settlement" in the Harvard Journal of Law & Technology Occasional Paper Series.

Here's an excerpt:

This Article responds to critics of the antitrust objections to the ASA [Amended Settlement Agreement] by making three main points. Part II explains that Judge Chin's incomplete and unpersuasive analysis of the antitrust objections to the proposed settlement agreement is best understood as an effort to encourage the settling parties to adopt more competitive terms in any revised settlement agreement. Part III points out that the DOJ did not reach definitive conclusions on antitrust issues posed by the ASA. The DOJ was, however, obliged to submit an interim analysis because Judge Chin wanted the government's input before he ruled on whether the settlement should be approved and the DOJ did a creditable job under the circumstances. Part IV contends that there was more merit to the DOJ's antitrust concerns about the proposed settlement than some commentators have recognized.

Digital Scholarship | Digital Scholarship Publications Overview | Sitemap

"Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting"

Emilio Delgado López-Cózar, Nicolás Robinson-García, and Daniel Torres-Salinas have self-archived "Manipulating Google Scholar Citations and Google Scholar Metrics: Simple, Easy and Tempting" in arXiv.org.

Here's an excerpt:

The launch of Google Scholar Citations and Google Scholar Metrics may provoke a revolution in the research evaluation field as it places within every researchers reach tools that allow bibliometric measuring. In order to alert the research community over how easily one can manipulate the data and bibliometric indicators offered by Google's products we present an experiment in which we manipulate the Google Citations profiles of a research group through the creation of false documents that cite their documents, and consequently, the journals in which they have published modifying their H index. . . . We analyse the malicious effect this type of practices can cause to Google Scholar Citations and Google Scholar Metrics. Finally, we conclude with several deliberations over the effects these malpractices may have and the lack of control tools these tools offer.

| Digital Scholarship's Digital/Print Books | Digital Scholarship |

Authors Guild et al. v. Google: "Brief of Amici Curiae Academic Authors in Support of Defendant-Appellant and Reversal"

Pamela Samuelson and David R. Hansen have self-archived "Brief of Amici Curiae Academic Authors in Support of Defendant-Appellant and Reversal" in SSRN.

Here's an excerpt:

Summary of argument: Class certification was improperly granted below because the District Court failed to conduct a rigorous analysis of the adequacy of representation factor, as Rule 23(a)(4) requires. The three individual plaintiffs who claim to be class representatives are not academics and do not share the commitment to broad access to knowledge that predominates among academics. . . .

Academic authors desire broad public access to their works such as that which the Google Books project provides. Although the District Court held that the plaintiffs had inadequately represented the interests of academic authors in relation to the proposed settlement, it failed to recognize that pursuit of this litigation would be even more adverse to the interests of academic authors than the proposed settlement was. . . .

In short, a "win" in this case for the class representatives would be a "loss" for academic authors. It is precisely this kind of conflict that courts have long recognized should prevent class certification due to inadequate representation. The District Court failed to adequately address this fundamental conflict in its certification order, though it was well aware of the conflict through submissions and objections received from the settlement fairness hearing through to the hearings on the most recent class certification motions. Because of that failure, the order certifying the class should be reversed

| Google Books Bibliography | Digital Scholarship |

Digital Copyright: Google Asks Court to Reverse Class Certification Decision in The Authors Guild et al. v. Google Inc.

In a brief, Google has asked the U.S. Second Circuit Court of Appeals to reverse the class certification decision by the United States District Court for the Southern District of New York in The Authors Guild et al. v. Google Inc. case.

Here's the brief.

Read more about it at "Google Asks Court to Ax Book-Scanning Suit from Authors Guild."

| Scholarly Electronic Publishing Bibliography 2010 | Digital Scholarship |

"Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Google"

Matthew L. Jockers, Matthew Sag, and Jason Schultz have self-archived "Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Google" in SSRN.

Here's an excerpt:

The brief argues that, just as copyright law has long recognized the distinction between protection for an author's original expression (e.g., the narrative prose describing the plot) and the public's right to access the facts and ideas contained within that expression (e.g., a list of characters or the places they visit), the law must also recognize the distinction between copying books for expressive purposes (e.g., reading) and nonexpressive purposes, such as extracting metadata and conducting macroanalyses. We amici urge the court to follow established precedent with respect to Internet search engines, software reverse engineering, and plagiarism detection software and to hold that the digitization of books for text-mining purposes is a form of incidental or intermediate copying to be regarded as fair use as long as the end product is also nonexpressive or otherwise non-infringing.

| Google Books Bibliography | Digital Scholarship |

Google and Publishers Settle Seven-Year-Old Copyright Lawsuit over Google Library Project

Google and the Association of American Publishers have settled the copyright lawsuit over Google Library Project. The related Authors Guild lawsuit has not been settled.

Here's an excerpt from the Google press release:

The agreement settles a copyright infringement lawsuit filed against Google on October 19, 2005 by five AAP member publishers. As the settlement is between the parties to the litigation, the court is not required to approve its terms.

The settlement acknowledges the rights and interests of copyright-holders. US publishers can choose to make available or choose to remove their books and journals digitized by Google for its Library Project. Those deciding not to remove their works will have the option to receive a digital copy for their use.

Apart from the settlement, US publishers can continue to make individual agreements with Google for use of their other digitally-scanned works. . . .

Google Books allows users to browse up to 20% of books and then purchase digital versions through Google Play. Under the agreement, books scanned by Google in the Library Project can now be included by publishers.

See also the AAP press release.

| Google Books Bibliography | Digital Scholarship |

"It Was Never a Universal Library: Three Years of the Google Book Settlement"

Walt Crawford has published "It Was Never a Universal Library: Three Years of the Google Book Settlement" in Cites & Insights: Crawford at Large.

Here's an excerpt:

Remember the Google Books settlement? It was going to settle a four-year-old pair of lawsuits (four years old then, eight years old now) against Google (by the Association of American Publishers, AAP, and the Authors Guild, AG) asserting that Google was infringing on copyright through its two-line snippets from in-copyright books scanned in the Google Library Project—and by the scanning itself. Later, a third group representing media photographers also sued Google for the same actions. . . .

This is a long set of notes and comments (cites & insights). It strikes me that the topic and complexity deserve that length—but note that I'm offering much briefer excerpts and comments on most items than I normally would in this sort of roundup.

After two sets of general notes and overviews (one before the settlement was rejected, one after) I'm breaking the discussion down by topics rather than chronologically.

| Google Books Bibliography | Digital Scholarship |

"Teaching with Google Books: Research, Copyright, and Data Mining"

Nathan Rinne has self-archived "Teaching with Google Books: Research, Copyright, and Data Mining" in E-LIS.

Here's an excerpt:

Google's Google Books site is a rich resource that is probably underutilized by most educators. It has all kinds of potential for a) getting students into the research process in a way that they will enjoy (for example, they can see how a famous quote has been used/quoted, find out which books cite the journal article they are interested in, or check to see if a specific book covers a topic that they want to explore, etc.); b) teaching them about the deeper civic purpose and the evolving state of copyright law; and, c) exploring, with the help of Google Book's Ngram viewer, the promise and ethics surrounding the issue of data-mining and "non-consumptive" research, or research that is accomplished by "mining" books for data, as opposed to reading them.

| Google Books Bibliography | Digital Scholarship |