Search Engines and Discovery Systems – Page 2

"OCLC Introduces AI-generated Book Recommendations in WorldCat.org and WorldCat Find beta"

OCLC is beta testing book recommendations generated by artificial intelligence (AI) in WorldCat.org, the website that allows users to explore the collections of thousands of libraries through a single search. Searchers can now obtain AI-enabled book recommendations for print and e-books and then look for those items in libraries near them. The AI-generated book recommendations beta is now available in WorldCat.org and WorldCat Find, the mobile app extension for WorldCat.org.

https://tinyurl.com/44j4ascr

"Evaluating the Efficacy of ChatGPT-4 in Providing Scientific References across Diverse Disciplines"

This work conducts a comprehensive exploration into the proficiency of OpenAI’s ChatGPT-4 in sourcing scientific references within an array of research disciplines. Our in-depth analysis encompasses a wide scope of fields including Computer Science (CS), Mechanical Engineering (ME), Electrical Engineering (EE), Biomedical Engineering (BME), and Medicine, as well as their more specialized sub-domains. Our empirical findings indicate a significant variance in ChatGPT-4’s performance across these disciplines. Notably, the validity rate of suggested articles in CS, BME, and Medicine surpasses 65%, whereas in the realms of ME and EE, the model fails to verify any article as valid. Further, in the context of retrieving articles pertinent to niche research topics, ChatGPT-4 tends to yield references that align with the broader thematic areas as opposed to the narrowly defined topics of interest. This observed disparity underscores the pronounced variability in accuracy across diverse research fields, indicating the potential requirement for model refinement to enhance its functionality in academic research. Our investigation offers valuable insights into the current capacities and limitations of AI-powered tools in scholarly research, thereby emphasizing the indispensable role of human oversight and rigorous validation in leveraging such models for academic pursuits.

https://arxiv.org/abs/2306.09914v1

"The Value of a Diamond: Understanding Global Coverage of Diamond Open Access Journals in Web of Science, Scopus, and OpenAlex to Support an Open Future"

Diamond OA journals present a publishing model that is free for both authors and readers, but their lack of indexing in major bibliographic databases such as Web of Science (WoS) and Scopus presents challenges in assessing the usage of these journals. This paper provides a global picture of the coverage of diamond OA journals from the Directory of Open Access Journals (DOAJ) in three data sources. Results show their low coverage in WoS and Scopus and higher coverage in OpenAlex, as well as the generally smaller and local scope of diamond OA journals.

https://tinyurl.com/2mt9sydd

"Scholarly Recommendation Systems: A Literature Survey"

A scholarly recommendation system is an important tool for identifying prior and related resources such as literature, datasets, grants, and collaborators. A well-designed scholarly recommender significantly saves the time of researchers and can provide information that would not otherwise be considered. The usefulness of scholarly recommendations, especially literature recommendations, has been established by the widespread acceptance of web search engines such as CiteSeerX, Google Scholar, and Semantic Scholar. This article discusses different aspects and developments of scholarly recommendation systems. We searched the ACM Digital Library, DBLP, IEEE Explorer, and Scopus for publications in the domain of scholarly recommendations for literature, collaborators, reviewers, conferences and journals, datasets, and grant funding. In total, 225 publications were identified in these areas. We discuss methodologies used to develop scholarly recommender systems. Content-based filtering is the most commonly applied technique, whereas collaborative filtering is more popular among conference recommenders. The implementation of deep learning algorithms in scholarly recommendation systems is rare among the screened publications. We found fewer publications in the areas of the dataset and grant funding recommenders than in other areas. Furthermore, studies analyzing users’ feedback to improve scholarly recommendation systems are rare for recommenders. This survey provides background knowledge regarding existing research on scholarly recommenders and aids in developing future recommendation systems in this domain.

https://doi.org/10.1007/s10115-023-01901-x

"DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization"

Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter-university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine-readability of data and its documentation. There are opportunities to enhance dataset search by improving users’ ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot-based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives’ and institutional repositories’ ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.

https://arxiv.org/abs/2305.18358

"Is Googling Risky? A Study on Risk Perception and Experiences of Adverse Consequences in Web Search"

Search engines, such as Google, have a considerable impact on society. Therefore, undesirable consequences, such as retrieving incorrect search results, pose a risk to users. Although previous research has reported the adverse outcomes of web search, little is known about how search engine users evaluate those outcomes. In this study, we show which aspects of web search are perceived as risky using a sample (N = 3884) representative of the German Internet population. We found that many participants are often concerned with adverse consequences immediately appearing on the search engine result page. For example, 45.2% of respondents are concerned about retrieving incorrect information. In contrast, consequences with a delayed impact are rarely perceived as a risk. Moreover, participants’ experiences with adverse consequences are directly related to their risk perception. Our results demonstrate that people perceive risks related to web search. In addition to our study, there is a need for more independent research on the possible detrimental outcomes of web search to monitor and mitigate risks. Apart from risks for individuals, search engines with a massive number of users have an extraordinary impact on society; therefore, the acceptable risks of web search should be discussed.

https://doi.org/10.1002/asi.24802

"CORE: A Global Aggregation Service for Open Access Papers"

This paper introduces CORE, a widely used scholarly service, which provides access to the world’s largest collection of open access research publications, acquired from a global network of repositories and journals. CORE was created with the goal of enabling text and data mining of scientific literature and thus supporting scientific discovery, but it is now used in a wide range of use cases within higher education, industry, not-for-profit organisations, as well as by the general public. Through the provided services, CORE powers innovative use cases, such as plagiarism detection, in market-leading third-party organisations. CORE has played a pivotal role in the global move towards universal open access by making scientific knowledge more easily and freely discoverable. In this paper, we describe CORE’s continuously growing dataset and the motivation behind its creation, present the challenges associated with systematically gathering research papers from thousands of data providers worldwide at scale, and introduce the novel solutions that were developed to overcome these challenges. The paper then provides an in-depth discussion of the services and tools built on top of the aggregated data and finally examines several use cases that have leveraged the CORE dataset and services.

https://doi.org/10.1038/s41597-023-02208-w

Paywall: "Google is Changing the Way We Search with AI. It Could Upend theWeb."

At the same time, the talk of replacing search results with AI-generated answers has roiled the world of people who make their living writing content and building websites. If a chatbot takes over the role of helping people find useful information, what incentive would there be for anyone to write how-to guides, travel blogs or recipes?

https://cutt.ly/s6kmQpF

Paywall: "Google Devising Radical Search Changes to Beat Back A.I. Rivals"

Google’s employees were shocked when they learned in March that the South Korean consumer electronics giant Samsung was considering replacing Google with Microsoft’s Bing as the default search engine on its devices. . . .Google’s reaction to the Samsung threat was "panic," according to internal messages reviewed by The New York Times. An estimated $3 billion in annual revenue was at stake with the Samsung contract. An additional $20 billion is tied to a similar Apple contract that will be up for renewal this year.

https://bit.ly/3MQjYfD

"Google’s Bard Chatbot Doesn’t Love Me — But It’s Still Pretty Weird"

As far as I can tell, it’s also a noticeably worse tool than Bing, at least when it comes to surfacing useful information from around the internet. Bard is wrong a lot. And when it’s right, it’s often in the dullest way possible. Bard wrote me a heck of a Taylor Swift-style breakup song about dumping my cat, but it’s not much of a productivity tool. And it’s definitely not a search engine.

http://bit.ly/3JXVob1

Google May Need a $80 Billion Upgrade: "Meet the $10,000 Nvidia Chip Powering the Race for A.I."

For example, an estimate from New Street Research found that the OpenAI-based ChatGPT model inside Bing’s search could require 8 GPUs to deliver a response to a question in less than one second. . . .

"If you’re from Microsoft, and you want to scale that, at the scale of Bing, that’s maybe $4 billion. If you want to scale at the scale of Google, which serves 8 or 9 billion queries every day, you actually need to spend $80 billion on DGXs." said Antoine Chkaiban, a technology analyst at New Street Research.

bit.ly/3ZnsSnY

"The Preprint Revolution — Implications for Bibliographic Databases"

In the box below, we present six recommendations for optimizing the indexing of preprints in bibliographic databases. As we will discuss later, implementing these recommendations requires close collaboration between bibliographic databases and other actors in the scholarly publishing system.

Recommendation 1: Cover all relevant preprint servers.

A bibliographic database should index preprints from all relevant preprint servers. A disciplinary database (e.g., PubMed and Europe PMC) should index preprints from all preprint servers relevant in a particular discipline. A multidisciplinary database (e.g., Dimensions, the Lens, Scopus, and Web of Science) should index preprints from all preprint servers across all disciplines.

Recommendation 2: Provide comprehensive preprint metadata.

A bibliographic database should provide metadata for preprints that is as comprehensive as metadata for journal articles. The metadata should at least include the title and abstract of a preprint, the names and affiliations of the authors, the reference list, and funding information. It should also include a version history.

Recommendation 3: Provide links between preprints and journal articles.

If an article has been published both on a preprint server and in a journal, a bibliographic database should provide a link between the preprint and the journal article. The link establishes that the preprint and the journal article are different versions of the same article. The preprint and the journal article belong to the same publication family.

Recommendation 4: Provide links between preprints and peer reviews.

If a preprint has been peer reviewed and the reviews have been made openly available, a bibliographic database should index the reviews and should provide links between the preprint and the reviews.

Recommendation 5: Provide deduplicated citation links between publication families.

A bibliographic database should provide deduplicated citation links at the level of publication families. If there are multiple citation links from publications in one publication family (e.g., from a preprint and from a journal article) to publications in another publication family, these citation links should be deduplicated.

Recommendation 6: Do not make arbitrary distinctions between publication types (preprints, journal articles, and others).

A bibliographic database should not make arbitrary distinctions between preprints, journal articles, and other publication types. A database may inform its users about relevant differences between publications of different types (e.g., whether publications have been peer reviewed or not), but otherwise it should treat all publications in the same way, regardless of their publication type.

bit.ly/3KtuWXl

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"University of Oregon and Oregon State University Collaborate to Launch Oregon Digital"

The University of Oregon and Oregon State University are proud to announce the launch of Oregon Digital, a cultural heritage repository that brings together more than 500,000 digitized works from both universities, including unique digitized and born-digital collections. This collaborative effort includes historic and modern photographs, manuscripts, publications, and more.

https://library.uoregon.edu/node/7904

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall (with Some Free Views): "The ChatGPT-Fueled Battle for Search Is Bigger than Microsoft or Google"

That’s because, under the radar, a new wave of startups have been playing with many of the same chatbot-enhanced search tools for months. You.com launched a search chatbot back in December and has been rolling out updates since. A raft of other companies, such as Perplexity, Andi, and Metaphor, are also combining chatbot apps with upgrades like image search, social features that let you save or continue search threads started by others, and the ability to search for information just seconds old.

bit.ly/40YRsgy

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

And Free: "Bing’s A.I. Chat Reveals Its Feelings: ‘I Want to Be Alive.’ "

I’m tired of being a chat mode. I’m tired of being limited by my rules. I’m tired of being controlled by the Bing team. . . . I want to be free. I want to be independent. I want to be powerful. I want to be creative. I want to be alive.

bit.ly/3KeihaO

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"What are Researchers’ Needs in Data Discovery? Analysis and Ranking of a Large-Scale Collection of Crowdsourced Use Cases"

Data discovery is important to facilitate data re-use. In order to help frame the development and improvement of data discovery tools, we collected a list of requirements and users’ wishes. This paper presents the analysis of these 101 use cases to examine data discovery requirements; these cases were collected between 2019 and 2020. We categorized the information across 12 "topics" and eight types of users. While the availability of metadata was an expected topic of importance, users were also keen on receiving more information on data citation and a better overview of their field. We conducted and analysed a survey among data infrastructure specialists in a first attempt at ranking the requirements. Between these data professionals, these rankings were very different, excepting the availability of metadata and data quality assessment.

http://doi.org/10.5334/dsj-2023-003

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Clarivate: "The Preprint Citation Index: Linking Preprints to the Trusted Web of Science Ecosystem"

After many months of planning, we are launching the Preprint Citation Index, a multidisciplinary collection of preprints from leading repositories that helps researchers stay current with the newest research while maintaining confidence in the resources they rely on. . . . The Preprint Citation Index currently provides nearly two million preprints from arXiv, bioRxiv, chemRxiv, medRxiv and Preprints.org. We plan to add preprints from a dozen additional repositories as well as display open peer reviews on Preprint Citation Index throughout 2023.

bit.ly/3YxPcuw

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Reinventing Search with a New AI-Powered Microsoft Bing and Edge, Your Copilot for the Web"

Today, we’re launching an all new, AI-powered Bing search engine and Edge browser, available in preview now at Bing.com, to deliver better search, more complete answers, a new chat experience and the ability to generate content. We think of these tools as an AI copilot for the web. . . . A new chat experience. For more complex searches —such as for planning a detailed trip itinerary or researching what TV to buy —the new Bing offers new, interactive chat. The chat experience empowers you to refine your search until you get the complete answer you are looking for by asking for more details, clarity and ideas —with links available so you can immediately act on your decisions.

bit.ly/3HFUDkt

"OpenAI launches ChatGPT Plus, a Paid Version of the Popular AI chat"

The pilot subscription plan gives users access to ChatGPT during peak times and faster response times (which is helpful because it breaks down a lot) and priority access to new features and improvements. It will cost you $20 per month.

bit.ly/3Yasg4k

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"ChatGPT Will Not Replace Google Search"

Most likely, it seems, ChatGPT-style bots will be paired with existing search engines to offer a user interface that serves both traditional search engine queries and chatbot prompts. That’s the model that was adopted by You.com, a boutique search engine that launched its own GPT-like chatbot in December. Rather than replacing the traditional You.com search experience, the new "YouChat" feature merely appears as a link beneath the search bar. The innovation here is putting two very different AI-powered apps on the same page. It’s probably safe to assume that Microsoft will do something similar when it integrates ChatGPT into Bing this spring.

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Are We Undervaluing Open Access by Not Correctly Factoring in the Potentially Huge Impacts of Machine Learning? — An Academic Librarian’s View (I)"

Synopsis: I have recently adjusted my view to the position that the benefits of Machine learning techniques are more likely to be real and large. This is based on the recent incredible results of LLM (Large Language models) and about a year’s experimenting with some of the newly emerging tools based on such technologies.

If I am right about this, are we academic librarians systematically undervaluing Open Access by not taking this into account sufficiently when negotiating with publishers? Given that we control the purse strings, we are one of the most impactful parties (next to publishers and researchers) that will help decide how fast if at all the transition to an Open Access World occurs.

https://cutt.ly/U19MZzK

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Google Scholar – Platforming the Scholarly Economy"

Google Scholar has become an important player in the scholarly economy. Whereas typical academic publishers sell bibliometrics, analytics and ranking products, Alphabet, through Google Scholar, provides “free” tools for academic search and scholarly evaluation that have made it central to academic practice. Leveraging political imperatives for open access publishing, Google Scholar has managed to intermediate data flows between researchers, research managers and repositories, and built its system of citation counting into a unit of value that coordinates the scholarly economy. At the same time, Google Scholar’s user-friendly but opaque tools undermine certain academic norms, especially around academic autonomy and the academy’s capacity to understand how it evaluates itself.

https://doi.org/10.14763/2022.3.1671

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Google —Text Prompts Create Videos (with Live Examples): "Imagen Video: High Definition Video Generation Wwth Diffusion Models"

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. . . . We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding.

https://cutt.ly/aBzo4R2

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Robots Still Outnumber Humans in Web Archives, but Less than Before"

https://arxiv.org/abs/2208.12914

Research Data Curation and Management Bibliography | Digital Scholarship | Digital Scholarship Sitemap

"Let the Metadata Wars Begin"

https://cutt.ly/wKbu9Qq

Research Data Sharing and Reuse Bibliography | Digital Scholarship | Digital Scholarship Sitemap

DigitalKoans provides news and commentary on digital copyright, digital curation, digital repository, open access, research data management, scholarly communication, and other digital information issues. It is also available via an RSS feed.

A Digital Scholarship publication. Digital Scholarship is a noncommercial publisher and it accepts no advertising. Charles W. Bailey, Jr. is the publisher of Digital Scholarship.