"Google Will No Longer Back Up the Internet: Cached Webpages Are Dead"


Google will no longer be keeping a backup of the entire Internet. Google Search’s "cached" links have long been an alternative way to load a website that was down or had changed, but now the company is killing them off. Google "Search Liaison" Danny Sullivan confirmed the feature removal in an X post, saying the feature "was meant for helping people access pages when way back, you often couldn’t depend on a page loading. These days, things have greatly improved."

http://tinyurl.com/uznbyacn

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"US Repository Network Launches Pilot to Enhance Discoverability of Open Access Content in Repositories"


In November, the US Repository Network (USRN) will launch a pilot project aimed at improving the discoverability of articles in repositories. This pilot project involves the use of services from CORE, a not-for-profit aggregator based at Open University in the UK, to evaluate and improve local repository practices. Additional technical support will be provided by Antleaf Ltd.

As part of the project, CORE will aggregate the metadata and full text of articles from a subset of US repositories, allowing them to be findable through a centralized discovery service with prominent links back to the original full text of the repository. At the same time, the project will assess current practices related to metadata quality, the tracking of Open Access deposits, the use of PIDs, technical support for OAI-PMH, and the adoption of more recent protocols, such as FAIR Signposting. At the level of the centralized aggregation, CORE will enrich the existing US metadata with information from its larger international aggregation. A Dashboard service for participating institutions will be provided, enabling them to assess, validate and monitor their practices.

https://tinyurl.com/2utfpvj3

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Google SGE [Search Generative Experience]: A New Way to Search, Teach, and Resist"


Google SGE removes many of the barriers that make us doubt our search abilities. We already know that users rarely look past the first page of results or scroll past the fold of a webpage, but with SGE you get exactly what you think is "good enough." However, the more I searched the more disappointed I was that Google continued to serve up the same kinds of sources you usually find at the top of the algorithm, such as Wikipedia pages, blog posts, news, and popular media. The only disclaimer that SGE gives is "Info quality may vary."

https://tinyurl.com/4tntbsbh

| Artificial Intelligence and Libraries Bibliography |
Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Digitization and the Market for Physical Works: Evidence from the Google Books Project"


We study the impact of the Google Books digitization project on the market for physical books. We find that digitization significantly boosts the demand for physical versions and provide evidence for the discovery channel. Moreover, digitization allows independent publishers to introduce new editions for existing books, further increasing sales.

https://tinyurl.com/2pbuzty2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Creating a Full Multitenant Back End User Experience in Omeka S with the Teams Module"


When Omeka S appeared as a beta release in 2016, it offered the opportunity for researchers or larger organizations to publish multiple Omeka sites from the same installation. Multisite functionality was and continues to be a major advance for what had become the premiere platform for scholarly digital exhibits produced by libraries, museums, researchers, and students. However, while geared to larger institutional contexts, Omeka S poses some user experience challenges on the back end for larger organizations with numerous users creating different sites. These challenges include a "cluttered" effect for many users seeing resources they do not need to access and data integrity challenges due to the possibility of users editing resources that other users need in their current state. The University of Illinois Library, drawing on two local use cases as well as two additional external use cases, developed the Teams module to address these challenges. This article describes the needs leading to the decision to create the module, the project requirement gathering process, and the implementation and ongoing development of Teams. The module and findings are likely to be of interest to other institutions adopting Omeka S but also, more generally, to libraries seeking to contribute successfully to larger open-source initiatives.

https://journal.code4lib.org/articles/17389

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Very Small Pond: Discovery Systems That Can Be Used with FOLIO in Academic Libraries"


FOLIO, an open source library services platform, does not have a front end patron interface for searching and using library materials. Any library installing FOLIO will need at least one other software to perform those functions. This article evaluates which systems, in a limited marketplace, are available for academic libraries to use with FOLIO.

https://journal.code4lib.org/articles/17433

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

ChatGPT Proof-of-Concept: "Searching for Meaning Rather Than Keywords and Returning Answers Rather Than Links"


Large language models (LLMs) have transformed the largest web search engines: for over ten years, public expectations of being able to search on meaning rather than just keywords have become increasingly realised. Expectations are now moving further: from a search query generating a list of "ten blue links" to producing an answer to a question, complete with citations.

This article describes a proof-of-concept that applies the latest search technology to library collections by implementing a semantic search across a collection of 45,000 newspaper articles from the National Library of Australia’s Trove repository, and using OpenAI’s ChatGPT4 API to generate answers to questions on that collection that include source article citations. It also describes some techniques used to scale semantic search to a collection of 220 million articles.

https://journal.code4lib.org/articles/17443

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Impact and Development of an Open Web Index for Open Web Search"


Web search is a crucial technology for the digital economy. Dominated by a few gatekeepers focused on commercial success, however, web publishers have to optimize their content for these gatekeepers, resulting in a closed ecosystem of search engines as well as the risk of publishers sacrificing quality. To encourage an open search ecosystem and offer users genuine choice among alternative search engines, we propose the development of an Open Web Index (OWI). We outline six core principles for developing and maintaining an open index, based on open data principles, legal compliance, and collaborative technology development. The combination of an open index with what we call declarative search engines will facilitate the development of vertical search engines and innovative web data products (including, e.g., large language models), enabling a fair and open information space. This framework underpins the EU-funded project OpenWebSearch.EU, marking the first step towards realizing an Open Web Index.

https://doi.org/10.1002/asi.24818

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Powering Research with Dimensions AI Assistant"


Imagine using AI to leverage the power of Dimensions with the click of a button. That’s exactly what you can do with Dimensions AI Assistant: your interaction with the world’s research knowledge is assisted by a powerful AI that takes you beyond keywords to a semantically rich summary with references, fully contextualizing the results and linking them with the literature. Digital Science has announced a closed beta release of Dimensions AI Assistant, which will allow users to achieve their goals quicker by helping them find the most relevant research and receive relevant synposes, leveraging the power of the Dimensions large language model, Dimensions General Science-BERT, and Open AI’s GPT models.

https://tinyurl.com/4w2jfukt

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Elsevier takes Scopus to the Next Level with Generative AI"


Scopus AI will help early-career researchers and seasoned academics alike through:

  • Summarized views based on Scopus abstracts: Researchers obtain a concise and trustworthy snapshot of any research topic, complete with academic references, reducing lengthy reading time and the risk of hallucinations.
  • Easy navigation to “Go Deeper Links” for extended exploration: Scopus AI provides relevant queries for further exploration, leading to hidden insights in various research topics.
  • Natural language queries: Researchers can ask questions about a subject in a natural, conversational manner.
  • A soon-to-be-added graphical representation, offering new perspectives of interconnected research themes: Scopus AI visually maps search results, offering a comprehensive overview that allows researchers to navigate complex relationships easily.

https://tinyurl.com/27xxj465

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Human-AI Interaction for Exploratory Search & Recommender Systems with Application to Cultural Heritage "


This dissertation introduces three primary contributions through publicly deployed sys- tems and datasets. First, we demonstrate how the construction of large-scale cultural heritage datasets using machine learning can answer interdisciplinary questions in library & information science and the humanities (Chapter 2). Second, based on the feedback of users of these cultural heritage datasets, we introduce open faceted search, an extension of faceted search that leverages human-AI interaction affordances to empower users to define their own facets in an open domain fashion (Chapter 3). Third, encountering similar challenges with the deluge of scientific papers, we explore the question of how to improve recommender systems through human-AI interaction and tackle the broad challenge of advice taking for opaque machine learners (Chapter 4).

https://tinyurl.com/yc59txc5

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Comparing Different Search Methods for the Open Access Journal Recommendation Tool B!Son"


Finding a suitable open access journal to publish academic work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of predatory publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. A systematic requirements analysis was conducted in the form of a survey. The developed tool suggests open access journals based on title, abstract and references provided by the user. The recommendations are built on open data, publisher-independent and work across domains and languages. Transparency is provided by its open source nature, an open application programming interface (API) and by specifying which matches the shown recommendations are based on. The recommendation quality has been evaluated using two different evaluation techniques, including several new recommendation methods. We were able to improve the results from our previous paper with a pre-trained transformer model. The beta version of the tool received positive feedback from the community and in several test sessions. We developed a recommendation system for open access journals to help researchers find a suitable journal. The open tool has been extensively tested, and we found possible improvements for our current recommendation technique. Development by two German academic libraries ensures the longevity and sustainability of the system.

https://doi.org/10.1007/s00799-023-00372-3

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"OCLC Introduces AI-generated Book Recommendations in WorldCat.org and WorldCat Find beta"


OCLC is beta testing book recommendations generated by artificial intelligence (AI) in WorldCat.org, the website that allows users to explore the collections of thousands of libraries through a single search. Searchers can now obtain AI-enabled book recommendations for print and e-books and then look for those items in libraries near them. The AI-generated book recommendations beta is now available in WorldCat.org and WorldCat Find, the mobile app extension for WorldCat.org.

https://tinyurl.com/44j4ascr

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Evaluating the Efficacy of ChatGPT-4 in Providing Scientific References across Diverse Disciplines"


This work conducts a comprehensive exploration into the proficiency of OpenAI’s ChatGPT-4 in sourcing scientific references within an array of research disciplines. Our in-depth analysis encompasses a wide scope of fields including Computer Science (CS), Mechanical Engineering (ME), Electrical Engineering (EE), Biomedical Engineering (BME), and Medicine, as well as their more specialized sub-domains. Our empirical findings indicate a significant variance in ChatGPT-4’s performance across these disciplines. Notably, the validity rate of suggested articles in CS, BME, and Medicine surpasses 65%, whereas in the realms of ME and EE, the model fails to verify any article as valid. Further, in the context of retrieving articles pertinent to niche research topics, ChatGPT-4 tends to yield references that align with the broader thematic areas as opposed to the narrowly defined topics of interest. This observed disparity underscores the pronounced variability in accuracy across diverse research fields, indicating the potential requirement for model refinement to enhance its functionality in academic research. Our investigation offers valuable insights into the current capacities and limitations of AI-powered tools in scholarly research, thereby emphasizing the indispensable role of human oversight and rigorous validation in leveraging such models for academic pursuits.

https://arxiv.org/abs/2306.09914v1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Value of a Diamond: Understanding Global Coverage of Diamond Open Access Journals in Web of Science, Scopus, and OpenAlex to Support an Open Future"


Diamond OA journals present a publishing model that is free for both authors and readers, but their lack of indexing in major bibliographic databases such as Web of Science (WoS) and Scopus presents challenges in assessing the usage of these journals. This paper provides a global picture of the coverage of diamond OA journals from the Directory of Open Access Journals (DOAJ) in three data sources. Results show their low coverage in WoS and Scopus and higher coverage in OpenAlex, as well as the generally smaller and local scope of diamond OA journals.

https://tinyurl.com/2mt9sydd

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Scholarly Recommendation Systems: A Literature Survey"


A scholarly recommendation system is an important tool for identifying prior and related resources such as literature, datasets, grants, and collaborators. A well-designed scholarly recommender significantly saves the time of researchers and can provide information that would not otherwise be considered. The usefulness of scholarly recommendations, especially literature recommendations, has been established by the widespread acceptance of web search engines such as CiteSeerX, Google Scholar, and Semantic Scholar. This article discusses different aspects and developments of scholarly recommendation systems. We searched the ACM Digital Library, DBLP, IEEE Explorer, and Scopus for publications in the domain of scholarly recommendations for literature, collaborators, reviewers, conferences and journals, datasets, and grant funding. In total, 225 publications were identified in these areas. We discuss methodologies used to develop scholarly recommender systems. Content-based filtering is the most commonly applied technique, whereas collaborative filtering is more popular among conference recommenders. The implementation of deep learning algorithms in scholarly recommendation systems is rare among the screened publications. We found fewer publications in the areas of the dataset and grant funding recommenders than in other areas. Furthermore, studies analyzing users’ feedback to improve scholarly recommendation systems are rare for recommenders. This survey provides background knowledge regarding existing research on scholarly recommenders and aids in developing future recommendation systems in this domain.

https://doi.org/10.1007/s10115-023-01901-x

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization"


Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter-university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine-readability of data and its documentation. There are opportunities to enhance dataset search by improving users’ ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot-based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives’ and institutional repositories’ ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.

https://arxiv.org/abs/2305.18358

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Is Googling Risky? A Study on Risk Perception and Experiences of Adverse Consequences in Web Search"


Search engines, such as Google, have a considerable impact on society. Therefore, undesirable consequences, such as retrieving incorrect search results, pose a risk to users. Although previous research has reported the adverse outcomes of web search, little is known about how search engine users evaluate those outcomes. In this study, we show which aspects of web search are perceived as risky using a sample (N = 3884) representative of the German Internet population. We found that many participants are often concerned with adverse consequences immediately appearing on the search engine result page. For example, 45.2% of respondents are concerned about retrieving incorrect information. In contrast, consequences with a delayed impact are rarely perceived as a risk. Moreover, participants’ experiences with adverse consequences are directly related to their risk perception. Our results demonstrate that people perceive risks related to web search. In addition to our study, there is a need for more independent research on the possible detrimental outcomes of web search to monitor and mitigate risks. Apart from risks for individuals, search engines with a massive number of users have an extraordinary impact on society; therefore, the acceptable risks of web search should be discussed.

https://doi.org/10.1002/asi.24802

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"CORE: A Global Aggregation Service for Open Access Papers"


This paper introduces CORE, a widely used scholarly service, which provides access to the world’s largest collection of open access research publications, acquired from a global network of repositories and journals. CORE was created with the goal of enabling text and data mining of scientific literature and thus supporting scientific discovery, but it is now used in a wide range of use cases within higher education, industry, not-for-profit organisations, as well as by the general public. Through the provided services, CORE powers innovative use cases, such as plagiarism detection, in market-leading third-party organisations. CORE has played a pivotal role in the global move towards universal open access by making scientific knowledge more easily and freely discoverable. In this paper, we describe CORE’s continuously growing dataset and the motivation behind its creation, present the challenges associated with systematically gathering research papers from thousands of data providers worldwide at scale, and introduce the novel solutions that were developed to overcome these challenges. The paper then provides an in-depth discussion of the services and tools built on top of the aggregated data and finally examines several use cases that have leveraged the CORE dataset and services.

https://doi.org/10.1038/s41597-023-02208-w

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Google is Changing the Way We Search with AI. It Could Upend theWeb."


At the same time, the talk of replacing search results with AI-generated answers has roiled the world of people who make their living writing content and building websites. If a chatbot takes over the role of helping people find useful information, what incentive would there be for anyone to write how-to guides, travel blogs or recipes?

https://cutt.ly/s6kmQpF

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Google Devising Radical Search Changes to Beat Back A.I. Rivals"


Google’s employees were shocked when they learned in March that the South Korean consumer electronics giant Samsung was considering replacing Google with Microsoft’s Bing as the default search engine on its devices. . . .Google’s reaction to the Samsung threat was "panic," according to internal messages reviewed by The New York Times. An estimated $3 billion in annual revenue was at stake with the Samsung contract. An additional $20 billion is tied to a similar Apple contract that will be up for renewal this year.

https://bit.ly/3MQjYfD

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Google’s Bard Chatbot Doesn’t Love Me — But It’s Still Pretty Weird"


As far as I can tell, it’s also a noticeably worse tool than Bing, at least when it comes to surfacing useful information from around the internet. Bard is wrong a lot. And when it’s right, it’s often in the dullest way possible. Bard wrote me a heck of a Taylor Swift-style breakup song about dumping my cat, but it’s not much of a productivity tool. And it’s definitely not a search engine.

http://bit.ly/3JXVob1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Google May Need a $80 Billion Upgrade: "Meet the $10,000 Nvidia Chip Powering the Race for A.I."


For example, an estimate from New Street Research found that the OpenAI-based ChatGPT model inside Bing’s search could require 8 GPUs to deliver a response to a question in less than one second. . . .

"If you’re from Microsoft, and you want to scale that, at the scale of Bing, that’s maybe $4 billion. If you want to scale at the scale of Google, which serves 8 or 9 billion queries every day, you actually need to spend $80 billion on DGXs." said Antoine Chkaiban, a technology analyst at New Street Research.

bit.ly/3ZnsSnY

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |