Artificial Intelligence/Robots – Page 10

"Deepfakes and Scientific Knowledge Dissemination"

Science misinformation on topics ranging from climate change to vaccines have significant public policy repercussions. Artificial intelligence-based methods of altering videos and photos (deepfakes) lower the barriers to the mass creation and dissemination of realistic, manipulated digital content. The risk of exposure to deepfakes among education stakeholders has increased as learners and educators rely on videos to obtain and share information. We field the first study to understand the vulnerabilities of education stakeholders to science deepfakes and the characteristics that moderate vulnerability. We ground our study in climate change and survey individuals from five populations spanning students, educators, and the adult public. Our sample is nationally representative of three populations. We found that 27–50% of individuals cannot distinguish authentic videos from deepfakes. All populations exhibit vulnerability to deepfakes which increases with age and trust in information sources but has a mixed relationship with political orientation. Adults and educators exhibit greater vulnerability compared to students, indicating that those providing education are especially susceptible. Vulnerability increases with exposure to potential deepfakes, suggesting that deepfakes become more pernicious without interventions. Our results suggest that focusing on the social context in which deepfakes reside is one promising strategy for combatting deepfakes.

https://doi.org/10.1038/s41598-023-39944-3

"AI, the New Frontier — Opportunities and Challenges"

Artificial intelligence (AI) is currently all the rage in our global economy. The launch of ChatGPT broke all of the records for user adoption – Reuters reported that ChatGPT achieved 100 million users in two months. . . .

Within scholarly publishing, we have ushered in the internet, digital journals, and books, and now we are witnessing first-hand the benefits of AI, semantic search, IoT, and WEB3. This article aims to provide a context of the history of AI, the opportunities, challenges, new services, and governance.

https://tinyurl.com/yfmew3r8

Pew Research Center: What Americans Know About AI, Cybersecurity and Big Tech

Overall, Americans answer a median of five out of nine questions correctly on a digital knowledge survey that Pew Research Center conducted among 5,101 U.S. adults from May 15 to May 21, 2023. The questions span a range of topics, including cybersecurity practices, facts about major technology companies, artificial intelligence and federal online privacy laws.

Some 26% of U.S. adults can answer at least seven of the nine questions accurately, but just 4% can correctly answer all nine.

https://tinyurl.com/582bwmf3

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Generative AI, ChatGPT, and Google Bard: Evaluating the Impact and Opportunities for Scholarly Publishing"

My group within Wiley Partner Solutions designs and develops intelligent services that leverage advanced AI, big data, and cloud technologies to support publishers and researchers in open access and open science environments. To identify both benefits and risks of generative AI for our industry, we tested ChatGPT and Google Bard for authoring, for submission and reviews, for publishing, and for discovery and dissemination. I hope that our findings will inspire you to find fresh ideas for using Generative AI, and will stimulate further conversation about this new and controversial but potentially beneficial tool.

https://tinyurl.com/2y2ue6zr

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The New York Times Prohibits AI Vendors from Devouring Its Content"

The new terms prohibit the use of Times content—which includes articles, videos, images, and metadata—for training any AI model without express written permission. In Section 2.1 of the TOS, the NYT says that its content is for the reader’s “personal, non-commercial use” and that non-commercial use does not include “the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

https://tinyurl.com/2cc4uhuc

"Sites Scramble to Block ChatGPT Web Crawler after Instructions Emerge"

But for large website operators, the choice to block large language model (LLM) crawlers isn’t as easy as it may seem. Making some LLMs blind to certain website data will leave gaps of knowledge that could serve some sites very well (such as sites that don’t want to lose visitors if ChatGPT supplies their information for them), but it may also hurt others. For example, blocking content from future AI models could decrease a site’s or a brand’s cultural footprint if AI chatbots become a primary user interface in the future. As a thought experiment, imagine an online business declaring that it didn’t want its website indexed by Google in the year 2002—a self-defeating move when that was the most popular on-ramp for finding information online.

https://tinyurl.com/yc4mcejn

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"AI Can Crack Double Blind Peer Review — Should We Still Use It?"

However, in the era of artificial intelligence (AI) and big data, a pressing question arises: can an author’s identity be deduced even from an anonymized paper (in cases where the authors do not advertise their submitted article on social media)?

In a recent article we investigate this very question, by leveraging an artificial intelligence model trained on the largest authorship attribution dataset to date. . . . Focusing purely on well-established researchers with at least a few dozen publications, our work demonstrates that reliable author identification is possible.

https://tinyurl.com/2kbuh7wn

"Will Building LLMs [AI Large Language Models] Become the New Revenue Driver for Academic Publishing?"

In a world where peer-reviewed content holds value for Generative AI companies, the question arises whether content that is locked behind a paywall has greater value than OA content. . . . Will publishers who still have a lot of content locked up, such as IEEE or NEJM, retain the most valuable assets? Will publishers that limit licensing to more restrictive terms such as CC BY-NC and CC BY-NC-ND have revenue streams denied to those exclusively using CC BY licenses? . . . Could authors receive income from their work via a CMO (Collective Management of Copyright) license, regardless of the agreement they have with the publisher?

https://tinyurl.com/zm6u5spc

OpenAI’s New Web Crawler: GPTBot

OpenAI has released a brief overview of GPTBot.

GPTBot is OpenAI’s web crawler and can be identified by the following user agent and string.

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Usage

Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.

Disallowing GPTBot

To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:

User-agent: GPTBot Disallow: /

"Artificial Intelligence in Subject-Specific Library Work: Trends, Perspectives, and Opportunities"

The general implications of AI for libraries are much discussed in library literature. But while this discussion takes place at the library-wide level, there are also important implications for subject librarians due to the specific uses of AI in different professions and areas of study. These are often overlooked as these specializations tend to publish in subject-specific journals. This article aims to address this research gap by providing a comparison and thematic analysis of this literature. Subject-specific library journals in the areas of law, health sciences, business, and humanities and social sciences were searched to identify relevant journal articles that discussed AI. 139 articles were identified and tagged with at least one category that reflected the nature of the discussion around AI. The following analysis showed that literature related to law had the greatest number of articles by far, though the publishing activity in all disciplines has increased significantly in the last 10 years. This article explores these trends to gain a more comprehensive understanding of the implications for subject-specific library work.

https://doi.org/10.33137/cjal-rcbu.v9.39951

"Powering Research with Dimensions AI Assistant"

Imagine using AI to leverage the power of Dimensions with the click of a button. That’s exactly what you can do with Dimensions AI Assistant: your interaction with the world’s research knowledge is assisted by a powerful AI that takes you beyond keywords to a semantically rich summary with references, fully contextualizing the results and linking them with the literature. Digital Science has announced a closed beta release of Dimensions AI Assistant, which will allow users to achieve their goals quicker by helping them find the most relevant research and receive relevant synposes, leveraging the power of the Dimensions large language model, Dimensions General Science-BERT, and Open AI’s GPT models.

https://tinyurl.com/4w2jfukt

"Elsevier takes Scopus to the Next Level with Generative AI"

Scopus AI will help early-career researchers and seasoned academics alike through:

Summarized views based on Scopus abstracts: Researchers obtain a concise and trustworthy snapshot of any research topic, complete with academic references, reducing lengthy reading time and the risk of hallucinations.

Easy navigation to “Go Deeper Links” for extended exploration: Scopus AI provides relevant queries for further exploration, leading to hidden insights in various research topics.

Natural language queries: Researchers can ask questions about a subject in a natural, conversational manner.

A soon-to-be-added graphical representation, offering new perspectives of interconnected research themes: Scopus AI visually maps search results, offering a comprehensive overview that allows researchers to navigate complex relationships easily.

https://tinyurl.com/27xxj465

Paywall: "Human-AI Interaction for Exploratory Search & Recommender Systems with Application to Cultural Heritage "

This dissertation introduces three primary contributions through publicly deployed sys- tems and datasets. First, we demonstrate how the construction of large-scale cultural heritage datasets using machine learning can answer interdisciplinary questions in library & information science and the humanities (Chapter 2). Second, based on the feedback of users of these cultural heritage datasets, we introduce open faceted search, an extension of faceted search that leverages human-AI interaction affordances to empower users to define their own facets in an open domain fashion (Chapter 3). Third, encountering similar challenges with the deluge of scientific papers, we explore the question of how to improve recommender systems through human-AI interaction and tackle the broad challenge of advice taking for opaque machine learners (Chapter 4).

https://tinyurl.com/yc59txc5

Generative AI and the Future of Work in America

By 2030, activities that account for up to 30 percent of hours currently worked across the US economy could be automated—a trend accelerated by generative AI. However, we see generative AI enhancing the way STEM, creative, and business and legal professionals work rather than eliminating a significant number of jobs outright. Automation’s biggest effects are likely to hit other job categories. Office support, customer service, and food service employment could continue to decline. . . .

An additional 12 million occupational transitions may be needed by 2030. As people leave shrinking occupations, the economy could reweight toward higher-wage jobs. Workers in lower-wage jobs are up to 14 times more likely to need to change occupations than those in highest-wage positions, and most will need additional skills to do so successfully. Women are 1.5 times more likely to need to move into new occupations than men.

https://tinyurl.com/yn2xdt7p

Paywall: "An Initial Interpretation of the U.S. Department of Education’s AI Report: Implications and Recommendations for Academic Libraries"

This article provides an analysis of the U.S. Department of Education’s report on Artificial Intelligence (AI) and its implications for academic libraries. It delves into the report’s key points, including the importance of AI literacy, the need for educator involvement in AI design and implementation, and the necessity of preparing for AI related issues. The author discusses how these points impact academic libraries and offers actionable recommendations for library leaders. It emphasizes the need for libraries to promote AI literacy, involve librarians in AI implementation, develop guidelines for AI use, prepare for AI issues, and collaborate with other stakeholders.

https://doi.org/10.1016/j.acalib.2023.102761

"Reproducibility in Machine Learning-Driven Research"

Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.

https://arxiv.org/abs/2307.10320

"Analyzing and Navigating Electronic Theses and Dissertations"

This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation.

https://tinyurl.com/33ay562h

Webinar Recording: "ACRL LDG A Mutualistic View of AI in the Library or a Continuation of Craft by Thomas Padilla"

During this session, Thomas Padilla [Deputy Director, Archiving and Data Services at the Internet Archive] will present a critical and generative position aimed at empowering GLAM professionals on their journey to develop a mutually beneficial relationship with AI. The discussion will cover the individual, organizational, and community impacts of AI in the library landscape.

https://www.youtube.com/watch?v=hh5PTyBT6AA

"Wikipedia’s Moment of Truth"

The new A.I. chatbots have typically swallowed Wikipedia’s corpus. . . . While estimates of its influence can vary, Wikipedia is probably the most important single source in the training of A.I. models. "Without Wikipedia, generative A.I. wouldn’t exist," says Nicholas Vincent, Yet as bots like ChatGPT become increasingly popular and sophisticated, Vincent and some of his colleagues wonder what will happen if Wikipedia, outflanked by A.I. that has cannibalized it, suffers from disuse and dereliction.

https://tinyurl.com/bdbxrdbk

"Meta Is Expanding Its Generative A.I. Arsenal with a New Tool It’s Touting as a ‘State-of-the-Art’ Breakthrough"

Currently, there is a divide between A.I. image generators and A.I. text generators, like OpenAI’s ChatGPT.. . . Meta’s tool breaks down that divide with a model that allows for the input and generation of text and images, and allows for the creation of captions (or image-to-text generation) and images with "super-resolution."

https://tinyurl.com/mr25z6zd

"The Future of Academic Publishing"

Ultimately, we might be forced to rethink publication. If scientific research is mostly read by machines, the question arises of whether it is relevant to package it into a single coherent narrative that is adapted to the limitations of human cognition. This seems like a lot of busywork for scientists. We could unbundle scientific research from the constraints of journal formatting, as suggested by Neuromatch Open Publishing. In this view, research will be a living compendium of code, datasets, graphs and narrative content remixable and always up to date. Open and freely accessible research will be more valuable and influential because it will be seen by LLMs.

https://doi.org/10.1038/s41562-023-01637-2

"Authors Join the Brewing Legal Battle over AI"

Neither Meta nor OpenAI has yet responded to the author suits. But multiple copyright lawyers told PW on background that the claims likely face an uphill battle in court. Even if the suits get past the threshold issues associated with the alleged copying at issue and how AI training actually works—which is no sure thing—lawyers say there is ample case law to suggest fair use. For example, a recent case against plagiarism detector TurnItIn.com held that works could be ingested to create a database used to expose plagiarism by students. The landmark Kelly v. Arriba Soft case held that the reproduction and display of photos as thumbnails was fair use.

https://tinyurl.com/bddvrykh

"Writing with CHATGPT: An Illustration of Its Capacity, Limitations & Implications for Academic Writers"

Rather than being alarmed or anxious, writers need to understand ChatGPT’s strengths and weaknesses. It is better at structure than it is at content. It is a good brainstorming tool (think titles, outlines, counter-arguments), but you must double check everything it tells you, especially if you’re outside your domain of expertise. It can provide summaries of complex ideas, and connect them with other ideas, but only if you have put a lot of thought into the incremental prompting needed to shift it from its generic default and train it to focus on what you care about. Its access to information is limited to what it was originally trained on, therefore your own training phase is essential to identify gaps and inaccuracies. It can be used for labor, such as reformatting abstracts or reducing the length of sections, but it can’t replace the thinking a writer does to determine why some paragraphs or ideas deserve more words and others can be cut back. It can be inaccurate: in fact, rather stubbornly so, persisting with inaccuracies even after they are pointed out, while at the same time presenting its next attempt as corrected.

https://doi.org/10.5334/pme.1072

"CORE-GPT: Combining Open Access Research and Large Language Models for Credible, Trustworthy Question Answering"

In this paper, we present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT’s performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.

https://arxiv.org/abs/2307.04683

"Claude 2: ChatGPT Rival Launches Chatbot That Can Summarise a Novel"

A US artificial intelligence company has launched a rival chatbot to ChatGPT that can summarise novel-sized blocks of text and operates from a list of safety principles drawn from sources such as the Universal Declaration of Human Rights. . . .

The chatbot is trained on principles taken from documents including the 1948 UN declaration and Apple’s terms of service, which cover modern issues such as data privacy and impersonation.

https://tinyurl.com/ms44eccd