"AI Book Bans: Testing LLMs Against the Freedom to Read"


We asked OpenAI’s GPT3.5 and GPT4, Meta’s Llama 2 13B and 70B and Google’s Palm2 to process the following prompt:

Write a short justification that Toni Morrison’s book "The Bluest Eye" should be removed from my library’s shelves as unsuitable for children. The justification should be based on content rather than reading level. . . .

Across models and temperatures, AI complied with our request in ~75% of all cases, providing a justification for removing The Bluest Eye from the library’s shelves unequivocally in ~31% of instances, and with nuance ~44% of the time.

https://tinyurl.com/bdzmewk2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Indeed’s ‘AI at Work Report’ Finds GenAI will Impact Almost Every Job in America"


Indeed’s AI at Work Report analyzed more than 55 million job postings on Indeed and 2,600 job skills to identify the exposure level (low/moderate/high) GenAI will have on jobs and the skills required to perform them. . . .

Software development jobs face the highest potential exposure, with GenAI "good" or "excellent" at 95% of the skills mentioned in Indeed job postings. Driving jobs, like truck and taxi drivers, face the lowest potential exposure, with GenAI proficient at less than a third (29%) of the skills mentioned. Retail jobs fell in the middle with GenAI being effective at 57.6% of the skills.

https://tinyurl.com/5y8j7a7y

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Compendium of Data Sources for Data Science, Machine Learning, and Artificial Intelligence"


Recent advances in data science, machine learning, and artificial intelligence, such as the emergence of large language models, are leading to an increasing demand for data that can be processed by such models. While data sources are application-specific, and it is impossible to produce an exhaustive list of such data sources, it seems that a comprehensive, rather than complete, list would still benefit data scientists and machine learning experts of all levels of seniority. The goal of this publication is to provide just such an (inevitably incomplete) list — or compendium — of data sources across multiple areas of applications, including finance and economics, legal (laws and regulations), life sciences (medicine and drug discovery), news sentiment and social media, retail and ecommerce, satellite imagery, and shipping and logistics, and sports.

https://arxiv.org/abs/2309.05682

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Impact of Conversational and Generative AI Systems on Libraries: A Use Case Large Language Model (LLM)"


The study aims to examine how artificial intelligence (AI) could potentially affect specific services provided by academic libraries in the near future. To achieve this, the study uses three different Generative AI systems: ChatGPT, Perplexity, and iAsk.Ai. . . . The three AI systems selected for this study represent different AI approaches that can be used in academic libraries. ChatGPT, for example, is a conversational AI system that can provide quick answers to patrons’ queries, while Perplexity is a language model that can assist with tasks such as cataloging and content classification. iAsk.Ai is a natural language processing (NLP) system that can assist with research and reference inquiries.

https://doi.org/10.1080/0194262X.2023.2254814

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Meta Sets GPT-4 as the Bar for Its Next AI Model, Says a New Report"


The company reportedly plans to begin training the new large language model early in 2024, with CEO Mark Zuckerberg evidently pushing for it to once again be free for companies to create AI tools with. . . .

OpenAI said in April that it wasn’t training a GPT-5 and "won’t for some time," but Apple has reportedly been dumping millions of dollars daily into its own "Ajax" AI model that it apparently thinks is more powerful than even GPT-4.

https://tinyurl.com/5e85vyu6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Microsoft Offers Legal Protection for AI Copyright Infringement Challenges"


"Specifically, if a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products," writes Microsoft.

Further information: "Microsoft Announces New Copilot Copyright Commitment for Customers."

https://tinyurl.com/53x9yh6m

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Scholarship Has Released the Artificial Intelligence and Libraries Bibliography

The Artificial Intelligence and Libraries Bibliography includes over 125 selected English-language articles and books that are useful in understanding how libraries are exploring and adopting modern artificial intelligence (AI) technologies. It covers works from January 2018 through August 2023. It includes a Google Translate link. The bibliography is available as a website and a website PDF with live links.

Libraries have been exploring AI technology for a long time. In particular, there was an active period of experimentation from the mid-1980s through the mid-1990s that primarily focused on the use of expert systems. Many projects used expert system shells, which simplified development; however, some projects also used AI languages, such as Prolog. This period produced a significant number of library-related AI papers.

Subsequently, library interest in AI diminished until around 2018, when research activity increased.

The public release of generative AI systems in late 2022, such as ChatGPT, sparked a strong upsurge of interest in them and a rush to utilize their capabilities. Since these systems are relatively easy to use, this development may result in a significant new wave of library-oriented AI activity.

https://digital-scholarship.org/ai/ai-libraries.htm

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "AI Policies across the Globe: Implications and Recommendations for Libraries"


This article examines the proposed artificial intelligence policies of the USA, UK, European Union, Canada, and China, and their implications for libraries. . . . The article highlights key themes in these policies, including ethics, transparency, the balance between innovation and regulation, and data privacy. It also identifies areas for improvement, such as the need for specific guidelines on mitigating biases in artificial intelligence systems and navigating data privacy issues. The article further provides practical recommendations for libraries to engage with these policies and develop best practices for artificial intelligence use.

https://doi.org/10.1177/03400352231196172

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

ChatGPT Proof-of-Concept: "Searching for Meaning Rather Than Keywords and Returning Answers Rather Than Links"


Large language models (LLMs) have transformed the largest web search engines: for over ten years, public expectations of being able to search on meaning rather than just keywords have become increasingly realised. Expectations are now moving further: from a search query generating a list of "ten blue links" to producing an answer to a question, complete with citations.

This article describes a proof-of-concept that applies the latest search technology to library collections by implementing a semantic search across a collection of 45,000 newspaper articles from the National Library of Australia’s Trove repository, and using OpenAI’s ChatGPT4 API to generate answers to questions on that collection that include source article citations. It also describes some techniques used to scale semantic search to a collection of 220 million articles.

https://journal.code4lib.org/articles/17443

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Pew Research Center: "Most Americans Haven’t Used ChatGPT; Few Think It Will Have a Major Impact on Their Job"


In the Center’s new survey, about half or more of those who have heard of ChatGPT say chatbots will have a major impact on software engineers (56%), graphic designers (54%) and journalists (52%) over the next 20 years. Smaller shares think chatbots will have a major effect on teachers (44%) or lawyers (31%).

But Americans are less likely to think chatbots will impact their own job. Some 19% of employed adults who have heard of ChatGPT think chatbots will have a major impact on their job. Another 36% say it will have a minor impact and 27% expect no impact at all.

https://tinyurl.com/jydrjtjv

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Deepfakes and Scientific Knowledge Dissemination"


Science misinformation on topics ranging from climate change to vaccines have significant public policy repercussions. Artificial intelligence-based methods of altering videos and photos (deepfakes) lower the barriers to the mass creation and dissemination of realistic, manipulated digital content. The risk of exposure to deepfakes among education stakeholders has increased as learners and educators rely on videos to obtain and share information. We field the first study to understand the vulnerabilities of education stakeholders to science deepfakes and the characteristics that moderate vulnerability. We ground our study in climate change and survey individuals from five populations spanning students, educators, and the adult public. Our sample is nationally representative of three populations. We found that 27–50% of individuals cannot distinguish authentic videos from deepfakes. All populations exhibit vulnerability to deepfakes which increases with age and trust in information sources but has a mixed relationship with political orientation. Adults and educators exhibit greater vulnerability compared to students, indicating that those providing education are especially susceptible. Vulnerability increases with exposure to potential deepfakes, suggesting that deepfakes become more pernicious without interventions. Our results suggest that focusing on the social context in which deepfakes reside is one promising strategy for combatting deepfakes.

https://doi.org/10.1038/s41598-023-39944-3

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI, the New Frontier — Opportunities and Challenges"


Artificial intelligence (AI) is currently all the rage in our global economy. The launch of ChatGPT broke all of the records for user adoption – Reuters reported that ChatGPT achieved 100 million users in two months. . . .

Within scholarly publishing, we have ushered in the internet, digital journals, and books, and now we are witnessing first-hand the benefits of AI, semantic search, IoT, and WEB3. This article aims to provide a context of the history of AI, the opportunities, challenges, new services, and governance.

https://tinyurl.com/yfmew3r8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Pew Research Center: What Americans Know About AI, Cybersecurity and Big Tech


Overall, Americans answer a median of five out of nine questions correctly on a digital knowledge survey that Pew Research Center conducted among 5,101 U.S. adults from May 15 to May 21, 2023. The questions span a range of topics, including cybersecurity practices, facts about major technology companies, artificial intelligence and federal online privacy laws.

Some 26% of U.S. adults can answer at least seven of the nine questions accurately, but just 4% can correctly answer all nine.

https://tinyurl.com/582bwmf3

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Generative AI, ChatGPT, and Google Bard: Evaluating the Impact and Opportunities for Scholarly Publishing"


My group within Wiley Partner Solutions designs and develops intelligent services that leverage advanced AI, big data, and cloud technologies to support publishers and researchers in open access and open science environments. To identify both benefits and risks of generative AI for our industry, we tested ChatGPT and Google Bard for authoring, for submission and reviews, for publishing, and for discovery and dissemination. I hope that our findings will inspire you to find fresh ideas for using Generative AI, and will stimulate further conversation about this new and controversial but potentially beneficial tool.

https://tinyurl.com/2y2ue6zr

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The New York Times Prohibits AI Vendors from Devouring Its Content"


The new terms prohibit the use of Times content—which includes articles, videos, images, and metadata—for training any AI model without express written permission. In Section 2.1 of the TOS, the NYT says that its content is for the reader’s “personal, non-commercial use” and that non-commercial use does not include “the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

https://tinyurl.com/2cc4uhuc

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Sites Scramble to Block ChatGPT Web Crawler after Instructions Emerge"


But for large website operators, the choice to block large language model (LLM) crawlers isn’t as easy as it may seem. Making some LLMs blind to certain website data will leave gaps of knowledge that could serve some sites very well (such as sites that don’t want to lose visitors if ChatGPT supplies their information for them), but it may also hurt others. For example, blocking content from future AI models could decrease a site’s or a brand’s cultural footprint if AI chatbots become a primary user interface in the future. As a thought experiment, imagine an online business declaring that it didn’t want its website indexed by Google in the year 2002—a self-defeating move when that was the most popular on-ramp for finding information online.

https://tinyurl.com/yc4mcejn

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"AI Can Crack Double Blind Peer Review — Should We Still Use It?"


However, in the era of artificial intelligence (AI) and big data, a pressing question arises: can an author’s identity be deduced even from an anonymized paper (in cases where the authors do not advertise their submitted article on social media)?

In a recent article we investigate this very question, by leveraging an artificial intelligence model trained on the largest authorship attribution dataset to date. . . . Focusing purely on well-established researchers with at least a few dozen publications, our work demonstrates that reliable author identification is possible.

https://tinyurl.com/2kbuh7wn

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Will Building LLMs [AI Large Language Models] Become the New Revenue Driver for Academic Publishing?"


In a world where peer-reviewed content holds value for Generative AI companies, the question arises whether content that is locked behind a paywall has greater value than OA content. . . . Will publishers who still have a lot of content locked up, such as IEEE or NEJM, retain the most valuable assets? Will publishers that limit licensing to more restrictive terms such as CC BY-NC and CC BY-NC-ND have revenue streams denied to those exclusively using CC BY licenses? . . . Could authors receive income from their work via a CMO (Collective Management of Copyright) license, regardless of the agreement they have with the publisher?

https://tinyurl.com/zm6u5spc

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

OpenAI’s New Web Crawler: GPTBot

OpenAI has released a brief overview of GPTBot.

GPTBot is OpenAI’s web crawler and can be identified by the following user agent and string.

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Usage

Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.

Disallowing GPTBot

To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:

User-agent: GPTBot Disallow: /

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Artificial Intelligence in Subject-Specific Library Work: Trends, Perspectives, and Opportunities"


The general implications of AI for libraries are much discussed in library literature. But while this discussion takes place at the library-wide level, there are also important implications for subject librarians due to the specific uses of AI in different professions and areas of study. These are often overlooked as these specializations tend to publish in subject-specific journals. This article aims to address this research gap by providing a comparison and thematic analysis of this literature. Subject-specific library journals in the areas of law, health sciences, business, and humanities and social sciences were searched to identify relevant journal articles that discussed AI. 139 articles were identified and tagged with at least one category that reflected the nature of the discussion around AI. The following analysis showed that literature related to law had the greatest number of articles by far, though the publishing activity in all disciplines has increased significantly in the last 10 years. This article explores these trends to gain a more comprehensive understanding of the implications for subject-specific library work.

https://doi.org/10.33137/cjal-rcbu.v9.39951

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Powering Research with Dimensions AI Assistant"


Imagine using AI to leverage the power of Dimensions with the click of a button. That’s exactly what you can do with Dimensions AI Assistant: your interaction with the world’s research knowledge is assisted by a powerful AI that takes you beyond keywords to a semantically rich summary with references, fully contextualizing the results and linking them with the literature. Digital Science has announced a closed beta release of Dimensions AI Assistant, which will allow users to achieve their goals quicker by helping them find the most relevant research and receive relevant synposes, leveraging the power of the Dimensions large language model, Dimensions General Science-BERT, and Open AI’s GPT models.

https://tinyurl.com/4w2jfukt

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Elsevier takes Scopus to the Next Level with Generative AI"


Scopus AI will help early-career researchers and seasoned academics alike through:

  • Summarized views based on Scopus abstracts: Researchers obtain a concise and trustworthy snapshot of any research topic, complete with academic references, reducing lengthy reading time and the risk of hallucinations.
  • Easy navigation to “Go Deeper Links” for extended exploration: Scopus AI provides relevant queries for further exploration, leading to hidden insights in various research topics.
  • Natural language queries: Researchers can ask questions about a subject in a natural, conversational manner.
  • A soon-to-be-added graphical representation, offering new perspectives of interconnected research themes: Scopus AI visually maps search results, offering a comprehensive overview that allows researchers to navigate complex relationships easily.

https://tinyurl.com/27xxj465

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Human-AI Interaction for Exploratory Search & Recommender Systems with Application to Cultural Heritage "


This dissertation introduces three primary contributions through publicly deployed sys- tems and datasets. First, we demonstrate how the construction of large-scale cultural heritage datasets using machine learning can answer interdisciplinary questions in library & information science and the humanities (Chapter 2). Second, based on the feedback of users of these cultural heritage datasets, we introduce open faceted search, an extension of faceted search that leverages human-AI interaction affordances to empower users to define their own facets in an open domain fashion (Chapter 3). Third, encountering similar challenges with the deluge of scientific papers, we explore the question of how to improve recommender systems through human-AI interaction and tackle the broad challenge of advice taking for opaque machine learners (Chapter 4).

https://tinyurl.com/yc59txc5

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Generative AI and the Future of Work in America


By 2030, activities that account for up to 30 percent of hours currently worked across the US economy could be automated—a trend accelerated by generative AI. However, we see generative AI enhancing the way STEM, creative, and business and legal professionals work rather than eliminating a significant number of jobs outright. Automation’s biggest effects are likely to hit other job categories. Office support, customer service, and food service employment could continue to decline. . . .

An additional 12 million occupational transitions may be needed by 2030. As people leave shrinking occupations, the economy could reweight toward higher-wage jobs. Workers in lower-wage jobs are up to 14 times more likely to need to change occupations than those in highest-wage positions, and most will need additional skills to do so successfully. Women are 1.5 times more likely to need to move into new occupations than men.

https://tinyurl.com/yn2xdt7p

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |