"Meta Sets GPT-4 as the Bar for Its Next AI Model, Says a New Report"


The company reportedly plans to begin training the new large language model early in 2024, with CEO Mark Zuckerberg evidently pushing for it to once again be free for companies to create AI tools with. . . .

OpenAI said in April that it wasn’t training a GPT-5 and "won’t for some time," but Apple has reportedly been dumping millions of dollars daily into its own "Ajax" AI model that it apparently thinks is more powerful than even GPT-4.

https://tinyurl.com/5e85vyu6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Microsoft Offers Legal Protection for AI Copyright Infringement Challenges"


"Specifically, if a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products," writes Microsoft.

Further information: "Microsoft Announces New Copilot Copyright Commitment for Customers."

https://tinyurl.com/53x9yh6m

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Scholarship Has Released the Artificial Intelligence and Libraries Bibliography

The Artificial Intelligence and Libraries Bibliography includes over 125 selected English-language articles and books that are useful in understanding how libraries are exploring and adopting modern artificial intelligence (AI) technologies. It covers works from January 2018 through August 2023. It includes a Google Translate link. The bibliography is available as a website and a website PDF with live links.

Libraries have been exploring AI technology for a long time. In particular, there was an active period of experimentation from the mid-1980s through the mid-1990s that primarily focused on the use of expert systems. Many projects used expert system shells, which simplified development; however, some projects also used AI languages, such as Prolog. This period produced a significant number of library-related AI papers.

Subsequently, library interest in AI diminished until around 2018, when research activity increased.

The public release of generative AI systems in late 2022, such as ChatGPT, sparked a strong upsurge of interest in them and a rush to utilize their capabilities. Since these systems are relatively easy to use, this development may result in a significant new wave of library-oriented AI activity.

https://digital-scholarship.org/ai/ai-libraries.htm

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "AI Policies across the Globe: Implications and Recommendations for Libraries"


This article examines the proposed artificial intelligence policies of the USA, UK, European Union, Canada, and China, and their implications for libraries. . . . The article highlights key themes in these policies, including ethics, transparency, the balance between innovation and regulation, and data privacy. It also identifies areas for improvement, such as the need for specific guidelines on mitigating biases in artificial intelligence systems and navigating data privacy issues. The article further provides practical recommendations for libraries to engage with these policies and develop best practices for artificial intelligence use.

https://doi.org/10.1177/03400352231196172

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

ChatGPT Proof-of-Concept: "Searching for Meaning Rather Than Keywords and Returning Answers Rather Than Links"


Large language models (LLMs) have transformed the largest web search engines: for over ten years, public expectations of being able to search on meaning rather than just keywords have become increasingly realised. Expectations are now moving further: from a search query generating a list of "ten blue links" to producing an answer to a question, complete with citations.

This article describes a proof-of-concept that applies the latest search technology to library collections by implementing a semantic search across a collection of 45,000 newspaper articles from the National Library of Australia’s Trove repository, and using OpenAI’s ChatGPT4 API to generate answers to questions on that collection that include source article citations. It also describes some techniques used to scale semantic search to a collection of 220 million articles.

https://journal.code4lib.org/articles/17443

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Pew Research Center: "Most Americans Haven’t Used ChatGPT; Few Think It Will Have a Major Impact on Their Job"


In the Center’s new survey, about half or more of those who have heard of ChatGPT say chatbots will have a major impact on software engineers (56%), graphic designers (54%) and journalists (52%) over the next 20 years. Smaller shares think chatbots will have a major effect on teachers (44%) or lawyers (31%).

But Americans are less likely to think chatbots will impact their own job. Some 19% of employed adults who have heard of ChatGPT think chatbots will have a major impact on their job. Another 36% say it will have a minor impact and 27% expect no impact at all.

https://tinyurl.com/jydrjtjv

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Deepfakes and Scientific Knowledge Dissemination"


Science misinformation on topics ranging from climate change to vaccines have significant public policy repercussions. Artificial intelligence-based methods of altering videos and photos (deepfakes) lower the barriers to the mass creation and dissemination of realistic, manipulated digital content. The risk of exposure to deepfakes among education stakeholders has increased as learners and educators rely on videos to obtain and share information. We field the first study to understand the vulnerabilities of education stakeholders to science deepfakes and the characteristics that moderate vulnerability. We ground our study in climate change and survey individuals from five populations spanning students, educators, and the adult public. Our sample is nationally representative of three populations. We found that 27–50% of individuals cannot distinguish authentic videos from deepfakes. All populations exhibit vulnerability to deepfakes which increases with age and trust in information sources but has a mixed relationship with political orientation. Adults and educators exhibit greater vulnerability compared to students, indicating that those providing education are especially susceptible. Vulnerability increases with exposure to potential deepfakes, suggesting that deepfakes become more pernicious without interventions. Our results suggest that focusing on the social context in which deepfakes reside is one promising strategy for combatting deepfakes.

https://doi.org/10.1038/s41598-023-39944-3

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI, the New Frontier — Opportunities and Challenges"


Artificial intelligence (AI) is currently all the rage in our global economy. The launch of ChatGPT broke all of the records for user adoption – Reuters reported that ChatGPT achieved 100 million users in two months. . . .

Within scholarly publishing, we have ushered in the internet, digital journals, and books, and now we are witnessing first-hand the benefits of AI, semantic search, IoT, and WEB3. This article aims to provide a context of the history of AI, the opportunities, challenges, new services, and governance.

https://tinyurl.com/yfmew3r8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Pew Research Center: What Americans Know About AI, Cybersecurity and Big Tech


Overall, Americans answer a median of five out of nine questions correctly on a digital knowledge survey that Pew Research Center conducted among 5,101 U.S. adults from May 15 to May 21, 2023. The questions span a range of topics, including cybersecurity practices, facts about major technology companies, artificial intelligence and federal online privacy laws.

Some 26% of U.S. adults can answer at least seven of the nine questions accurately, but just 4% can correctly answer all nine.

https://tinyurl.com/582bwmf3

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Generative AI, ChatGPT, and Google Bard: Evaluating the Impact and Opportunities for Scholarly Publishing"


My group within Wiley Partner Solutions designs and develops intelligent services that leverage advanced AI, big data, and cloud technologies to support publishers and researchers in open access and open science environments. To identify both benefits and risks of generative AI for our industry, we tested ChatGPT and Google Bard for authoring, for submission and reviews, for publishing, and for discovery and dissemination. I hope that our findings will inspire you to find fresh ideas for using Generative AI, and will stimulate further conversation about this new and controversial but potentially beneficial tool.

https://tinyurl.com/2y2ue6zr

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The New York Times Prohibits AI Vendors from Devouring Its Content"


The new terms prohibit the use of Times content—which includes articles, videos, images, and metadata—for training any AI model without express written permission. In Section 2.1 of the TOS, the NYT says that its content is for the reader’s “personal, non-commercial use” and that non-commercial use does not include “the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

https://tinyurl.com/2cc4uhuc

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Sites Scramble to Block ChatGPT Web Crawler after Instructions Emerge"


But for large website operators, the choice to block large language model (LLM) crawlers isn’t as easy as it may seem. Making some LLMs blind to certain website data will leave gaps of knowledge that could serve some sites very well (such as sites that don’t want to lose visitors if ChatGPT supplies their information for them), but it may also hurt others. For example, blocking content from future AI models could decrease a site’s or a brand’s cultural footprint if AI chatbots become a primary user interface in the future. As a thought experiment, imagine an online business declaring that it didn’t want its website indexed by Google in the year 2002—a self-defeating move when that was the most popular on-ramp for finding information online.

https://tinyurl.com/yc4mcejn

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"AI Can Crack Double Blind Peer Review — Should We Still Use It?"


However, in the era of artificial intelligence (AI) and big data, a pressing question arises: can an author’s identity be deduced even from an anonymized paper (in cases where the authors do not advertise their submitted article on social media)?

In a recent article we investigate this very question, by leveraging an artificial intelligence model trained on the largest authorship attribution dataset to date. . . . Focusing purely on well-established researchers with at least a few dozen publications, our work demonstrates that reliable author identification is possible.

https://tinyurl.com/2kbuh7wn

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Will Building LLMs [AI Large Language Models] Become the New Revenue Driver for Academic Publishing?"


In a world where peer-reviewed content holds value for Generative AI companies, the question arises whether content that is locked behind a paywall has greater value than OA content. . . . Will publishers who still have a lot of content locked up, such as IEEE or NEJM, retain the most valuable assets? Will publishers that limit licensing to more restrictive terms such as CC BY-NC and CC BY-NC-ND have revenue streams denied to those exclusively using CC BY licenses? . . . Could authors receive income from their work via a CMO (Collective Management of Copyright) license, regardless of the agreement they have with the publisher?

https://tinyurl.com/zm6u5spc

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

OpenAI’s New Web Crawler: GPTBot

OpenAI has released a brief overview of GPTBot.

GPTBot is OpenAI’s web crawler and can be identified by the following user agent and string.

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Usage

Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.

Disallowing GPTBot

To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:

User-agent: GPTBot Disallow: /

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Artificial Intelligence in Subject-Specific Library Work: Trends, Perspectives, and Opportunities"


The general implications of AI for libraries are much discussed in library literature. But while this discussion takes place at the library-wide level, there are also important implications for subject librarians due to the specific uses of AI in different professions and areas of study. These are often overlooked as these specializations tend to publish in subject-specific journals. This article aims to address this research gap by providing a comparison and thematic analysis of this literature. Subject-specific library journals in the areas of law, health sciences, business, and humanities and social sciences were searched to identify relevant journal articles that discussed AI. 139 articles were identified and tagged with at least one category that reflected the nature of the discussion around AI. The following analysis showed that literature related to law had the greatest number of articles by far, though the publishing activity in all disciplines has increased significantly in the last 10 years. This article explores these trends to gain a more comprehensive understanding of the implications for subject-specific library work.

https://doi.org/10.33137/cjal-rcbu.v9.39951

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Powering Research with Dimensions AI Assistant"


Imagine using AI to leverage the power of Dimensions with the click of a button. That’s exactly what you can do with Dimensions AI Assistant: your interaction with the world’s research knowledge is assisted by a powerful AI that takes you beyond keywords to a semantically rich summary with references, fully contextualizing the results and linking them with the literature. Digital Science has announced a closed beta release of Dimensions AI Assistant, which will allow users to achieve their goals quicker by helping them find the most relevant research and receive relevant synposes, leveraging the power of the Dimensions large language model, Dimensions General Science-BERT, and Open AI’s GPT models.

https://tinyurl.com/4w2jfukt

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Elsevier takes Scopus to the Next Level with Generative AI"


Scopus AI will help early-career researchers and seasoned academics alike through:

  • Summarized views based on Scopus abstracts: Researchers obtain a concise and trustworthy snapshot of any research topic, complete with academic references, reducing lengthy reading time and the risk of hallucinations.
  • Easy navigation to “Go Deeper Links” for extended exploration: Scopus AI provides relevant queries for further exploration, leading to hidden insights in various research topics.
  • Natural language queries: Researchers can ask questions about a subject in a natural, conversational manner.
  • A soon-to-be-added graphical representation, offering new perspectives of interconnected research themes: Scopus AI visually maps search results, offering a comprehensive overview that allows researchers to navigate complex relationships easily.

https://tinyurl.com/27xxj465

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Human-AI Interaction for Exploratory Search & Recommender Systems with Application to Cultural Heritage "


This dissertation introduces three primary contributions through publicly deployed sys- tems and datasets. First, we demonstrate how the construction of large-scale cultural heritage datasets using machine learning can answer interdisciplinary questions in library & information science and the humanities (Chapter 2). Second, based on the feedback of users of these cultural heritage datasets, we introduce open faceted search, an extension of faceted search that leverages human-AI interaction affordances to empower users to define their own facets in an open domain fashion (Chapter 3). Third, encountering similar challenges with the deluge of scientific papers, we explore the question of how to improve recommender systems through human-AI interaction and tackle the broad challenge of advice taking for opaque machine learners (Chapter 4).

https://tinyurl.com/yc59txc5

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Generative AI and the Future of Work in America


By 2030, activities that account for up to 30 percent of hours currently worked across the US economy could be automated—a trend accelerated by generative AI. However, we see generative AI enhancing the way STEM, creative, and business and legal professionals work rather than eliminating a significant number of jobs outright. Automation’s biggest effects are likely to hit other job categories. Office support, customer service, and food service employment could continue to decline. . . .

An additional 12 million occupational transitions may be needed by 2030. As people leave shrinking occupations, the economy could reweight toward higher-wage jobs. Workers in lower-wage jobs are up to 14 times more likely to need to change occupations than those in highest-wage positions, and most will need additional skills to do so successfully. Women are 1.5 times more likely to need to move into new occupations than men.

https://tinyurl.com/yn2xdt7p

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "An Initial Interpretation of the U.S. Department of Education’s AI Report: Implications and Recommendations for Academic Libraries"


This article provides an analysis of the U.S. Department of Education’s report on Artificial Intelligence (AI) and its implications for academic libraries. It delves into the report’s key points, including the importance of AI literacy, the need for educator involvement in AI design and implementation, and the necessity of preparing for AI related issues. The author discusses how these points impact academic libraries and offers actionable recommendations for library leaders. It emphasizes the need for libraries to promote AI literacy, involve librarians in AI implementation, develop guidelines for AI use, prepare for AI issues, and collaborate with other stakeholders.

https://doi.org/10.1016/j.acalib.2023.102761

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Reproducibility in Machine Learning-Driven Research"


Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.

https://arxiv.org/abs/2307.10320

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Analyzing and Navigating Electronic Theses and Dissertations"


This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation.

https://tinyurl.com/33ay562h

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Webinar Recording: "ACRL LDG A Mutualistic View of AI in the Library or a Continuation of Craft by Thomas Padilla"


During this session, Thomas Padilla [Deputy Director, Archiving and Data Services at the Internet Archive] will present a critical and generative position aimed at empowering GLAM professionals on their journey to develop a mutually beneficial relationship with AI. The discussion will cover the individual, organizational, and community impacts of AI in the library landscape.

https://www.youtube.com/watch?v=hh5PTyBT6AA

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Wikipedia’s Moment of Truth"


The new A.I. chatbots have typically swallowed Wikipedia’s corpus. . . . While estimates of its influence can vary, Wikipedia is probably the most important single source in the training of A.I. models. "Without Wikipedia, generative A.I. wouldn’t exist," says Nicholas Vincent, Yet as bots like ChatGPT become increasingly popular and sophisticated, Vincent and some of his colleagues wonder what will happen if Wikipedia, outflanked by A.I. that has cannibalized it, suffers from disuse and dereliction.

https://tinyurl.com/bdbxrdbk

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |