"Google Shows off AI Tool for Reading Handwritten Text by Rewriting It Digitally"


Imagine writing by hand in a paper notebook, then showing the notes to your camera to instantly make them searchable and organize them in context with previous notes on physical pages. If you’re like me and have particularly messy handwriting, InkSight could help turn your chicken scratch into typewritten text that is still accurate to what you scribble.

https://tinyurl.com/2dt685ba

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Chatbot Optimisation Game: Can We Trust AI Web Searches?"


Those wanting a firmer grip on chatbots, then, may have to explore more underhand techniques, such as the one discovered by two computer-science researchers at Harvard University. They’ve demonstrated how chatbots can be tactically controlled by deploying something as simple as a carefully written string of text. This “strategic text sequence” looks like a nonsensical series of characters – all random letters and punctuation – but is actually a delicate command that can strong-arm chatbots into generating a specific response. Not part of a programming language, it’s derived using an algorithm that iteratively develops text sequences that encourage LLMs to ignore their safety guardrails – and steer them towards particular outputs.

https://tinyurl.com/2wuvuur9

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI Can Carry out Qualitative Research at Unprecedented Scale"


We have developed and launched an easy-to-use platform for conducting large-scale qualitative interviews, based on artificial intelligence in just this way. A chat interface allows the respondent to interact with a LLM that collects their responses and generates new questions. . . .

First, we asked a team of sociology PhD students from Harvard and the London School of Economics, who specialise in qualitative methods, to assess the quality of interviews based on the interview scripts. The AI-led interviews were rated approximately comparable to an average human expert (under the same conditions). . . . A vast majority of participants reported enjoying their interaction with the conversational agent and preferred this mode of interview over open text fields.

https://tinyurl.com/mry3vrat

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Ithaka S+R: A Third Transformation? Generative AI and Scholarly Publishing


What is not yet clear is how disruptive this [AI] growth will be. To this end, we interviewed 12 leaders in stakeholder communities ranging from large publishers and technology disruptors to academic librarians and scholars. The consensus among the individuals with whom we spoke is that generative AI will enable efficiency gains across the publication process. Writing, reviewing, editing, and discovery will all become easier and faster. Both scholarly publishing and scientific discovery in turn will likely accelerate as a result of AI-enhanced research methods. From that shared premise, two distinct categories of change emerged from our interviews. In the first and most commonly described future, the efficiency gains made publishing function better but did not fundamentally alter its dynamics or purpose. In the second, much hazier scenario, generative AI created a transformative wave that could dwarf the impacts of either the first or second digital transformations [URL added].

https://doi.org/10.18665/sr.321519

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "A ‘Delve’ into the Evidence of AI in Production of Academic Business Literature"


The author performed a t-test using the average growth rates of articles published in the database ProQuest ABI/INFORM Global containing keywords or phrases purported to be commonly used in content generated by AI during the years before and after common generative AI availability. Results show evidence that publication rates after generative AI availability experienced an improbably high deviation from the norm.

https://doi.org/10.1080/08963568.2024.2420300

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"‘Massive Copyright Violation’ Threatens One of the World’s Hottest AI Apps"


News Corp has officially filed a lawsuit against Perplexity AI over accusations that the startup has committed copyright infringement on a “massive scale.” . . .

Perplexity’s value proposition is instead to insert itself between search and content producers as a middleman, training its AI on copyrighted content that its chatbot will then regurgitate. . . to its own paying customers, without compensating or attributing the original content producers. . . .

https://tinyurl.com/y2h5fpeu

Perplexity

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Publishers Join with Worldwide Coalition to Condemn the Theft of Creative and Intellectual Authorship by Tech Companies for Generative AI Training"


Today, the Association of American Publishers (AAP) joined forces with more than 10,000 creators and coalition partners, including authors, musicians, actors, artists, and photographers, to condemn the theft of creative and intellectual authorship by big tech companies for use in their Generative AI models. In fact, these consumer-facing models and tools would not exist without the books, newspapers, songs, performances, and other invaluable human expressions that were—and continue to be—copied, ingested, and regenerated in blatant disregard of the law.

https://tinyurl.com/4e37e3ff

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Microsoft and OpenAI’s Close Partnership Shows Signs of Fraying"


When OpenAI got its giant investment from Microsoft, it agreed to an exclusive deal to buy computing power from Microsoft and work closely with the tech giant on new A.I.. . . .

OpenAI employees complain that Microsoft is not providing enough computing power. . . some have complained that if another company beat it to the creation of A.I. that matches the human brain, Microsoft will be to blame because it hasn’t given OpenAI the computing power it needs. . . .

The contract contains a clause that says that if OpenAI builds artificial general intelligence, or A.G.I. — roughly speaking, a machine that matches the power of the human brain — Microsoft loses access to OpenAI’s technologies.

The clause was meant to ensure that a company like Microsoft did not misuse this machine of the future, but today, OpenAI executives see it as a path to a better contract. . . Under the terms of the contract, the OpenAI board could decide when A.G.I. has arrived.

https://tinyurl.com/y5mjr66d

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Large Language Publishing: The Scholarly Publishing Oligopoly’s Bet on AI"


This article focuses on an offshoot of the big firms’ surveillance-publishing businesses: the post-ChatGPT imperative top profit from troves of proprietary “training data,” to make new AI products and—the essay predicts—to license academic papers and scholars’ tracked behavior to big technology companies.

https://tinyurl.com/ft2467my

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Virginia Tech and UC Riverside: "University Libraries Receives Grant to Create Generative Artificial Intelligence Incubator Program"


University Libraries at Virginia Tech and the University of California, Riverside, received a $115,398 Institute of Museum and Library Services grant to create a generative artificial intelligence incubator program (GenAI) to increase the adoption of artificial intelligence (AI) in the library profession and academic libraries. . . .

[Yinlin] Chen [assistant director for the Center for Digital Research and Scholarship at Virginia Tech] will use his expertise in advanced GenAI techniques and multidisciplinary AI research in his collaboration with Edward Fox, co-principal investigator and director of the digital library research laboratory at Virginia Tech and computer science professor, and Zhiwu Xie, co-principal investigator and assistant university librarian for research and technology at the University of California, Riverside, to create the generative artificial intelligence incubator program. They will build training materials, workshops, and projects to assist librarians in becoming AI practitioners.

https://tinyurl.com/3sysn284

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"In Which Fields Can ChatGPT Detect Journal Article Quality? An Evaluation of REF2021 Results"


Time spent by academics on research quality assessment might be reduced if automated approaches can help. Whilst citation-based indicators have been extensively developed and evaluated for this, they have substantial limitations and Large Language Models (LLMs) like ChatGPT provide an alternative approach. This article assesses whether ChatGPT 4o-mini can be used to estimate the quality of journal articles across academia. It samples up to 200 articles from all 34 Units of Assessment (UoAs) in the UK’s Research Excellence Framework (REF) 2021, comparing ChatGPT scores with departmental average scores. There was an almost universally positive Spearman correlation between ChatGPT scores and departmental averages, varying between 0.08 (Philosophy) and 0.78 (Psychology, Psychiatry and Neuroscience), except for Clinical Medicine (rho=-0.12). Although other explanations are possible, especially because REF score profiles are public, the results suggest that LLMs can provide reasonable research quality estimates in most areas of science, and particularly the physical and health sciences and engineering, even before citation data is available. Nevertheless, ChatGPT assessments seem to be more positive for most health and physical sciences than for other fields, a concern for multidisciplinary assessments, and the ChatGPT scores are only based on titles and abstracts, so cannot be research evaluations.

https://arxiv.org/abs/2409.16695

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Hacker Plants False Memories in ChatGPT to Steal User Data in Perpetuity"


Within three months of the rollout [of a long-term conversation memory feature], [Johann] Rehberger found that memories could be created and permanently stored through indirect prompt injection, an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The [security] researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations.. . .

While OpenAI has introduced a fix that prevents memories from being abused as an exfiltration vector, the researcher said, untrusted content can still perform prompt injections that cause the memory tool to store long-term information planted by a malicious attacker.

https://tinyurl.com/bddcxjj4

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Federal Reserve Bank of St. Louis: The Rapid Adoption of Generative AI


Figure 2 presents our main results. The first bar shows that 39.4 percent of all August 2024 RPS respondents say that they used generative AI, either at work or at home. About 32 percent of respondents reported using generative AI at least once in the week prior to the survey, while 10.6 percent reported using it every day last week. About 28 percent of employed respondents used generative AI at work in August 2024, with the vast majority (24.1 percent) using it at least once in the last week and 10.9 percent using it daily. Usage outside of work was more common (32.7 percent), but slightly less intensive, with 25.9 percent using it at least once in the last week and 6.4 percent using it every day. Appendix Figure A.1 presents the share of respondents using specific generative AI products. ChatGPT is used most often (28.5 percent), followed by Google Gemini (16.3 percent).

https://tinyurl.com/mfhr6ujr

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

" It Takes a Village A Distributed Training Model for AI-Based Chatbots "


The introduction of Large Language Models (LLM) to the chatbot landscape has opened intriguing possibilities for academic libraries to offer more responsive and institutionally contextualized support to users, especially outside of regular service hours. While a few academic libraries currently employ AI-based chatbots on their websites, this service has not yet become the norm and there are no best practices in place for how academic libraries should launch, train, and assess the usefulness of a chatbot. In summer 2023, staff from the University of Delaware’s Morris Library information technology (IT) and reference departments came together in a unique partnership to pilot a low-cost AI-powered chatbot called UDStax. The goals of the pilot were to learn more about the campus community’s interest in engaging with this tool and to better understand the labor required on the staff side to maintain the bot. After researching six different options, the team selected Chatbase, a subscription-model product based on ChatGPT 3.5 that provides user-friendly training methods for an AI model using website URLs and uploaded source material. Chatbase removed the need to utilize the OpenAI API directly to code processes for submitting information to the AI engine to train the model, cutting down the amount of work for library information technology and making it possible to leverage the expertise of reference librarians and other public-facing staff, including student workers, to distribute the work of developing, refining, and reviewing training materials. This article will discuss the development of prompts, leveraging of existing data sources for training materials, and workflows involved in the pilot. It will argue that, when implementing AI-based tools in the academic library, involving staff from across the organization is essential to ensure buy-in and success. Although chatbots are designed to hide the effort of the people behind them, that labor is substantial and needs to be recognized.

https://tinyurl.com/3y654j2r

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

" Responsible AI Practice in Libraries and Archives: A Review of the Literature "


Artificial intelligence (AI) has the potential to positively impact library and archives collections and services—enhancing reference, instruction, metadata creation, recommendations, and more. However, AI also has ethical implications. This paper presents an extensive literature and review analysis that examines AI projects implemented in library and archives settings, asking the following research questions: RQ1: How is artificial intelligence being used in libraries and archives practice? RQ2: What ethical concerns are being identified and addressed during AI implementation in libraries and archives? The results of this literature review show that AI implementation is growing in libraries and archives and that practitioners are using AI for increasingly varied purposes. We found that AI implementation was most common in large, academic libraries. Materials used in AI projects usually involved digitized and born digital text and images, though materials also ranged to include web archives, electronic theses and dissertations (ETDs), and maps. AI was most often used for metadata extraction and reference and research services. Just over half of the papers included in the literature review mentioned ethics or values related issues in their discussions of AI implementation in libraries and archives, and only one-third of all resources discussed ethical issues beyond technical issues of accuracy and human-in-the-loop. Case studies relating to AI in libraries and archives are on the rise, and we expect subsequent discussions of relevant ethics and values to follow suit, particularly growing in the areas of cost considerations, transparency, reliability, policy and guidelines, bias, social justice, user communities, privacy, consent, accessibility, and access. As AI comes into more common usage, it will benefit the library and archives professions to not only consider ethics when implementing local projects, but to publicly discuss these ethical considerations in shared documentation and publications.

https://tinyurl.com/2t6ykuyv

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Clarivate Launches Generative AI-Powered Primo Research Assistant"


Key features include:

  • Semantic search and natural language queries: Users can interact with the system using everyday language, making the search process more intuitive.
  • AI-powered answers with references to sources used: The tool provides immediate answers based on the top five abstracts, with links to the full text and the complete result list.
  • Search suggestions: The assistant offers suggestions to help users expand their topics and delve deeper into their research.
  • Non-English query support: Users can ask questions and receive answers in multiple non-English languages.

https://tinyurl.com/bdcnbku3

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Academic Writing in the Age of AI: Comparing the Reliability of ChatGPT and Bard with Scopus and Web of Science"


ChatGPT and Bard (now known as Gemini) are becoming indispensable resources for researchers, academicians and diverse stakeholders within the academic landscape. At the same time, traditional digital tools such as scholarly databases continue to be widely used. Web of Science and Scopus are the most extensive academic databases and are generally regarded as consistently reliable scholarly research resources. With the increasing acceptance of artificial intelligence (AI) in academic writing, this study focuses on understanding the reliability of the new AI models compared to Scopus and Web of Science. The study includes a bibliometric analysis of green, sustainable and ecological buying behaviour, covering the period from 1 January 2011 to 21 May 2023. These results are used to compare the results from the AI and the traditional scholarly databases on several parameters. Overall, the findings suggest that AI models like ChatGPT and Bard are not yet reliable for academic writing tasks. It appears to be too early to depend on AI for such tasks.

https://doi.org/10.1016/j.jik.2024.100563

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Introducing OpenAI o1-preview: A New Series of Reasoning Models for Solving Hard Problems"


In our tests, the next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o [the last model] correctly solved only 13% of problems, while the [new]reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions. . . .

As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT-4o will be more capable in the near term.

https://tinyurl.com/5ap6p996

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Reshaping Academic Library Information Literacy Programs in the Advent of ChatGPT and Other Generative AI Technologies"


This article reports on three digital information literacy initiatives created by instruction librarians to support students’ use of generative AI technologies, namely ChatGPT, in academic library research. The cumulative and formative data gathered from the initiatives reveals a continuing need for academic libraries to provide information literacy instruction that guides students toward the ethical use of information and awareness of using generative AI tools in library research.

https://doi.org/10.1080/10875301.2024.2400132

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The AI-Copyright Trap"


As AI tools proliferate, policy makers are increasingly being called upon to protect creators and the cultural industries from the extractive, exploitative, and even existential threats posed by generative AI. In their haste to act, however, they risk running headlong into the Copyright Trap: the mistaken conviction that copyright law is the best tool to support human creators and culture in our new technological reality (when in fact it is likely to do more harm than good). It is a trap in the sense that it may satisfy the wants of a small group of powerful stakeholders, but it will harm the interests of the more vulnerable actors who are, perhaps, most drawn to it. Once entered, it will also prove practically impossible to escape. I identify three routes in to the copyright trap in current AI debates: first is the “if value, then (property) right” fallacy; second is the idea that unauthorized copying is inherently wrongful; and third is the resurrection of the starving artist trope to justify copyright’s expansion.

https://tinyurl.com/bdett6ue

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Datacenters to Emit 3X More Carbon Dioxide Because of Generative AI"


The datacenter industry is set to emit 2.5 billion tonnes of greenhouse gas (GHG) emissions worldwide between now and the end of the decade, three times more than if generative AI had not been developed.

https://tinyurl.com/4vatmm8a

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Clarivate Report Unveils the Transformative Role of Artificial Intelligence on Shaping the Future of the Library"


The report combines feedback from a survey of more than 1,500 librarians from across the world with qualitative interviews, covering academic, national and public libraries. In addition to the downloadable report, the accompanying microsite’s dynamic and interactive data visualizations enable rapid comparative analyses according to regions and library types. . . .

Key findings of the report include:

  • Most libraries have an AI plan in place, or one in progress: Over 60% of respondents are evaluating or planning for AI integration.
  • AI adoption is the top tech priority: AI-powered tools for library users and patrons top the list of technology priorities for the next 12 months, according to 43% of respondents.
  • AI is advancing library missions: Key goals for those evaluating or implementing AI include supporting student learning (52%), research excellence (47%) and content discoverability (45%), aligning closely with the mission of libraries.
  • Librarians see promise and pitfalls in AI adoption: 42% believe AI can automate routine tasks, freeing librarians for strategic and creative activities. Levels of optimism vary regionally.
  • AI skills gaps and shrinking budgets are top concerns. Lack of expertise and budget constraints are seen as greater challenges than privacy and security issues: — Shrinking budgets: Almost half (47%) cite shrinking budgets as their greatest challenge. — Skills gap: 52% of respondents see upskilling as AI’s biggest impact on employment, yet nearly a third (32%) state that no training is available.
  • AI advancement will be led by IT: By combining the expertise of heads of IT with strategic investment and direction from senior leadership, libraries can move from consideration to implementation of AI in the coming years.
  • Regional priorities differ: Librarians’ views on other key topics such as sustainability, diversity, open access and open science show notable regional diversity.

https://tinyurl.com/9azeessa

Pulse of the Library report

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The AI Copyright Hype: Legal Claims That Didn’t Hold Up"


Over the past year, two dozen AI-related lawsuits and their myriad infringement claims have been winding their way through the court system. None have yet reached a jury trial. While we all anxiously await court rulings that can inform our future interaction with generative AI models, in the past few weeks, we are suddenly flooded by news reports with titles such as “US Artists Score Victory in Landmark AI Copyright Case,” “Artists Land a Win in Class Action Lawsuit Against A.I. Companies,” “Artists Score Major Win in Copyright Case Against AI Art Generators”—and the list goes on. The exuberant mood in these headlines mirror the enthusiasm of people actually involved in this particular case (Andersen v. Stability AI). The plaintiffs’ lawyer calls the court’s decision “a significant step forward for the case.” “We won BIG,” writes the plaintiff on X.

In this blog post, we’ll explore the reality behind these headlines and statements. The “BIG” win in fact describes a portion of the plaintiffs’ claims surviving a pretrial motion to dismiss. If you are already familiar with the motion to dismiss per Federal Rules of Civil Procedure Rule 12(b)(6), please refer to Part II to find out what types of claims have been dismissed early on in the AI lawsuits.

https://tinyurl.com/rhmzkr8y

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI Models Collapse When Trained on Recursively Generated Data"


Yet, although current LLMs. . ., including GPT-3, were trained on predominantly human-generated text, this may change. If the training data of most future models are also scraped from the web, then they will inevitably train on data produced by their predecessors. In this paper, we investigate what happens when text produced by, for example, a version of GPT forms most of the training dataset of following models. . . .

Model collapse is a degenerative process affecting generations of learned generative models, in which the data they generate end up polluting the training set of the next generation. Being trained on polluted data, they then mis-perceive reality. . . .

In our work, we demonstrate that training on samples from another generative model can induce a distribution shift, which—over time—causes model collapse. This in turn causes the model to mis-perceive the underlying learning task. To sustain learning over a long period of time, we need to make sure that access to the original data source is preserved and that further data not generated by LLMs remain available over time. The need to distinguish data generated by LLMs from other data raises questions about the provenance of content that is crawled from the Internet: it is unclear how content generated by LLMs can be tracked at scale. One option is community-wide coordination to ensure that different parties involved in LLM creation and deployment share the information needed to resolve questions of provenance. Otherwise, it may become increasingly difficult to train newer versions of LLMs without access to data that were crawled from the Internet before the mass adoption of the technology or direct access to data generated by humans at scale.

https://doi.org/10.1038/s41586-024-07566-y

See also: “When A.I.’s Output Is a Threat to A.I. Itself.”

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |