Ithaka S+R: "Generative AI and Scholarly Publishing: Announcing a New Research Project"


To help, Ithaka S+R is launching a new study of the strategic implications of generative AI for scholarly publishing, with support from STM Solutions and a group of its members. The following key questions will guide our inquiry:

  • Will generative AI be integrated into the existing goals, processes, and infrastructures for scholarly publishing? Or, does this represent a transformative technology that will require fundamental restructuring of those goals, processes, and infrastructures?
  • Could generative AI effectively render our current assumptions about the role and purpose of publishers obsolete? What new roles could publishers play in a radically transformed information environment?
  • Which potential transformations should publishers encourage, and which risks require immediate coordinated responses while the technology is still taking root in the sector?
  • What new kinds of shared technical and/or social infrastructure are needed to support the ethical adoption of generative AI in support of the goals of scholarship and scholarly publishing? What systems and structures will be necessary to balance the needs of authors, readers, rights holders, publishers, and aggregators?

https://tinyurl.com/2s432pfh

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Rethinking Copyright Exceptions in the Era of Generative AI: Balancing Innovation and Intellectual Property Protection"


In response to these identified [copyright and AI] challenges, this paper proposes a hybrid model for TDM exceptions emerges, along with recommended specific mechanisms. The model divides exceptions into noncommercial and commercial uses, providing a nuanced solution to complex copyright issues in AI training. Recommendations incorporate mandatory exceptions for noncommercial uses, an opt-out clause for commercial uses, enhanced transparency measures, and a searchable portal for copyright owners. In conclusion, striking a delicate equilibrium between technological progress and the incentive for creative expression is of paramount importance. These suggested solutions aim to establish a harmonious foundation that nurtures innovation and creativity while honoring creators’ rights, facilitating AI development, promoting transparency, and ensuring fair compensation for creators.

https://doi.org/10.1111/jwip.12301

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Emerging AI Divide in the United States"


In this study, we characterize spatial differences in U.S. residents’ knowledge of a new generative AI tool, ChatGPT, through an analysis of state- and county-level search query data. In the first six months after the tool’s release, we observe the highest rates of users searching for ChatGPT in West Coast states and persistently low rates of search in Appalachian and Gulf states. Counties with the highest rates of search are relatively more urbanized and have proportionally more educated, more economically advantaged, and more Asian residents in comparison with other counties or with the U.S. average. In multilevel models adjusting for socioeconomic and demographic factors as well as industry makeup, education is the strongest positive predictor of rates of search for generative AI tooling. Although generative AI technologies may be novel, early differences in uptake appear to be following familiar paths of digital marginalization.

https://arxiv.org/abs/2404.11988

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"More CNI Spring 24′ Meeting Videos Live"

CNI has released eight new videos from its Spring 2024 meeting.

Here are three examples:

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Scholarship and DigitalKoans Are Now 19 Years Old

Digital Scholarship and DigitalKoans were established on 4/20/2005. Digital Scholarship provides information and commentary about artificial intelligence, digital copyright, digital curation, open access, research data management, scholarly communication, and other digital information issues. Digital Scholarship is an open access noncommercial publisher. All of its publications are currently under a Creative Commons Attribution License.

DigitalKoans has published over 16,200 posts. Since 2008, over 5,600 job ads have been posted, with slightly over 4,000 of them for digital library jobs.

Digital Scholarship has published the following books and book supplements: the Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (2005; published with the Association of Research Libraries), the Scholarly Electronic Publishing Bibliography: 2008 Annual Edition (2009), Digital Scholarship 2009 (2010), Transforming Scholarly Publishing through Open Access: A Bibliography (2010), the Scholarly Electronic Publishing Bibliography 2010 (2011), the Digital Curation and Preservation Bibliography 2010 (2011), the Institutional Repository and ETD Bibliography 2011 (2011), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works (2012), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement (2013), and the Research Data Curation and Management Bibliography (2021).

It has also published and updated the following bibliographies, webliographies, and weblogs: the Scholarly Electronic Publishing Bibliography (1996-2011), the Scholarly Electronic Publishing Weblog (2001-2013), the Electronic Theses and Dissertations Bibliography (2005-2021), the Google Books Bibliography (2005-2011), the Institutional Repository Bibliography (2009-2011), the Open Access Journals Bibliography (2010), the Digital Curation and Preservation Bibliography (2010-2011), the E-science and Academic Libraries Bibliography (2011), the Digital Curation Resource Guide (2012), the Research Data Curation Bibliography (2012-2019), the Altmetrics Bibliography (2013), the Transforming Peer Review Bibliography (2014), the Academic Library as Scholarly Publisher Bibliography (2018-2023), the Research Data Sharing and Reuse Bibliography (2021), the Research Data Publication and Citation Bibliography (2022), Digital Curation Certificate and Master’s Degree Programs (2023), the Academic Libraries and Research Data Management Bibliography (2023), and the Artificial Intelligence and Libraries Bibliography (2023).

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Author Granted Copyright over Book with AI-Generated Text—with a Twist"


The USCO’s notice granting Shupe copyright registration of her book does not recognize her as author of the whole text as is conventional for written works. Instead she is considered the author of the "selection, coordination, and arrangement of text generated by artificial intelligence." This means no one can copy the book without permission, but the actual sentences and paragraphs themselves are not copyrighted and could theoretically be rearranged and republished as a different book.

https://tinyurl.com/bd97jbw6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Stanford: Artifical Intelligence Index Report 2024


AI has surpassed human performance on several benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning. . . .

According to AI Index estimates, the training costs of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute. . . .

New research from the AI Index reveals a significant lack of standardization in responsible AI reporting. Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models. . . .

Despite a decline in overall AI private investment last year, funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion.

https://tinyurl.com/53wsjxyj

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Is ChatGPT Transforming Academics’ Writing Style?"


Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT’s writing style in their abstracts by means of a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. We find that ChatGPT is having an increasing impact on arXiv abstracts, especially in the field of computer science, where the fraction of ChatGPT-revised abstracts is estimated to be approximately 35%, if we take the output of one of the simplest prompts, "revise the following sentences", as a baseline. We conclude with an analysis of both positive and negative aspects of the penetration of ChatGPT into academics’ writing style.

https://arxiv.org/abs/2404.08627v1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Generative AI Can Turn Your Most Precious Memories into Photos That Never Existed"


Dozens of people have now had their memories turned into images in this way via Synthetic Memories, a project run by Domestic Data Streamers. The studio uses generative image models, such as OpenAI’s DALL-E, to bring people’s memories to life. Since 2022, the studio, which has received funding from the UN and Google, has been working with immigrant and refugee communities around the world to create images of scenes that have never been photographed, or to re-create photos that were lost when families left their previous homes.

https://tinyurl.com/yekzh6sy

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Is ChatGPT Corrupting Peer Review? Telltale Words Hint at AI Use"


A study that identified buzzword adjectives that could be hallmarks of AI-written text in peer-review reports suggests that researchers are turning to ChatGPT and other artificial intelligence (AI) tools to evaluate others’ work. . . .

Their analysis suggests that up to 17% of the peer-review reports have been substantially modified by chatbots — although it’s unclear whether researchers used the tools to construct reviews from scratch or just to edit and improve written drafts.

https://www.nature.com/articles/d41586-024-01051-2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI Race Heats Up as OpenAI, Google and Mistral Release New Models"


OpenAI, Google, and the French artificial intelligence startup Mistral have all released new versions of their frontier AI models within 12 hours of one another, as the industry prepares for a burst of activity over the summer.

The unprecedented flurry of releases come as the sector readies for the expected launch of the next major version of GPT, the system that underpins OpenAI’s hit chatbot Chat-GPT.

https://tinyurl.com/36zmymwp

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Towards a Books Data Commons for AI Training"


This white paper describes ways of building a books data commons: a responsibly designed, broadly accessible data set of digitized books to be used in training AI models. This report, written in partnership with Creative Commons and Proteus Strategies, is based on a series of workshops that brought together practitioners building AI models, legal and policy scholars, and experts working with collections of digitized books.

In the paper, we first explain why books matter for AI training and how broader access could be beneficial. We then summarize two tracks that might be considered for developing such a resource, highlighting existing projects that help foreground the potential challenges. One track relies on public domain and permissively licensed books, while the other depends on exceptions to copyright to enable training on in-copyright books. The report also presents several key design choices and next steps that could advance further development of this approach.

https://tinyurl.com/2fu47552

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"PubTator 3.0: An AI-Powered Literature Resource for Unlocking Biomedical Knowledge"


PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0’s online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

https://doi.org/10.1093/nar/gkae235

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Advancing the Search Frontier with AI Agents"


As many of us in the information retrieval (IR) research community know and appreciate, search is far from being a solved problem. Millions of people struggle with tasks on search engines every day. Often, their struggles relate to the intrinsic complexity of their task and the failure of search systems to fully understand the task and serve relevant results. The task motivates the search, creating the gap/problematic situation that searchers attempt to bridge/resolve and drives search behavior as they work through different task facets. Complex search tasks require more than support for rudimentary fact finding or re-finding. Research on methods to support complex tasks includes work on generating query and website suggestions, personalizing and contextualizing search, and developing new search experiences, including those that span time and space. The recent emergence of generative artificial intelligence (AI) and the arrival of assistive agents, based on this technology, has the potential to offer further assistance to searchers, especially those engaged in complex tasks. There are profound implications from these advances for the design of intelligent systems and for the future of search itself. This article, based on a keynote by the author at the 2023 ACM SIGIR Conference, explores these issues and how AI agents are advancing the frontier of search system capabilities, with a special focus on information interaction and complex task completion.

https://arxiv.org/abs/2311.01235

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How Tech Giants Cut Corners to Harvest Data for A.I."


The volume of data is crucial [to train AIs]. Leading chatbot systems have learned from pools of digital text spanning as many as three trillion words, or roughly twice the number of words stored in Oxford University’s Bodleian Library, which has collected manuscripts since 1602. The most prized data, A.I. researchers said, is high-quality information, such as published books and articles, which have been carefully written and edited by professionals. . . .

Tech companies are so hungry for new data that some are developing "synthetic" information. This is not organic data created by humans, but text, images and code that A.I. models produce — in other words, the systems learn from what they themselves generate.

https://tinyurl.com/3uxuwekh

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Unleashing the Power of AI. A Systematic Review of Cutting-Edge Techniques in AI-Enhanced Scientometrics, Webometrics, and Bibliometrics"


Findings: (i) Regarding scientometrics, the application of AI yields various distinct advantages, such as conducting analyses of publications, citations, research impact prediction, collaboration, research trend analysis, and knowledge mapping, in a more objective and reliable framework. (ii) In terms of webometrics, AI algorithms are able to enhance web crawling and data collection, web link analysis, web content analysis, social media analysis, web impact analysis, and recommender systems. (iii) Moreover, automation of data collection, analysis of citations, disambiguation of authors, analysis of co-authorship networks, assessment of research impact, text mining, and recommender systems are considered as the potential of AI integration in the field of bibliometrics.

https://arxiv.org/abs/2403.18838

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Now You Can Use ChatGPT without an Account"


OpenAI will no longer require an account to use ChatGPT, the company’s free AI platform. However, this only applies to ChatGPT, as other OpenAI products, like DALL-E 3, cost money to access and will still require an account for access. . . .

OpenAI said it introduced "additional content safeguards for this experience," including blocking prompts in a wider range of categories, but did not expound more on what these categories are. The option to opt out of model training will still be available, even to those without accounts.

https://tinyurl.com/582ehjhm

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Developing a Foundation for the Informational Needs of Generative AI Users through the Means of Established Interdisciplinary Relationships"


University faculty immediately had many questions and concerns in response to the public proliferation of generative artificial intelligence programs leveraging large language models to generate complex text responses to simple prompts. Librarians at the University of South Florida (USF) pooled their skills, existing relationships with faculty and professional staff across campus to provide information that answered common questions raised by those faculty on generative artificial intelligence usage within research related topics. Faculty concern regarding the worry of plagiarism, how to instruct students to use the new tools and how to discern the reliability of information generated by artificial intelligence tools were placed at the forefront.

https://doi.org/10.1016/j.acalib.2024.102876

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Generative AI for Trustworthy, Open, and Equitable Scholarship"


We focus on the potential of GenAI to address known problems for the alignment of science practice and its underlying core values. As institutions culturally charged with the curation and preservation of the world’s knowledge and cultural heritage, libraries are deeply invested in promoting a durable, trustworthy, and sustainable scholarly knowledge commons. With public trust in academia and in research waning [reference] and in the face of recent high-profile instances of research misconduct [reference], the scholarly community must act swiftly to develop policies, frameworks, and tools for leveraging the power of GenAI in ways that enhance, rather than erode, the trustworthiness of scientific communications, the breadth of scientific impact, and the public’s trust in science, academia, and research.

https://doi.org/10.21428/e4baedd9.567bfd15

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Evolving AI Strategies in Libraries: Insights from Two Polls of ARL Member Representatives over Nine Months—Report Published"


To effectively chart this [AI] transition, two quick polls were conducted among members of the Association of Research Libraries (ARL) to capture changing perspectives on the potential impact of AI, assess the extent of AI exploration and implementation within libraries, and identify AI applications relevant to the current library environment.

Today, ARL has released the results of the two polls—analyzing and juxtaposing the outcomes of these two surveys to better understand how library leaders are managing the complexities of integrating AI into their operations and services. The report also includes recommendations for ARL research libraries.

https://tinyurl.com/2t9nywcv

Report

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"TDM & AI Rights Reserved? Fair Use & Evolving Publisher Copyright Statements"


Earlier this year, we noticed that some academic publishers have revised the copyright notices on their websites to state they reserve rights to text and data mining (TDM) and AI training (for example, see the website footers for Elsevier and Wiley). . . .SPARC asked Kyle K. Courtney, Director of Copyright and Information Policy for Harvard Library, to address key questions regarding these revised copyright statements and the continuing viability of fair use justifications for TDM.

https://tinyurl.com/4prkfbb3

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Use ‘Jan’ to Chat with AI without the Privacy Concerns"


Jan is a free an open source application that makes it easy to download multiple large language models and start chatting with them. There are simple installers for Windows, macOS, and Linux. Now, this isn’t perfect. The models aren’t necessarily as good as the latest ones from OpenAI or Google, and depending on how powerful your computer is, the results might take a while to come in.

https://tinyurl.com/4m8p4b82

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Human-Centered Explainable Artificial Intelligence: An Annual Review of Information Science and Technology (Arist) Paper"


Explainability is central to trust and accountability in artificial intelligence (AI) applications. The field of human-centered explainable AI (HCXAI) arose as a response to mainstream explainable AI (XAI) which was focused on algorithmic perspectives and technical challenges, and less on the needs and contexts of the non-expert, lay user. HCXAI is characterized by putting humans at the center of AI explainability. . . . This review identifies the foundational ideas of HCXAI, how those concepts are operationalized in system design, how legislation and regulations might normalize its objectives, and the challenges that HCXAI must address as it matures as a field.

https://doi.org/10.1002/asi.24889

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Exploring the Potential of Large Language Models and Generative Artificial Intelligence (GPT): Applications in Library and Information Science"


The presented study offers a systematic overview of the potential application of large language models (LLMs) and generative artificial intelligence tools, notably the GPT model and the ChatGPT interface, within the realm of library and information science (LIS). The paper supplements and extends the outcomes of a comprehensive information survey on the subject matter with the author’s own experiences and examples showcasing possible applications, demonstrated through illustrative instances. This study does not involve testing available LLMs or selecting the most suitable tool; instead, it targets information professionals, specialists, librarians, and scientists, aiming to inspire them in various ways.

https://doi.org/10.1177/09610006241241066

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Latest ‘Crisis’ — Is the Research Literature Overrun with ChatGPT- and LLM-generated Articles?"


Elsevier has been under the spotlight this month for publishing a paper that contains a clearly ChatGPT-written portion of its introduction. The first sentence of the paper’s Introduction reads, "Certainly, here is a possible introduction for your topic:. . . ." To date, the article remains unchanged, and unretracted. A second paper, containing the phrase "I’m very sorry, but I don’t have access to real-time information or patient-specific data, as I am an AI language model" was subsequently found, and similarly remains unchanged. This has led to a spate of amateur bibliometricians scanning the literature for similar common AI-generated phrases, with some alarming results.

https://tinyurl.com/4a8bjmzy

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |