"AI Deception: A Survey of Examples, Risks, and Potential Solutions"


This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta’s CICERO) and general-purpose AI systems (including large language models). Next, we detail several risks from AI deception, such as fraud, election tampering, and losing control of AI. Finally, we outline several potential solutions: first, regulatory frameworks should subject AI systems that are capable of deception to robust risk-assessment requirements; second, policymakers should implement bot-or-not laws; and finally, policymakers should prioritize the funding of relevant research, including tools to detect AI deception and to make AI systems less deceptive. Policymakers, researchers, and the broader public should work proactively to prevent AI deception from destabilizing the shared foundations of our society.

https://doi.org/10.1016/j.patter.2024.100988

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

AI-native Platform: "Reimagining Research Impact: Introducing Web of Science Research Intelligence"


Currently being developed in partnership with leading academic institutions, Web of Science Research Intelligence is an AI-native platform that embodies a vision centered on three pillars: unification, innovation and impact. It seamlessly integrates funding data with research outputs that include publications, patents, conference proceedings, books, policy documents and more. Based on these data, the platform identifies relevant funding opportunities within emerging research areas, equipping institutions and researchers to innovate.

  • A conversational assistant powered by generative AI enables all users to gain insights and create qualitative narratives for more balanced impact assessment, from data scientists to those with limited analysis experience.
  • Tailored recommendations for collaboration and funding help early career researchers build their networks and all researchers position themselves to win.
  • A new framework for measuring societal impact beyond traditional citation metrics will empower researchers and institutions to showcase the broader impacts of their work.

https://tinyurl.com/2zdshm6b

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Literature Review of User Privacy Concerns in Conversational Chatbots: A Social Informatics Approach: An Annual Review of Information Science and Technology (ARIST) Paper"


Since the introduction of OpenAI’s ChatGPT-3 in late 2022, conversational chatbots have gained significant popularity. These chatbots are designed to offer a user-friendly interface for individuals to engage with technology using natural language in their daily interactions. However, these interactions raise user privacy concerns due to the data shared and the potential for misuse in these conversational information exchanges. Furthermore, there are no overarching laws and regulations governing such conversational interfaces in the United States. Thus, there is a need to investigate the user privacy concerns. To understand these concerns in the existing literature, this paper presents a literature review and analysis of 38 papers out of 894 retrieved papers that focus on user privacy concerns arising from interactions with text-based conversational chatbots through the lens of social informatics. The review indicates that the primary user privacy concern that has consistently been addressed is self-disclosure. This review contributes to the broader understanding of privacy concerns regarding chatbots the need for further exploration in this domain. As these chatbots continue to evolve, this paper acts as a foundation for future research endeavors and informs potential regulatory frameworks to safeguard user privacy in an increasingly digitized world.

https://doi.org/10.1002/asi.24898

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Artificial Intelligence’s Role in Digitally Preserving Historic Archives"


The term "Artificial Intelligence" (AI) is increasingly permeating public consciousness as it has gained more popularity in recent years, especially within the landscape of academia and libraries. AI in libraries has been a trending subject of interest for some time, as within the library there are numerous departments that serve a role in collectively contributing to the library’s mission. Consequently, it is imperative to consider AI’s influence on the digital preservation of historic documents. This paper delves into the historical evolution of preservation methods driven by technological advancements as, throughout history, libraries, archives, and museums have grappled with the challenge of preserving historical collections, while many of the traditional preservation methods are costly and involve a lot of manual (human) effort. AI being the catalyst for transformation could change this reality and perhaps redefine the process of preservation; thus, this paper explores the emerging trend of incorporating AI technology into preservation practices and provides predictions regarding the transformative role of Artificial Intelligence in preservation for the future. With that in mind, this paper addresses the following questions: could AI be what changes or creates a paradigm shift in how preservation is done?; and could it be the thing that will change the way history is safeguarded?

https://doi.org/10.1515/pdtc-2023-0050

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Association of Research Libraries and Coalition for Networked Information Publish AI-Influenced Scenarios for Research Environment"


The Association of Research Libraries (ARL) and the Coalition for Networked Information (CNI) are pleased to announce the publication of The ARL/CNI 2035 Scenarios: AI-Influenced Futures in the Research Environment. These scenarios explore potential futures shaped by the rapid growth of artificial intelligence (AI) and its integration within the research environment.

Developed through a robust, member-driven process, these scenarios serve as a strategic resource to aid leaders in the research environment in navigating the complex landscape of AI technologies. Library directors, IT leaders, funding agencies, academic presidents and provosts, and those working in scholarly publishing are among the many individuals who will find these scenarios useful. By examining diverse futures, ARL and CNI aim to equip their members with the foresight needed to proactively address the challenges and opportunities that AI presents.

https://tinyurl.com/24c7s7wn

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Thousands of AI Authors on the Future of AI"


In the largest survey of its kind, 2,778 researchers who had published in top-tier artificial intelligence (AI) venues gave predictions on the pace of AI progress and the nature and impacts of advanced AI systems The aggregate forecasts give at least a 50% chance of AI systems achieving several milestones by 2028, including autonomously constructing a payment processing site from scratch, creating a song indistinguishable from a new song by a popular musician, and autonomously downloading and fine-tuning a large language model. If science continues undisrupted, the chance of unaided machines outperforming humans in every possible task was estimated at 10% by 2027, and 50% by 2047. The latter estimate is 13 years earlier than that reached in a similar survey we conducted only one year earlier [Grace et al., 2022]. However, the chance of all human occupations becoming fully automatable was forecast to reach 10% by 2037, and 50% as late as 2116 (compared to 2164 in the 2022 survey).

Most respondents expressed substantial uncertainty about the long-term value of AI progress: While 68.3% thought good outcomes from superhuman AI are more likely than bad, of these net optimists 48% gave at least a 5% chance of extremely bad outcomes such as human extinction, and 59% of net pessimists gave 5% or more to extremely good outcomes. Between 38% and 51% of respondents gave at least a 10% chance to advanced AI leading to outcomes as bad as human extinction. More than half suggested that "substantial" or "extreme" concern is warranted about six different AI-related scenarios, including misinformation, authoritarian control, and inequality. There was disagreement about whether faster or slower AI progress would be better for the future of humanity. However, there was broad agreement that research aimed at minimizing potential risks from AI systems ought to be prioritized more.

https://arxiv.org/abs/2401.02843v2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Transforming Academic Librarianship through AI Reskilling: Insights from the GPT-4 Exploration Program"


This case study examines the GPT-4 Exploration Program at the University of New Mexico’s College of University Libraries and Learning Sciences, which aimed to foster a culture of continuous learning and innovation by providing hands-on experience with advanced AI technology. . . . The study reveals that effective AI reskilling involves cultivating a culture of continuous learning, adaptability, and collaborative exploration, anchored in a practical, hands-on approach. Participants reported significant improvements in AI literacy and confidence in applying AI tools to their work.

https://doi.org/10.1016/j.acalib.2024.102883

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ChatGPT Shows Better Moral Judgment than a College Undergrad"


In "Attributions toward artificial agents in a modified Moral Turing Test"—which was recently published in Nature’s online, open-access Scientific Reports journalmdash;those researchers found that morality judgments given by ChatGPT4 were "perceived as superior in quality to humans" along a variety of dimensions like virtuosity and intelligence. But before you start to worry that philosophy professors will soon be replaced by hyper-moral AIs, there are some important caveats to consider.

https://tinyurl.com/y4jtds4h

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

AI for Scientific Discovery: Proceedings of a Workshop


While AI in the context of scientific investigation has existed for decades, advances in computational technology and sensing in the physical world have created opportunities to integrate AI into science in unexpected ways, with capabilities that are rapidly accelerating. As a result, AI has been leveraged by an expanding collection of disciplines in the physical and biological sciences, as well as engineering domains. While the opportunities for AI in scientific discovery seem endless, there are numerous questions about what makes for trustworthy and reliable discovery, whether such investigation should be performed without human oversight or intervention, and how best to prioritize the research agenda and allocation of resources without magnifying disparities for individuals and nations alike.

https://tinyurl.com/zf6vy9ca

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Collective Use and Evaluation of Generative AI Tools in Digital Humanities Research: Survey-Based Results"


By investigating DH scholars’ use of GenAI tools in their research, this survey study makes several contributions. First, our findings demonstrate GenAI’s important role in enriching DH research, detailing specific, effective instances of its application that may inform DH scholars planning to apply GenAI tools in their future research. Secondly, the incorporation of GenAI in DH research raises important ethical and social concerns. Our study illuminates the potential risks, such as disputes over authorship, the emergence of biases, and the need for greater transparency and accountability in AI-involved DH research.

https://arxiv.org/abs/2404.12458

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Ithaka S+R: "Generative AI and Scholarly Publishing: Announcing a New Research Project"


To help, Ithaka S+R is launching a new study of the strategic implications of generative AI for scholarly publishing, with support from STM Solutions and a group of its members. The following key questions will guide our inquiry:

  • Will generative AI be integrated into the existing goals, processes, and infrastructures for scholarly publishing? Or, does this represent a transformative technology that will require fundamental restructuring of those goals, processes, and infrastructures?
  • Could generative AI effectively render our current assumptions about the role and purpose of publishers obsolete? What new roles could publishers play in a radically transformed information environment?
  • Which potential transformations should publishers encourage, and which risks require immediate coordinated responses while the technology is still taking root in the sector?
  • What new kinds of shared technical and/or social infrastructure are needed to support the ethical adoption of generative AI in support of the goals of scholarship and scholarly publishing? What systems and structures will be necessary to balance the needs of authors, readers, rights holders, publishers, and aggregators?

https://tinyurl.com/2s432pfh

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Rethinking Copyright Exceptions in the Era of Generative AI: Balancing Innovation and Intellectual Property Protection"


In response to these identified [copyright and AI] challenges, this paper proposes a hybrid model for TDM exceptions emerges, along with recommended specific mechanisms. The model divides exceptions into noncommercial and commercial uses, providing a nuanced solution to complex copyright issues in AI training. Recommendations incorporate mandatory exceptions for noncommercial uses, an opt-out clause for commercial uses, enhanced transparency measures, and a searchable portal for copyright owners. In conclusion, striking a delicate equilibrium between technological progress and the incentive for creative expression is of paramount importance. These suggested solutions aim to establish a harmonious foundation that nurtures innovation and creativity while honoring creators’ rights, facilitating AI development, promoting transparency, and ensuring fair compensation for creators.

https://doi.org/10.1111/jwip.12301

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Emerging AI Divide in the United States"


In this study, we characterize spatial differences in U.S. residents’ knowledge of a new generative AI tool, ChatGPT, through an analysis of state- and county-level search query data. In the first six months after the tool’s release, we observe the highest rates of users searching for ChatGPT in West Coast states and persistently low rates of search in Appalachian and Gulf states. Counties with the highest rates of search are relatively more urbanized and have proportionally more educated, more economically advantaged, and more Asian residents in comparison with other counties or with the U.S. average. In multilevel models adjusting for socioeconomic and demographic factors as well as industry makeup, education is the strongest positive predictor of rates of search for generative AI tooling. Although generative AI technologies may be novel, early differences in uptake appear to be following familiar paths of digital marginalization.

https://arxiv.org/abs/2404.11988

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"More CNI Spring 24′ Meeting Videos Live"

CNI has released eight new videos from its Spring 2024 meeting.

Here are three examples:

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Scholarship and DigitalKoans Are Now 19 Years Old

Digital Scholarship and DigitalKoans were established on 4/20/2005. Digital Scholarship provides information and commentary about artificial intelligence, digital copyright, digital curation, open access, research data management, scholarly communication, and other digital information issues. Digital Scholarship is an open access noncommercial publisher. All of its publications are currently under a Creative Commons Attribution License.

DigitalKoans has published over 16,200 posts. Since 2008, over 5,600 job ads have been posted, with slightly over 4,000 of them for digital library jobs.

Digital Scholarship has published the following books and book supplements: the Open Access Bibliography: Liberating Scholarly Literature with E-Prints and Open Access Journals (2005; published with the Association of Research Libraries), the Scholarly Electronic Publishing Bibliography: 2008 Annual Edition (2009), Digital Scholarship 2009 (2010), Transforming Scholarly Publishing through Open Access: A Bibliography (2010), the Scholarly Electronic Publishing Bibliography 2010 (2011), the Digital Curation and Preservation Bibliography 2010 (2011), the Institutional Repository and ETD Bibliography 2011 (2011), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works (2012), the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement (2013), and the Research Data Curation and Management Bibliography (2021).

It has also published and updated the following bibliographies, webliographies, and weblogs: the Scholarly Electronic Publishing Bibliography (1996-2011), the Scholarly Electronic Publishing Weblog (2001-2013), the Electronic Theses and Dissertations Bibliography (2005-2021), the Google Books Bibliography (2005-2011), the Institutional Repository Bibliography (2009-2011), the Open Access Journals Bibliography (2010), the Digital Curation and Preservation Bibliography (2010-2011), the E-science and Academic Libraries Bibliography (2011), the Digital Curation Resource Guide (2012), the Research Data Curation Bibliography (2012-2019), the Altmetrics Bibliography (2013), the Transforming Peer Review Bibliography (2014), the Academic Library as Scholarly Publisher Bibliography (2018-2023), the Research Data Sharing and Reuse Bibliography (2021), the Research Data Publication and Citation Bibliography (2022), Digital Curation Certificate and Master’s Degree Programs (2023), the Academic Libraries and Research Data Management Bibliography (2023), and the Artificial Intelligence and Libraries Bibliography (2023).

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Author Granted Copyright over Book with AI-Generated Text—with a Twist"


The USCO’s notice granting Shupe copyright registration of her book does not recognize her as author of the whole text as is conventional for written works. Instead she is considered the author of the "selection, coordination, and arrangement of text generated by artificial intelligence." This means no one can copy the book without permission, but the actual sentences and paragraphs themselves are not copyrighted and could theoretically be rearranged and republished as a different book.

https://tinyurl.com/bd97jbw6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Stanford: Artifical Intelligence Index Report 2024


AI has surpassed human performance on several benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning. . . .

According to AI Index estimates, the training costs of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute. . . .

New research from the AI Index reveals a significant lack of standardization in responsible AI reporting. Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models. . . .

Despite a decline in overall AI private investment last year, funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion.

https://tinyurl.com/53wsjxyj

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Is ChatGPT Transforming Academics’ Writing Style?"


Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT’s writing style in their abstracts by means of a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. We find that ChatGPT is having an increasing impact on arXiv abstracts, especially in the field of computer science, where the fraction of ChatGPT-revised abstracts is estimated to be approximately 35%, if we take the output of one of the simplest prompts, "revise the following sentences", as a baseline. We conclude with an analysis of both positive and negative aspects of the penetration of ChatGPT into academics’ writing style.

https://arxiv.org/abs/2404.08627v1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Generative AI Can Turn Your Most Precious Memories into Photos That Never Existed"


Dozens of people have now had their memories turned into images in this way via Synthetic Memories, a project run by Domestic Data Streamers. The studio uses generative image models, such as OpenAI’s DALL-E, to bring people’s memories to life. Since 2022, the studio, which has received funding from the UN and Google, has been working with immigrant and refugee communities around the world to create images of scenes that have never been photographed, or to re-create photos that were lost when families left their previous homes.

https://tinyurl.com/yekzh6sy

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Is ChatGPT Corrupting Peer Review? Telltale Words Hint at AI Use"


A study that identified buzzword adjectives that could be hallmarks of AI-written text in peer-review reports suggests that researchers are turning to ChatGPT and other artificial intelligence (AI) tools to evaluate others’ work. . . .

Their analysis suggests that up to 17% of the peer-review reports have been substantially modified by chatbots — although it’s unclear whether researchers used the tools to construct reviews from scratch or just to edit and improve written drafts.

https://www.nature.com/articles/d41586-024-01051-2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI Race Heats Up as OpenAI, Google and Mistral Release New Models"


OpenAI, Google, and the French artificial intelligence startup Mistral have all released new versions of their frontier AI models within 12 hours of one another, as the industry prepares for a burst of activity over the summer.

The unprecedented flurry of releases come as the sector readies for the expected launch of the next major version of GPT, the system that underpins OpenAI’s hit chatbot Chat-GPT.

https://tinyurl.com/36zmymwp

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Towards a Books Data Commons for AI Training"


This white paper describes ways of building a books data commons: a responsibly designed, broadly accessible data set of digitized books to be used in training AI models. This report, written in partnership with Creative Commons and Proteus Strategies, is based on a series of workshops that brought together practitioners building AI models, legal and policy scholars, and experts working with collections of digitized books.

In the paper, we first explain why books matter for AI training and how broader access could be beneficial. We then summarize two tracks that might be considered for developing such a resource, highlighting existing projects that help foreground the potential challenges. One track relies on public domain and permissively licensed books, while the other depends on exceptions to copyright to enable training on in-copyright books. The report also presents several key design choices and next steps that could advance further development of this approach.

https://tinyurl.com/2fu47552

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"PubTator 3.0: An AI-Powered Literature Resource for Unlocking Biomedical Knowledge"


PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0’s online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

https://doi.org/10.1093/nar/gkae235

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Advancing the Search Frontier with AI Agents"


As many of us in the information retrieval (IR) research community know and appreciate, search is far from being a solved problem. Millions of people struggle with tasks on search engines every day. Often, their struggles relate to the intrinsic complexity of their task and the failure of search systems to fully understand the task and serve relevant results. The task motivates the search, creating the gap/problematic situation that searchers attempt to bridge/resolve and drives search behavior as they work through different task facets. Complex search tasks require more than support for rudimentary fact finding or re-finding. Research on methods to support complex tasks includes work on generating query and website suggestions, personalizing and contextualizing search, and developing new search experiences, including those that span time and space. The recent emergence of generative artificial intelligence (AI) and the arrival of assistive agents, based on this technology, has the potential to offer further assistance to searchers, especially those engaged in complex tasks. There are profound implications from these advances for the design of intelligent systems and for the future of search itself. This article, based on a keynote by the author at the 2023 ACM SIGIR Conference, explores these issues and how AI agents are advancing the frontier of search system capabilities, with a special focus on information interaction and complex task completion.

https://arxiv.org/abs/2311.01235

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How Tech Giants Cut Corners to Harvest Data for A.I."


The volume of data is crucial [to train AIs]. Leading chatbot systems have learned from pools of digital text spanning as many as three trillion words, or roughly twice the number of words stored in Oxford University’s Bodleian Library, which has collected manuscripts since 1602. The most prized data, A.I. researchers said, is high-quality information, such as published books and articles, which have been carefully written and edited by professionals. . . .

Tech companies are so hungry for new data that some are developing "synthetic" information. This is not organic data created by humans, but text, images and code that A.I. models produce — in other words, the systems learn from what they themselves generate.

https://tinyurl.com/3uxuwekh

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |