Artificial Intelligence/Robots – Page 6

"Towards a Books Data Commons for AI Training"

This white paper describes ways of building a books data commons: a responsibly designed, broadly accessible data set of digitized books to be used in training AI models. This report, written in partnership with Creative Commons and Proteus Strategies, is based on a series of workshops that brought together practitioners building AI models, legal and policy scholars, and experts working with collections of digitized books.

In the paper, we first explain why books matter for AI training and how broader access could be beneficial. We then summarize two tracks that might be considered for developing such a resource, highlighting existing projects that help foreground the potential challenges. One track relies on public domain and permissively licensed books, while the other depends on exceptions to copyright to enable training on in-copyright books. The report also presents several key design choices and next steps that could advance further development of this approach.

https://tinyurl.com/2fu47552

"PubTator 3.0: An AI-Powered Literature Resource for Unlocking Biomedical Knowledge"

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0’s online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery.

https://doi.org/10.1093/nar/gkae235

"Advancing the Search Frontier with AI Agents"

As many of us in the information retrieval (IR) research community know and appreciate, search is far from being a solved problem. Millions of people struggle with tasks on search engines every day. Often, their struggles relate to the intrinsic complexity of their task and the failure of search systems to fully understand the task and serve relevant results. The task motivates the search, creating the gap/problematic situation that searchers attempt to bridge/resolve and drives search behavior as they work through different task facets. Complex search tasks require more than support for rudimentary fact finding or re-finding. Research on methods to support complex tasks includes work on generating query and website suggestions, personalizing and contextualizing search, and developing new search experiences, including those that span time and space. The recent emergence of generative artificial intelligence (AI) and the arrival of assistive agents, based on this technology, has the potential to offer further assistance to searchers, especially those engaged in complex tasks. There are profound implications from these advances for the design of intelligent systems and for the future of search itself. This article, based on a keynote by the author at the 2023 ACM SIGIR Conference, explores these issues and how AI agents are advancing the frontier of search system capabilities, with a special focus on information interaction and complex task completion.

https://arxiv.org/abs/2311.01235

"How Tech Giants Cut Corners to Harvest Data for A.I."

The volume of data is crucial [to train AIs]. Leading chatbot systems have learned from pools of digital text spanning as many as three trillion words, or roughly twice the number of words stored in Oxford University’s Bodleian Library, which has collected manuscripts since 1602. The most prized data, A.I. researchers said, is high-quality information, such as published books and articles, which have been carefully written and edited by professionals. . . .

Tech companies are so hungry for new data that some are developing "synthetic" information. This is not organic data created by humans, but text, images and code that A.I. models produce — in other words, the systems learn from what they themselves generate.

https://tinyurl.com/3uxuwekh

"Unleashing the Power of AI. A Systematic Review of Cutting-Edge Techniques in AI-Enhanced Scientometrics, Webometrics, and Bibliometrics"

Findings: (i) Regarding scientometrics, the application of AI yields various distinct advantages, such as conducting analyses of publications, citations, research impact prediction, collaboration, research trend analysis, and knowledge mapping, in a more objective and reliable framework. (ii) In terms of webometrics, AI algorithms are able to enhance web crawling and data collection, web link analysis, web content analysis, social media analysis, web impact analysis, and recommender systems. (iii) Moreover, automation of data collection, analysis of citations, disambiguation of authors, analysis of co-authorship networks, assessment of research impact, text mining, and recommender systems are considered as the potential of AI integration in the field of bibliometrics.

https://arxiv.org/abs/2403.18838

"Now You Can Use ChatGPT without an Account"

OpenAI will no longer require an account to use ChatGPT, the company’s free AI platform. However, this only applies to ChatGPT, as other OpenAI products, like DALL-E 3, cost money to access and will still require an account for access. . . .

OpenAI said it introduced "additional content safeguards for this experience," including blocking prompts in a wider range of categories, but did not expound more on what these categories are. The option to opt out of model training will still be available, even to those without accounts.

https://tinyurl.com/582ehjhm

Paywall: "Developing a Foundation for the Informational Needs of Generative AI Users through the Means of Established Interdisciplinary Relationships"

University faculty immediately had many questions and concerns in response to the public proliferation of generative artificial intelligence programs leveraging large language models to generate complex text responses to simple prompts. Librarians at the University of South Florida (USF) pooled their skills, existing relationships with faculty and professional staff across campus to provide information that answered common questions raised by those faculty on generative artificial intelligence usage within research related topics. Faculty concern regarding the worry of plagiarism, how to instruct students to use the new tools and how to discern the reliability of information generated by artificial intelligence tools were placed at the forefront.

https://doi.org/10.1016/j.acalib.2024.102876

"Generative AI for Trustworthy, Open, and Equitable Scholarship"

We focus on the potential of GenAI to address known problems for the alignment of science practice and its underlying core values. As institutions culturally charged with the curation and preservation of the world’s knowledge and cultural heritage, libraries are deeply invested in promoting a durable, trustworthy, and sustainable scholarly knowledge commons. With public trust in academia and in research waning [reference] and in the face of recent high-profile instances of research misconduct [reference], the scholarly community must act swiftly to develop policies, frameworks, and tools for leveraging the power of GenAI in ways that enhance, rather than erode, the trustworthiness of scientific communications, the breadth of scientific impact, and the public’s trust in science, academia, and research.

https://doi.org/10.21428/e4baedd9.567bfd15

"Evolving AI Strategies in Libraries: Insights from Two Polls of ARL Member Representatives over Nine Months—Report Published"

To effectively chart this [AI] transition, two quick polls were conducted among members of the Association of Research Libraries (ARL) to capture changing perspectives on the potential impact of AI, assess the extent of AI exploration and implementation within libraries, and identify AI applications relevant to the current library environment.

Today, ARL has released the results of the two polls—analyzing and juxtaposing the outcomes of these two surveys to better understand how library leaders are managing the complexities of integrating AI into their operations and services. The report also includes recommendations for ARL research libraries.

https://tinyurl.com/2t9nywcv

Report

"TDM & AI Rights Reserved? Fair Use & Evolving Publisher Copyright Statements"

Earlier this year, we noticed that some academic publishers have revised the copyright notices on their websites to state they reserve rights to text and data mining (TDM) and AI training (for example, see the website footers for Elsevier and Wiley). . . .SPARC asked Kyle K. Courtney, Director of Copyright and Information Policy for Harvard Library, to address key questions regarding these revised copyright statements and the continuing viability of fair use justifications for TDM.

https://tinyurl.com/4prkfbb3

"Use ‘Jan’ to Chat with AI without the Privacy Concerns"

Jan is a free an open source application that makes it easy to download multiple large language models and start chatting with them. There are simple installers for Windows, macOS, and Linux. Now, this isn’t perfect. The models aren’t necessarily as good as the latest ones from OpenAI or Google, and depending on how powerful your computer is, the results might take a while to come in.

https://tinyurl.com/4m8p4b82

"Human-Centered Explainable Artificial Intelligence: An Annual Review of Information Science and Technology (Arist) Paper"

Explainability is central to trust and accountability in artificial intelligence (AI) applications. The field of human-centered explainable AI (HCXAI) arose as a response to mainstream explainable AI (XAI) which was focused on algorithmic perspectives and technical challenges, and less on the needs and contexts of the non-expert, lay user. HCXAI is characterized by putting humans at the center of AI explainability. . . . This review identifies the foundational ideas of HCXAI, how those concepts are operationalized in system design, how legislation and regulations might normalize its objectives, and the challenges that HCXAI must address as it matures as a field.

https://doi.org/10.1002/asi.24889

"Exploring the Potential of Large Language Models and Generative Artificial Intelligence (GPT): Applications in Library and Information Science"

The presented study offers a systematic overview of the potential application of large language models (LLMs) and generative artificial intelligence tools, notably the GPT model and the ChatGPT interface, within the realm of library and information science (LIS). The paper supplements and extends the outcomes of a comprehensive information survey on the subject matter with the author’s own experiences and examples showcasing possible applications, demonstrated through illustrative instances. This study does not involve testing available LLMs or selecting the most suitable tool; instead, it targets information professionals, specialists, librarians, and scientists, aiming to inspire them in various ways.

https://doi.org/10.1177/09610006241241066

"The Latest ‘Crisis’ — Is the Research Literature Overrun with ChatGPT- and LLM-generated Articles?"

Elsevier has been under the spotlight this month for publishing a paper that contains a clearly ChatGPT-written portion of its introduction. The first sentence of the paper’s Introduction reads, "Certainly, here is a possible introduction for your topic:. . . ." To date, the article remains unchanged, and unretracted. A second paper, containing the phrase "I’m very sorry, but I don’t have access to real-time information or patient-specific data, as I am an AI language model" was subsequently found, and similarly remains unchanged. This has led to a spate of amateur bibliometricians scanning the literature for similar common AI-generated phrases, with some alarming results.

https://tinyurl.com/4a8bjmzy

"Fair Use Rights to Conduct Text and Data Mining and Use Artificial Intelligence Tools Are Essential for UC Research and Teaching"

The UC Libraries invest more than $60 million each year licensing systemwide electronic content needed by scholars for these and other studies. (Indeed, the $60 million figure represents license agreements made at the UC systemwide and multi-campus levels. But each individual campus also licenses electronic resources, adding millions more in total expenditures.) Our libraries secure campus access to a broad range of digital resources including books, scientific journals, databases, multimedia resources, and other materials. In doing so, the UC Libraries must negotiate licensing terms that ensure scholars can make both lawful and comprehensive use of the materials the libraries have procured. Increasingly, however, publishers and vendors are presenting libraries with content license agreements that attempt to preclude, or charge additional and unsupportable fees for, fair uses like training AI tools in the course of conducting TDM. . . .

If the UC Libraries are unable to protect these fair uses, UC scholars will be at the mercy of publishers aggregating and controlling what may be done with the scholarly record. Further, UC scholars’ pursuit of knowledge will be disproportionately stymied relative to academic colleagues in other global regions, given that a large proportion of other countries preclude contractual override of research exceptions.

Indeed, in more than forty countries—including all those within the European Union (EU)—publishers are prohibited from using contracts to abrogate exceptions to copyright in non-profit scholarly and educational contexts. Article 3 of the EU’s Directive on Copyright in the Digital Single Market preserves the right for scholars within research organizations and cultural heritage institutions (like those researchers at UC) to conduct TDM for scientific research, and further proscribes publishers from invalidating this exception by license agreements (see Article 7). Moreover, under AI regulations recently adopted by the European Parliament, copyright owners may not opt out of having their works used in conjunction with artificial intelligence tools in TDM research—meaning copyrighted works must remain available for scientific research that is reliant on AI training, and publishers cannot override these AI training rights through contract. Publishers are thus obligated to—and do—preserve fair use-equivalent research exceptions for TDM and AI within the EU, and can do so in the United States, too. . . .

In all events, adaptable licensing language can address publishers’ concerns by reiterating that the licensed products may be used with AI tools only to the extent that doing so would not: i. create a competing or commercial product or service for use by third parties; ii. unreasonably disrupt the functionality of the subscribed products; or iii. reproduce or redistribute the subscribed products for third parties. In addition, license agreements can require commercially reasonable security measures (as also required in the EU) to extinguish the risk of content dissemination beyond permitted uses. In sum, these licensing terms can replicate the research rights that are unequivocally reserved for scholars elsewhere.

https://tinyurl.com/4fvpdz35

"Microsoft Is Developing Tech That Would Let Users Write with Their Eyes, a Huge Win for Accessibility"

Microsoft published a new patent for a device called the Eye-Gaze, which would allow users to communicate and interact with electronic devices without the use of hands and fingers for typing. . . .

The only other peripheral that comes to mind that’s remotely similar to the Eye-Gaze is the Apple Vision Pro, but that’s in a mixed reality setting which still requires some hand movements.

https://tinyurl.com/2s443y86

"Responsible Artificial Intelligence: A Structured Literature Review"

Our research endeavors to advance the concept of responsible artificial intelligence (AI), a topic of increasing importance within EU policy discussions. The EU has recently issued several publications emphasizing the necessity of trust in AI, underscoring the dual nature of AI as both a beneficial tool and a potential weapon. This dichotomy highlights the urgent need for international regulation. Concurrently, there is a need for frameworks that guide companies in AI development, ensuring compliance with such regulations. Our research aims to assist lawmakers and machine learning practitioners in navigating the evolving landscape of AI regulation, identifying focal areas for future attention. This paper introduces a comprehensive and, to our knowledge, the first unified definition of responsible AI. Through a structured literature review, we elucidate the current understanding of responsible AI. Drawing from this analysis, we propose an approach for developing a future framework centered around this concept. Our findings advocate for a human-centric approach to Responsible AI. This approach encompasses the implementation of AI methods with a strong emphasis on ethics, model explainability, and the pillars of privacy, security, and trust.

https://arxiv.org/abs/2403.06910

"An OpenAI Spinoff Has Built an AI Model That Helps Robots Learn Tasks Like Humans"

Now three of OpenAI’s early research scientists say the startup they spun off in 2017, called Covariant, has solved that problem and unveiled a system that combines the reasoning skills of large language models with the physical dexterity of an advanced robot. . . .

This represents a leap forward, Chen told me, in robots that can adapt to their environment using training data rather than the complex, task-specific code that powered the previous generation of industrial robots. It’s also a step toward worksites where managers can issue instructions in human language without concern for the limitations of human labor. ("Pack 600 meal-prep kits for red pepper pasta using the following recipe. Take no breaks!")

https://tinyurl.com/3nek7xx2

Paywall: "The Obscene Energy Demands of A.I."

It’s been estimated that ChatGPT is responding to something like two hundred million requests per day, and, in so doing, is consuming more than half a million kilowatt-hours of electricity. (For comparison;s sake, the average U.S. household consumes twenty-nine kilowatt-hours a day.)

https://tinyurl.com/ynrd4k4p

Generative AI in Higher Education: The Product Landscape

Since last fall, Ithaka S+R has been partnering with 19 colleges and universities from the US and Canada to assess GAI’s impact on higher education and make evidence-based, proactive decisions about how to manage the far-ranging effects of GAI.[3] As part of this project, Ithaka S+R has been cataloging GAI applications geared towards teaching, learning, and research in the higher education context. Today, we are excited to make our Product Tracking tool (https://sr.ithaka.org/our-work/generative-ai-product-tracker/) publicly available. . . .

This issue brief is designed to enrich the descriptive data captured in the Product Tracker. In the brief’s first section, we provide a typology of existing products and value propositions. In the second, we offer observations about what the product landscape suggests about the future of teaching, learning, and research practices, and speculations on the near-term future of the academic GAI market.

https://doi.org/10.18665/sr.320394

Paywall: "I Used Generative AI to Turn My Story into a Comic—and You Can Too"

After more than a year in development, Lore Machine is now available to the public for the first time. For $10 a month, you can upload 100,000 words of text (up to 30,000 words at a time) and generate 80 images for short stories, scripts, podcast transcripts, and more. There are price points for power users too, including an enterprise plan costing $160 a month that covers 2.24 million words and 1,792 images. The illustrations come in a range of preset styles, from manga to watercolor to pulp ’80s TV show.

https://tinyurl.com/54mj6t77

OCUL [Ontario Council of University Libraries] Machine Learning/Artificial Intelligence Report and Strategy: Interim Report

This report describes use cases for machine learning relevant to the OCUL consortium and recommends projects utilizing machine learning technologies. It also considers key contextual issues such as ethical concerns, technical capacity, available expertise, and infrastructure needs. All sections are drafts with some sections more fully developed than others

https://tinyurl.com/38cjdn9p

"Responsible AI at the Vanderbilt Television News Archive: A Case Study"

We provide an overview of the use of machine-learning and artificial intelligence at the Vanderbilt Television News Archive (VTNA). After surveying our major initiatives to date, which include the full transcription of the collection using a custom language model deployed on Amazon Web Services (AWS), we address some ethical considerations we encountered, including the possibility of staff downsizing and misidentification of individuals in news recordings.

https://doi.org/10.7191/jeslib.805

"Using AI/Machine Learning to Extract Data from Japanese American Confinement Records"

Purpose: This paper examines the use of Artificial Intelligence/Machine Learning to extract a more comprehensive data set from a structured “standardized” form used to document Japanese American incarcerees during World War II.

Setting/Participants/Resources: The Bancroft Library partnered with Densho, a community memory organization, and Doxie.AI to complete this work.

Brief Description: The project digitized the complete set of Form WRA-26 "individual record"’ for more than 110,000 Japanese Americans incarcerated in War Relocation Authority camps during WWII. The library utilized AI/machine learning to automate text extraction from over 220,000 images of a structured "standardized" form; our goal was to improve upon and collect information not previously recorded in the Japanese American Internee Data file held by the National Archives and Records Administration. The project team worked with technical, academic, legal, and community partners to address ethical and logistical issues raised by the data extraction process, and to assess appropriate access options for the dataset(s) and digitized records.

https://doi.org/10.7191/jeslib.850

"The Implementation of Keenious at Carnegie Mellon University"

n the fall of 2022, the Carnegie Mellon University (CMU) Libraries began investigating Keenious—an artificial intelligence (AI)-based article recommender tool&mdashfor a possible trial implementation to improve pathways to resource discovery and assist researchers in more effectively searching for relevant research. This process led to numerous discussions within the library regarding the unique nature of AI-based tools when compared with traditional library resources, including ethical questions surrounding data privacy, algorithmic transparency, and the impact on the research process. This case study explores these topics and how they were negotiated up to and immediately following CMU’s implementation of Keenious in January, 2023, and highlights the need for more frameworks for evaluating AI-based tools in academic settings.

https://doi.org/10.7191/jeslib.800