Paywall: "An Initial Interpretation of the U.S. Department of Education’s AI Report: Implications and Recommendations for Academic Libraries"


This article provides an analysis of the U.S. Department of Education’s report on Artificial Intelligence (AI) and its implications for academic libraries. It delves into the report’s key points, including the importance of AI literacy, the need for educator involvement in AI design and implementation, and the necessity of preparing for AI related issues. The author discusses how these points impact academic libraries and offers actionable recommendations for library leaders. It emphasizes the need for libraries to promote AI literacy, involve librarians in AI implementation, develop guidelines for AI use, prepare for AI issues, and collaborate with other stakeholders.

https://doi.org/10.1016/j.acalib.2023.102761

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Reproducibility in Machine Learning-Driven Research"


Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.

https://arxiv.org/abs/2307.10320

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Analyzing and Navigating Electronic Theses and Dissertations"


This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation.

https://tinyurl.com/33ay562h

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Webinar Recording: "ACRL LDG A Mutualistic View of AI in the Library or a Continuation of Craft by Thomas Padilla"


During this session, Thomas Padilla [Deputy Director, Archiving and Data Services at the Internet Archive] will present a critical and generative position aimed at empowering GLAM professionals on their journey to develop a mutually beneficial relationship with AI. The discussion will cover the individual, organizational, and community impacts of AI in the library landscape.

https://www.youtube.com/watch?v=hh5PTyBT6AA

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Wikipedia’s Moment of Truth"


The new A.I. chatbots have typically swallowed Wikipedia’s corpus. . . . While estimates of its influence can vary, Wikipedia is probably the most important single source in the training of A.I. models. "Without Wikipedia, generative A.I. wouldn’t exist," says Nicholas Vincent, Yet as bots like ChatGPT become increasingly popular and sophisticated, Vincent and some of his colleagues wonder what will happen if Wikipedia, outflanked by A.I. that has cannibalized it, suffers from disuse and dereliction.

https://tinyurl.com/bdbxrdbk

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Meta Is Expanding Its Generative A.I. Arsenal with a New Tool It’s Touting as a ‘State-of-the-Art’ Breakthrough"


Currently, there is a divide between A.I. image generators and A.I. text generators, like OpenAI’s ChatGPT.. . . Meta’s tool breaks down that divide with a model that allows for the input and generation of text and images, and allows for the creation of captions (or image-to-text generation) and images with "super-resolution."

https://tinyurl.com/mr25z6zd

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Future of Academic Publishing"


Ultimately, we might be forced to rethink publication. If scientific research is mostly read by machines, the question arises of whether it is relevant to package it into a single coherent narrative that is adapted to the limitations of human cognition. This seems like a lot of busywork for scientists. We could unbundle scientific research from the constraints of journal formatting, as suggested by Neuromatch Open Publishing. In this view, research will be a living compendium of code, datasets, graphs and narrative content remixable and always up to date. Open and freely accessible research will be more valuable and influential because it will be seen by LLMs.

https://doi.org/10.1038/s41562-023-01637-2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Authors Join the Brewing Legal Battle over AI"


Neither Meta nor OpenAI has yet responded to the author suits. But multiple copyright lawyers told PW on background that the claims likely face an uphill battle in court. Even if the suits get past the threshold issues associated with the alleged copying at issue and how AI training actually works—which is no sure thing—lawyers say there is ample case law to suggest fair use. For example, a recent case against plagiarism detector TurnItIn.com held that works could be ingested to create a database used to expose plagiarism by students. The landmark Kelly v. Arriba Soft case held that the reproduction and display of photos as thumbnails was fair use.

https://tinyurl.com/bddvrykh

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Writing with CHATGPT: An Illustration of Its Capacity, Limitations & Implications for Academic Writers"


Rather than being alarmed or anxious, writers need to understand ChatGPT’s strengths and weaknesses. It is better at structure than it is at content. It is a good brainstorming tool (think titles, outlines, counter-arguments), but you must double check everything it tells you, especially if you’re outside your domain of expertise. It can provide summaries of complex ideas, and connect them with other ideas, but only if you have put a lot of thought into the incremental prompting needed to shift it from its generic default and train it to focus on what you care about. Its access to information is limited to what it was originally trained on, therefore your own training phase is essential to identify gaps and inaccuracies. It can be used for labor, such as reformatting abstracts or reducing the length of sections, but it can’t replace the thinking a writer does to determine why some paragraphs or ideas deserve more words and others can be cut back. It can be inaccurate: in fact, rather stubbornly so, persisting with inaccuracies even after they are pointed out, while at the same time presenting its next attempt as corrected.

https://doi.org/10.5334/pme.1072

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"CORE-GPT: Combining Open Access Research and Large Language Models for Credible, Trustworthy Question Answering"


In this paper, we present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT’s performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.

https://arxiv.org/abs/2307.04683

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Claude 2: ChatGPT Rival Launches Chatbot That Can Summarise a Novel"


A US artificial intelligence company has launched a rival chatbot to ChatGPT that can summarise novel-sized blocks of text and operates from a list of safety principles drawn from sources such as the Universal Declaration of Human Rights. . . .

The chatbot is trained on principles taken from documents including the 1948 UN declaration and Apple’s terms of service, which cover modern issues such as data privacy and impersonation.

https://tinyurl.com/ms44eccd

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

10 AI Researchers on How AI Can Either Improve the World or Destroy It

Steve Rose of The Guardian interviews the experts.

Five Ways AI Could Improve the World: ‘We Can Cure All Diseases, Stabilise Our Climate, Halt Poverty’

Five Ways AI Might Destroy the World: ‘Everyone on Earth Could Fall over Dead in the Same Second’

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"SSP Conference Debate: AI and the Integrity of Scholarly Publishing"


At the annual meeting of the Society for Scholarly Publishing held in Portland, Oregon last month, the closing plenary session was a formal debate on the proposition "Resolved: Artificial intelligence will fatally undermine the integrity of scholarly publishing." Arguing in favor of the proposition was Tim Vines, founder of DataSeer and a Scholarly Kitchen Chef. Arguing against was Jessica Miles, Vice President for Strategy and Investments at Holtzbrinck Publishing Group.

https://tinyurl.com/ururdfvw

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

AI Is Training AI: "Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks"


Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold-standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs, as crowd workers have financial incentives to use LLMs to increase their productivity and income. To investigate this concern, we conducted a case study on the prevalence of LLM usage by crowd workers. We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33-46% of crowd workers used LLMs when completing the task. Although generalization to other, less LLM-friendly tasks is unclear, our results call for platforms, researchers, and crowd workers to find new ways to ensure that human data remain human, perhaps using the methodology proposed here as a stepping stone

https://arxiv.org/abs/2306.07899

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"OCLC Introduces AI-generated Book Recommendations in WorldCat.org and WorldCat Find beta"


OCLC is beta testing book recommendations generated by artificial intelligence (AI) in WorldCat.org, the website that allows users to explore the collections of thousands of libraries through a single search. Searchers can now obtain AI-enabled book recommendations for print and e-books and then look for those items in libraries near them. The AI-generated book recommendations beta is now available in WorldCat.org and WorldCat Find, the mobile app extension for WorldCat.org.

https://tinyurl.com/44j4ascr

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Evaluating the Efficacy of ChatGPT-4 in Providing Scientific References across Diverse Disciplines"


This work conducts a comprehensive exploration into the proficiency of OpenAI’s ChatGPT-4 in sourcing scientific references within an array of research disciplines. Our in-depth analysis encompasses a wide scope of fields including Computer Science (CS), Mechanical Engineering (ME), Electrical Engineering (EE), Biomedical Engineering (BME), and Medicine, as well as their more specialized sub-domains. Our empirical findings indicate a significant variance in ChatGPT-4’s performance across these disciplines. Notably, the validity rate of suggested articles in CS, BME, and Medicine surpasses 65%, whereas in the realms of ME and EE, the model fails to verify any article as valid. Further, in the context of retrieving articles pertinent to niche research topics, ChatGPT-4 tends to yield references that align with the broader thematic areas as opposed to the narrowly defined topics of interest. This observed disparity underscores the pronounced variability in accuracy across diverse research fields, indicating the potential requirement for model refinement to enhance its functionality in academic research. Our investigation offers valuable insights into the current capacities and limitations of AI-powered tools in scholarly research, thereby emphasizing the indispensable role of human oversight and rigorous validation in leveraging such models for academic pursuits.

https://arxiv.org/abs/2306.09914v1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"European Lawmakers Vote to Adopt EU AI Act"


European Union lawmakers have passed the EU AI Act that will govern use and deployment of artificial intelligence technology within the EU. . . . Changes introduced by MEPs to the original commission draft act include some top-level regulation of general-purpose AI tools such as ChatGPT. These foundation models will require mandatory labelling for AI-generated content and the forced disclosure of training data covered by copyright. . . . . Other changes include a fine-tuned list of prohibited practices, extended to include subliminal techniques, biometric categorisation, predictive policing, and internet-scraped facial recognition databases.

https://tinyurl.com/nhet5ckd

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Congressional Research Service: Generative Artificial Intelligence: Overview, Issues, and Questions for Congress


The recent public release of many GenAI tools, and the race by companies to develop ever-more powerful models, have generated widespread discussion of their capabilities, potential concerns with their use, and debates about their governance and regulation. This CRS InFocus describes the development and uses of GenAI, concerns raised by the use of GenAI tools, and considerations for Congress. For additional considerations related to data privacy, see CRS Report R47569, Generative Artificial Intelligence and Data Privacy: A Primer, by Kristen E. Busch.

https://tinyurl.com/bdrpkzcj

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"New ChatGPT Course at ASU Gives Students a Competitive Edge"


A new Arizona State University course will provide students with those skills, providing expertise that is becoming increasingly sought after.

Basic Prompt Engineering with ChatGPT: An Introduction is open this summer to students in any major, and despite the name, is not really about engineering.

https://tinyurl.com/rcfnt39d

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Guest Post — Accessibility Powered by AI: How Artificial Intelligence Can Help Universalize Access to Digital Content"


More than 1 billion people around the world have some type of disability (including visual, hearing, cognitive, learning, mobility, and other disabilities) that affects how they access digital content. No wonder we spend so much time talking about accessibility tools!

Digital transformation can revolutionize the world, turning it into an inclusive place for people with and without disabilities, with accessibility powered by artificial intelligence. This post provides an overview of how AI can improve accessibility in different ways, illustrated with real-world applications and examples.

https://tinyurl.com/3s64tvm7

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI Is about to Turn Book Publishing Upside-down"


The latest generation of AI is a game changer. Not incremental change—something gentle, something gradual: this AI changes everything, fast. Scary fast.

I believe that every function in trade book publishing today can be automated with the help of generative AI. And, if this is true, then the trade book publishing industry as we know it will soon be obsolete. We will need to move on.

https://tinyurl.com/2p9z6pr6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Top AI Researchers and CEOs Warn against ‘Risk of Extinction’ in 22-Word Statement"


The 22-word statement, trimmed short to make it as broadly acceptable as possible, reads as follows: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

https://cutt.ly/EwqXnHn9

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Pew Research Center: "A Majority of Americans Have Heard of ChatGPT, but Few Have Tried It Themselves"


However, few U.S. adults have themselves used ChatGPT for any purpose. Just 14% of all U.S. adults say they have used it for entertainment, to learn something new, or for their work. This lack of uptake is in line with a Pew Research Center survey from 2021 that found that Americans were more likely to express concerns than excitement about increased use of artificial intelligence in daily life.

https://cutt.ly/Ywqld1X7

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"EU’s New AI Law Targets Big Tech Companies but Is Probably Only Going to Harm the Smallest Ones"


In a bold stroke, the EU’s amended AI Act would ban American companies such as OpenAI, Amazon, Google, and IBM from providing API access to generative AI models. The amended act, voted out of committee on Thursday, would sanction American open-source developers and software distributors, such as GitHub, if unlicensed generative models became available in Europe. While the act includes open source exceptions for traditional machine learning models, it expressly forbids safe-harbor provisions for open source generative systems.

Any model made available in the EU, without first passing extensive, and expensive, licensing, would subject companies to massive fines of the greater of €20,000,000 or 4% of worldwide revenue.

(Quote from Technomancers.ai.)

https://bit.ly/3ociZwo

Paywall: "Microsoft Says New A.I. Shows Signs of Human Reasoning"


When computer scientists at Microsoft started to experiment with a new artificial intelligence system last year, they asked it to solve a puzzle that should have required an intuitive understanding of the physical world. . . . The clever suggestion [by the AI] made the researchers wonder whether they were witnessing a new kind of intelligence. In March, they published a 155-page research paper arguing that the system was a step toward artificial general intelligence, or A.G.I., which is shorthand for a machine that can do anything the human brain can do.

https://bit.ly/42FkLp1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |