"Responsible AI at the Vanderbilt Television News Archive: A Case Study"


We provide an overview of the use of machine-learning and artificial intelligence at the Vanderbilt Television News Archive (VTNA). After surveying our major initiatives to date, which include the full transcription of the collection using a custom language model deployed on Amazon Web Services (AWS), we address some ethical considerations we encountered, including the possibility of staff downsizing and misidentification of individuals in news recordings.

https://doi.org/10.7191/jeslib.805

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Using AI/Machine Learning to Extract Data from Japanese American Confinement Records"


Purpose: This paper examines the use of Artificial Intelligence/Machine Learning to extract a more comprehensive data set from a structured “standardized” form used to document Japanese American incarcerees during World War II.

Setting/Participants/Resources: The Bancroft Library partnered with Densho, a community memory organization, and Doxie.AI to complete this work.

Brief Description: The project digitized the complete set of Form WRA-26 "individual record"’ for more than 110,000 Japanese Americans incarcerated in War Relocation Authority camps during WWII. The library utilized AI/machine learning to automate text extraction from over 220,000 images of a structured "standardized" form; our goal was to improve upon and collect information not previously recorded in the Japanese American Internee Data file held by the National Archives and Records Administration. The project team worked with technical, academic, legal, and community partners to address ethical and logistical issues raised by the data extraction process, and to assess appropriate access options for the dataset(s) and digitized records.

https://doi.org/10.7191/jeslib.850

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Implementation of Keenious at Carnegie Mellon University"


n the fall of 2022, the Carnegie Mellon University (CMU) Libraries began investigating Keenious—an artificial intelligence (AI)-based article recommender tool&mdashfor a possible trial implementation to improve pathways to resource discovery and assist researchers in more effectively searching for relevant research. This process led to numerous discussions within the library regarding the unique nature of AI-based tools when compared with traditional library resources, including ethical questions surrounding data privacy, algorithmic transparency, and the impact on the research process. This case study explores these topics and how they were negotiated up to and immediately following CMU’s implementation of Keenious in January, 2023, and highlights the need for more frameworks for evaluating AI-based tools in academic settings.

https://doi.org/10.7191/jeslib.800

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Evaluating the Performance of ChatGPT and Perplexity AI in Business Reference"


The Thomas Mahaffey Jr. Business Library conducted a study to assess the performance of two competing generative AI products, ChatGPT and Perplexity AI, in answering business reference questions. The study used a data set consisting of a sample of anonymized reference questions submitted through the library’s ServiceNow ticketing system between January 2018 and May 2022. The questions were input as prompts to each competing AI. . . . Results showed similar and underwhelming performance between each AI at the composite level. Analysis of scores in each individual scoring dimension showed greater variance in the score distributions between the competing AI. Through the evaluation process, key strengths, weaknesses, and trends emerged between each AI.

https://doi.org/10.1080/08963568.2024.2317534

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Leveraging ChatGPT and Bard for Academic Librarians and Information Professionals: A Case Study of Developing Pedagogical Strategies Using Generative AI Models"


This study focuses on improving pedagogical strategies by integrating artificial intelligence (AI) chatbots and library databases. Examples from ChatGPT and Bard were used to demonstrate the quality of information. A cross-examination using a research validation template was conducted; it revealed that no artificial hallucinations were produced. However, the information provided by both AI chatbots was slightly outdated based on organizational changes and did not provide an in-depth analysis of the company.

https://doi.org/10.1080/08963568.2024.2321729

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Anthropic Says Its Latest AI Bot Can Beat Gemini and ChatGPT"


Anthropic, the AI company started by several former OpenAI employees, says the new Claude 3 family of AI models performs as well as or better than leading models from Google and OpenAI. Unlike earlier versions, Claude 3 is also multimodal, able to understand text and photo inputs.

Anthropic says Claude 3 will answer more questions, understand longer instructions, and be more accurate. Claude 3 can understand more context, meaning it can process more information.

https://tinyurl.com/yb3dw8u7

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Artificial intelligence News 03/04/24

"Adobe Is Testing a New AI Tool That Can Create Music From Text Prompts"

"The Best Generative AI Courses Money Can Buy"

"Generative AI Is Challenging a 234-Year-Old Law"

"How to Picture A.I."

"Is ChatGPT Making Scientists Hyper-Productive? The Highs and Lows of Using AI"

"StarCoder Is a Code-Generating AI That Runs on Most GPUs"

"‘Up to 1,000X Faster’: AI Startup Wants to Make GPU Training Obsolete with an Extraordinary Piece of Tech — Meet the Tseltin Machine Which May Come To a Device near You Sooner than You Think"

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Digital Science Announces Innovative New Research Solutions from Dimensions, Delivering Insights from ~350 Million Publications, Grants, Patents and Clinical Trials through CHATGPT’s Conversational Interface."


With Dimensions Research GPT, any Plus or Enterprise ChatGPT Store subscriber can gain insight from the tens of millions of Open Access publications available through Dimensions, providing a quick and effective search of open research on the user-friendly ChatGPT platform. Search results include clickable links that take users to Dimensions, where they can learn more about source materials.

Dimensions Research GPT Enterprise is available to mutual ChatGPT Enterprise and organization-wide Dimensions customers, and incorporates insights from all publications, grants, clinical trials and patents within the Dimensions database. Queries such as "tell me about clinical trials for malaria vaccines", followed by "tell me about the key research articles that underpin these trials" provide answers that come fully linked to the research literature behind the responses, ensuring that researchers can access and check the source of the information.

https://www.digital-science.com/news/dimensions-research-gpt/

Here is a inquiry I made to Dimensions Research GPT:

Describe the history of the open access movement

The history of the open access movement, which aims to make scholarly research freely available to all, is rich and diverse, involving various stakeholders including researchers, libraries, publishers, and educational institutions. The top results from Dimensions include a range of perspectives and approaches, with a total of 946,585 related documents found.

Perspectives on Open Access Publishing:

Particle Physics and Open Access: Travis C. Brooks (2007) discusses the adoption of open access in particle physics, highlighting the field’s longstanding tradition of preprints and the European efforts towards fully open access publishing models (Brooks, 2007).

Scholarly Publishing System Critique: Molly Kleinman (2017) offers a critical view of the traditional scholarly publishing system and presents the open access movement as a remedy to its drawbacks, emphasizing its benefits for faculty, universities, and the public (Kleinman, 2017).

Academic Publishing Evolution: Bernard Forgues and Sébastien Liarte (2013) reflect on the evolution of academic publishing, discussing the challenges of subscription prices and time embargoes, and how the open access movement aims to reshape scholarly communication (Forgues & Liarte, 2013).

Innovative Open Access Models and Reflections:

Open Access Archives for Education: Bonnie Gordon et al. (2016) describe how Interference Archive uses an open stacks policy for education, allowing for a participatory approach in deciding how histories are told and accessed (Gordon et al., 2016).

Scholarly Knowledge Freedom: Peter Baldwin (2023) examines the past, present, and future of the open access movement, arguing for the free dissemination of scholarly knowledge (Baldwin, 2023).

University Presses and Open Access: Paul Ayris and Lara Speicher (2015) discuss the launch of UCL Press, the UK’s first fully open access university press, highlighting the global movement towards open access and open science (Ayris & Speicher, 2015).

Other Relevant Insights:

MOOCs and Licensing Challenges: David Kernohan (2013) examines the challenges of licensing and intellectual property in the massive open online course (MOOC) movement, advocating for open licenses to maximize content management benefits (Kernohan, 2013).

National Parks and Open Access: Kristian Martinus Mennen (2023) explores the history of the national parks movement in England and Wales, emphasizing its role in providing public access to natural spaces (Mennen, 2023).

In conclusion, the history of the open access movement is characterized by efforts across various fields to make scholarly knowledge freely available, addressing the challenges posed by traditional publishing models and exploring innovative solutions. With 946,585 documents identified, the movement’s impact and evolution continue to be a rich area for exploration and discussion.

This is what Microsoft’s Copilot said:

The Open Access Movement has a rich history spanning several decades. Let’s delve into some key milestones:

Project Gutenberg (1971): Michael Hart launched Project Gutenberg, an initiative to digitize and archive cultural works, making them freely accessible 1.

Psycoloquy (1989): Stevan Harnad launched the free online journal Psycoloquy, which later became a peer-reviewed journal in January 1990 1.

The Public-Access Computer Systems Review (1989): Charles W. Bailey Jr. initiated this free online journal, which transitioned to peer-reviewed status in April 1992 1.

Tim Berners-Lee and the World Wide Web (1990-1991): Tim Berners-Lee proposed the concept of the World Wide Web, leading to its launch in March 1991. This pivotal development facilitated global information sharing 1.

ArXiv (1991): Physicist Paul Ginsparg established ArXiv, an open repository for preprints in physics and related fields 1.

Emergence of E-Journals (Early 1990s): Several e-journals, such as Ejournal, Electronic Journal of Communication, and Postmodern Culture, emerged during this period 1.

Standards for the World Wide Web (1991): Tim Berners-Lee released the standards for the World Wide Web, shaping its growth and accessibility 1.

Project MUSE (1993): The Milton S. Eisenhower Library, in partnership with Johns Hopkins University Press, launched Project MUSE, a pioneer in online distribution that allowed full-text searches 1.

Self-Archiving Proposal (1994): Stevan Harnad proposed self-archiving, advocating for authors to freely share their research 1.

Social Science Research Network (SSRN) (1994): Wayne Marr and Michael Jensen founded SSRN, a platform for sharing working papers and preprints 1.

Open Access Conversion (1994): The print journal Florida Entomologist transitioned to an open-access format, making all previous issues available for free 1.

Networked Digital Library of Theses and Dissertations (NDLTD) (1996): Virginia Polytechnic Institute and State University launched NDLTD, promoting open access to theses and dissertations 1.

Internet Archive (1996): Brewster Kahle established the Internet Archive, preserving digital content for public access 1.

Medline and PubMed (1997): Medline, initially launched in 1966, became open access and was incorporated into PubMed 1. SciELO (1997): SciELO (Scientific Electronic Library Online) began development as a project supported by FAPESP and BIREME 1.

The Open Access Movement continues to evolve, driven by the vision of making scholarly knowledge freely available to all.

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"OpenAI: ‘The New York Times Paid Someone to Hack Us’"


OpenAI accuses The New York Times of paying someone to hack OpenAI’s products. This was allegedly done to gather evidence for the copyright infringement complaint the newspaper filed late last year. This lawsuit fails to meet The Times’ "famously rigorous journalistic standards," the defense argues, asking the New York federal court to dismiss it in part.

http://tinyurl.com/44j8wchb

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Your Organization Isn’t Designed to Work with GenAI "


A good example is the customer service model employed by Jerry, a company valued at $450 million with over five million customers that serves as a one stop-shop for car owners to get insurance and financing. Jerry receives over 200,000 messages a month from customers. With such high volume, the company struggled to respond to customer queries within 24 hours, let alone minutes or seconds. By installing their GenAI solution in May 2023, they moved from having humans in the lead in the entirety of the customer service process and answering only 54% of customer inquiries within 24 hours or less to having AI in the lead 100% of the time and answering over 96% of inquiries within 30 seconds by June 2023. They project $4 million in annual savings from this transformation.

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

U.S. Copyright Office Update on Its Artificial Intelligence Initiatives


In March 2023, the Office announced a broad initiative to examine the copyright implications of the current forms of generative AI. Although we had previously examined the scope of copyright in works created using AI, the increasing sophistication and public adoption of generative AI tools raised new questions about the process of training and the legal status of the outputs. Our goal was to gather information from a full range of knowledgeable and interested parties in order to produce a report to assist Congress, thecourts, and others in formulating policy in this area. In taking this initiative forward, we are monitoring related work being done in other agencies, including the U.S. Patent and Trademark Office (USPTO) and the Federal Trade Commission, and communicating with them on an ongoing basis.

This letter summarizes the Office’s work so far and describes our agenda for the rest of 2024, including the release of the report, updates to the Compendium of U.S. Copyright Office Practices, and the publication of a proposed economic research agenda.

http://tinyurl.com/4tpeyw3t

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "GPT-4 Developer Tool Can Hack Websites without Human Help"


OpenAI’s artificial intelligence model GPT-4 has the capability to hack websites and steal information from online databases without human help, researchers have found. That suggests individuals or organisations without hacking expertise could unleash AI agents to carry out cyber attacks.

http://tinyurl.com/24u7bnud

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"A Descriptive Analysis of AI Guides in Academic Libraries"


The purpose of this work is to do an exploratory descriptive analysis of topics included in Artificial Intelligence (AI) electronic library guides of academic libraries. AI library guides from members of the Association of Research Libraries (ARL) are chosen for selection. Of the 48 libraries randomly selected, 25 have AI guides. Data collected was subject to a content analysis process which included the use of terms co-occurrence using the VOSviewer software. The results show three clusters of terms, which are complemented with the discussion of materials found in the guides.

https://digitalcommons.unl.edu/libphilprac/8033/

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Using CHATGPT-Generated Essays in Library Instruction"


This case study details a library instruction activity developed by a team of academic librarians, which intended to leverage experiential learning to make students and faculty aware of the function, capabilities, and limitations of text-generating artificial intelligence (AI) tools like OpenAI’s ChatGPT. The activity is described, with its development connected to key instructional theories and frameworks.

https://doi.org/10.1016/j.acalib.2024.102863

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Generative AI News

Adobe Acrobat Adds Generative AI to "Easily Chat With Documents"

ChatGPT Spat Out Gibberish for Many Users Overnight before OpenAI Fixed It

Exploring ChatGPT for Next-Generation Information Retrieval: Opportunities and Challenges

Generative AI’s Environmental Costs Are Soaring — And Mostly Secret

Google Releases "Open" AI Models after Meta

Generative AI Market Share Reach USD 126.5 Billion by 2031, Impacting Factors That Could Escalate Market’s Rapid Growth

Grammarly and Harris Poll Find Using Generative AI for Communication Could Save Up to $1.6 Trillion Annually in U.S. Productivity

One Month with Microsoft’s AI Vision of the Future: Copilot Pro

OpenAI Can’t Register "GPT" as a Trademark — Yet

Top 20 Generative AI Tools & Applications in 2024

What the EU’s Tough AI Law Means for Research and ChatGPT

Why the New York Times Might Win Its Copyright Lawsuit against OpenAI

"Guest Post — There is More to Reliable Chatbots than Providing Scientific References: The Case of ScopusAI"


In October, my institution was granted access to the Beta version of ScopusAI. I have tested it using a concept connected to my PhD dissertation in physics, an "electromagnon". In this post, I want to share my experience and use it to illustrate the many dimensions the design and assessment of such tools need to consider. . . .

[The author provides an extensive description and analysis of the performed tests as well as their broader implications.]

And if AI is only as good as its underlying data, let’s not forget who owns the scholarly data and regulates access to it. Big scholarly publishers have long been using content as a resource to capitalize on. AI tools amplify existing imbalances in access to scholarly text: if a publisher owns the exclusive right to a text, they can train their own AI on it and make this content unavailable to competing AI projects, profiting from the copyright yet again. Currently, most AI research assistants are grounded with abstracts, but the real value is contained in the full text of articles, and accessing them remains very difficult.

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Will Traffic from Search Engines Fall 25% by 2026?"


Will traffic from search engines to your website drop 25% by 2026 as a result of the rise of generative AI answer engines? That’s what Gartner, a technological research and consulting firm, is predicting. . . . Gartner previously predicted that "organic search traffic will decrease by 50% or more [by 2028] as consumers embrace generative AI-powered search."

http://tinyurl.com/mtxvt8kj

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"OpenAI Introduces Sora, Its Text-to-Video AI Model"


The AI company says Sora "can create realistic and imaginative scenes from text instructions." The text-to-video model allows users to create photorealistic videos up to a minute long — all based on prompts they’ve written. . . . The model can also generate a video based on a still image, as well as fill in missing frames on an existing video or extend it.

http://tinyurl.com/y6jfbyd6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"OpenAI Wants to Eat Google Search’s Lunch"


OpenAI is reportedly developing a search app that would directly compete with Google Search . . . Microsoft Bing would allegedly power the service from Sam Altman, which could be the most serious threat Google Search has ever faced. Current AI-enabled search engines from Google and Perplexity answer your questions with a clear AI-generated answer, usually in one to two sentences. Then, the engine provides links to its sources below, like a hybrid between an AI chatbot and a search engine. The report says this new search product could be faster than ChatGPT, without sacrificing its powerful summarizing abilities.

http://tinyurl.com/yc65hb5p

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"The Text File That Runs the Internet"


But robots.txt is not a legal document — and 30 years after its creation, it still relies on the good will of all parties involved. Disallowing a bot on your robots.txt page. . . sends a message, but it’s not going to stand up in court. Any crawler that wants to ignore robots.txt can simply do so, with little fear of repercussions. . . . As the AI companies continue to multiply, and their crawlers grow more unscrupulous, anyone wanting to sit out or wait out the AI takeover has to take on an endless game of whac-a-mole. . . . If AI is in fact the future of search, as Google and others have predicted, blocking AI crawlers could be a short-term win but a long-term disaster.

http://tinyurl.com/5n8s72bz

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Court Dismisses Authors’ Copyright Infringement Claims Against OpenAI"


Several authors, including comedian Sarah Silverman, have suffered an early loss in their copyright battle against OpenAI. The authors accused OpenAI of using pirated copies of their books to train its models. A California federal court dismissed the vicarious copyright infringement and DMCA violation claims. However, the lawsuit isn’t over yet.

http://tinyurl.com/478vm6kw

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

2024 EDUCAUSE AI Landscape Study


Moving from reaction to action, higher education stakeholders are currently exploring the opportunities afforded by AI for teaching, learning, and work. . . To aid in these efforts, we present this inaugural EDUCAUSE AI Landscape Study, in which we summarize the higher education community’s current sentiments and experiences related to strategic planning and readiness, policies and procedures, workforce, and the future of AI in higher education.

http://tinyurl.com/4fhprhs6

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI"


Using WARC-GPT, you can ask specific questions in natural language against a collection of WARC files. Rather than relying on keyword searches and metadata filters to sort through search results, WARC-GPT provides a new starting point for search using multi-document full-text search with summarization to explore the contents of web archives. WARC-GPT lists the sources used to generate the response and relevant text excerpts, which you can use to verify the information provided and identify points of interest within a collection of web archives.

http://tinyurl.com/3vvpsyj9

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Generative Artificial Intelligence in Higher Education: Evidence from an Analysis of Institutional Policies and Guidelines"


In this paper we examined documents produced by 116 US universities categorized as high research activity or R1 institutions to comprehensively understand GenAI related advice and guidance given to institutional stakeholders. Through an extensive analysis, we found the majority of universities (N=73, 63%) encourage the use of GenAI and many provide detailed guidance for its use in the classroom (N=48, 41%). More than half of all institutions provided sample syllabi (N=65, 56%) and half (N=58, 50%) provided sample GenAI curriculum and activities that would help instructors integrate and leverage GenAI in their classroom. Notably, most guidance for activities focused on writing, whereas code and STEM-related activities were mentioned half the time and vaguely even when they were (N=58, 50%).

https://arxiv.org/abs/2402.01659

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How (and Why) the University of Michigan Built Its Own Closed Generative AI Tools "


On August 21, U-M did have three unique generative AI tools ready for returning students and employees.

  1. U-M GPT is the tool that most resembles ChatGPT. It is able to answer questions, produce written content, and make recommendations. Additionally, U-M GPT supports multiple commercial and open-source language models and AI art generators, broadening its utility and applications. . . .
  2. U-M Maizey is a no-code platform that allows users to build unique and customized chat programs by using their own datasets in combination with U-M’s AI language models. . . .
  3. U-M GPT Toolkit is designed for AI developers who require full control over the AI model and environment that they are building, training, and hosting. Researchers and developers who want to use the U-M GPT Toolkit must contact the ITS AI team for access.

http://tinyurl.com/yu8ym4j8

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |