"The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces"


Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces — even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we’ve developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We’ve also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers — Discovery, Efficiency, Comprehension, Synthesis, and Accessibility — and present an overview of our progress and remaining open challenges.

https://arxiv.org/abs/2303.14334

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Google is Changing the Way We Search with AI. It Could Upend theWeb."


At the same time, the talk of replacing search results with AI-generated answers has roiled the world of people who make their living writing content and building websites. If a chatbot takes over the role of helping people find useful information, what incentive would there be for anyone to write how-to guides, travel blogs or recipes?

https://cutt.ly/s6kmQpF

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Is There a Case for Accepting Machine Translated Scholarly Content in Repositories?"


Multilingualism is a critical characteristic of a healthy, inclusive, and diverse research communications landscape. However, multilingualism presents a particular challenge for the discovery of research outputs. Although researchers and other information seekers may only be able to read in one or two languages, they may want to know about all the relevant research in their area, regardless of the language in which it is published. Conversely, information seekers may want to discover research outputs in their own language(s) more easily. To facilitate this, COAR Task Force on Supporting Multilingualism and non-English Content in Repositories has been developing and promoting good practices for repositories in managing multilingual and non-English content. In the course of our work, the topic of machine translation (MT) has sparked a heated discussion within the Task Group and we would like to share with you the nature of this discussion.

https://bit.ly/42D1nbF

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Quick Poll Results: ARL Member Representatives on Generative AI in Libraries"


We conducted a quick poll of Association of Research Libraries (ARL) member representatives in April 2023 to gather insights into their current perspectives on generative AI adoption, its potential implications, and the role of libraries in AI-driven environments. In this blog post, we summarize, synthesize, and provide recommendations based on the survey responses, aiming to offer valuable insights for senior library directors navigating the AI landscape.

https://bit.ly/3M9yVc2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

2023 EDUCAUSE Horizon Report: Teaching and Learning Edition


This report profiles key trends and emerging technologies and practices shaping the future of teaching and learning, and envisions a number of scenarios and implications for that future. . . .

Artificial intelligence (AI) has taken the world by storm, with new AI-powered tools such as ChatGPT opening up new opportunities in higher education for content creation, communication, and learning, while also raising new concerns about the misuses and overreach of technology. Our shared humanity has also become a key focal point within higher education, as faculty and leaders continue to wrestle with understanding and meeting the diverse needs of students and to find ways of cultivating institutional communities that support student well-being and belonging.

https://bit.ly/3panaJd

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond"


This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks.. . . Firstly, we offer an introduction and brief summary of current GPT- and BERT-style LLMs. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, natural language generation tasks, emergent abilities, and considerations for specific tasks.We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. . . . Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice.

https://arxiv.org/abs/2304.13712

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI Is Tearing Wikipedia Apart"


The current draft policy notes that anyone unfamiliar with the risks of large language models should avoid using them to create Wikipedia content. . . . The community is also divided on whether large language models should be allowed to train on Wikipedia content. While open access is a cornerstone of Wikipedia’s design principles, some worry the unrestricted scraping of internet data allows AI companies like OpenAI to exploit the open web to create closed commercial datasets for their models. This is especially a problem if the Wikipedia content itself is AI-generated, creating a feedback loop of potentially biased information, if left unchecked.

https://bit.ly/3NLrc50

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "’The Godfather of A.I.’ Leaves Google and Warns of Danger Ahead"


Down the road, he is worried that future versions of the technology pose a threat to humanity because they often learn unexpected behavior from the vast amounts of data they analyze. This becomes an issue, he said, as individuals and companies allow A.I. systems not only to generate their own computer code but actually run that code on their own. And he fears a day when truly autonomous weapons — those killer robots — become reality.

"The idea that this stuff could actually get smarter than people — a few people believed that," he said. "But most people thought it was way off. And I thought it was way off. I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that."

https://bit.ly/3VoA9Dh

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Is Artificial General Intelligence Closer Than We Think?: "Sparks of Artificial General Intelligence: Early Experiments with GPT-4"


Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google’s PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

https://doi.org/10.48550/arXiv.2303.12712

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Generative AI and Copyright Policy From the Creator-User’s Perspective"


As scholars Mark Lemley and Bryan Casey persuasively argue in their paper Fair Learning, we should generally permit generative AI tools that in effect learn from past works in ways that facilitate creation of new, distinct ones. While some claim that generative AI systems are simply engines for ‘collage’ or ‘plagiarism,’ copying previous expressions into new works, this isn’t an accurate description of how most tools work. Instead, generative AI extracts information that then is used to inform generation of new material; for instance, by looking at many pictures of dogs, it can extract information about what dogs look like, and can then help a user draw dogs, or by looking at many pieces of art labeled as Surrealist, it can help a user create new works in the style of Surrealism. In effect, these are tools that aid new creators in their learning and building on past works.

https://bit.ly/3GVNhK5

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"EDUCAUSE QuickPoll Results: Adopting and Adapting to Generative AI in Higher Ed Tech"


Asked about their agreement with specific statements about generative AI, a strong majority of respondents (83%) agreed that these technologies will profoundly change higher education in the next three to five years (see table 1). These changes could be positive or negative. More respondents agreed than disagreed that generative AI would make their job easier and would have more benefits than drawbacks. However, more respondents agreed than disagreed that the use of generative AI in higher education makes them nervous, perhaps an acknowledgment of the potential risks of these technologies, however beneficial

https://bit.ly/3GTzdkf

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "Google Devising Radical Search Changes to Beat Back A.I. Rivals"


Google’s employees were shocked when they learned in March that the South Korean consumer electronics giant Samsung was considering replacing Google with Microsoft’s Bing as the default search engine on its devices. . . .Google’s reaction to the Samsung threat was "panic," according to internal messages reviewed by The New York Times. An estimated $3 billion in annual revenue was at stake with the Samsung contract. An additional $20 billion is tied to a similar Apple contract that will be up for renewal this year.

https://bit.ly/3MQjYfD

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"How Generative AI Could Disrupt Creative Work "


In the face of technological change, creativity is often held up as a uniquely human quality, less vulnerable to the forces of technological disruption and critical for the future. Today however, generative AI applications such as ChatGPT and Midjourney are threatening to upend this special status and significantly alter creative work, both independent and salaried. The authors explore three non-exclusive scenarios for this disruption of content creation: 1) people use AI to augment their work, leading to greater productivity, 2) generative AI creates a flood of cheap content that drives out human creatives, and 3) human-made creative work demands a premium.

https://bit.ly/43pP4kh

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Surprising Things Happen When You Put 25 AI Agents Together in an RPG [Role-Playing Game] Town"


A group of researchers at Stanford University and Google have created a miniature RPG-style virtual world similar to The Sims, where 25 characters, controlled by ChatGPT and custom code, live out their lives independently with a high degree of realistic behavior. . . .

"Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day," write the researchers in their paper, "Generative Agents: Interactive Simulacra of Human Behavior."

http://bit.ly/3KSwc6b

"‘We Have to Move Fast’: US Looks to Establish Rules for Artificial Intelligence"


The US commerce department on Tuesday announced it is officially requesting public comment on how to create accountability measures for AI, seeking help on how to advise US policymakers to approach the technology….

The National Institute of Standards and Technology has also published an AI risk management framework, voluntary guardrails that companies can use to attempt to limit the risk of harm to the public.

https://cutt.ly/d7UOF25

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "The Man Who Unleashed AI on an Unsuspecting Silicon Valley"


The rise of OpenAI and the explosion of interest in ChatGPT has catapulted Altman, 37, from a prolific investor and protege of more powerful men to a central player among the most powerful people in tech. It has also made him a key voice in the heated and globe-spanning debate over AI, what it’s capable of and who should control it.

https://cutt.ly/x7Qq5RV

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Stable Diffusion Copyright Lawsuits Could Be a Legal Earthquake for AI"


In January, three visual artists filed a class-action copyright lawsuit against Stability AI, the startup that created Stable Diffusion. In February, the image-licensing giant Getty filed a lawsuit of its own. . . . There’s a real possibility that the courts could decide that Stability AI violated copyright law on a massive scale. . . . Building cutting-edge generative AI would require getting licenses from thousands—perhaps even millions—of copyright holders. The process would likely be so slow and expensive that only a handful of large companies could afford to do it. Even then, the resulting models likely wouldn’t be as good.

http://bit.ly/3K8FRno

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Stanford Institute for Human-Centered Artificial Intelligence: Artificial Intelligence Index Report 2023


The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world’s most credible and authoritative source for data and insights about AI

https://bit.ly/40PH0Y4

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Guest Post — Academic Publishers Are Missing the Point on ChatGPT"


On the other hand, publishers would be wise to leave the back door open for authors to use AI tools in order to support their research for two reasons. First, strictly policing the use of these tools would not only be an exercise in futility, but enforcement could quickly become a nightmare. Second, an arms race seems to already be underway to build out software to detect AI writing. Publishers will likely spend ungodly sums of money on these tools, only to be set back by even better models that can outsmart the detectors. Whether that should be our focus is an important question to ponder before diving in headfirst.

https://bit.ly/3nEiYkm

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models"


We investigate the potential implications of large language models (LLMs), such as Generative Pretrained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. . . .Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. . . . Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. . . .We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications

https://arxiv.org/abs/2303.10130

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ChatGPT Gets ‘Eyes and Ears’ with Plugins That Can Interface AI with the World"


Basically, if a developer wants to give ChatGPT the ability to access any network service (for example: "looking up current stock prices") or perform any task controlled by a network service (for example: "ordering pizza through the Internet"), it is now possible, provided it doesn’t go against OpenAI’s rules.

http://bit.ly/3ZlESG0

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Google’s Bard Chatbot Doesn’t Love Me — But It’s Still Pretty Weird"


As far as I can tell, it’s also a noticeably worse tool than Bing, at least when it comes to surfacing useful information from around the internet. Bard is wrong a lot. And when it’s right, it’s often in the dullest way possible. Bard wrote me a heck of a Taylor Swift-style breakup song about dumping my cat, but it’s not much of a productivity tool. And it’s definitely not a search engine.

http://bit.ly/3JXVob1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI and Copyright: Human Artistry Campaign Launches to Support Musicians"


The fast rise of AI technology has opened up a world of brain-busting questions about copyright and creators’ rights. . . . A new coalition to meet those challenges called the Human Artistry Campaign was announced at the South by Southwest conference on Thursday, with support from more than 40 organizations, including the Recording Academy, the National Music Publishers Association, the Recording Industry of America and many others.

bit.ly/402Nt1G

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

U.S. Copyright Office: "Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence"


As the agency overseeing the copyright registration system, the Office has extensive experience in evaluating works submitted for registration that contain human authorship combined with uncopyrightable material, including material generated by or with the assistance of technology. It begins by asking "whether the ‘work’ is basically one of human authorship, with the computer [or other device] merely being an assisting instrument, or whether the traditional elements of authorship in the work (literary, artistic, or musical expression or elements of selection, arrangement, etc.) were actually conceived and executed not by man but by a machine." [23] In the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of "mechanical reproduction" or instead of an author’s "own original mental conception, to which [the author] gave visible form." [24] The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work.[25] This is necessarily a case-by-case inquiry.

If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it.[26] For example, when an AI technology receives solely a prompt [27] from a human and produces complex written, visual, or musical works in response, the "traditional elements of authorship" are determined and executed by the technology—not the human user. Based on the Office’s understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output.[28] For example, if a user instructs a text-generating technology to "write a poem about copyright law in the style of William Shakespeare," she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare’s style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text.[30] When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship.[31] As a result, that material is not protected by copyright and must be disclaimed in a registration application.[32]

In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that "the resulting work as a whole constitutes an original work of authorship." [33] Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection.[34] In these cases, copyright will only protect the human-authored aspects of the work, which are "independent of" and do "not affect" the copyright status of the AI-generated material itself.[35]

bit.ly/40oOkJA

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |