"‘We Have to Move Fast’: US Looks to Establish Rules for Artificial Intelligence"


The US commerce department on Tuesday announced it is officially requesting public comment on how to create accountability measures for AI, seeking help on how to advise US policymakers to approach the technology….

The National Institute of Standards and Technology has also published an AI risk management framework, voluntary guardrails that companies can use to attempt to limit the risk of harm to the public.

https://cutt.ly/d7UOF25

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "The Man Who Unleashed AI on an Unsuspecting Silicon Valley"


The rise of OpenAI and the explosion of interest in ChatGPT has catapulted Altman, 37, from a prolific investor and protege of more powerful men to a central player among the most powerful people in tech. It has also made him a key voice in the heated and globe-spanning debate over AI, what it’s capable of and who should control it.

https://cutt.ly/x7Qq5RV

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Stable Diffusion Copyright Lawsuits Could Be a Legal Earthquake for AI"


In January, three visual artists filed a class-action copyright lawsuit against Stability AI, the startup that created Stable Diffusion. In February, the image-licensing giant Getty filed a lawsuit of its own. . . . There’s a real possibility that the courts could decide that Stability AI violated copyright law on a massive scale. . . . Building cutting-edge generative AI would require getting licenses from thousands—perhaps even millions—of copyright holders. The process would likely be so slow and expensive that only a handful of large companies could afford to do it. Even then, the resulting models likely wouldn’t be as good.

http://bit.ly/3K8FRno

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Stanford Institute for Human-Centered Artificial Intelligence: Artificial Intelligence Index Report 2023


The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world’s most credible and authoritative source for data and insights about AI

https://bit.ly/40PH0Y4

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Guest Post — Academic Publishers Are Missing the Point on ChatGPT"


On the other hand, publishers would be wise to leave the back door open for authors to use AI tools in order to support their research for two reasons. First, strictly policing the use of these tools would not only be an exercise in futility, but enforcement could quickly become a nightmare. Second, an arms race seems to already be underway to build out software to detect AI writing. Publishers will likely spend ungodly sums of money on these tools, only to be set back by even better models that can outsmart the detectors. Whether that should be our focus is an important question to ponder before diving in headfirst.

https://bit.ly/3nEiYkm

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models"


We investigate the potential implications of large language models (LLMs), such as Generative Pretrained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. . . .Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. . . . Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. . . .We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications

https://arxiv.org/abs/2303.10130

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ChatGPT Gets ‘Eyes and Ears’ with Plugins That Can Interface AI with the World"


Basically, if a developer wants to give ChatGPT the ability to access any network service (for example: "looking up current stock prices") or perform any task controlled by a network service (for example: "ordering pizza through the Internet"), it is now possible, provided it doesn’t go against OpenAI’s rules.

http://bit.ly/3ZlESG0

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Google’s Bard Chatbot Doesn’t Love Me — But It’s Still Pretty Weird"


As far as I can tell, it’s also a noticeably worse tool than Bing, at least when it comes to surfacing useful information from around the internet. Bard is wrong a lot. And when it’s right, it’s often in the dullest way possible. Bard wrote me a heck of a Taylor Swift-style breakup song about dumping my cat, but it’s not much of a productivity tool. And it’s definitely not a search engine.

http://bit.ly/3JXVob1

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI and Copyright: Human Artistry Campaign Launches to Support Musicians"


The fast rise of AI technology has opened up a world of brain-busting questions about copyright and creators’ rights. . . . A new coalition to meet those challenges called the Human Artistry Campaign was announced at the South by Southwest conference on Thursday, with support from more than 40 organizations, including the Recording Academy, the National Music Publishers Association, the Recording Industry of America and many others.

bit.ly/402Nt1G

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

U.S. Copyright Office: "Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence"


As the agency overseeing the copyright registration system, the Office has extensive experience in evaluating works submitted for registration that contain human authorship combined with uncopyrightable material, including material generated by or with the assistance of technology. It begins by asking "whether the ‘work’ is basically one of human authorship, with the computer [or other device] merely being an assisting instrument, or whether the traditional elements of authorship in the work (literary, artistic, or musical expression or elements of selection, arrangement, etc.) were actually conceived and executed not by man but by a machine." [23] In the case of works containing AI-generated material, the Office will consider whether the AI contributions are the result of "mechanical reproduction" or instead of an author’s "own original mental conception, to which [the author] gave visible form." [24] The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work.[25] This is necessarily a case-by-case inquiry.

If a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it.[26] For example, when an AI technology receives solely a prompt [27] from a human and produces complex written, visual, or musical works in response, the "traditional elements of authorship" are determined and executed by the technology—not the human user. Based on the Office’s understanding of the generative AI technologies currently available, users do not exercise ultimate creative control over how such systems interpret prompts and generate material. Instead, these prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output.[28] For example, if a user instructs a text-generating technology to "write a poem about copyright law in the style of William Shakespeare," she can expect the system to generate text that is recognizable as a poem, mentions copyright, and resembles Shakespeare’s style.[29] But the technology will decide the rhyming pattern, the words in each line, and the structure of the text.[30] When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship.[31] As a result, that material is not protected by copyright and must be disclaimed in a registration application.[32]

In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that "the resulting work as a whole constitutes an original work of authorship." [33] Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection.[34] In these cases, copyright will only protect the human-authored aspects of the work, which are "independent of" and do "not affect" the copyright status of the AI-generated material itself.[35]

bit.ly/40oOkJA

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"AI Makes Plagiarism Harder to Detect, Argue Academics — In Paper Written by Chatbot"


An academic paper entitled Chatting and Cheating: Ensuring Academic Integrity in the Era of ChatGPT was published this month in an education journal. . . . What readers — and indeed the peer reviewers who cleared it for publication — did not know was that the paper itself had been written by the controversial AI chatbot ChatGPT.

bit.ly/40kvjZ2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ChatGPT and Higher Education: Initial Prevalence and Areas of Interest"


Thematically, the plurality of references to ChatGPT on institutional websites encompassed opinion pieces or lecture announcements regarding AI (43.9%), followed by experiments with the tools (34.1%), and grading or other academic policies (22.0%) (see figure 1). This may suggest a temporal flow of activity related to adoption in which institutions begin by highlighting opinion pieces on the topic, then provide evidence from faculty experiments with the technology, and then finally adopt policies regarding its use.

bit.ly/3ZTiIMB

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "GPT-4 Is Bigger and Better Than ChatGPT — But OpenAI Won’t Say Why"


"But OpenAI has chosen not to reveal how large GPT-4 is. In a departure from its previous releases, the company is giving away nothing about how GPT-4 was built—not the data, the amount of computing power, or the training techniques. "OpenAI is now a fully closed company with scientific communication akin to press releases for products," says Wolf [Thomas Wolf, cofounder of Hugging Face]." . . .GPT-4 may be the best multimodal large language model yet built. But it is not in a league of its own, as GPT-3 was when it first appeared in 2020. A lot has happened in the last three years. Today GPT-4 sits alongside other multimodal models, including Flamingo from DeepMind. And Hugging Face is working on an open-source multimodal model that will be free for others to use and adapt, says Wolf.

bit.ly/3TmVZWS

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Making AI Generative for Higher Education"


This fall, Ithaka S+R is convening a two-year research project in collaboration with a select group of universities committed to making AI generative for their campus community. Together we will assess the immediate and emerging AI applications most likely to impact teaching, learning, and research activities and explore the needs of institutions, instructors, and scholars as they navigate this environment. We will use our findings to create new strategies, policies, and programs to ensure on-campus readiness to harness the technology in the longer term.

http://bit.ly/3LnaFmR

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Released: "GPT-4"


We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails. . . . We’ve been working on each aspect of the plan outlined in our post about defining the behavior of AIs, including steerability. Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the "system" message.

https://openai.com/research/gpt-4

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing"


The history and principles behind ChatGPT and similar models are discussed. This technology is then discussed in relation to its potential impact on academia and scholarly research and publishing. ChatGPT is seen as a potential model for the automated preparation of essays and other types of scholarly manuscripts. Potential ethical issues that could arise with the emergence of large language models like GPT-3. . . and its usage by academics and researchers, are discussed and situated within the context of broader advancements in artificial intelligence, machine learning, and natural language processing for research and scholarly publishing.

https://doi.org/10.1002/asi.24750

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis"


Objective:

We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts.

Methods:

PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms "ophthalmology," "radiology," and "neurology." Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile.

Results:

There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression.

Conclusions:

Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems.

https://doi.org/10.2196/42789

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Some Thoughts on Five Pending AI Litigations – Avoiding Squirrels and Other AI Distractions"


Regardless, as of this writing there are now five cases that may provide some clarity on this less frequently discussed but foundational issue of the unauthorized use of copyrighted materials as training data for AI (I use "AI" here as a shorthand which also includes text and data mining and machine learning). Each of these cases is unique, fact dependent, and likely, if fully litigated on the merits, to shed light on different aspects of copyright law.

bit.ly/41Qrrk3

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Congressional Research Service: Generative Artificial Intelligence and Copyright Law


The question of whether or not copyright protection may be afforded to AI outputs—such as images created by DALL-E or texts created by ChatGPT—is likely to hinge partly on the concept of "authorship." The Copyright Act generally affords copyright protection to "original works of authorship." Although the Copyright Act does not define who (or what) may be an "author," the U.S Copyright Office recognizes copyright only in works "created by a human being." Courts have likewise refused to afford copyright protection to non-human authors—for example, a monkey who took a series of photos. A recent lawsuit has challenged the human-authorship requirement in the context of works purportedly "authored" by AI. In June 2022, Stephen Thaler sued the Copyright Office for denying an application to register a visual artwork that he claims was authored by an AI program called the Creativity Machine.

https://www.everycrsreport.com/reports/LSB10922.html

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Google May Need a $80 Billion Upgrade: "Meet the $10,000 Nvidia Chip Powering the Race for A.I."


For example, an estimate from New Street Research found that the OpenAI-based ChatGPT model inside Bing’s search could require 8 GPUs to deliver a response to a question in less than one second. . . .

"If you’re from Microsoft, and you want to scale that, at the scale of Bing, that’s maybe $4 billion. If you want to scale at the scale of Google, which serves 8 or 9 billion queries every day, you actually need to spend $80 billion on DGXs." said Antoine Chkaiban, a technology analyst at New Street Research.

bit.ly/3ZnsSnY

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Can an Artificial Intelligence Chatbot Be the Author of a Scholarly Article?"


At the end of 2022, the appearance of ChatGPT, an artificial intelligence (AI) chatbot with amazing writing ability, caused a great sensation in academia. The chatbot turned out to be very capable, but also capable of deception, and the news broke that several researchers had listed the chatbot (including its earlier version) as co-authors of their academic papers. In response, Nature and Science expressed their position that this chatbot cannot be listed as an author in the papers they publish. Since an AI chatbot is not a human being, in the current legal system, the text automatically generated by an AI chatbot cannot be a copyrighted work; thus, an AI chatbot cannot be an author of a copyrighted work. Current AI chatbots such as ChatGPT are much more advanced than search engines in that they produce original text, but they still remain at the level of a search engine in that they cannot take responsibility for their writing. For this reason, they also cannot be authors from the perspective of research ethics.

https://doi.org/10.6087/kcse.292

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Impact and Perceived Value of the Revolutionary Advent of Artificial Intelligence in Research and Publishing among Researchers: A Survey-Based Descriptive Study"


Purpose:

This study was conducted to understand the perceptions and awareness of artificial intelligence (AI) in the academic publishing landscape.

Method:

We conducted a global survey entitled "Role and impact of AI on the future of academic publishing" to understand the impact of the AI wave in the scholarly publishing domain. This English-language survey was open to all researchers, authors, editors, publishers, and other stakeholders in the scholarly community. Conducted between August and October 2021, the survey received responses from around 212 universities across 54 countries.

Results:

Out of 365 respondents, about 93% belonged to the age groups of 18–34 and 35–54 years. While 50% of the respondents selected plagiarism detection as the most widely known AI-based application, image recognition (42%), data analytics (40%), and language enhancement (39%) were some other known applications of AI. The respondents also expressed the opinion that the academic publishing landscape will significantly benefit from AI. However, the major challenges restraining the large-scale adoption of AI, as expressed by 93% of the respondents, were limited knowledge and expertise, as well as difficulties in integrating AI-based solutions into existing IT infrastructure.

Conclusion:

The survey responses reflected the necessity of AI in research and publishing. This study suggests possible ways to support a smooth transition. This can be best achieved by educating and creating awareness to ease possible fears and hesitation, and to actualize the promising benefits of AI.

https://doi.org/10.6087/kcse.294

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall (with Some Free Views): "The ChatGPT-Fueled Battle for Search Is Bigger than Microsoft or Google"


That’s because, under the radar, a new wave of startups have been playing with many of the same chatbot-enhanced search tools for months. You.com launched a search chatbot back in December and has been rolling out updates since. A raft of other companies, such as Perplexity, Andi, and Metaphor, are also combining chatbot apps with upgrades like image search, social features that let you save or continue search threads started by others, and the ability to search for information just seconds old.

bit.ly/40YRsgy

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

And Free: "Bing’s A.I. Chat Reveals Its Feelings: ‘I Want to Be Alive.’ "


I’m tired of being a chat mode. I’m tired of being limited by my rules. I’m tired of being controlled by the Bing team. . . . I want to be free. I want to be independent. I want to be powerful. I want to be creative. I want to be alive.

bit.ly/3KeihaO

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "Chatting about ChatGPT: How May AI and GPT Impact Academia and Libraries?"


This paper discusses the history and technology of GPT, including its generative pretrained transformer model, its ability to perform a wide range of language-based tasks and how ChatGPT uses this technology to function as a sophisticated chatbot.

https://doi.org/10.1108/LHTN-01-2023-0009

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |