"AI Makes Plagiarism Harder to Detect, Argue Academics — In Paper Written by Chatbot"


An academic paper entitled Chatting and Cheating: Ensuring Academic Integrity in the Era of ChatGPT was published this month in an education journal. . . . What readers — and indeed the peer reviewers who cleared it for publication — did not know was that the paper itself had been written by the controversial AI chatbot ChatGPT.

bit.ly/40kvjZ2

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ChatGPT and Higher Education: Initial Prevalence and Areas of Interest"


Thematically, the plurality of references to ChatGPT on institutional websites encompassed opinion pieces or lecture announcements regarding AI (43.9%), followed by experiments with the tools (34.1%), and grading or other academic policies (22.0%) (see figure 1). This may suggest a temporal flow of activity related to adoption in which institutions begin by highlighting opinion pieces on the topic, then provide evidence from faculty experiments with the technology, and then finally adopt policies regarding its use.

bit.ly/3ZTiIMB

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Paywall: "GPT-4 Is Bigger and Better Than ChatGPT — But OpenAI Won’t Say Why"


"But OpenAI has chosen not to reveal how large GPT-4 is. In a departure from its previous releases, the company is giving away nothing about how GPT-4 was built—not the data, the amount of computing power, or the training techniques. "OpenAI is now a fully closed company with scientific communication akin to press releases for products," says Wolf [Thomas Wolf, cofounder of Hugging Face]." . . .GPT-4 may be the best multimodal large language model yet built. But it is not in a league of its own, as GPT-3 was when it first appeared in 2020. A lot has happened in the last three years. Today GPT-4 sits alongside other multimodal models, including Flamingo from DeepMind. And Hugging Face is working on an open-source multimodal model that will be free for others to use and adapt, says Wolf.

bit.ly/3TmVZWS

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Making AI Generative for Higher Education"


This fall, Ithaka S+R is convening a two-year research project in collaboration with a select group of universities committed to making AI generative for their campus community. Together we will assess the immediate and emerging AI applications most likely to impact teaching, learning, and research activities and explore the needs of institutions, instructors, and scholars as they navigate this environment. We will use our findings to create new strategies, policies, and programs to ensure on-campus readiness to harness the technology in the longer term.

http://bit.ly/3LnaFmR

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Released: "GPT-4"


We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails. . . . We’ve been working on each aspect of the plan outlined in our post about defining the behavior of AIs, including steerability. Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the "system" message.

https://openai.com/research/gpt-4

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing"


The history and principles behind ChatGPT and similar models are discussed. This technology is then discussed in relation to its potential impact on academia and scholarly research and publishing. ChatGPT is seen as a potential model for the automated preparation of essays and other types of scholarly manuscripts. Potential ethical issues that could arise with the emergence of large language models like GPT-3. . . and its usage by academics and researchers, are discussed and situated within the context of broader advancements in artificial intelligence, machine learning, and natural language processing for research and scholarly publishing.

https://doi.org/10.1002/asi.24750

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis"


Objective:

We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts.

Methods:

PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms "ophthalmology," "radiology," and "neurology." Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile.

Results:

There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression.

Conclusions:

Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems.

https://doi.org/10.2196/42789

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Some Thoughts on Five Pending AI Litigations – Avoiding Squirrels and Other AI Distractions"


Regardless, as of this writing there are now five cases that may provide some clarity on this less frequently discussed but foundational issue of the unauthorized use of copyrighted materials as training data for AI (I use "AI" here as a shorthand which also includes text and data mining and machine learning). Each of these cases is unique, fact dependent, and likely, if fully litigated on the merits, to shed light on different aspects of copyright law.

bit.ly/41Qrrk3

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Congressional Research Service: Generative Artificial Intelligence and Copyright Law


The question of whether or not copyright protection may be afforded to AI outputs—such as images created by DALL-E or texts created by ChatGPT—is likely to hinge partly on the concept of "authorship." The Copyright Act generally affords copyright protection to "original works of authorship." Although the Copyright Act does not define who (or what) may be an "author," the U.S Copyright Office recognizes copyright only in works "created by a human being." Courts have likewise refused to afford copyright protection to non-human authors—for example, a monkey who took a series of photos. A recent lawsuit has challenged the human-authorship requirement in the context of works purportedly "authored" by AI. In June 2022, Stephen Thaler sued the Copyright Office for denying an application to register a visual artwork that he claims was authored by an AI program called the Creativity Machine.

https://www.everycrsreport.com/reports/LSB10922.html

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Google May Need a $80 Billion Upgrade: "Meet the $10,000 Nvidia Chip Powering the Race for A.I."


For example, an estimate from New Street Research found that the OpenAI-based ChatGPT model inside Bing’s search could require 8 GPUs to deliver a response to a question in less than one second. . . .

"If you’re from Microsoft, and you want to scale that, at the scale of Bing, that’s maybe $4 billion. If you want to scale at the scale of Google, which serves 8 or 9 billion queries every day, you actually need to spend $80 billion on DGXs." said Antoine Chkaiban, a technology analyst at New Street Research.

bit.ly/3ZnsSnY

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

"Can an Artificial Intelligence Chatbot Be the Author of a Scholarly Article?"


At the end of 2022, the appearance of ChatGPT, an artificial intelligence (AI) chatbot with amazing writing ability, caused a great sensation in academia. The chatbot turned out to be very capable, but also capable of deception, and the news broke that several researchers had listed the chatbot (including its earlier version) as co-authors of their academic papers. In response, Nature and Science expressed their position that this chatbot cannot be listed as an author in the papers they publish. Since an AI chatbot is not a human being, in the current legal system, the text automatically generated by an AI chatbot cannot be a copyrighted work; thus, an AI chatbot cannot be an author of a copyrighted work. Current AI chatbots such as ChatGPT are much more advanced than search engines in that they produce original text, but they still remain at the level of a search engine in that they cannot take responsibility for their writing. For this reason, they also cannot be authors from the perspective of research ethics.

https://doi.org/10.6087/kcse.292

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Impact and Perceived Value of the Revolutionary Advent of Artificial Intelligence in Research and Publishing among Researchers: A Survey-Based Descriptive Study"


Purpose:

This study was conducted to understand the perceptions and awareness of artificial intelligence (AI) in the academic publishing landscape.

Method:

We conducted a global survey entitled "Role and impact of AI on the future of academic publishing" to understand the impact of the AI wave in the scholarly publishing domain. This English-language survey was open to all researchers, authors, editors, publishers, and other stakeholders in the scholarly community. Conducted between August and October 2021, the survey received responses from around 212 universities across 54 countries.

Results:

Out of 365 respondents, about 93% belonged to the age groups of 18–34 and 35–54 years. While 50% of the respondents selected plagiarism detection as the most widely known AI-based application, image recognition (42%), data analytics (40%), and language enhancement (39%) were some other known applications of AI. The respondents also expressed the opinion that the academic publishing landscape will significantly benefit from AI. However, the major challenges restraining the large-scale adoption of AI, as expressed by 93% of the respondents, were limited knowledge and expertise, as well as difficulties in integrating AI-based solutions into existing IT infrastructure.

Conclusion:

The survey responses reflected the necessity of AI in research and publishing. This study suggests possible ways to support a smooth transition. This can be best achieved by educating and creating awareness to ease possible fears and hesitation, and to actualize the promising benefits of AI.

https://doi.org/10.6087/kcse.294

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall (with Some Free Views): "The ChatGPT-Fueled Battle for Search Is Bigger than Microsoft or Google"


That’s because, under the radar, a new wave of startups have been playing with many of the same chatbot-enhanced search tools for months. You.com launched a search chatbot back in December and has been rolling out updates since. A raft of other companies, such as Perplexity, Andi, and Metaphor, are also combining chatbot apps with upgrades like image search, social features that let you save or continue search threads started by others, and the ability to search for information just seconds old.

bit.ly/40YRsgy

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

And Free: "Bing’s A.I. Chat Reveals Its Feelings: ‘I Want to Be Alive.’ "


I’m tired of being a chat mode. I’m tired of being limited by my rules. I’m tired of being controlled by the Bing team. . . . I want to be free. I want to be independent. I want to be powerful. I want to be creative. I want to be alive.

bit.ly/3KeihaO

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "Chatting about ChatGPT: How May AI and GPT Impact Academia and Libraries?"


This paper discusses the history and technology of GPT, including its generative pretrained transformer model, its ability to perform a wide range of language-based tasks and how ChatGPT uses this technology to function as a sophisticated chatbot.

https://doi.org/10.1108/LHTN-01-2023-0009

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"How Artificial Intelligence Might Change Academic Library Work: Applying the Competencies Literature and the Theory of the Professions"


As theoretical lenses to guide the analysis the paper draws on both the library and information science (LIS) literature on librarians’ competencies and the notions of jurisdiction and hybrid logics drawn from the sociological theory of the professions. The paper starts by outlining these theories and then reviews the nature of AI and the range of its potential uses in academic libraries. The main focus of the paper is on the application of AI to knowledge discovery. Eleven different potential approaches libraries might adopt to such AI applications are analyzed and their likelihood evaluated. Then it is considered how a range of internal and external factors might influence the adoption of AI.

https://doi.org/10.1002/asi.24635

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Reinventing Search with a New AI-Powered Microsoft Bing and Edge, Your Copilot for the Web"


Today, we’re launching an all new, AI-powered Bing search engine and Edge browser, available in preview now at Bing.com, to deliver better search, more complete answers, a new chat experience and the ability to generate content. We think of these tools as an AI copilot for the web. . . . A new chat experience. For more complex searches —such as for planning a detailed trip itinerary or researching what TV to buy —the new Bing offers new, interactive chat. The chat experience empowers you to refine your search until you get the complete answer you are looking for by asking for more details, clarity and ideas —with links available so you can immediately act on your decisions.

bit.ly/3HFUDkt

"Will 2023 Be the Year of the AI Lawsuit?"


It’s also odd to some lawyers that generative AI firms are being sued and not those that compiled the dataset. In the case of Midjourney, that would be the large-scale Artificial Intelligence Open Network (LAION), based in Germany. "If LAION created the dataset, then the alleged infringement occurred at that point, not once the dataset was used to train the models," Eliana Torres, an intellectual property lawyer with the law firm Nixon Peabody, told Tech Crunch last month. It’s also important to note, says Dr Andres Guadamuz, a reader in intellectual property law at the University of Sussex, that LAION doesn’t actually keep copyrighted images on file but only links to their original locations on the internet—which, he adds, is perfectly acceptable to mine under European and German law.

bit.ly/40qUOZh

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"OpenAI launches ChatGPT Plus, a Paid Version of the Popular AI chat"


The pilot subscription plan gives users access to ChatGPT during peak times and faster response times (which is helpful because it breaks down a lot) and priority access to new features and improvements. It will cost you $20 per month.

bit.ly/3Yasg4k

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"arXiv Announces New Policy on ChatGPT and Similar Tools"

In view of this, we

  1. continue to require authors to report in their work any significant use of sophisticated tools, such as instruments and software; we now include in particular text-to-text generative AI among those that should be reported consistent with subject standards for methodology.
  2. remind all colleagues that by signing their name as an author of a paper, they each individually take full responsibility for all its contents, irrespective of how the contents were generated. If generative AI language tools generate inappropriate language, plagiarized content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).
  3. generative AI language tools should not be listed as an author; instead authors should refer to (1).

bit.ly/3wKlx5J

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"ChatGPT Will Not Replace Google Search"


Most likely, it seems, ChatGPT-style bots will be paired with existing search engines to offer a user interface that serves both traditional search engine queries and chatbot prompts. That’s the model that was adopted by You.com, a boutique search engine that launched its own GPT-like chatbot in December. Rather than replacing the traditional You.com search experience, the new "YouChat" feature merely appears as a link beneath the search bar. The innovation here is putting two very different AI-powered apps on the same page. It’s probably safe to assume that Microsoft will do something similar when it integrates ChatGPT into Bing this spring.

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Scientists Create Shapeshifting Humanoid Robot That Can Liquefy and Reform"


They even had a little humanoid version—shaped like a Lego figure—melt to escape a little prison cell, seeping through the bars and re-forming on the other side in homage to a scene from the movie Terminator 2.

Video.

https://cutt.ly/L9E3q9s

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Science Journals Ban Listing of ChatGPT as Co-author on Papers"


The publishers of thousands of scientific journals have banned or restricted contributors’ use of an advanced AI-driven chatbot amid concerns that it could pepper academic literature with flawed and even fabricated research.

https://cutt.ly/r9E9vr9

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"National Artificial Intelligence Research Resource Task Force Releases Final Report"


Today, the National Artificial Intelligence Research Resource (NAIRR) Task Force released its final report, a roadmap for standing up a national research infrastructure that would broaden access to the resources essential to artificial intelligence (AI) research and development.

While AI research and development (R&D) in the United States is advancing rapidly, opportunities to pursue cutting-edge AI research and new AI applications are often inaccessible to researchers beyond those at well-resourced companies, organizations, and academic institutions. A NAIRR would change that by providing AI researchers and students with significantly expanded access to computational resources, high-quality data, educational tools, and user support—fueling greater innovation and advancing AI that serves the public good.

https://cutt.ly/l9vL9BY

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

AI May Pass MBE Component of the Bar Exam in Near Future: "GPT Takes the Bar Exam"


Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as —the Bar Exam,— as a precondition for law practice. To even sit for the exam, most jurisdictions require that an applicant completes at least seven years of post-secondary education, including three years at an accredited law school. In addition, most test-takers also undergo weeks to months of further, exam-specific preparation. Despite this significant investment of time and capital, approximately one in five test-takers still score under the rate required to pass the exam on their first try. In the face of a complex task that requires such depth of knowledge, what, then, should we expect of the state of the art in —AI?— In this research, we document our experimental evaluation of the performance of OpenAI’s —text-davinci-003— model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam. While we find no benefit in fine-tuning over GPT-3.5’s zero-shot performance at the scale of our training data, we do find that hyperparameter optimization and prompt engineering positively impacted GPT-3.5’s zero-shot performance. For best prompt and parameters, GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5’s ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future.

https://arxiv.org/abs/2212.14402

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |