"Will 2023 Be the Year of the AI Lawsuit?"


It’s also odd to some lawyers that generative AI firms are being sued and not those that compiled the dataset. In the case of Midjourney, that would be the large-scale Artificial Intelligence Open Network (LAION), based in Germany. "If LAION created the dataset, then the alleged infringement occurred at that point, not once the dataset was used to train the models," Eliana Torres, an intellectual property lawyer with the law firm Nixon Peabody, told Tech Crunch last month. It’s also important to note, says Dr Andres Guadamuz, a reader in intellectual property law at the University of Sussex, that LAION doesn’t actually keep copyrighted images on file but only links to their original locations on the internet—which, he adds, is perfectly acceptable to mine under European and German law.

bit.ly/40qUOZh

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"OpenAI launches ChatGPT Plus, a Paid Version of the Popular AI chat"


The pilot subscription plan gives users access to ChatGPT during peak times and faster response times (which is helpful because it breaks down a lot) and priority access to new features and improvements. It will cost you $20 per month.

bit.ly/3Yasg4k

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"arXiv Announces New Policy on ChatGPT and Similar Tools"

In view of this, we

  1. continue to require authors to report in their work any significant use of sophisticated tools, such as instruments and software; we now include in particular text-to-text generative AI among those that should be reported consistent with subject standards for methodology.
  2. remind all colleagues that by signing their name as an author of a paper, they each individually take full responsibility for all its contents, irrespective of how the contents were generated. If generative AI language tools generate inappropriate language, plagiarized content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).
  3. generative AI language tools should not be listed as an author; instead authors should refer to (1).

bit.ly/3wKlx5J

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"ChatGPT Will Not Replace Google Search"


Most likely, it seems, ChatGPT-style bots will be paired with existing search engines to offer a user interface that serves both traditional search engine queries and chatbot prompts. That’s the model that was adopted by You.com, a boutique search engine that launched its own GPT-like chatbot in December. Rather than replacing the traditional You.com search experience, the new "YouChat" feature merely appears as a link beneath the search bar. The innovation here is putting two very different AI-powered apps on the same page. It’s probably safe to assume that Microsoft will do something similar when it integrates ChatGPT into Bing this spring.

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Scientists Create Shapeshifting Humanoid Robot That Can Liquefy and Reform"


They even had a little humanoid version—shaped like a Lego figure—melt to escape a little prison cell, seeping through the bars and re-forming on the other side in homage to a scene from the movie Terminator 2.

Video.

https://cutt.ly/L9E3q9s

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Science Journals Ban Listing of ChatGPT as Co-author on Papers"


The publishers of thousands of scientific journals have banned or restricted contributors’ use of an advanced AI-driven chatbot amid concerns that it could pepper academic literature with flawed and even fabricated research.

https://cutt.ly/r9E9vr9

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"National Artificial Intelligence Research Resource Task Force Releases Final Report"


Today, the National Artificial Intelligence Research Resource (NAIRR) Task Force released its final report, a roadmap for standing up a national research infrastructure that would broaden access to the resources essential to artificial intelligence (AI) research and development.

While AI research and development (R&D) in the United States is advancing rapidly, opportunities to pursue cutting-edge AI research and new AI applications are often inaccessible to researchers beyond those at well-resourced companies, organizations, and academic institutions. A NAIRR would change that by providing AI researchers and students with significantly expanded access to computational resources, high-quality data, educational tools, and user support—fueling greater innovation and advancing AI that serves the public good.

https://cutt.ly/l9vL9BY

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

AI May Pass MBE Component of the Bar Exam in Near Future: "GPT Takes the Bar Exam"


Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as —the Bar Exam,— as a precondition for law practice. To even sit for the exam, most jurisdictions require that an applicant completes at least seven years of post-secondary education, including three years at an accredited law school. In addition, most test-takers also undergo weeks to months of further, exam-specific preparation. Despite this significant investment of time and capital, approximately one in five test-takers still score under the rate required to pass the exam on their first try. In the face of a complex task that requires such depth of knowledge, what, then, should we expect of the state of the art in —AI?— In this research, we document our experimental evaluation of the performance of OpenAI’s —text-davinci-003— model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam. While we find no benefit in fine-tuning over GPT-3.5’s zero-shot performance at the scale of our training data, we do find that hyperparameter optimization and prompt engineering positively impacted GPT-3.5’s zero-shot performance. For best prompt and parameters, GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5’s ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future.

https://arxiv.org/abs/2212.14402

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"AI Learns to Write Computer Code in "Stunning" Advance"


After training, AlphaCode solved about 34% of assigned problems, DeepMind reports this week in Science. . . . To further test its prowess, DeepMind entered AlphaCode into online coding competitions. In contests with at least 5000 participants, the system outperformed 45.7% of programmers. The researchers also compared its programs with those in its training database and found it did not duplicate large sections of code or logic. It generated something new—a creativity that surprised Ellis.

bit.ly/3UPpRdr

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Are We Undervaluing Open Access by Not Correctly Factoring in the Potentially Huge Impacts of Machine Learning? — An Academic Librarian’s View (I)"


Synopsis: I have recently adjusted my view to the position that the benefits of Machine learning techniques are more likely to be real and large. This is based on the recent incredible results of LLM (Large Language models) and about a year’s experimenting with some of the newly emerging tools based on such technologies.

If I am right about this, are we academic librarians systematically undervaluing Open Access by not taking this into account sufficiently when negotiating with publishers? Given that we control the purse strings, we are one of the most impactful parties (next to publishers and researchers) that will help decide how fast if at all the transition to an Open Access World occurs.

https://cutt.ly/U19MZzK

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

And It Ran: "I Used ChatGPT to Create an Entire AI Application on AWS"


So in this blog post I describe how I used ChatGPT to create a simple sentiment analysis application from scratch. The app should run on an EC2 instance and utilize a state-of-the-art NLP model from the Hugging Face Model Hub. The results were astonishing.

https://cutt.ly/a1Z6c8i

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

ChatGPT: "Finally, an A.I. Chatbot That Reliably Passes ‘the Nazi Test ’"


A chatbot that meets the hype is finally here. On Thursday, OpenAI released ChatGPT, a bot that converses with humans via cutting-edge artificial intelligence. The bot can help you write code, compose essays, dream up stories, and decorate your living room. And that’s just what people discovered on day one.

https://cutt.ly/d1XqQKN

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The Scary Truth about AI Copyright Is Nobody Knows What Will Happen Next"


First, can you copyright the output of a generative AI model, and if so, who owns it? Second, if you own the copyright to the input used to train an AI, does that give you any legal claim over the model or the content it creates? Once these questions are answered, an even larger one emerges: how do you deal with the fallout of this technology? What kind of legal restraints could—or should—be put in place on data collection? And can there be peace between the people building these systems and those whose data is needed to create them?

https://cutt.ly/UM9vOJK

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"We Could Run Out Of Data to Train AI Language Programs"


The trouble is, the types of data typically used for training language models may be used up in the near future—as early as 2026, according to a paper by researchers from Epoch, an AI research and forecasting organization, that is yet to be peer reviewed. The issue stems from the fact that, as researchers build more powerful models with greater capabilities, they have to find ever more texts to train them on. Large language model researchers are increasingly concerned that they are going to run out of this sort of data, says Teven Le Scao, a researcher at AI company Hugging Face, who was not involved in Epoch’s work.

https://cutt.ly/L1Wj6of

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Applying AI to Digital Archives: Trust, Collaboration and Shared Professional Ethics"


Policy makers produce digital records on a daily basis. A selection of records is then preserved in archival repositories. However, getting access to these archival materials is extremely complicated for many reasons—including data protection, sensitivity, national security, and copyright. Artificial Intelligence (AI) can be applied to archives to make them more accessible, but it is still at an experimental stage. While skills gaps contribute to keeping archives ‘dark’, it is also essential to examine issues of mistrust and miscommunication. This article argues that although civil servants, archivists, and academics have similar professional principles articulated through professional codes of ethics, these are not often communicated to each other. This lack of communication leads to feelings of mistrust between stakeholders. Mistrust of technology also contributes to the barriers to effective implementation of AI tools. Therefore, we propose that surfacing the shared professional ethics between stakeholders can contribute to deeper collaborations between humans. In turn, these collaborations can lead to the building of trust in AI systems and tools. The research is informed by semi-structured interviews with thirty government professionals, archivists, historians, digital humanists, and computer scientists. Previous research has largely focused on preservation of digital records, rather than access to these records, and on archivists rather than records creators such as government professionals. This article is the first to examine the application of AI to digital archives as an issue that requires trust and collaboration across the entire archival circle (from record creators to archivists, and from archivists to users).

https://doi.org/10.1093/llc/fqac073

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Meta’s Game-Playing AI Can Make and Break Alliances Like a Human"


Learning to play Diplomacy is a big deal for several reasons. Not only does it involve multiple players, who make moves at the same time, but each turn is preceded by a brief negotiation in which players chat in pairs in an attempt to form alliances or gang up on rivals. After this round of negotiation, players then decide what pieces to move—and whether to honor or renege on a deal.

https://cutt.ly/c1bEU9c

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Ethics of Artificial Intelligence: Case Studies and Options for Addressing Ethical Challenges


This open access collection of AI ethics case studies is the first book to present real-life case studies combined with commentaries and strategies for overcoming ethical challenges. Case studies are one of the best ways to learn about ethical dilemmas and to achieve insights into various complexities and stakeholder perspectives.

https://cutt.ly/z1zj5Oy

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"When AI Can Make Art—What Does It Mean for Creativity?"


While internet users have embraced this supercharged creative potential—armed with the correctly refined prompt, even novices can now create arresting digital canvases—some artists have balked at the new technology’s capacity for mimicry. Among the prompts entered into image generators Stable Diffusion and Midjourney, many tag an artist’s name in order to ensure a more aesthetically pleasing style for the resulting image. Something as mundane as a bowl of oranges can become eye-catching if rendered in the style of, say, Picasso. Because the AI has been trained on billions of images, some of which are copyrighted works by living artists, it can generally create a pretty faithful approximation.

https://cutt.ly/iMv27Pn

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Microsoft, GitHub, and OpenAI Sued: "The Lawsuit That Could Rewrite the Rules of AI Copyright"


Microsoft, its subsidiary GitHub, and its business partner OpenAI have been targeted in a proposed class action lawsuit alleging that the companies’ creation of AI-powered coding assistant GitHub Copilot relies on "software piracy on an unprecedented scale". . . .Copilot, which was unveiled by Microsoft-owned GitHub in June 2021, is trained on public repositories of code scraped from the web, many of which are published with licenses that require anyone reusing the code to credit its creators. Copilot has been found to regurgitate long sections of licensed code without providing credit—prompting this lawsuit that accuses the companies of violating copyright law on a massive scale.

https://cutt.ly/FMwC4mR

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"An AI Toolkit for Libraries"


Now that artificial intelligence (AI) tools are being widely used across academic publishing, how can we make informed assessments of these utilities? There is a need for a set of skills for evaluating new tools and measuring existing ones, which should enable anyone commissioning or managing AI utilities to understand what questions to ask, what parameters to measure and possible pitfalls to avoid when introducing a new utility. The skills required are not technical. Potential problems include bias in the corpus, a poor training set or poor use of metrics for evaluation. This article gives a quick overview of some of areas where AI tools are being used and how they work. It then provides a checklist for assessment. The goal is not to discredit AI, but to make effective use of it.

http://doi.org/10.1629/uksg.592

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Paywall: "‘So How Do We Balance All of These Needs?’: How the Concept of AI Technology Impacts Digital Archival Expertise"


Four main themes were identified: fitting AI into day to day practice; the responsible use of (AI) technology; managing expectations (about AI adoption) and bias associated with the use of AI. The analysis suggests that AI adoption combined with hindsight about digitisation as a disruptive technology might provide archival practitioners with a framework for re-defining, advocating and outlining digital archival expertise.

https://doi.org/10.1108/JD-08-2022-0170

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Google —Text Prompts Create Videos (with Live Examples): "Imagen Video: High Definition Video Generation Wwth Diffusion Models"


We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. . . . We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding.

https://cutt.ly/aBzo4R2

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

Openly Licensed Photos and AI Facial Recognition White Paper: AI_Commons

"This white paper presents the case of using openly licensed photographs for AI facial recognition training datasets. . . . The case creates an opportunity to ask fundamental questions about the challenges that open licensing faces today, related to privacy, exploitation of the commons at massive scales of use, or dealing with unexpected and unintended uses of works that are openly licensed"

https://cutt.ly/pBuHEmH

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |