"NVIDIA: Copyrighted Books Are Just Statistical Correlations to Our AI Models"


Earlier this year, several authors sued NVIDIA over alleged copyright infringement. The class action lawsuit alleged that the company’s AI models were trained on copyrighted works and specifically mentioned Books3 data [a database of over 180,000 pirated books]. Since this happened without permission, the rightsholders demand compensation. . . .

The company believes that AI companies should be allowed to use copyrighted books to train their AI models, as these books are made up of “uncopyrightable facts and ideas” that are already in the public domain. . . .

“[AI] Training measures statistical correlations in the aggregate, across a vast body of data, and encodes them into the parameters of a model. Plaintiffs do not try to claim a copyright over those statistical correlations, asserting instead that the training data itself is ‘copied’ for the purposes of infringement,” NVIDIA writes [to the court hearing the case].

According to NVIDIA, the lawsuit boils down to two related questions. First, whether the authors’ direct infringement claim is essentially an attempt to claim copyright on facts and grammar. Second, whether making copies of the books is fair use.

https://tinyurl.com/mpa6e8jj

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Avatar photo

Author: Charles W. Bailey, Jr.

Charles W. Bailey, Jr.