"Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft"

Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

https://tinyurl.com/ymen65js

Author: Charles W. Bailey, Jr.

Charles W. Bailey, Jr. View all posts by Charles W. Bailey, Jr.