OpenAI is facing yet another lawsuit, the artificial intelligence has been sued by two US authors in San Francisco federal court over its unauthorised collection of information across the web to train its artificial intelligence chatbot. The authors claim that ChatGPT infringes on the copyrights to their novels.
On Wednesday the San Francisco federal court alleged that OpenAI “relied on harvesting mass quantities” of copyright-protected works and that it used the work “without consent, without credit, and without compensation.”
OpenAI is allegedly been using over 300 000 books, including from “shadow libraries” that offer copyrighted books without permission to train its AI system. Plaintiffs include source-code owners against OpenAI and Microsoft’s GitHub, and visual artists against Stability AI, Midjourney and DeviantArt.
“They copied the books from a website called Smashwords.com that hosts unpublished novels that are available to readers at no cost,” states the complaint. “Those novels, however, are largely under copyright. They were copied into the BookCorpus dataset without consent, credit, or compensation to the authors.”
The authors’ lawyer Joseph Saveri, who also represents programmers in the proposed class action against OpenAI and Microsoft wrote, “These flagrantly illegal shadow libraries have long been of interest to the AI-training community: for instance, an AI training dataset published in December 2020 by EleutherAI called ‘Books3′ includes a recreation of the Bibliotik collection and contains nearly 200,000 books.”