Comedian Sarah Silverman has teamed up with two novelists to bring a potentially groundbreaking case against Meta and OpenAI, alleging that their copyrighted material was used to train chatbots without permission.
The proposed class actions, brought by Silverman alongside authors Christopher Golden and Richard Kadrey, claim that their books were “ingested” to train OpenAI’s ChatGPT and Meta’s LLaMA without their permission, according to documents filed to a San Francisco court on Friday night.
ChatGPT and LLaMA are both large language models (LLMs), a type of AI algorithm that is trained on vast amounts of data scraped from multiple sources. This has led to accusations that developers are pilfering information from publicly available but protected works.
In the suit against OpenAI, lawyers for Silverman and the other plaintiffs said that the company had admitted it used databases of books while training ChatGPT.
While the exact datasets used have not been disclosed, lawyers used deduction to suggest that one was public domain digital library Project Gutenberg. The other was likely to be a “shadow library” such as Library Genesis, Z-Library or Sci-Hub, which often contain copyrighted materials.
To prove their point, lawyers asked ChatGPT to summarize works by each of the authors.
Though some details were wrong, they argued that the results show ChatGPT “retains knowledge of particular works in the training dataset and is able to output similar textual content.”
A ‘larger fight’ for artists’ rights
The suits are led by San Francisco firm Joseph Saveri, alongside lawyer and author Matthew Butterick. Alongside the Silverman suits, the same team has also put together a separate case representing authors Paul Tremblay and Mona Awad in a class action against OpenAI.
Speaking late last month, Joseph Savari said it was “critical that we recognize and protect the rights of authors such as these against unlawful theft and fraud.”
“GPT-3.5 and GPT-4 are not just an infringement of authors’ rights; whether they aim to or not, models such as this will eliminate ‘author’ as a viable career path,” argued Saveri. “This case
represents a larger fight for preserving ownership rights for all artists and other creators.”
The rapid rise of AI has caused controversy and existential angst for artists across creative industries. Speaking on a podcast in May, megastar Tom Hanks revealed that actors are now scrambling to secure the digital rights to their images thanks to the AI boom.
Last week, Japan took an early stance on the issue, deciding that model trainers can gather public data without having to license or secure permission from owners.
Elsewhere the question of what counts as copyright infringement is far from settled.
The European Union has proposed that companies will have to disclose any copyrighted material used in developing their systems. The U.S. Copyright Office has launched an initiative to examine the law and policy issues raised by AI.