Silverman Goes for the Gold
The term "generative AI" may be a bit of a misnomer. AI platforms like ChatGPT may seem like they produce answers to queries using humanlike intelligence and creativity, but the reality is that their intelligence is an illusion created by complex systems working off of broad and deep datasets.
Generative AI doesn't create so much as parse, pick, and cobble together relevant ideas from billions of different data points in a form and format that resembles human writing. The data sets used for training AI aren't just made up of random bits and bytes. They're comprised of an inconceivable number of books, articles, papers, poems, blogs, and other written material that exists out in the world — anything the AIs can get their digital hands on.
In other words: ChatGPT and other generative AI aren't creators; they're incredibly sophisticated plagiarists.
And you know who doesn't like plagiarism? Sarah Silverman — among other human creators.
Silverman Spins Gold
Sarah Silverman has been in the funny business since the early 90s. She's done standup, written for "Saturday Night Live," starred in her own series on Comedy Central, and played prominent roles on TV and film. She's been in a variety of comedy roles, from the sketch series Mr. Show to Disney's Wreck-It Ralph movies. Her combination of acerbic wit, transgressive material, and consistently hilarious performances have earned her a slew of awards and a permanent place in the pantheon of American comedy.
Silverman published a memoir in 2010 called The Bedwetter: Stories of Courage, Redemption, and Pee.That book happens to be at the center of two lawsuits against OpenAI and Meta. It's an unlikely starting point for a seminal class-action suit that may help determine the future of copyright law and artificial intelligence — let alone two such lawsuits.
No Permission; No "Bedwetting"
Two interesting lawsuits were recently filed on July 7 in federal district court in San Francisco. One suit targeted Facebook's parent company, Meta. The other took aim at OpenAI, the company responsible for ChatGPT.
The suits' targets weren't especially notable; people sue big companies all the time. But the names of the plaintiffs and their qualms raised more than a few eyebrows. The plaintiffs were Sarah Silverman, along with other authors, Christopher Golden and Richard Kadrey. These authors believed the companies they were suing to have committed copyright infringement.
According to reporters and the website of the law firm representing the plaintiffs, OpenAI and Meta included The Bedwetter and works of the other two authors in the data sets used to train their AIs, without asking for or receiving permission to do so. The willful, or even accidental, inclusion of copyrighted materials in the AIs' training datasets thus constitutes "direct copyright infringement, vicarious copyright infringement, violations of section 1202(b) of the Digital Millennium Copyright Act, unjust enrichment, violations of the California and common law unfair competition laws, and negligence," according to the text of the lawsuit against OpenAI.
In legal and creative circles, infringement on someone else's copyright is considered foul play. Creative types — obviously and understandably — don't like it when someone messes with their intellectual property. And legal types don't like it for an equally simple and predictable reason: because it's illegal.
The legal question at hand isn't whether it's OK to use someone else's work without their permission. That's obviously neither legal nor cool. If Silverman and company win their lawsuits, they'll force Meta and OpenAI to stop using illegally obtained materials to train their AIs. The real question, then, is what happens next?
An Open AI Question
The lawsuit alleges that OpenAI used hundreds of thousands of books downloaded (read: pirated) from "shadow libraries" like Library Genesis, Z-Library, Sci-Hub, and Bibliotik, to train ChatGPT. Meta also disclosed that its AI platform LLaMA was partly trained using a dataset called The Pile. The Pile includes more than 196,640 books that it got from Bibliotik. And that's just the tip of the iceberg.
All the available disclosures, evidence, and informed conjectures suggest that both companies used a huge volume of copyrighted materials to train their AI. And that's proprietary information that the companies almost certainly did not get permission to use.
The way forward for Meta, OpenAI, and other AI companies will remain unclear until the lawsuits are decided. Until then, we're left to wonder: how will they handle the hundreds of thousands of claims they'll have to settle if they lose? What will the future of copyright law look like if they win? Will artificial intelligence be held to the same standards as humans? Or will carve-outs be made in the law to allow them to operate without regard to copyright?
Related Resources:
- Generative AI: Biggest Threat to the Music Industry Since Napster? (FindLaw's Practice of Law blog)
- Legal (and Moral) Issues in AI-Generated Content (FindLaw's Law and Daily Life blog)
- Copyrights (FindLaw's Learn About the Law)