Skip to main content
Find a Lawyer

Find a Lawyer

More Options

AI Company Anthropic Defends Against (Some) Copyright Claims with Fair Use Doctrine

By Vaidehi Mehta, Esq. | Last updated on

If you’ve been following the world of artificial intelligence (AI), you know it’s been a wild ride lately — especially when it comes to how these systems are trained. But what happens when an AI company builds its models by copying millions of books (many of them pirated), and authors cry foul play?

You might not recognize the name Anthropic PBC, but you might be more familiar with their flagship product: Claude. A competitor of ChatGPT and Google Gemini, Claude is an AI chatbot that can answer questions, write stories, and mimic human writing with uncanny skill.

But Claude didn’t learn to write in a vacuum. Like other large language models (LLMs), it was trained on massive amounts of text, including millions of books.

This practice led to some legal headaches for Anthropic when it culminated in a big copyright lawsuit.

Books and Bots

So, about the millions of books used to train Claude? Many of those books were downloaded from pirate websites like LibGen and Books3. If you were ever a broke college student, you too might have experience downloading free books from sites like LibGen.

Of course, it’s not legal—for humans, anyway. Is it any different if an AI does the same? After all, chatbots (or at least the companies that make them) can break the law, too.

Other books were bought in bulk (sometimes used), stripped of their bindings, scanned page by page, and then shredded, leaving only digital files behind.

All told, Anthropic amassed a digital “library” of over seven million books, including works by published authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson.

Bartz has written four novels that were copied and used by Anthropic:  The Lost Night, The Herd, We Were Never Here, and The Spare Room. Graeber and Johnson are both nonfiction writers. Their books copied by Anthropic include a couple of Graeber’s books on medical true crime and the history of cancer, as well as several of Johnson’s books on niche American history.  

Anthropic copied these works by the established authors without their authorization and used them to train artificial intelligence models like Claude.

Suspicious Authors Sue

How do the authors know their books were copied and used? To be honest, we don’t exactly know, but we have a few hunches. These three are by no means the first authors to accuse an AI company of training models on copyrighted books.

They probably first suspected their books were being used after seeing public reports about AI training datasets built from pirated book collections. It’s likely that the authors sued Anthropic on this strong hunch, and the evidence to back it up came later.

As AI companies face increasing scrutiny over how they train their models, researchers and journalists have begun examining leaked or publicly available lists of books included in datasets like Books3 and LibGen. These lists often circulate online and can be searched for specific titles or authors. The legal team they hired probably hired experts who combed through Anthropic’s datasets (obtained during discovery in the lawsuit) and matched them to the plaintiffs’ works.

In any event, the authors took Anthropic to federal court with a multi-pronged legal challenge rooted in the core protections of U.S. copyright law.

At the heart of their argument was a straightforward claim: Anthropic had copied their books without permission, both by downloading pirated digital copies and by purchasing and scanning physical books, all to build an internal library and train its AI models. The plaintiffs contended that this conduct amounted to textbook copyright infringement.

Anthropic’s Defense: ‘Fair Use’

The case mainly turned on the question of “fair use.” Fair use is a legal doctrine under Section 107 of the Copyright Act that allows for the limited use of copyrighted material without getting permission from the owners. It acts as a defense against claims of copyright infringement. This doctrine allows for uses like criticism, comment, news reporting, teaching, scholarship, or research.

Determining whether a use is "fair" involves analyzing various factors that courts have articulated over time. These were the factors that the parties battled in court.

Anthropic’s primary defense was that its use of the authors’ books—including copying from both pirated and purchased sources—was protected as “fair use.” Anthropic argued that training LLMs like Claude on copyrighted works is “spectacularly transformative,” akin to how humans learn by reading widely. In their view, this does not infringe copyright because the AI’s outputs are new creations, not copies or knock-offs.

Another big defense of Anthropic’s was that any copying (including entire books) was reasonably necessary for training LLMs. They claimed the copying was not for the purpose of reproducing or distributing the works, but training AI to generate new text. They also noted that no outputs from its Claude service reproduced or knocked off the plaintiffs’ works, and thus, the copying did not serve as a market substitute.

Authors’ Arguments

The authors rebutted that Anthropic’s copying was not transformative in any meaningful sense. Unlike uses that add new meaning or context (like parody or commentary), Anthropic’s use was, in their view, purely exploitative. It took the authors’ creative expression and fed it into a commercial product—Claude—that generated enormous revenue for Anthropic.

The authors argued that training an AI model to “memorize” and internalize their unique writing style and content did not create a new work or serve a different purpose; rather, it risked substituting for the original works and undermining the market for them.

The plaintiffs also highlighted the amount and substantiality of the copying. They pointed out that Anthropic didn’t just take snippets or excerpts; it copied their entire books (sometimes multiple times over) from both pirated digital libraries and destructively scanned print editions. This wholesale copying, they argued, went far beyond what courts have typically allowed under fair use, especially when the works at issue are highly creative and expressive.

The authors further pointed to the ways money heightened the stakes. Far from being a nonprofit research project or educational endeavor, Anthropic was a billion-dollar company whose business model depended on using massive amounts of copyrighted material—including their books—to power its AI services.

On the other hand, the authors claimed the copying harmed both the actual and potential markets for their works. They warned that by building a permanent internal library of pirated books—and then using those books to train AI models capable of generating similar content—Anthropic threatened to displace demand for legitimate copies and undermine emerging markets for licensing literary works for AI training.

In other words, if companies like Anthropic could simply take whatever they wanted from authors without paying, there would be little incentive left for writers to create new works.

Finally, the plaintiffs challenged Anthropic’s attempt to excuse its piracy by claiming it was necessary for technological progress. The authors insisted that convenience or cost-saving could never justify outright theft—especially when lawful alternatives (like licensing or purchasing) were available but deliberately avoided. They argued that allowing such conduct would set a dangerous precedent, effectively carving out an AI exception to copyright law and eroding creators’ rights in the digital age.

Judge Finds Mostly Fair Use

In his ruling handed down Monday, Federal Judge William Alsup of the Northern District of California parsed out different actions by Anthropic to evaluate whether or not each was permitted under the fair use doctrine.

Regarding Anthropic using the copyrighted books to train its LLM, the judge found fair use. He agreed it was sufficiently “transformative” because the purpose of copying (generating new text) of even entire books differs fundamentally from the original works’ intent. He compared this process to how humans read and internalize books to improve their own writing, noting that copyright law does not require readers to pay royalties each time they recall or are inspired by a book.

Secondly, the judge ruled that Anthropic’s practice of digitizing purchased print books also constitutes fair use, so long as the physical copy was destroyed in the process and the resulting digital copies were not distributed outside of the AI company. He pointed out that the format change simply replaced the purchased print copy with a more convenient, space-saving, and searchable digital version for Anthropic’s internal library. He cited precedent that held that once a purchaser of copyrighted material (say, an old VHS tape) has legally purchased it, they are not in violation of copyright laws if they convert the material to another format (say, DVD) as long as neither version is shared or sold.  

Anthropic On the Hook for Pirating

So Anthropic won their defense of fair use on those two grounds—but they did lose on one important front. Judge Alsup ruled that the company’s downloading of millions of pirated books to build a permanent internal “research library” was not fair use. He rejected Anthropic’s argument that acquiring and retaining these pirated copies could be justified merely because some might eventually be used for transformative purposes like AI training.

The judge found it especially problematic that Anthropic kept the pirated copies “forever,” even after deciding not to use them for training—as this seemed to imply that the primary use was simply to amass a vast library without paying for it. This conduct, the court held, was not transformative and directly displaced legitimate demand for the authors’ works.

Importantly, the judge noted that even if Anthropic later purchased legitimate copies of the same books it had pirated, this does not absolve it of liability for the initial infringement (though it may affect the calculation of statutory damages).

As a result, the case will proceed to trial specifically to determine the damages Anthropic owes for this infringement. At trial, the jury will consider both actual and statutory damages, including whether Anthropic’s conduct was willful, which could increase the amount owed. Thus, while Anthropic succeeded on some fair use arguments, it still faces significant financial exposure for its unauthorized acquisition and retention of pirated books.

Although writers everywhere might lament the ruling as a setback for copyright protections they thought they had, it’s also a reminder to companies like Anthropic that legal and ethical lines still matter. As Judge Alsup put it: “There is no carveout, however, from the Copyright Act for AI companies.”

Was this helpful?

You Don’t Have To Solve This on Your Own – Get a Lawyer’s Help

Meeting with a lawyer can help you understand your options and how to best protect your rights. Visit our attorney directory to find a lawyer near you who can help.

Or contact an attorney near you:
Copied to clipboard