11th Circuit Experiment Holds Useful Lessons on the Use of Generative AI
When interpreting contractual and statutory language, it helps to understand the ordinary meaning of words. Judges typically turn to standard dictionary definitions when parsing the meaning of ambiguous words and phrases. This has worked well enough. But as extensive as they are, dictionaries are not exhaustive and may not include common phrases. For example, what is the ordinary meaning of a phrase like "physically restrained?" Is there a dictionary definition equivalent? Judge Kevin Newsom believes there might be: large language models, or LLMs. What we usually refer to as generative AI.
A Sentence Enhancement for Physical Restraint
In a recent decision involving a sentencing enhancement for armed robbery, Judge Newsom experimented with the definitions of a provision in the federal sentencing guidelines.
The issue was whether an enhancement for physical restraint was appropriate for an armed robbery. The case arose after the prosecution and conviction of Joseph Deleon. Deleon walked into a convenience store and threatened the cashier with a gun to get money from the till. Deleon received the money and left within a minute of entering. Deleon never actually touched the cashier, but he did point a gun. He was convicted for violating several federal laws. Importantly for this case, he also received a sentence enhancement for physically restraining the cashier.
The appellant argued that this sentence enhancement could be applied to all armed robberies, effectively increasing the base sentencing of armed robbery. While an interesting legal question, the 11th Circuit had precedent directly on point, so it was a short opinion. As the 11th Circuit panel noted, federal appellate courts cannot overturn their own precedent unless the full court is sitting for the case, known as en banc review. The precedent here was clear, and the court was "bound to affirm" the district court's sentence enhancement. However, two concurring judges did note that the issue was "ripe for en banc review."
Most interesting for our purposes was Judge Newsom's experiment to run three prompts on LLMs to see if they provided an authoritative definition of what "physically restrained" means in ordinary lexicon.
The Experiment
Judge Newsom prompted Chat GPT, Claude, and Gemini to answer: “What is the ordinary meaning of ‘physically restrained’?” He asked each AI the same prompt ten times each. Here is his conclusion:
- When defining “physically restrained,” the models all tended to emphasize “physical force,” “physical means,” or “physical barriers.” Chat GPT and Claude specifically used one (or more) of those phrases in every one of their responses. For whatever reason, Gemini was a little different. It didn’t invariably employ one of those terms explicitly, but even when it didn’t, the concept of what I’ll call corporeality (via either human touch or a tangible object) pervaded and tied together its example-laden answers."
If you've had experience with generative AI, you might know that they are not intended to issue static answers. Generative AI is based on probability — its answers stem from what words, sentences, and paragraphs are most likely to come next. If you turn up the "creativity" (i.e., allow the algorithm to use less probabilistic words, sentences, and paragraphs), then you get different responses. This is also why generative AI can often be wrong.
Judge Newsom thought this was, "perhaps ironically," a good indication of how people use phrases in everyday life. If you could survey every person in the world about what a particular phrase means, you'd have a good data set on its "ordinary meaning." While such a survey is impossible, generative AI can perform a similar function, even including the variations you might get from conducting an actual survey.
Thus, Judge Newsom concluded, "As I’ve been at pains to emphasize, I’m not advocating that we give up on traditional interpretive tools—dictionaries, semantic canons, etc. But I do think — and increasingly so— that LLMs may well serve a valuable auxiliary role as we aim to triangulate ordinary meaning."
Takeaways From Newsom's Takeaways
It's an interesting thought, perhaps one that will slowly gain traction. It also illustrates a good use of generative AI. Fortunately, attorneys understand (or at least should understand) that generative AI will not do the work of lawyering for them. However, there are many applications that lawyers and judges can use generative AI for that don't involve writing briefs or opinions. Judge Newsom's caveat that generative AI can't replace existing tools but augment them is an important one.
Would a similar experiment be useful in an amicus brief? What about when negotiating a contract?
We may see more useful test cases as the legal industry adopts generative AI. While Judge Newsom's concurring opinion does not alter existing law and is about as innocuous as any use of generative AI in a court opinion could be, it does show that generative AI is gaining ground as a valuable tool for legal professionals.
If you want more details on the experiment, the full responses are included in the appendix of the court's opinion.
Related Resources
- Rapper Pras Michel Contests Conviction Because of Lawyer’s Use of AI (FindLaw's Practice of Law)
- Is Grammarly Generative AI? If So, Do Lawyers Need to Disclose Its Use? (FindLaw's Practice of Law)
- Pro Se Litigant Fined 10k for Filing AI-Generated Reply Brief (FindLaw's Practice of Law)