What's the value of a word?
Ponder these mind-boggling statistics, courtesy of the ABA 2012 Litigation Section Annual Conference:
Some companies estimate that for every 340,000 pages of information preserved for litigation, only 1 is actually used. In addition, discovery comprises approximately 50% of the cost of litigation.
Like a dog chasing its own tail, technology has been forced to generate new solutions to deal with the escalating costs and burdens associated with legal review of massive amounts of electronically stored information.
Welcome to computer-assisted document coding and review, sometimes better known by the legal industry as predictive coding. Thanks in part to three cases that have recently emerged on predictive coding, (Da Silva Moore, Kleen Products, LLC, and Global Aerospace Inc.), this relatively novel technique is now garnering recognition, and in one seminal case, judicial approval.
In the ground-breaking case of Da Silva Moore v. Publicis Groupe, Case No. 11-cv-01279 (S.D.N.Y. April 26, 2012) the U.S. District Court for the S.D. of New York became the first court to officially approve the use of predictive coding as an acceptable way of reviewing electronically stored documents in certain cases.
What is Predictive Coding?
Although definitions can differ, what is commonly referred to as predictive coding -- perhaps more appropriately called computer-assisted document coding and review -- is a human-driven, human-supervised technique of utilizing computer technology to review, analyze, and process large sets of electronically stored data for relevance, privilege, priority, issue-relation, or other thematic patterns.
According to a report submitted at the ABA 2012 Litigation Section Annual Conference, predictive coding involves the development of decision-making criteria which is based upon a training set, and then applied to a larger body of data for the purpose of making predictions. At the heart of predictive coding lies the concept of "supervised learning," defined as "an algorithm that learns from human decisions and then has the ability to apply those decisions to new data."
Da Silva Moore: A Revealing Look
Although the parties agreed to use predictive coding in Da Silva Moore, they disagreed about its implementation. A closer look at the case reveals how the process plays out in litigation and raises practical questions, including whether proper implementation will require the use of experts.
Central to the opinion issued by Magistrate Judge Peck approving predictive coding (which was later adopted by Judge Carter on April 26) is a stipulation submitted by the parties which details the protocols governing the defendant's e-discovery production.
The Method
The stipulation specifies that the defendants first identify a small number of documents, called the "initial seed set," which is representative of the categories to be reviewed and coded. The seed sets are then each applied to the relevant category, which begins the "software training process" -- meaning the software uses the seed sets to prioritize and identify similar documents within the larger body of stored documents.
Defendants then review and code a "judgmental sample" of the "software-suggested" documents, ensuring they are properly categorized, and "calibrate" the process by recoding the documents. In this manner, the software is trained from the new coding, and the entire process is repeated in what is called an iterative training process.
Plaintiffs' Objection
The Da Silva Moore plaintiffs objected to the stipulation's protocols, arguing, inter alia, that the predictive coding methodology utilized by the defendants lacked generally accepted standards of reliability, and violated Federal Rule of Civil Procedure 26 and Federal Rule of Evidence 702.
Judge Carter's Opinion
Judge Carter disagreed, however, pointing out that the protocol contains standards for measuring reliability, requires active participation from plaintiffs at various stages of the process, and provides for judicial review in the event of a dispute prior to final production.
Specifically, Judge Carter concluded that plaintiffs' arguments challenging the reliability of the method were premature, stating:
It is difficult to ascertain that the predictive software is less reliable than the traditional keyword search. Experts were present during the February 8 conference and Judge Peck heard from these experts. The lack of a formal evidentiary hearing at the conference is a minor issue because if the method appears unreliable as the litigation continues and the parties continue to dispute its effectiveness, the Magistrate Judge may then conduct an evidentiary hearing."
Da Silva Moore paves the way for the use of predictive coding as a defensible method of discovery in appropriate cases, but perhaps foreshadows potentially thorny pretrial issues for its future. For example, how will parties agree upon which custodians will be searched? How will issue tags used in the coding process be determined? How many documents must be reviewed in proportion to the overall corpus in order to ensure a statistically reliable number of representative and responsive documents? How many iterative rounds are necessary?
The Rise of the Evidentiary Expert . . .
Which begs the question: Will recent court acceptance of predictive coding result in the rise of an "evidentiary expert"?
The question arises as some who have criticized the use of predictive coding claim that even small errors can balloon into massive deficiencies. Arguably one small mistake, especially early on the in initial coding process, can turn into big problems and potentially result in false positives or miss documents altogether.
As one commentator has noted, "An expert in the case must carefully train the program in order for it to be able to identify the correct documents with accuracy equal to that of a human reviewer; even a tiny mistake in the algorithm can turn into huge deficiencies in quality. It is better to use a less advanced tool very well than to place an extremely complex tool in the hands of someone who doesn't know how to use it."
Whether this requires the expertise of a linguist to code and determine the initial sample set used in computer training, then correct and recode during iterative review; a statistician who calculates error rates, and can defend the process as relevant and methodologies as reliable; or a technology expert who determines the number of iterative rounds required to stabilize training of the software, the technique requires a legal team -- at least one of which will likely need to be an expert.
The Bottom Line
Although the first to officially approve the use of predictive coding, the Da Silva Moore case is not an e-discovery panacea that opens the flood gates for using predictive coding in every case. If one digs a little deeper, there are many questions yet unanswered about the future of predictive coding, as well as valuable lessons to be learned from Da Silva Moore, which we'll discuss in a future post.
Meanwhile, do you think predictive coding will continue to gain acceptance in the courts, resulting in increased demand for certain kinds of specialized evidentiary experts?
This article was originally published in BullsEye, an expert witness and litigation news blog published by IMS ExpertServices.