The legal technology world has been discussing Technology Assisted Review ("predictive coding," "TAR," or any of numerous other monikers) for some time. That discussion has primarily centered on the defensibility and viability of the process. However, with case law consistently supporting the use of TAR, it is really past time to put that conversation to rest, accept that TAR is here to stay, and focus instead on how to best put it to work as part of a routine e-discovery process.
The primary purpose of Technology Assisted Review is to assist e-discovery practitioners by automating the culling, coding and categorization of large document collections in order to speed up the review process. Beyond this assistance, a properly trained application will also provide improved review consistency and, in many instances, increased overall accuracy. Chuck Rothman, for eDiscovery Journal, defines TAR as "a process whereby a definition, made up of various rules, is created. Records in a collection are then evaluated to determine how well they match the definition." This defines a process, rather than a technology, and viewing TAR as a process reminds us that human engagement remains essential, although technology really is the centerpiece of an assisted review scenario. So how does that technology work?
By analyzing the coding choices made by subject-expert reviewers, the TAR-enabled review platform writes rules to determine its own coding choices. Then, as it tests its own choices against the representative sample, it reports its accuracy rates to the human review team. The team will then decide, ideally in conversation with opposing counsel, what accuracy rate is acceptable. At this stage, the choices made by the TAR apparatus are limited only to those choices that merit automatic review: questions of responsiveness, for instance. These choices, however limited, constitute a massive portion of the work currently performed by human reviewers. As multiple studies have shown, this process not only decreases time and cost of review but also increases overall accuracy.
It should be noted that, while all TAR applications share the above characteristics, each has a different approach. Incidentally, all providers of TAR software should be open to explaining at a functional level how and with which workflows their technology operates. Spreading genuine understanding allows the users of TAR software to customize and develop their own human and technological workflows to realize the maximum return on product investment. Hiding TAR in a black box helps no one but the vendor trying to market it as a costly and exclusive commodity, when in reality we are currently in a position technologically in which TAR workflows should be an integrated part of every e-discovery software platform.
If TAR and its utility still seem a bit abstract, then perhaps the best way to understand TAR as a process is to look at it in comparison to a traditional review workflow.
The Traditional Legal Review Workflow
Traditionally, when incoming evidence exits a processing workflow, it brings very little value with it other than its potential for review. In order to realize this potential value -- after the enormous expense already accrued in the collection, sequestration, conversion, decryption and promotion to review -- the document must undergo, at minimum, a first-pass review. This review typically begins with a request from an attorney to a supervising paralegal and a team of non-specialist paralegals, contract attorneys or junior associates. Ideally, this collection will already have undergone a mostly automated filtration process (often referred to as "Early Case Assessment," or "ECA"), which has attempted to identify and to cull the raw data of duplicates, corrupted files, unsupported file types and system files. However, particularly with large case files having hundreds of thousands of documents exchanged, ECA is only one preliminary step toward the first-pass review.
After ECA has been performed, the review team supervisor parses out the documents in equal numbers to the team. Each reviewer then, under considerable time pressure and with considerable overtime hours billed, reviews hundreds, if not thousands, of documents each day and "codes" each one individually, according to a predetermined categorization scheme. In essence, these first-pass reviewers are not dissimilar to a TAR engine. They are, however, much more human and vulnerable to subjective inconsistencies and concentration fatigue. And, as document productions continue to increase in size, these reviewers are likewise increasingly over-worked and under-stimulated as their projects stretch further out into the indefinite future. Meanwhile, each reviewer bills an hourly rate to the client and represents ongoing overhead for the attorney. The end result is a process that perpetually increases expenses while concurrently decreasing in efficacy.
The TAR Workflow
TAR provides an iterative workflow in which high-level reviewers examine and code a representative portion of the documents, thereby "teaching" the system how to code other documents. The system then takes documents coded by the user, applies similar coding to documents with similar content, and waits for the reviewer to perform quality assessment on its work. Again, this is not unlike, in function, the work of a traditional review team.
TAR does differ from traditional review, however, by automating a process that has always been better suited to automation, resulting in welcome savings of both time and money. Grossman and Cormack's analysis of the TREC Legal 2009 Interactive Task Test Collection compares the results of a team of attorneys and law students, managed by a document review vendor, with the results of two different TAR engines. Their quantitative results demonstrate that both TAR technologies achieved greater recall and precision than manual review. Moreover, the qualitative analysis finds that the human reviewers even mischaracterized documents that humans would presumably be better suited to answer -- for example, concepts relying on context for meaning. These findings are a large part of why courts have essentially been able to put the TAR defensibility question to rest.
Once expert reviewers train the TAR engine with a sufficient and representative sample of documents, the process can proceed as quickly as its hardware will allow and achieve a consistent degree of accuracy across its review. Humans, on the other hand, will make more errors as they respond to the stressful demand to work faster. They will necessarily pay less attention to the details before them. The TAR engine does not experience this pressure and, therefore, saves time while increasing accuracy. This, as a result, saves money by both reducing the number of reviewers required (along with related overhead) and reducing the amount of time for which those reviewers are billing.
A TAR-enabled review platform can efficiently analyze coding choices made by subject-expert reviewers and convert its analysis into a powerful decision-tree, able to conduct analysis on its own. Traditional review has required humans to make these limited choices according to prescribed rules, not choices, that would require -- or benefit from -- the human ability to interpret or describe complex processes. These choices are routine and call for automation. They remain, when tied to traditional solutions, sources of great expense as reviewers continue to face exponentially increasing numbers of documents. Those who utilize the new generation of solutions TAR provides, on the other hand, will realize a measured and significant return on their investment, both in time and money.
With defensibility largely settled and comparisons showing how much more efficient a TAR-enabled workflow can be, it's time for organizations to table any lingering objections about the process. Predictive coding should no longer be considered the 'next big thing,' frightening or risky, but instead just a routine part of the e-discovery process.