Culling and searching occur throughout the discovery process and are tightly related components of any solid and defensible processing strategy. Simply defined, culling is the process of programmatically removing content that is irrelevant, while searching is the process of identifying content that is most likely relevant and will require review. Together, when implemented effectively, culling and searching will reduce and focus the reviewable content universe--saving clients time and money for higher value downstream activities.
More than 90% of collected electronic content can be non-responsive. When developing a culling and searching strategy, the objective should always be to identify the most relevant content first and move it downstream to the review team. To accomplish this, the discovery team needs to develop a general understanding of the collection. This understanding includes answers to questions such as:
- How much data will we be receiving? How many custodians? What is the average amount per custodian?
- What types of files (e.g., emails, word processing documents, spreadsheets, etc.) will we be receiving?
- How will the data be delivered (e.g., on hard drives, DVDs, etc.)?
There are myriad approaches and technologies for culling and searching. The key is to find the tools that best work for your unique requirements in a particular matter. Prior to selecting a particular culling or search technique or tool, it is important to understand what objectives the review team is trying to accomplish.
Understanding Background to Set Priorities
The amount of data which is created is almost immeasurable and increasing all the time. This is why it is extremely necessary to have a triage and prioritization strategy so you can move ahead in a timely and efficient manner.
As it has already been previously discussed in the Collection section, the proper analysis of the key issues and collections can help save time and money exponentially down the road in the discovery process. The collection of information was originally based on such issues as:
- Key custodians or issues
- Preparation for key legal dates (depositions, hearings, filings)
- Ease of collection drove the collection of the data
In the triage and prioritization section we can employ those basic factors and then become more granular to address more technical issues which result from more detailed management of electronic data.
Planning
The volumes and types of data even after collection and culling can be quite overwhelming. Like with any other large project, following a methodology is important to break down the work ahead. The process should be followed to create priority and full use of throughput:
- Understand key legal issues to identify and process the most critical data as the top priority.
- Plan on processing data in line with key legal deadlines. Identify potential issues and address potential changes with court or other side
- Prioritize by key positions, departments and then by specific department members.
- Analysis of data types and accessibility (i.e., Backup or legacy types).
- Review team factors such as availability of experts, legal subject expertise, financial analysis, and foreign language.
Even with excellent culling techniques the data needs to be further evaluated. Understanding the data can help in further elimination of data, identification of future challenges, and processing of certain data. A critical analysis of the types of files in the collection can help you understand even the largest of collections. Understanding the file types can give insight into what type of applications the users used as well as speed and challenges in the review process.
File Analysis
File analysis is a process whereby an application is used to give statistics on what types of files are in the collection. This application can be used as an in-house tool, by a forensics expert, or by an electronic discovery vendor who may be processing the data. Some tools use the file extension, but more sophisticated tools will analyze the file's header information to determine the file type regardless of the file's extension. It is possible to rename file types, such as document.old, or to rename files to try to hide critical information. Tools that identify file type by file header can be useful if renamed file types are suspected.
Types of files can help determine the number of files which would be processed for review and those which would not. There are file types which can be user generated in the normal course of business and those which are non-custodian created files such as files which come with a computer's operating system. In most cases the custodian-created files are the files which will move on to be processed and reviewed.
It is important to keep as much of a custodian's data processed together going forward so that it can be managed as a whole in the processing and review. Data at this point will become data that can be processed, those that will not, and those with special handling needs.
Special handling needs include files which need special non-standard applications to process or view. Companies develop their own internal application or specialized applications such as accounting or computer automated design (CAD) programs, both of which can be critical in a matter. Certain files or certain custodians might further forensics analysis depending on the nature of the matter, which would entail a forensics professional working with the collected data or going back to the original media to avoid spoliation.
Review Team Factors
Knowing that certain file types will prove to be difficult can prevent delaying other information in the process. Files which contain complex data such as relational databases can be organized out of the standard process so that analysis can be done for true responsive data, and proper formatting can be created so that beneficial information formats can be created for proper production.
Prioritization should also take into consideration the issues relating to the people who will be reviewing the information. Review team issues need to be considered in the prioritization process, such as legal expertise, domain knowledge (i.e., scientists or accountants), or foreign languages.
Using Key Words for Prioritization
The collection of data in some cases will be used as a central repository, and this will be used again and again to retrieve key documents. If the data collection is being used in this way then keyword searching can be done to help prioritize the data. It is important that with this approach there be flexible options on when to retrieve the same documents or eliminate duplicates, depending on the organization and needs of the matter.
Source: EDRM (edrm.net)