The Complexity of Metadata
The Review and Produce steps of the discovery process often focus on the format(s) in which the parties exchange information. Some of the most frequently litigated issues have been: whether paper production suffices; discoverability of metadata and/or embedded data; and whether to request/produce in native format or in "uniform image format" (TIFF or PDF). Other oft-disputed issues are: the (in)sufficiency of unorganized sets of electronic information; and the produce-ability of databases and/or proprietary software to the responding party.
However, it is still unsettled whether a potential litigant should be sufficiently clairvoyant to anticipate the ultimate discoverability of metadata.
Metadata is "data about data." In electronic discovery, the three principal kinds of such data are: E-mail; File System; and Document (imbedded/embedded). Each requires slightly different review and production strategies. Thus, each may require different retention/destruction/preservation strategies.
In data processing systems, metadata provides information about all files and/or other data managed within an application or environment. Most programs create at least six basic types of "file system" metadata. Many systems create much more extensive metadata, including prior versions of files.
File System Metadata
File System metadata - created by, e.g., Word and Excel - encompasses what a user sees in Windows Explorer, including:
- File name;
- Original author;
- Information regarding by whom and when revisions were made;
- Number of pages;
- Number of characters;
- File size;
- Template used to create it;
- Date created;
- Date modified; and
- Date printed.
Note that some categories, such as original author, may be very misleading whenever the current file started out as a "File...Save As" version of a predecessor file.
E-mail metadata is a broad category, in part because it includes specialized file metadata, such as From, To, cc, Subject, Date and Time Sent and the like. Other, less transparent e-mail file metadata can provide additional information, including the sender's domain, the route a message has traveled over the Internet, and where delays may have occurred between sending and receipt.
Understanding what types of metadata are available and what is required to collect and retain this metadata is critical to effective records retention. For example, getting a complete picture of all recipients of a given e-mail is often impossible through simply looking at the e-mail on an e-mail server:
- If the e-mail was sent to a distribution list, knowing who was on that given distribution list at the time of distribution list expansion requires capturing distribution list membership at the time of transmission. Technologies such as Microsoft Exchange envelope journaling can be enabled to capture this information in Exchange 2003 or higher, but this functionality is not available in previous versions of Exchange Server.
- If the e-mail was Blind Carbon Copied (Bcc-ed) to any recipients, this information will not be retained in any stored e-mail. Technologies such as Bcc Journaling in Exchange 2003 or higher can be enabled to capture this information, but are unavailable in previous versions of Exchange.
These examples underscore the need to understand what is going on technically in order to ensure that e-mail records are retained in a manner that meets expectations.
Document Metadata (a/k/a Embedded/Imbedded Data)
Embedded data can generally yield more surprises. For example, in Word or Excel, embedded data can track/capture:
- Changes made;
- Reviewer name(s); and
- The sequence in which changes were made.
Embedded data is not necessarily revealed when a file's creator/modifier opens the file. However, it can be revealed if either:
- Deleted text is still present in the bowels of the file; or
- If the Track Changes feature was not used or was properly used.
Tracked changes is an item that is the tip of the iceberg, regardless of whether a file is in native format or has been converted to .pdf. This commonly used feature is quite familiar to many users and thus illustrative. If the creator or modifier merely un-highlighted the track changes, the recipient of a file can activate Word's Track Changes tool or Markup tool, thus revealing the revisions of a deactivated document.
Upon conversion to .pdf via Acrobat's PDFMaker, improperly-handled Tracked Changes may migrate to the .pdf file in unusual circumstances - namely when:
- The Word file itself is incorporated in native format into the .pdf file;
- Tracked changes are visible before and after the PDFMaker conversion; or
- One's Word printing configuration is set to also print tracked changes.
In the Word scenario, prior revisions can reside in multiple places in the file. Even in the .pdf scenario, some file metadata does migrate. Once in a converted file whose metadata has not been scrubbed, "Ctrl+D" enables identification of the Title and original Author of the .pdf file at its first creation, in its original format.
Native vs. Electronic Document Discovery (EDD) [Platforms]
A "native data" file is one "[i]n the original file format in which [it was] created (i.e., in the specific software applications used to create each individual document)." Examples are Microsoft Word, Microsoft Excel and WordPerfect.
In contrast, "uniform" or "standard" image format is an agreed-upon file format into which all different types of native files are converted solely for review and/or production in civil litigation. Often tagged image format (TIFF) plus searchable index is uniform format; sometimes portable document format (PDF) is.
"Searchable TIFF" is an oxymoron. It is a litigation fiction, reflecting the exchange of a set of imaged electronic files, which are accompanied by searchable text associated with those files.
Many strategies and cost issues determine whether to review, produce and/or seek files in their native format(s). The relative technological and financial resources of the parties are likely to play a big role. So is the significance, or lack thereof, of metadata - such as spreadsheet formulas, tracked changes, creation date, e-mail fields, cross-file links, etc. In some instances, it may be better for a party to review and/or produce in native format. In other instances, an EDD platform/database may be preferable.
Metadata - When Is It Discoverable?
Metadata/Imbedded-Data is discoverable when needed or relevant to a matter at hand. The proposed amendment to Federal Rule 26(b)(2)(C) intentionally avoids specific reference to metadata; yet the associated comment evinces a desire to keep metadata from being produced absent an affirmative showing of need.
- Electronic production was ordered in In re Honeywell Int'l, Inc. Securities Litig., 2003 U.S. Dist. LEXIS 20602, 2003 WL 22722961 (S.D.N.Y. Nov. 18, 2003) (in putative securities class action, third party accounting firm's previous production of hardcopies of its work papers had been insufficient under Fed. R. Civ. P. 34(b) because information "not produced as kept in the usual course of business"). For a rare modern case in which paper production was sufficient, see Northern Crossarm Co., Inc. v. Chemical Specialties, Inc., 2004 U.S. Dist. LEXIS 5381 (W.D. Wis. Mar. 3, 2004) (unique set of circumstances in which both production request and meet-and-confer correspondence failed to specify "electronic" and prior costly production of hardcopies 65,000 e-mails had "mimic[ed] manner in which that information [wa]s stored electronically").
- See generally Kristin M. Nimsger & Michele C.S. Lange, "E-Document Conversion & Native Document Review" (LJN Legal Tech. News Dec. 2003) ("Nimsger"); "E-Evidence Thought Leadership Luncheon: Rowe v. Zubulake: A Perspective From the Bench" (Kroll Ontrack Sep. 23, 2003) (hereafter "Judges"), at http://www.krollontrack.com/upcomingevents/documents/zubulake.pdf; Kenneth Shear, "Retaining Computer Data in Original Format v. Conversion of Data into Images" (Electronic Evidence Discovery 2003). See also Mary Mack, "Native File Review: Simplifying Electronic Discovery?" (LJN's Legal Tech News. May 1, 2005); Mark Reber, "Native File Review: What Problem Are We Solving?" Technolawyer (Mar. 8, 2005).
- Jinks-Umstead v. England, 2005 WL 775780 (D.D.C. Apr. 7, 2005) (in discrimination case, granting new trial to allow Plaintiff to present its case using new electronic evidence that Defendant had initially claimed it no longer possessed but which turned out to be retrievable from database); In re Plastics Additives Antitrust Litig., 2004 U.S. Dist. LEXIS 23989, 2004-2 Trade Cas. (CCH) ¶ 74,620 (E.D. Pa. Nov. 29, 2004) (ordering parties to provide all transactional data in electronic format, to extent reasonably feasible; not requiring Defendant to provide technical assistance to help plaintiffs understand and make use of electronic data), available at http://www.paed.uscourts.gov/documents/opinions/04D0537P.pdf.
- Among the many online definitions is the one found in Applied Discovery's Glossary. See also Brownstone, Collaborative Navigation of the Stormy e-Discovery Seas, 10 Rich. J.L. & Tech. 53, ¶¶ 2, 23 & nn. 5, 68-70 (2004), ¶¶ 3, 19, 31 & nn.5-7, 56, 95-96, at http://law.richmond.edu/jolt/v10i5/article53.pdf.
- See generally Workshare, "Dangers of Document Metadata" (2004).
- See E. Svenson, "Overstating the threat of metadata in PDF documents" http://www.planetpdf.com/enterprise/article.asp?ContentID=6877 (rebutting D. Payne & B.Lewis, "Metadata: Are You Protected?" (2004)).
- Metadata cleaning software includes PCG's Metadata Assistant and Workshare's Professional 4's "Hidden Data". See Benjamin Rosenbaum, "Evaluation of the Top 5 Metadata Removal Utilities" (TechnoLawyer post 1/28/05). Note: in eDiscovery, only scrub metadata if you are sure no Court Order or Stipulation forbids it.
- Nimsger, supra note 2, at 2.
- Id. at 1-2.
- Robert D. Brownstone, "Collaborative Navigation of the Stormy e-Discovery Seas", 10 Rich. J.L. & Tech. 53, ¶¶ 2, 23 & nn. 5, 68-70 (2004), available at http://law.richmond.edu/jolt/v10i5/article53.pdf.
- Case law addressing these issues is still developing. See, e.g., Medtronic Sofamor Danek, Inc. v. Michelson, 2003 WL 21468573 (W.D. Tenn. 2003) (in trade secrets and patents case as to spinal fusion medical technology, ordering non-privileged files produced to Defendant in their native electronic formats (rather than as image files); appointing special master - technology or computer expert - to oversee discovery and setting forth detailed protocol).
- S.D.N.Y. Magistrate Judge Francis has informally stated
Judges, supra note 2, at 25.
[T]he touchstone...is the purpose or...relevance of the particular document at issue. [Whether] the metadata or the embedded data is going to be highly relevant...dictates [the] form of production.... [I]n any large document case these days, it's probably irresponsible for the requesting party not to ask for it in searchable form in any event.... I think the days of producing large volumes of paper documents are pretty close to over but that doesn't solve the situation about what form of searchable data.
- The pertinent Advisory Committee Note quotes the Manual for Complex Litigation (4th) section 11.446 to the effect that "production of word-processing files with all associated metadata...should be conditioned upon a showing of need or sharing expenses." Cf. D. Del. Default Standard for Discovery of Electronic Documents, providing that:
If, during...Rule 26(f) conference, the parties cannot agree to the format..., electronic documents shall be produced...as image files (e.g., PDF or TIFF).... [T[he producing party must preserve the integrity of the electronic document's contents, i.e., the original formatting of the document, its metadata and, where applicable, its revision history. After initial production in image file format is complete, a party must demonstrate particularized need for production of electronic documents in their native format.
Source: EDRM (edrm.net)