Author: Helen GeibIndustry Insights

Know Your ESI, Categorically Speaking

ESI categories

There is a fast growing trend in the states to explicitly recognize that a lawyer’s ethical responsibilities include technological competence. In 2012, the ABA amended the comments to Model Rule of Professional Conduct 1.1 governing a lawyer’s duty of competence to his or her client to specify that the duty includes keeping abreast of “the benefits and risks associated with relevant technology.” Over half the states have now adopted this rule change, and many have also issued technology-related ethical opinions in the areas of eDiscovery and cybersecurity. Florida’s recent adoption of a mandatory technology CLE requirement is likely to prove another bellwether.

Technology has a particularly prominent role in eDiscovery. A basic familiarity with the sources and types of Electronically Stored Information (ESI) is foundational technological knowledge for litigators. It’s necessary in conducting custodian interviews and creating a data map during the identification stage, in negotiating a discovery plan and in developing a defensible strategy for preservation and collection. Efficient and cost-effective review relies on matching the right advanced analytics tools to the different types of ESI included in the collection.

Litigators should have a working knowledge of the eDiscovery options and challenges associated with the seven main categories of ESI.


Most companies use Microsoft Exchange for email, which may be hosted internally on the company’s servers or cloud-hosted in Office 365. Some companies use email archiving solutions such as Symantec and Barracuda instead of or as an add-on to Exchange. Personal email is ordinarily accessed with free webmail accounts (Gmail,, Yahoo! mail, etc.). Email in general is relatively simple to collect. Enterprise applications like Exchange and archivers have built-in export functionality. Webmail can be configured to download a copy of the mailbox to a PC for collection.

The challenge with email is high volume and low relevance. The indispensable analytics tool for email is email thread analysis, familiarly called threading. Email threading searches the entire dataset to locate and group together all messages in an email thread or string (i.e., original message, replies, forwards). Reviewing all related messages as a group is an enormous time and cost saver, and also significantly improves review consistency and accuracy. Domain-based filtering can also be used to cull email during the processing or review stages (for a full treatment of domain filtering, see my article “Leveraging Email Domains for Data Filtering”).

Mobile Device Data

Data commonly retrieved from phones and tablets includes messages, call logs, contacts, calendar entries, pictures, web browsing history, app data, voicemail messages and saved documents or other files.  Messages may be relevant in any civil case. Other types of mobile device data mostly come up in employment, personal injury and data theft cases. Cell phones, smartphones, tablets, cameras and GPS devices contain a wealth of potentially relevant, and often critical, information in those matters. Ordinarily, a qualified forensic examiner or eDiscovery service provider should collect data from phones and tablets using specialized software.


Messaging has become ubiquitous in personal communications. In business, it supplements or even replaces email for many people. There are various types of messaging services and message data is accordingly collected from a variety of sources. These include text messages (SMS/MMS) from cell phones, iMessages from iPhones, instant messaging (IM) from computers and corporate archiving solutions, proprietary messaging services from Facebook or other social media platforms and chat application data from smartphones. Preservation and collection are significant challenges because of the array of sources and the ephemeral nature of messaging.

For messages collected from mobile devices, the standard review and production format is PDF or spreadsheet reports generated by the forensics collection software. QMobile by QDiscovery is an innovative alternative that allows mobile device data such as messages and call logs to be fully reviewed in Relativity; key features include conversation threading, filtering and redaction capabilities. More details on QMobile can be found in my colleague Gary Hunt’s article “From Spreadsheets to Review Platforms: The New Way to Review and Produce Mobile Data.”

Native Files

Native files, also called loose files, are user-created files other than email and messaging. The category includes Word documents, PDFs, spreadsheets, presentations and other text-based file types. It also includes photos, graphics files and files created by technical applications such as engineering drawings and software code.

Native files may be found on the C: drive of a desktop or laptop computer, in user drives and shared drives on the company server or in cloud storage, the latter a broad category ranging from Office 365 to Dropbox. Native files may also be stored in a document management system (DMS) like Sharepoint. They should always be collected using a forensically sound copying tool and methodology to prevent the metadata from being altered. In addition, sound chain of custody procedures must be followed to preserve the metadata during file transfer and processing.

Many powerful analytics tools are available for native file and email review. These include concept searching, keyword searching and add-on tools like keyword expander, relationship analysis, timeline analysis and clustering. Predictive coding is used to cull non-responsive files and to organize and prioritize ESI for more efficient review.

Structured Data (Databases)

Companies rely on databases, or structured data, to store, organize and retrieve information of all kinds. Accounting software, electronic medical records and customer and sales records, to name just a few, are prime sources of relevant information in many cases.

Database information is typically collected by generating a report using the program’s built-in capabilities. Similarly, native files stored in a database are typically collected using built-in export utilities. The export utility may also support creating a PDF version of a database record instead of, or in addition to, generating a report; for example, in response to a narrowly tailored request for production or where the database contains few relevant entries. Both the requesting and producing party can benefit from discussing the content and form of database production during meet-and-confer early in the case.

Digitized Paper

A paper (“hard copy” in industry terminology) document is digitized by scanning it to PDF or TIF electronic file format. Digitizing paper files for transmittal and storage is now standard practice for companies and individuals. Paper is routinely transformed into Electronically Stored Information both before and during discovery.

The main eDiscovery issue is ensuring scanned documents are text-searchable. This may be accomplished either at the time a document is scanned or retroactively using optical character recognition (OCR) software. It is helpful to also code digitized paper files. Coding is a manual process to capture author, date, title and similar information from the face of the document. Coding fields may then be searched just like the metadata of an electronic file.

Archives and Backups

In contrast to the other categories, archives and backups are classified by their intended purpose rather than by data type. They can contain any category of ESI and be located on a wide variety of systems or media, including backup tape, external hard drive, DVD, server or in the cloud. It can be a significant challenge to identify, catalog and search all archives and backups that may contain potentially relevant data. In addition, archives and backups frequently involve technical issues stemming from obsolete media, legacy (i.e., decommissioned) systems and third-party proprietary software. Finally, by their nature they tend to be large in size and undifferentiated in content.

A significant body of case law has developed around the specific issue of when a company’s disaster recovery backup system or other backup or archived data is “not reasonably accessible,” and thus should be excluded from discovery. If it is necessary to move forward with collecting and producing a backup or archive, for instance because it is the only source of critical documents, then the producing party may be able to seek cost-shifting or cost-sharing from the opposing party to lessen the financial burden. Once collected and processed, the usual analytics tools for native files and email are available for use in review.

Recognizing the limits of our knowledge is an important aspect of competence. Lawyers who have a sound grasp of the basics of ESI categories and associated eDiscovery solutions and challenges will be well positioned to seek help in appropriate circumstances from knowledgeable colleagues and trusted service providers.



Helen Geib is General Counsel and Practice Support Consultant for QDiscovery. Prior to joining QDiscovery, Helen practiced law in the intellectual property litigation department of Barnes and Thornburg’s Indianapolis office where her responsibilities included managing large scale discovery and motion practice. She brings that experience and perspective to her work as an eDiscovery consultant. She also provides trial consulting services in civil and criminal cases. Helen has published articles on topics in eDiscovery and trial technology. She is a member of the bar of the State of Indiana and the US District Court for the Southern District of Indiana and a registered patent attorney.

This post is for general informational and educational purposes only. It is not intended as legal advice or to substitute for legal counsel, and does not create an attorney-client privilege.

(888) 427-5667


Because eDiscovery insights, company news and more await. Enter your email below.