Author: Helen GeibBest Practices

Building Blocks of Effective Keyword Search

Keyword search image

Keyword searching is an eDiscovery workhorse. Legal teams rely on keyword searches for responsiveness, privilege and issue-based document reviews. Familiarity can breed complacency and many lawyers take searching for granted. An effective keyword search requires both an informed understanding of the substantive issues in the case and a firm grasp of search technique. A good keyword search has the right keywords, proper search grammar and is carefully validated.

Many uses of keyword searches

Keyword searches are used in the analysis and review stages of eDiscovery to identify responsive, non-privileged documents for production. While standalone keyword searching is most typical, keywords may be used in conjunction with advanced analytics tools. For example, a technique to validate predictive coding (a/k/a TAR) is running a search for known high-value keywords on the predicted non-responsive dataset. If the validation search returns a significant number of responsive documents within the predicted non-responsive dataset, then the search results are used to refine and re-run the predictive coding.

Some other common applications of keyword searching include: 1) review of the opposing party’s production; 2) deposition preparation; and 3) identification of pertinent documents to share with experts. Additionally, targeted keyword searches are used to find supporting material for discovery motions and summary judgment motions.

Where do keywords come from?

The four main sources for substantive keyword terms and phrases are the pleadings, the client, opposing counsel and the ESI itself.

a) Pleadings – The factual allegations in the complaint and answer are a source of names and key terms.

b) Client – Generally speaking the client is able to brainstorm a starter list of keywords. In addition, the client frequently has already identified a handful of critical documents at the outset of the case; given their high relevance, these documents are typically a prime source of keywords.

c) Opposing counsel – Whether to include opposing counsel in developing the keyword list is a strategic decision. There is a potential benefit in forestalling disputes over the search terms; however, keyword negotiations may be fruitless or even create unnecessary work if opposing counsel is unreasonable or technologically unsophisticated. Assuming the proposed list compiled from the first two categories is already reasonably comprehensive, opposing counsel is most likely to add terms related to legal issues; for example, knowledge and state of mind.

d) ESI – Finally, the ESI can be mined for keywords. The most efficient approach is to review a relatively small number of files that are known or likely to be responsive; for example, a project folder, critical witness’ email messages from the relevant time frame or hits on an already identified key term.

Keyword expansion analytics tools

Keyword expansion analytics tools such as Relativity’s Keyword Expander are designed to supplement keyword term lists. Keyword Expander is something like a case-specific thesaurus. The user must first input a list of keywords. Keyword Expander then analyzes the word index for the ESI collection (a word index is created as part of data processing) and suggests related keywords found in the database. Helpfully, Keyword Expander identifies keywords based on concept analysis. For example, in a product liability case, the name of the product is an obvious keyword to input; internal pre-launch product names from marketing files would be identified as related keywords.

Good search syntax is essential

The second step in constructing a good keyword search is making the most effective use of operators, wildcards and search parameters:

  • Boolean operators – AND, OR and AND NOT
  • Wildcards – * (multi-character expander) and ! (single character expander) at the beginning or end of a term
  • Nested searches – parentheses to “nest” search terms, e., “(this OR this) AND that”
  • Fuzziness – an instruction to the search engine that some of the characters can differ from the term as written; useful for variant spellings, technical terms and common misspellings
  • Proximity parameters – within sentence (w/s), within paragraph (w/p) or within a set number of words (e.g., w/10)

Search strings run the gamut from simple Boolean searches to complex searches that draw on the full battery of options. While the basic syntax of a keyword search will be familiar to lawyers from searching in case law databases, working with an experienced project manager or eDiscovery consultant is recommended for complex searches. An eDiscovery professional will:

  1. Suggest keyword variants;
  2. Give advice on when and how to use operators and other options;
  3. Collate multiple keyword lists to eliminate duplicates (search term overlap is not always obvious, especially with complicated searches);
  4. Reconcile inconsistencies within the search string.

Validating the search

Just as no battle plan survives first contact with the enemy, no keyword search (should) survive first contact with the ESI. It’s important to review, analyze and refine the keyword search in light of the preliminary results. An experienced project manager or consultant will be able to provide invaluable assistance at this final, validation stage of keyword searching.

There are three basic steps to validate a keyword search. As a practical matter, the steps overlap and are iterative. The cycle will generally be repeated at least a few times before the search string is finalized, and should be repeated as many times as necessary. Validation should be an ongoing process until there is confidence that the keyword search is producing the optimal result.

1) Review a “hit count” report – The first validation step is reviewing a hit count report. The report gives the total number of documents returned by the search. It also breaks out the number of documents that are hits on each search term. It may be necessary to narrow search terms with high hit counts, such as by removing wildcards or adding a proximity search term. Search terms with unexpectedly low hit counts should also be scrutinized. Some terms may need to be broadened by removing search limitations or reducing the fuzziness percentage. Overall low hit counts may be a sign that the keyword list needs to be supplemented.

2) Sample the search results – The second validation step is sampling the results, meaning reviewing a relatively small number of the “hit” documents. The sample can be drawn from the universe of documents returned by the search. However, it usually makes more sense to sample hits on specific terms that produced either unexpectedly high or unexpectedly low hit counts. In addition, for the search terms that had the most hits, it’s good practice to sample the results to determine if there are significant false positives.

3) Modify the terms and test the new search string – The final validation step is to modify the search string based on steps one and two, run the search again and compare the results of the current and prior searches. Appropriately modifying the search term list based on the validation results is critical. To that end, litigants should be careful to include a workable validation process when negotiating a keyword search agreement with opposing counsel.

Used in conjunction with other analytics tools, keyword searching remains an important part of overall review strategy. In addition, targeted keyword searches are useful both during and after discovery. Effective keyword searches are case-specific, technically sound and carefully validated.




Helen Geib is General Counsel and Practice Support Consultant for QDiscovery. Prior to joining QDiscovery, Helen practiced law in the intellectual property litigation department of Barnes and Thornburg’s Indianapolis office where her responsibilities included managing large scale discovery and motion practice. She brings that experience and perspective to her work as an eDiscovery consultant. She also provides trial consulting services in civil and criminal cases. Helen has published articles on topics in eDiscovery and trial technology. She is a member of the bar of the State of Indiana and the US District Court for the Southern District of Indiana and a registered patent attorney.


This post is for general informational and educational purposes only. It is not intended as legal advice or to substitute for legal counsel, and does not create an attorney-client privilege.

(888) 427-5667


Because eDiscovery insights, company news and more await. Enter your email below.