Amended Federal Rule of Evidence 902, set to go into effect on December 1, creates a mechanism to authenticate digital evidence by means of a certification by a qualified person instead of by live testimony. My colleague Helen Geib discussed the legal ramifications of the amendments in her article Prepare Now for Upcoming Amended Rule 902. In this article, I discuss the technical requirements of the new rule.
The amendments will create substantial changes to collection and preservation of electronic documents. Rule 902(14) reads as follows:
Certified Data Copied from an Electronic Device, Storage Medium, or File. Data copied from an electronic device, storage medium, or file, if authenticated by a process of digital identification, as shown by a certification of a qualified person that complies with the certification requirements of Rule 902(11) or (12).
The Committee Note provides valuable guidance on what qualifies as “a process of digital identification.” The notes state in pertinent part:
Today, data copied from electronic, storage media, and electronic files are ordinarily authenticated by “hash value.” […] identical hash values for the original and copy reliably attest to the fact that they are exact duplicates.
At its core, these amendments mean that data must be collected by a qualified individual, and the data must have an associated hash value or other reliable means of validation. This will allow the data to be admitted through means of an affidavit, rather than by live testimony as procedure would dictate under the current rules.
What is a “hash value”?
Hash values are the most widely utilized form of digital verification. A hash value is a unique alpha-numeric value generated by running the binary data of a file through a mathematical algorithm, a process known as “hashing.” The resulting value is used as a unique identifier for the data, as well as a method of validation. Because the hash value is a unique identifier it is used in forensic examination and eDiscovery collection to validate data integrity.
The data, whether the collection copy or any other copy made subsequently, can repeatedly be “hashed” and will return the same unique hash value every time assuming the contents have not been altered. Conversely, if the hash values do not match then it is apparent that the data has been altered.
When are hash values used?
Forensic data collections typically generate an evidence container (i.e. E01, L01, AD1, RAW) that contains all the collected data. The contents of the evidence container may be loose documents such as an assortment of PDFs, Word docs and Excel files; folders in a directory structure (e.g., My Documents or Desktop folders); or even an entire hard drive. The hash value is generated for the evidence container at the time of collection and may be embedded in the container file or saved in an associated audit log file.
The most common hash algorithms in the forensic community are MD5, SHA1, and SHA256. MD5 can generate up to 2128 unique values while SHA1 can generate up to 2160 unique values. Since there are a finite number of unique hash values, there have been instances where different documents have similar hash values. However, the chance of this happening is astronomically low, and the only documented occurrences have been in a controlled test environment. SHA256 can generate up to 2256 unique values, and at the time of this publication, there have been no instances discovered where different documents produced similar hash values.
Whether the data at hand is a single file or an evidence container, it should be able to be hashed and provide consistent results at any point in the forensic investigation or eDiscovery matter. The data has been compromised in some manner if the resulting hash value differs from the original hash value created at the time of collection.
Practical examples of hash value verification
One of the most common types of investigations performed by QDiscovery’s forensic division is departed employee investigations. These investigations typically start with making a full disk forensic image of the employee’s computer (i.e., a bit-by-bit copy of the entire hard drive made with specialized forensic hardware or software tools). A hash value is created for the full disk image at the time of collection that validates that the computer hard drive and its forensic copy are identical.
The contents of the disk image may be subsequently inspected and searched, while relevant folders and files may be exported for the client’s review. As long as the hash value of the disk image remains unchanged during the course of the forensic investigation, there can be full confidence that all evidence items contained within the image are true and accurate copies of the data on the employee’s computer.
Review platforms, such as Viewpoint and Relativity, utilize hash values behind the scenes to help make the review process easier. The most common use of hash values in the eDiscovery process is to identify and remove duplicate documents in the review environment, often reducing the reviewable document set by about 30%. These platforms can also generate hash values based upon certain email fields to not only act as a unique identifier, but to help de-duplicate emails and build thread views. For example, an investigation may have 20 custodians’ email collected, all of whom received an email with identical attachments. By way of hash value identification and de-duplication, the reviewer would only have to review the email and the attachments one time, rather than for all 20 instances.
Validation using other means
Not every data collection is validated using hash values. For example, a common form of data used in litigation matters is a PST. “PST” is a Microsoft file type that functions as a container file for email. These may be created in the normal course of email usage and saved on a computer or file server. They may also be exported from Microsoft Exchange environments; for example, to archive a former employee’s mailbox or for the specific purpose of making a data collection.
An existing PST can be collected into an evidence container and validated through traditional hash comparison like any other data type. However, when a PST is exported from an Exchange server as part of the process of data collection, it is being generated “on the fly” and must be validated using a different method.
In this situation, it is absolutely critical that detailed notes are kept documenting the collection. This includes recording the date and time of the export, the commands or tools used to run the export, and any available logs from the export session. These items together will typically suffice and the collection will be recognized as forensically sound.
Once the emails are exported from the Exchange environment, they should be preserved in a forensic container. At this point, hash value validation comes into play again as the data moves through the eDiscovery lifecycle of processing, review and production.
Validation by means other than hash values is explicitly approved in the Committee Note:
[…]The rule is flexible enough to allow certifications through processes other than comparison of hash value, including by other reliable means of identification provided by future technology.
This statement makes clear that the Committee recognizes that technology is always changing and advancing, and that hash values may not always be the gold standard of validation. More efficient methods or more advanced copying tools may someday be developed, perhaps even in the near future, that will relegate the current hashing standard to second best status. Forensic experts have an important responsibility to perform in-depth testing of new technologies as they appear on the market in order to understand how they work and to be aware of their advantages and risks.
The inclusion of future technology and not outlining any one specific method of validation is a very proactive step by the Committee. This will allow the eDiscovery industry to use new technology as it is released and vetted, rather than being forced to utilize antiquated tools and methods.
It is important to understand how the upcoming changes to Rule 902 may impact both your current and future investigations and eDiscovery data collections. Are your collections being conducted by a qualified individual? Do your collections have any sort of hash value or other reasonable means of validation? There will inevitably be service providers that do not review the amendments, and may actually perpetuate practices that would invalidate collections under the new authentication standards. At QDiscovery, we understand the Rule changes and what they mean for your investigation or litigation. All of our collections are conducted by qualified forensic examiners and meet the technical requirements of amended Rule 902. We are proud to continue to offer the best in industry services to all of our valued clients.
Gary Hunt is a Senior Digital Forensic Examiner for QDiscovery. Gary holds the Certified Computer Examiner (CCE) certification, is an active member of the International Society of Forensic Computer Examiners (ISFCE) and High Technology Crime Investigation Association (HTCIA) organizations and is one of QDiscovery’s testifying experts. Prior to joining QDiscovery, Gary managed the Midwest presence for TransPerfect Legal Solutions’ Forensic Technology and Consulting division. His diverse background in technology, forensics and eDiscovery provides a unique perspective to many challenges faced in the eDiscovery industry.