TECHNOLOGY

Databases in Discovery

Tips on how to request the information you need.

By Craig Ball

Databases

The dreaded “discovery about discovery.” It’s a necessary precursor to devising query and production strategies when seeking information from databases in a suit. If you don’t know what the database holds or the ways in which relevant and responsive data can be extracted, you are at the mercy of opponents who will give you data in unusable forms or give you nothing at all.

Here is some language to consider when seeking information about databases and when serving notice of deposition for corporate designees (e.g., per Federal Rule of Civil Procedure 30(b)(6) or Texas Rule of Civil Procedure 199(b)(1)):

For each database or system that holds potentially responsive information, we seek the following information to prepare to question the designated person(s) who, with reasonable particularity, can testify on your behalf about information known to or reasonably available to you concerning:

1. The standard reporting capabilities of the database or system, including the nature, purpose, structure, appearance, format, and electronic searchability of the information conveyed within each standard report (or template) that can be generated by the database or system or by any overlay reporting application;

2. The enhanced reporting capabilities of the database or system, including the nature, purpose structure, appearance, format, and electronic searchability of the information conveyed within each enhanced or custom report (or template) that can be generated by the database or system or by any overlay reporting application;

3. The flat file and structured export capabilities of each database or system, particularly the ability to export to fielded/delimited or structured formats in a manner that faithfully reflects the content, integrity, and functionality of the source data;

4. Other export and reporting capabilities of each database or system (including any overlay reporting application) and how they may or may not be employed to faithfully reflect the content, integrity, and functionality of the source data for use in this litigation;

5. The structure of the database or system to the extent necessary to identify data within potentially responsive fields, records, and entities, including field and table names, definitions, constraints, and relationships, as well as field codes and field code/value translation or lookup tables;

6. The query language, syntax, capabilities and constraints of the database or system, including any overlay reporting application, as they may bear on the ability to identify, extract, and export potentially responsive data;

7. The user experience and interface, including datasets, functionality, and options available for use by persons involved with the (provide language specific to the basis of the suit);

8. The operational history of the database or system to the extent that it may bear on the content, integrity, accuracy, currency, or completeness of potentially responsive data;

9. The nature, location, and content of any training, user, or administrator manuals or guides that address the manner in which the database or system has been administered, queried, or its contents reviewed by persons involved with the (provide language specific to the basis of the suit);

10. The nature, location, and contents of any schema, schema documentation (such as an entity relationship diagram or data dictionary), or the like for any database or system that may reasonably be expected to contain information relating to the (provide language specific to the basis of the suit);

11. The capacity and use of any database or system to log reports or exports generated by, or queries run against, the database or system where such reports, exports, or queries may bear on the (provide language specific to the basis of the suit);

12. The identity and roles of current or former employees or contractors serving as database or system administrators for databases or systems that may reasonably be expected to contain (or have contained) information relating to the (provide language specific to the basis of the suit) and;

13. The cost, burden, complexity, facility, and ease with which the information within databases and systems holding potentially responsive data relating to the (provide language specific to the basis of the suit) may be identified, preserved, searched, extracted, and produced in a manner that faithfully reflects the content, integrity, and functionality of the source data.


If you borrow this language, please take the time to understand it, and particularly strive to know why you are asking for what you demand. If you don’t need the information or know what you plan to do with it, don’t ask for it.

Databases are constructed to enforce specified field property requirements or “constraints,” which may include field size, data type, unique fields, group or member lists, validation rules, and required data.

Databases are queried using a “query language” similar to “pre-programmed pushbutton queries.” Understanding the query language is key to fashioning a query that extracts what you need to know, both within the data and about the data.

As important as learning what the database can produce is understanding what the database does or does not display to end users. Screen shots may be worth a thousand words when it comes to understanding what the user saw or might have done to pursue further intelligence.

In the simplest terms, a database’s schema is how it works. It may be the system’s logical schema, detailing how the database is designed in terms of its table structures, attributes, fields, relationships, joins, and views. Or it could be its physical schema, setting out the hardware and software implementation of the database on machines, storage devices, and networks. The schema of a database is rarely a trade secret or proprietary data; although, you may hear that objection raised to frustrate discovery. It is more like a database map, typically supplied as a table or diagram.

One feature that sets databases apart from many other forms of electronically stored information is the critical importance of the fielding of data. Preserving the fielded character of data is essential to preserving its utility and searchability. Fielding data means that information is stored in locations dedicated to holding just that information, and it serves to separate and identify information so you can search, sort, and cull using just that information. It’s a capability we take for granted in databases but that is often crippled or eradicated when data is produced in e-discovery. Be sure that you consider the form of production and ensure that the fielded character of the data produced will not be lost, whether supplied as a standard report or as a delimited export.

Seeking discovery from databases is a key capability in modern litigation, and it’s not easy for the technically challenged—although it’s probably a whole lot easier than your opponent claims. Getting the proper data in usable forms demands careful thought, tenacity, and more than a little homework. Still, anyone can do it alone with a modicum of effort or aided by a little expert assistance.TBJ


This article was originally published in Circuits. It has been edited and reprinted with permission.

CRAIG BALL is a board certified Texas trial lawyer living in New Orleans, Louisiana, who limits his practice to service as a court-appointed special master and consultant in computer forensics and electronic discovery. A founder of the Georgetown University Law Center eDiscovery Training Academy, Ball serves on the academy’s faculty and also teaches electronic discovery and digital evidence at the University of Texas School of Law. Ball has published and presented on forensic technology more than 1,700 times all over the world. For his articles on electronic discovery and computer forensics, go to craigball.com or ballinyourcourt.com.

{Back to top}