The Question for Litigation Support Databases

 

The available database solutions for the litigation support market have radically evolved in
the last decade. Many newer solutions use relational database technologies, specifically SQL
(Structured Query Language). This discussion provides some insight into database solutions for
litigation support and whether SQL should be the only choice. While there are many variations of
SQL, for brevity, this is limited to Microsoft SQL Server. I am and have been a proponent of SQL
Server for a variety of solutions, but as with any technology solution, it is only the right solution
if it is chosen for the right reason and environment. As full disclosure, IPRO Tech develops our
Enterprise Solutions, Allegro, eCapture and Eclipse, using SQL as the database engine. We also
offer a non-SQL database solution with our Eclipse SE product.

Historical Perspective

Ten years ago, most cases were stored in somewhat proprietary database products. Most
of those products began to stress under the increasing volume of records produced by electronic
discovery, compared to the relatively small number of paper/imaged documents. Modern
solutions migrated to SQL based platforms, some for the right reasons, others because it was
believed that large scale cases required it. These legacy databases of the past reached their
limit because they were originally designed a decade or two ago. Most were never redesigned
for modern use in order to avoid major conversions as well as to preserve backward
compatibility. That said, companies who invested in modernizing their flat file systems for
today’s disk storage, memory, and operating systems handle large volume cases with excellent
performance; as in all products, it is about design and implementation.

Advantages of SQL Server

SQL Server is a highly scalable and robust relational platform requiring sophisticated
configuration and maintenance to run properly. It is much more than a database server; it
also performs functions similar to an application server, a file server, and an operating system. A
server based platform such as SQL Server requires an experienced DBA, sophisticated
monitoring, maintenance, and backup; generally outside the control of the application software.
Conversely, a file based database usually can be maintained by litigation support personnel and
normal IT personnel (file backup), since most of the database internal operations are controlled
by the software itself.

If the database solution sought requires load balancing, sophisticated relational querying,
access to the data from outside the application, or needs to be browser based; then SQL is
definitely the right choice. SQL servers, web servers, and files servers can be scaled using
common standards and technology and the knowledge to do so is widely available in today’s
Microsoft based world. There are many trained personnel available and more are entering the
market each day. While file based database applications can certainly ‘build in’ remote access,
load balancing, querying, and access to data using scripting languages, this requires proprietary,
complex, specialized infrastructure handling, and additional training, which makes SQL a better
alternative.

Advantages of Flat-File Database

If the key requirement is performance and support for large volume cases, a welldeveloped
file based database can match, and in many cases, outperform SQL Server in
litigation document databases. Before all the SQL developers and DBA’s shout their objections,
there is a valid reason for this statement. Most litigation document databases perform far more
data reads than writes; closer to a “write-once-read-many” paradigm than to large scale
transaction based processes. Database writes occur when data is loaded during an import,
documents are issue tagged, notes are entered, lists are created, and images are annotated.
Those write operations pale in comparison to read operations like searching, review, and
production. SQL excels at large volume transactions and relational querying; but a properly
designed file system database will soar for review, even on large cases, because they are
optimized for reading data in non-transactional related methods. A database engine does not
have to be SQL to perform the same type of searches and filtering that most people associate
with relational platforms.

Comparing SQL and Flat-File

There are two fundamental engines at the core of SQL Server, the Relational Engine and
the Storage Engine, providing the foundation for data access and data storage. Ultimately, SQL
Server stores its data to disk (just like flat file systems) in an MDF file and the logs in an LDF
file. The disk storage algorithms are very similar across SQL and non-SQL database platforms
with performance, stability, and reliability resulting from the underlying architecture. One
interesting note: SQL Server uses indexes to improve the performance of querying and other
operations. These indexes are stored on disk using a B-Tree algorithm
http://msdn.microsoft.com/en-us/library/ms177443.aspx. That name should sound familiar to
many of us; the core algorithm was invented in 1972 http://en.wikipedia.org/wiki/B-tree .
While the methods have been improved over the years; at its core, this is the basis for most
high speed storage on disk. SQL server data storage uses a modified version of OLE DB
(although this will change in the near future). Paradigms that work rarely change; they are just
improved.

SQL Server resides between the client and the database file(s) and handles all the
querying, reads, writes, caching as well as a multitude of other operations. Anytime there is an
abstraction layer between the client and server data, there is additional overhead. The incorrect
tuning of a SQL Server can ultimately result in a solution that becomes unusable. A file based
system is designed so each client accesses shared data from the network storage, so the
performance is based upon disk and server access as well as the design of the database. Both
SQL and file based systems are affected by data storage speed and capability. SQL Server is a
fantastic and sometimes amazing solution; but if the requirements do not dictate the need, then
a quality file based database is a much better choice, eliminating the complexity and the need
for specialized hardware and personnel while performing equally as well. A decision should be
based on features, technology, and the company that created the application instead of just the
platform or data structure.

When to Choose a SQL Based Solution

1. You have a competent DBA
2. Most of your applications are already SQL based (which means you should already have a
competent DBA)
3. You need to scale to hundreds or thousands of users utilizing load balancing and/or
redundancy
4. Your application is hosted and/or browser based and accesses the data remotely
5. You absolutely need to query or access the data outside of the application,.e.g. custom
reports, data mining, integrated data migration, etc.

The right solution comes from a firm understanding of business requirements, infrastructure,
industry, and current technology. SQL or non-SQL solutions can be equally viable. The key
determining factors should be product features, environment and infrastructure match,
reliability, performance, and the reputation of the company that builds it.