Useful Constraints

How proper use of constraints in MSSQL/TSQL not only protects data but also boosts query performance. Hands-on examples, execution plans, and practical tips for database optimization.
Published on Wednesday 31 August 2016

Usually we use constraints to specify the allowable values for the columns in tables. And it works perfectly! But beyond that, constraints help query analyzer to generate more effective query plans. How does it work? Read below.

There is an opinion "Give MS SQL Server all information you know about data you store." What does it mean? First, obvious things: use the most suitable data type (use int, instead of bigint or float, for example), set the size of strings and binary data (nvarchar(10) instead of nvarchar(max)). And second - the constraints: if the value in a column may be in the range from 1 to 4, you should specify this as a constraint, SQL Server will thank you!

Example

Let's create a table:

CREATE TABLE [TestTable](
    [Id] [int] IDENTITY(1,1) NOT NULL, 
    [Value] [nvarchar](max) NOT NULL, 
    [Status] [int] NOT NULL
)

Add one million rows with data like this:

Id Value Status
1 value 1
2 value 2
3 value 3
4 value 1
5 value 2
6 value 3
7 value 1
8 value 2
9 value 3
10 value 1

And execute a query (with statistics output):

SET STATISTICS IO ON

SELECT tt.Status, COUNT(*)
FROM [TestTable] AS tt
WHERE tt.Status <> 1 
GROUP BY tt.Status

The query is quite real — we need a number of rows in all statuses excluding 1 ("Calc number of orders in all statuses excluding Open").

Query statistics:

Table 'TestTable'. Scan count 9, logical reads 4590, physical reads 0, read-ahead reads 4038, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Execution plan:

Plain Table Scan — server had to look through a table because it does not have indexes.

Add a new index:

CREATE NONCLUSTERED INDEX [IX_TestTable_Status] ON [TestTable]
(
    [Status] ASC
)

And run the query again:

Table 'TestTable'. Scan count 2, logical reads 1501, physical reads 0, read-ahead reads 10, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As expected, we see Index Seek with interesting predicate: Staus < 1 и Status > 1. If you remember index is implemented as a tree, and search operation on the tree is quite fast. SQL Server converted inequality into two comparisons that can be processed in the most effective way. Wherein, logical reads are also decreased.

And let's tell SQL Server that column Status can store only values 1, 2, 3 and 4:

ALTER TABLE [TestTable] WITH CHECK ADD CONSTRAINT [CK_TestTable]
CHECK (([Status]=(4) OR [Status]=(3) OR [Status]=(2) OR [Status]=(1)))

Run query:

Table 'TestTable'. Scan count 3, logical reads 1508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Surprisingly, there is nothing about status "1" in the actual execution plan. Instead of this analyzer replaced <> 1 with predicate IN (2, 3, 4).

But we don't get any advantage in logical reads.

Here is another example with a little bit different query:

SET STATISTICS IO ON

SELECT tt.Status, COUNT(*)
FROM [TestTable] AS tt
WHERE tt.Status NOT IN (1,3)
GROUP BY tt.Status

Remove another one status. Remove constraint and run query:

Table 'TestTable'. Scan count 2, logical reads 1501, physical reads 1, read-ahead reads 1959, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Analyzer converted NOT IN into a series of comparisons "less-more."

Add constraint and run the query:

Table 'TestTable'. Scan count 2, logical reads 757, physical reads 0, read-ahead reads 12, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Analyzer knows that column can store only values from set {1,2,3,4}, so predicates NOT IN (1, 3) and IN (2, 4) are equivalent. But logical read is decreased by 2 times!

Result by logical reads

Predicate No index, no constraint With index With index and constraint
<> 1 4590 1501 1508
NOT IN (1,3) 4590 1501 757
  1. SET STATISTICS IO
  2. Create Check Constraint