Can you rewrite WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL)
so that each column name is only used once without using the substitution method (that is, without doing something like WHERE ISNULL(Col1, '-2,147,483,648') = ISNULL(Col2, '-2,147,483,648')
)?
WHERE EXISTS(SELECT Col1 INTERSECT SELECT Col2)
What…what?! That can’t be possible! I thought INTERSECT
compares two result sets, not two columns from the same row of data.
True, but a correlated subquery has access to the outer query’s columns. In the subquery, we can turn a column from the outer query into a single column, single row result set by putting SELECT
in front of its name. Do this to two columns and INTERSECT
is happy to compare between them. If they’re identical, INTERSECT
returns a single row containing the (distinct) identical value; if not, INTERSECT
returns no rows.
Okay, you win that point. I see how the INTERSECT
subquery could compare between the two values but don’t you need to add IS NULL
checks since a comparison between two NULLs
does not evaluate to true?
That would be correct if we were using a comparison operator (for example, =
or <>
). When a comparison operator encounters a NULL
, the operator evaluates to unknown. If we were using one of these operators and wanted NULL
s to be treated as equal, explicit IS NULL
checks would be required.
However, INTERSECT
is a set operator, not a comparison operator. Out of the box, set operators treat NULL
s as equivalent, so IS NULL
checks are unnecessary.
Wow! That’s interesting.
Hum…let me make sure I understand the EXISTS
part of the WHERE
clause. INTERSECT
passes its output to EXISTS
, which is kind of playing the role of a true or false check, right?
Exactly! If INTERSECT
returns a row, EXISTS
evaluates to true which makes the WHERE
evaluate to true. If INTERSECT
returns no rows, EXISTS
returns false which makes the WHERE
false.
Okay, you win. Your query works. But what about performance? Your query looks a lot more complex than the simple ‘equality comparison plus IS NULL
checks’ we started with. Won’t that complexity slow the query down?
That’s a fair question. As far as execution performance goes, remember that the execution plan and the efficiency with which it is executed is what matters, not the apparent simplicity or complexity of the SQL statement.
Let’s look at the execution plan generated for our original WHERE
statement used in the context of a simple SELECT
query.
SELECT * FROM TestTable WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL);
Notice that the WHERE
clause’s three tests are processed using a single compare predicate on the table scan. The database server determined it could satisfy all three logical conditions in the WHERE
clause using a single predicate.
Wait a minute! It looks like the IS NULL
checks disappeared. The predicate appears to only be performing an equals comparison.
Indeed, that’s what it looks like. Interestingly, if you run the same query with the IS NULL
checks removed, the graphical execution plan is identical.
SELECT * FROM TestTable WHERE Col1 = Col2;
Both queries have the same graphical execution plan, even though they are not logically equivalent and produce different result sets.
Whoa! What’s going on? How can both queries use the same predicate but produce different result sets?
Ready for something fascinating, a kind of Microsoft SQL Server secret? The compare predicate you see on the table scan can run under multiple operation modes. However, the graphical execution plan doesn’t show the mode being used. Thankfully, this detail is revealed in the XML execution plan.
For WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL)
Notice that the only non-whitespace difference between the two predicates is the value of CompareOp
. The first query was executed using CompareOp="IS"
while the second query used CompareOp="EQ"
. IS
considers values equal if they are the same or if both are NULL
. EQ
considers values equal if they are the same but not when both are NULL
.
Wow! So even though the one WHERE
clause has three checks (Col1 = Col2
, Col1 IS NULL
and Col2 IS NULL
), SQL Server used a single predicate to evaluate all three. Then, if we run the same query without the IS NULL
checks, the same predicate is used—but it executes under a different mode. Okay, that’s cool!
But what about the performance of EXISTS+INTERSECT
?
Let’s take a look:
SELECT * FROM TestTable WHERE EXISTS (SELECT Col1 INTERSECT SELECT Col2);
From the graphical execution plan, we can tell that the WHERE
clause evaluated down to a single predicate on the table scan—and that predicate looks just the one for the other two queries.
Wait…let me guess…if we look at the execution plan, we’ll find that the comparison predicate is operating in CompareOp="IS"
mode?
Right on!
Both SELECT * FROM TestTable WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL);
and SELECT * FROM TestTable WHERE EXISTS (SELECT Col1 INTERSECT SELECT Col2);
were executed using the same execution plan.
Keep in mind that SQL Server looks at the entire query when building execution plans. In our case, the two WHERE
clauses produced identical execution plans. However, in other situations, switching between comparison operator + IS NULL
checks and EXISTS+INTERSECT
could result in vastly different execution plans which might differ significantly in performance (for better or for worse). With query writing—at least when performance is important—always test and compare!
This has been fascinating! However, EXISTS+INTERSECT
’s syntax feels a bit clunky and verbose. Are you suggesting it’s better to use than comparison operator + IS NULL
checks?
I agree with your assessment. In fact, I think both approaches are awkward. Both require a lot of syntax to do something that seems like it should be simple.
EXISTS+INTERSECT
avoids repeating column names multiple times (a violation of the DRY principle). However, the intention of its syntax may not be immediately clear. Even an experienced query author may need to take a moment to figure out what it’s doing.
Comparison operator + IS NULL
checks may be repetitious but has the advantage of being straightforward and so easier to understand.
Since neither approach is ideal, it’s nice to have options. You can choose which seems best in the context of a particular query.
Hey…wait…before you go: Can this technique be used for not equals comparisons, as well?
Absolutely! Just add NOT
in front of EXISTS
. For example, the WHERE
clauses Col1 <> Col2 OR (Col1 IS NULL AND Col2 IS NOT NULL) OR (Col1 IS NOT NULL AND Col2 IS NULL)
can be rewritten as NOT EXISTS (SELECT Col1 INTERSECT SELECT Col2)
.
Footnotes
- ANSI/ISO SQL bypasses the need to use either of these approaches with its
IS [NOT] DISTINCT FROM
clause. There’s a Connect item asking Microsoft to add support for this syntax to T-SQL. If you like this idea, please consider voting for the Connect item. - For additional reading on this technique, check out Paul White’s post Undocumented Query Plans: Equality Comparisons. Paul’s article introduced me to the
CompareOps
property. Thank you, Paul!