Equating NULLs with EXISTS + INTERSECT

, ,

Can you rewrite WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL) so that each column name is only used once without using the substitution method (that is, without doing something like WHERE ISNULL(Col1, '-2,147,483,648') = ISNULL(Col2, '-2,147,483,648'))?

WHERE EXISTS(SELECT Col1 INTERSECT SELECT Col2)

What…what?! That can’t be possible! I thought INTERSECT compares two result sets, not two columns from the same row of data.

True, but a correlated subquery has access to the outer query’s columns. In the subquery, we can turn a column from the outer query into a single column, single row result set by putting SELECT in front of its name. Do this to two columns and INTERSECT is happy to compare between them. If they’re identical, INTERSECT returns a single row containing the (distinct) identical value; if not, INTERSECT returns no rows.

Okay, you win that point. I see how the INTERSECT subquery could compare between the two values but don’t you need to add IS NULL checks since a comparison between two NULLs does not evaluate to true?

That would be correct if we were using a comparison operator (for example, = or <>). When a comparison operator encounters a NULL, the operator evaluates to unknown. If we were using one of these operators and wanted NULLs to be treated as equal, explicit IS NULL checks would be required.

However, INTERSECT is a set operator, not a comparison operator. Out of the box, set operators treat NULLs as equivalent, so IS NULL checks are unnecessary.

Wow! That’s interesting.

Hum…let me make sure I understand the EXISTS part of the WHERE clause. INTERSECT passes its output to EXISTS, which is kind of playing the role of a true or false check, right?

Exactly! If INTERSECT returns a row, EXISTS evaluates to true which makes the WHERE evaluate to true. If INTERSECT returns no rows, EXISTS returns false which makes the WHERE false.

Okay, you win. Your query works. But what about performance? Your query looks a lot more complex than the simple equality comparison plus IS NULL checks we started with. Won’t that complexity slow the query down?

That’s a fair question. As far as execution performance goes, remember that the execution plan and the efficiency with which it is executed is what matters, not the apparent simplicity or complexity of the SQL statement.

Let’s look at the execution plan generated for our original WHERE statement used in the context of a simple SELECT query.

SELECT * FROM TestTable WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL);

Execution Plan for 'SELECT * FROM TestTable WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL);'

Notice that the WHERE clause’s three tests are processed using a single compare predicate on the table scan. The database server determined it could satisfy all three logical conditions in the WHERE clause using a single predicate.

Wait a minute! It looks like the IS NULL checks disappeared. The predicate appears to only be performing an equals comparison.

Indeed, that’s what it looks like. Interestingly, if you run the same query with the IS NULL checks removed, the graphical execution plan is identical.

SELECT * FROM TestTable WHERE Col1 = Col2;

Execution Plan for 'SELECT * FROM TestTable WHERE Col1 = Col2;'

Both queries have the same graphical execution plan, even though they are not logically equivalent and produce different result sets.

Whoa! What’s going on? How can both queries use the same predicate but produce different result sets?

Ready for something fascinating, a kind of Microsoft SQL Server secret? The compare predicate you see on the table scan can run under multiple operation modes. However, the graphical execution plan doesn’t show the mode being used. Thankfully, this detail is revealed in the XML execution plan.

For WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL)XML Execution Plan Fragment - Triple Condition WHERE Clause Query

For WHERE Col1 = Col2XML Execution Plan Fragment - Single Condition WHERE Clause Query

Notice that the only non-whitespace difference between the two predicates is the value of CompareOp. The first query was executed using CompareOp="IS" while the second query used CompareOp="EQ". IS considers values equal if they are the same or if both are NULL. EQ considers values equal if they are the same but not when both are NULL.

Wow! So even though the one WHERE clause has three checks (Col1 = Col2, Col1 IS NULL and Col2 IS NULL), SQL Server used a single predicate to evaluate all three. Then, if we run the same query without the IS NULL checks, the same predicate is used—but it executes under a different mode. Okay, that’s cool!

But what about the performance of EXISTS+INTERSECT?

Let’s take a look:

SELECT * FROM TestTable WHERE EXISTS (SELECT Col1 INTERSECT SELECT Col2);

Execution Plan for 'SELECT * FROM TestTable WHERE EXISTS (SELECT Col1 INTERSECT SELECT Col2);'From the graphical execution plan, we can tell that the WHERE clause evaluated down to a single predicate on the table scan—and that predicate looks just the one for the other two queries.

Wait…let me guess…if we look at the execution plan, we’ll find that the comparison predicate is operating in CompareOp="IS" mode?

Right on!

XML Execution Plan Fragment - EXISTS+INTERSECT Query

Both SELECT * FROM TestTable WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL); and SELECT * FROM TestTable WHERE EXISTS (SELECT Col1 INTERSECT SELECT Col2); were executed using the same execution plan.

Keep in mind that SQL Server looks at the entire query when building execution plans. In our case, the two WHERE clauses produced identical execution plans. However, in other situations, switching between comparison operator + IS NULL checks and EXISTS+INTERSECT could result in vastly different execution plans which might differ significantly in performance (for better or for worse). With query writing—at least when performance is important—always test and compare!

This has been fascinating! However, EXISTS+INTERSECT’s syntax feels a bit clunky and verbose. Are you suggesting it’s better to use than comparison operator + IS NULL checks?

I agree with your assessment. In fact, I think both approaches are awkward. Both require a lot of syntax to do something that seems like it should be simple.

EXISTS+INTERSECT avoids repeating column names multiple times (a violation of the DRY principle). However, the intention of its syntax may not be immediately clear. Even an experienced query author may need to take a moment to figure out what it’s doing.

Comparison operator + IS NULL checks may be repetitious but has the advantage of being straightforward and so easier to understand.

Since neither approach is ideal, it’s nice to have options. You can choose which seems best in the context of a particular query.

Hey…wait…before you go: Can this technique be used for not equals comparisons, as well?

Absolutely! Just add NOT in front of EXISTS. For example, the WHERE clauses Col1 <> Col2 OR (Col1 IS NULL AND Col2 IS NOT NULL) OR (Col1 IS NOT NULL AND Col2 IS NULL) can be rewritten as NOT EXISTS (SELECT Col1 INTERSECT SELECT Col2).

Footnotes

  • ANSI/ISO SQL bypasses the need to use either of these approaches with its IS [NOT] DISTINCT FROM clause. There’s a Connect item asking Microsoft to add support for this syntax to T-SQL. If you like this idea, please consider voting for the Connect item.
  • For additional reading on this technique, check out Paul White’s post Undocumented Query Plans: Equality Comparisons. Paul’s article introduced me to the CompareOps property. Thank you, Paul!

Leave a Reply

Your email address will not be published. Required fields are marked *