Equals Is Not Always Equivalent: Power Query Joins vs. SQL Joins

, , , ,

Take the following M expression:

Table.Join(A, "ID", B, "ID", JoinKind.Left)

Does it behave like the below SQL (which is how a join between two tables on column ID would typically be coded in the database world)?

FROM A
  LEFT JOIN B ON A.ID = B.ID

Perhaps surprisingly, no—at least, not when the simple, innocent null is involved.

A Philosophical Difference

To the Power Query engine, two null values are considered equivalent. In the database world, comparing two null values evaluates to unknown (at least, when the database is following the relevant ISO standard), which has the net effect of the two nulls being considered not equal.

This is why the above M expression and the SQL below it are not equivalent. Both perform equality comparisons between the two tables’ rows, but each uses a different rule when determining whether two nulls are considered equivalent.

Both null-comparison philosophies are valid. You could argue that two nulls should be equal because they are both identical values. Or, considering that null indicates unknown/undefined, you could argue that two nulls should not be considered equal because whether two unknown or undefined values are equal is unknown. But I digress.

Automatic Accommodation

Power Query accommodates for the difference in how database servers compare nulls by writing compensating SQL. When necessary, extra SQL is added to the generated native query telling the database to handle nulls like M would natively. (At least, this is true for the connectors I tested [SQL Server and PostgreSQL]. Since native query generation is connector-specific, all connectors may not follow this practice.)

With the connectors I tested, the above M expression is query folded into SQL that’s equivalent to the following:

FROM A
  LEFT JOIN B ON A.ID = B.ID
    OR (A.ID IS NULL AND B.ID IS NULL)

Notice the addition of an “or” condition that makes the join predicate match when key columns are null. This added SQL aligns the database server’s behavior with M’s, as far as null comparisons go.

This is good—but it isn’t the SQL (or behavior) that a typical database developer likely would envision if you told them that two tables are joined on a particular column.

Achieving the Other Behavior

Power Query’s nulls-are-equal behavior may be exactly what you want folded to SQL. In that case, you get the behavior you want out of the box. But what if you want the behavior of how a join is typically implemented in SQL—that is, you want two null values to be considered unequal? You’ll need to manipulate the result set in M to achieve the desired effect.

For example, in the case of left, right and inner joins, simply filter out rows with nulls in the appropriate key column(s)* either before or after the join (*appropriate key columns: key columns from the right table for a left join, key columns from the left table for a right join, and key columns from both tables for an inner join).

let 
  …
  FilteredB = Table.SelectRows(B, each [ID] <> null),
  Result = Table.Join(A, "ID", FilteredB, "ID", JoinKind.Left)
in
  Result

While the SQL generated by query-folding the above may not be as succinct as a simple SQL JOIN statement (instead perhaps incorporating a JOIN that includes the extra OR (A.ID IS NULL AND B.ID IS NULL) clause and one or more WHERE conditions excluding the relevant nulls), the desired net effect is achieved: nulls are considered not equal when determining which rows to join.

Moral of the Lesson

A Power Query join works differently from the stereotypical SQL implementation of a join, as far as null handling goes.

The situation discussed in this article is different from Equals Is Not Always Equivalent: When Query Folding Does Not Produce Identical Results. That post describes how operators or functions folded into native queries may be processed differently by the remote system than they would be if they were evaluated directly by M. The native request’s code looks equivalent to the corresponding Power Query logic; the behavior difference comes from the fact that M and the external source “think” differently.

In contrast, this post deals with a scenario where the data connector generates accommodating native code to align the external source’s behavior with Power Query’s, as far as null handling goes (so behavior is the same, cross systems). The difference here is in the meaning of the term join: A join is being query folded so that it executes on the database server, but what’s being executed on that server is a Power Query join (where nulls are considered equal), not a stereotypical database join (where nulls typically are not considered equal).

Being aware of this subtle difference in the contextual meaning of “join” can help avoid communications confusion when interacting with those from a database background (or when reading articles written from that perspective).

Leave a Reply

Your email address will not be published. Required fields are marked *