Equals Is Not Always Equivalent: When Query Folding Does Not Produce Identical Results

, , , ,

Query folding is supposed to be transparent, as far as results go. Whether or not a Power Query expression is folded should have no effect on the data returned. You should receive back identical results either way. At least, that’s the theory.

Unfortunately, this is not always the case!

The fact that query folding sometimes changes the results that are returned can bite unexpectantly. You have an M expression that produces exactly what you want. Then you make what should be an innocuous edit, but behind the scenes the change affects whether or how the query is folded. The results you now receive back are no longer what you expect, and puzzlingly the divergence seems to have no obvious relation to your edit. Or, maybe you didn’t edit anything at all: instead, a Power Query update changed the foldability of your query without you touching it. You made no changes, yet the data returned is now different.

Say your expression contains the filter:

Table.SelectRows(Source, each [FirstName] = "Joe")

If the query is executed completely in M (no query folding), you receive back.

FirstName
---------
Joe

But if the query is folded, you might instead see:

FirstName
---------
joe
Joe

The pain from the mysterious change in results can be bad. Very bad. You’re left burning time trying to solve this quandary, while business is troubled at the unreliability of the associated report (maybe they’re questioning Microsoft Power BI’s or Microsoft Excel’s quality, or maybe they’re wondering about your competence).

Why the Difference?

Query folding involves the data connector translating your M logic into a native query/request, then the external data source executing that request. Transparent query folding relies on the combination of these two activities producing a result identical to what the mashup engine would produce if it executed your M logic directly (i.e. if no folding occurred).

In the case of our example, the SQL Server data connector translated the M filter Table.SelectRows(Source, each [FirstName] = "Joe") into the following SQL: where [_].[FirstName] = 'Joe'. Visually, the M expression and corresponding SQL appear equivalent; the difference in behavior stems from the rules the two systems use determine equality.

In the database world, the rule sets defining how textual comparisons are performed are known as collations. Often—at least, in the Microsoft SQL Server world—database columns have a case-insensitive collation, which means that the first names “joe” and “Joe” will both match the above SQL WHERE filter. In contrast, Power Query’s = operator performs a case-sensitive comparison between strings: [FirstName] = "Joe" will match a first name of “Joe”, but not of “joe”.

The difference between how M and the database determined equality led to the difference in results encountered in our example.

While our example features an equality comparison, query folding differences aren’t exclusive to checking whether two values are equal. These differences can occur with any operation or function that is folded.

Where’s the Bug?

At first glance, it may seem logical to conclude that something somewhere is broken. While it’s possible there’s a bug in the data connector or a configuration problem on the data source, it’s also possible–and likely–that the change in results isn’t due to a flaw, per se, but rather stems from the fact that two systems are involved which each “think” differently.

Imagine a web API that accepts search parameters as query string arguments, which it processes in a case-insensitive manner.

GET SomeEndPoint?FirstName='Joe'

If a Table.SelectRows like the one we saw previously is folded into a native request to this API, there’s no option to convey that the equality comparison should be case-sensitive. (This lack isn’t necessarily a shortcoming of the API. It may be that in its context a case-sensitive search isn’t normally relevant so isn’t offered as an option.)

The Power Query data connector developer for this API is faced with a choice: Don’t support folding equality comparisons to query parameters, or support folding these as best as possible, even though the net effect will deviate from how M would behave.

Choosing to not support folding means that processing a Table.SelectRows filter like the example’s necessitates pulling all results from the API endpoint then filtering them locally in Power Query. This can involve a vast quantity of otherwise unneeded data being fetched (e.g. say there are millions or billions of rows available from the endpoint–even if only a few match the Table.SelectRows filter, all still need to be retrieved locally by M). On the other hand, supporting folding means that the external source can send back a much smaller set of data (because it was filtered at the source); however, this data set may not match exactly what M would have come up with, had the filtering been done locally.

Ultimately, it’s a trade-off: purity vs. performance. In the Power Query world, leaning somewhat towards performance is the accepted practice for query folding. Perhaps surprisingly, this usually works out pretty well…except when it doesn’t.

Can the Pain Be Resolved?

Now that you have an understanding as to the cause of the problem, the question comes, “what can you do to fix it when it occurs?”

If you are bit by a results difference due to a switch from folded to not folded (so the generated native query did what you wanted, even though its behavior technically was different from the logic in your M expression), then the change has brought to your attention the fact that your Power Query logic does not capture the entirety of what you want done. It could be argued that your M code has a bug in it which was masked by query folding. With folding no longer occurring, the bug became visible.

The solution is simple: rewrite the relevant M expression so that it aligns with the behavior you want, which “happens” to be how the external source was behaving. In our example, this would involve changing Table.SelectRow‘s case-sensitive equality check to a case-insensitive check.

On the other hand, if the pain comes from something previously not folded now being folded (so your M expression does what you want, but the generated native query does not), things get trickier. The problem isn’t with your M code; it’s due to something going on behind the scenes–so there’s not a straightforward fix, just workarounds.

You could insert a step that interrupts folding prior to the M expression affected by the problem. This way, the affected expression will be processed in M, not by the external data source. However, this has the downside that you lose the benefits of offloading processing to the external source (which query folding gives you automatically). To preserve these, you could instead use a hand-crafted native query which you have Power Query execute (perhaps by using Value.NativeQuery). With this approach, exactly what you want is executed–with consistent, expected behavior and remotely.

Hand-Crafted Beats Auto-Generated (at least, sometimes)

That last solution raises an interesting question: In those cases where you can hand-craft a native query whose behavior aligns with M’s when the data connector’s auto-generated native request does not, doesn’t this indicate a flaw in the connector–that is, it could and should have generated a better query? Maybe. It could also be that you were able to hand-craft your query to the exact particulars of the situation but what you wrote doesn’t extrapolate out to a universally-applicable, general solution.

For example, changing the first example to use a hand-crafted native query clause of where [_].[FirstName] = 'Joe' collate Latin1_General_100_CS_AI produces results that align exactly with the behavior of Table.SelectRows(Source, each [FirstName] = "Joe").

Does this mean that simply adding “collate Latin1_General_100_CS_AS” to every equality comparison in a WHERE clause will align its behavior with M? No. That bit of SQL worked in the example’s situation, but isn’t guaranteed to make all equality comparisons align with M’s behavior (in fact, using it to compare a different set of strings might even cause a larger deviation from the desired M-equivalent behavior).

Unless a particular native query bit is generally applicable, a data connector developer may be inclined not to use it. In contrast, when you hand-write a query for a particular situation, you just need to make sure it works in your particular case. So, the ability to write a “transparent behavior” native query where the data connector couldn’t doesn’t necessarily mean the connector is flawed.

Conclusion

Query folding is powerful, but comes with complex factors to be aware of. One of them is its lack of guaranteed transparency. While we may not be able to change this reality, by shedding light on it, when it bits, hopefully you now have an understanding of what is going on (and why), as well as a starting place on how to resolve the unexpected variation in results.

Leave a Reply

Your email address will not be published. Required fields are marked *