Power Query M Primer (Part 23): Query Folding I

, , , , ,

Query folding, by now, is a concept you’re likely already familiar with. In short, Power Query’s query folding takes an M expression and translates part or all of it into the data source’s native query/request language (for example, translating M into T-SQL or into an OData request), then asks the source to execute that native request. You wrote M, but query folding offloaded some portion of your expression’s processing to the external data source by rewriting M logic into the source’s native language.

Query folding is a key concept in the Power Query ecosystem. It brings the potential for extremely significant performance benefits. Thanks to mechanisms like indexes and in-memory paging of data, the data source often can find the data of interest much more efficiently than when raw rows are streamed from the source to Power Query for local filtering. The source may also be able to perform other processing, such as aggregation and joins, again with much better performance than Power Query can locally. In addition to these benefits, offloading execution to the source usually reduces the quantity of data that needs to be sent across the wire to Power Query. For reasons such as these, query folding as much as possible is usually more efficient (and so quite desirable) in contrast to having Power Query internally handle all the processing itself.

Perhaps surprisingly, considering its importance, query folding is not part of the M language. You could write your own M mashup engine which is 100% compliant with the language specification without even knowing that query folding exists. How could such a key Power Query concept not be part—a prominent part!—of the language specification?

It doesn’t need to be.

Let’s take another look at query folding. This time, instead of focusing on what it is or why it is advantageous (topics we’ve touched on in the past—see parts 5 & 12), let’s explore how it works. We’ll do this by looking at the general concepts involved with how folding is implemented using Table.View.

The knowledge we gain should help make folding much less mysterious—which should help us write better queries and debug query folding problems. As a bonus, if we ever decide to try custom connector development or feel the need to override (or augment) an existing connector’s folding, what we learn here should serve as a useful starting place.

Let’s get going!

Continue reading

Custom Folding Joins: Tunneling State Between Table.Views

, , ,

“Handling joins doesn’t look too bad,” you think as you examine the handler’s signature. You’re building out a Table.View (perhaps as part of a Power Query custom data connector), expanding its query folding capabilities by implementing additional handlers. OnJoin is up next.

OnJoin = (joinSide, leftTable, rightTable, joinKeys, joinKind) => ...

Its arguments look straightforward, mostly a mirror of what’s passed to Table.Join. Only one parameter, joinSide, is unfamiliar, but is easy to figure out.

All is well, that is, until you start writing code. You’re stopped by a brick wall.

Continue reading

Privacy Level Descriptions Revised

, , ,

Last week, Microsoft Power BI’s privacy level descriptions were revised.

Previously, Microsoft documentation implied that the data protection firewall imposed unilateral, bidirectional isolation between data sources having incompatible privacy levels. In actuality, the direction of data flow is factored in when determining whether data folding can take place between sources. Privacy level documentation now reflects this.

Continue reading

Custom Folding Arbitrary Functions: OnInvoke & Table.ViewFunction

, , , ,

You are working happily away on a Power Query custom data connector (or maybe on a standalone Table.View). Implement OnTake and OnSkip handlers? Check. Implement OnSelectColumns? Check. And on your journey goes, adding functionality by coding up new handlers—that is, until you realize you want to handle folding for a function where there doesn’t seem to be a corresponding handler.

Maybe it’s a standard library function like Table.ApproximateRowCount. Your data source provides a shortcut way to fetch a close-to accurate count and you’d like users of your custom connector to be able to retrieve that count using the standard function that already exists for this purpose. The catch? There’s no publicly documented GetApproximateRowCount handler which you can handle.

Instead, maybe it’s a custom function. Perhaps your data source maintains snapshots of historic data. Your connector users would like to be able to fetch data as it existed at a user-selected point in time by doing something like MySource.AsOf(someTableFromMySource, someDateTime). For this to work efficiently, that method needs to be foldable, but there’s no OnMySourceAsOf handler provided by Microsoft. Are you stuck or is there a way to fold custom methods?

In either case, the challenge is that you want to fold something that doesn’t have a specific handler.

The solution?

Continue reading

Resilient Relative Column Reordering

, , ,
Screenshot showing the "Move > To Beginning" option in Query Editor

There’s this one column you’d always like to appear leftmost in the table. No problem! In Query Editor, you right-click on the column and choose Move > To Beginning, which generates a “Reordered Columns” step for you.

All is well, until down the road when you remove a different, seemingly unrelated column from the table. Your Power Query refreshes start failing, complaining that the removed column is not found.

Column not found error message

You dig into the problem and find that the reordered columns step that Query Editor generated included a hard-coded reference to the now-removed column. To get things working again, you must hand edit this step’s M expression, manually removing the problematic column reference.

Why? Why did you need to remove (by hand, nonetheless!) a reference to a column that you didn’t consciously put there—a reference to a column whose position you didn’t ask Query Editor to reorder?

Relative-less (how sad!)

As it turns out, Table.ReorderColumns—the function that powers Query Editor’s reorder columns feature—does not support relative reordering. Conceptually, you wanted a column reordered to be the table’s first column, but Table.ReorderColumns doesn’t provide a way to simply say “make this one column leftmost”. Instead, any column reordering performed in the UI generates a function call to that method where it’s passed a list of all columns in the table, each in their desired order.

#"Reordered Columns" = Table.ReorderColumns(Source,{"ID", "FirstName", "LastName"})

If one of these columns is later removed, Query Editor doesn’t automatically update the passed in column list, so your code breaks. Ouch! In contrast, adding a new column to the table doesn’t cause Table.ReorderColumns to fail, but this doesn’t mean the experience is painless: the presence of the new column may bump the column you wanted leftmost out of that position.

It would be nice to eliminate these pain points.

Make Do…

Table.ReorderColumns has an optional third argument which can be set to MissingField.Ignore. This suppresses the missing column name error, which keeps the function working even though the column is gone. While this works, it leaves the deleted column’s name in code (code clutter = undesirable). It also doesn’t guarantee that the column you want on the left will stay there when new columns are added to the table.

Surely there’s a better way to do relative reordering….

…Or, Do It Nice!

Let’s see. The pain point is the hard-coded column list that’s passed to Table.ReorderColumns. We’re M code writers. Why don’t we use code to dynamically compute that list and perform the reorder?! We could craft a function that takes a list of just those columns we want leftmost, which then dynamically fetches the table’s current column list and adjusts its order appropriately before passing the result to Table.ReorderColumns.

Something like the below (which includes the bonus feature of also supporting rightmost relative ordering):

let
  Function = 
    (data as table, columnsToOrderLeft as list, optional columnsToOrderRight as list) as table => 
    let
      CurrentOrder = Table.ColumnNames(data),
      ReorderLeft = columnsToOrderLeft,
      ReorderRight = columnsToOrderRight ?? {},
      OrderedColumnsRemoved = List.RemoveItems(CurrentOrder, ReorderLeft & ReorderRight),
      NewOrdering = ReorderLeft & OrderedColumnsRemoved & ReorderRight,
      Reordered = Table.ReorderColumns(data, NewOrdering)
    in
      Reordered,
  FunctionType = 
    type function 
      (
          data as table,
          columnsToOrderLeft as (type {text}), 
          optional columnsToOrderRight as (type {text})
      ) 
      as table
      meta [
        Documentation.Name = "TableRelativeReorderColumns", 
        Documentation.LongDescription = "Returns a table from the input <code>table</code>, with the columns in <code>columnsToOrderLeft</code> appearing leftmost in the order given and the columns in <code>columnsToOrderRight</code> appearing rightmost in the order given. Other columns will not be reordered."
      ],
  Ascribed = Value.ReplaceType(Function, FunctionType)
in
  Ascribed

No more need for a hardcoded list of all column names. No more code clutter when MissingField.Ignore is used and a column is removed. Columns stay in the expected relative order even when new columns are added.

let
  TableRelativeReorderColumns = (code from above),
  Source = ...,
  Reordered = TableRelativeReorderColumns(Source, {"ID"})
in
  Reordered

Hope this helps!

Relationship Columns and Their Names

, ,

Have you ever stopped to think about relationship columns: how they work, when they’re automatically added, and in particular how they’re named?

On that last point: Did you know there is a latent danger where seemingly unrelated changes can break existing M code?

What Is a Relationship Column?

In a nutshell, a relationship column is an automatically added nested join between the table you’re working with and a related table. In the relationship column, for each row, there’s a nested table containing the associated rows from the related table. Thanks to M’s laziness, if the nested join isn’t used, fetching the related table’s row data will be skipped—so the presence of a relationship column whose values are unneeded does not incur an appreciable cost.

Continue reading

Describing a Function’s Record Parameters

,

Did you know that Power Query provides a way to define field-by-field details for a function’s record parameters?

You may already be familiar with how metadata can be used to provide descriptive details about a function and its parameter list. When provided, Query Editor uses this information to enhance the function invocation forms it generates and the IntelliSense it displays.

However, for record parameters, what’s not so well known is how to go beyond simply describing the basic details of a parameter to detailing its expected shape—such as the list of fields the record may (or must) contain, a friendly name for each field and even the allowed values on a per-field basis.

Pulling this off is easy!

Continue reading

Dataverse Web API Tip #13: Templates for New Entities

When creating a related record in Microsoft Dynamics/Power Apps, ever notice how the “quick create” form for the new entity may be displayed with some fields prepopulated?

Screenshot of Power Apps contact quick create form

For example, on an account, go to related contacts and click “New Contact.” By default, the create contact form that pops up shows account, phone and address pre-filled in with data from the account. The user can leave these values untouched or edit them, at their preference, before clicking “Save” to actually create the new entity record.

If you’re working programmatically—say using the Dataverse Web API—and would like to prepopulate fields like this from another entity, how would you go about pulling this off?

You could maintain your own mappings, like “When creating a contact for an account, set fields x, y and z on the new contact to the corresponding values from the related account.” While technically valid, this requires you to own the responsibility of configuring and storing these mappings. What if, instead, you could simply use the exact same template that the Dynamics/Power Apps UI uses?

Continue reading

Always Identical Outputs? Switch Statement vs. Switch Expression

, ,

Are these two statements identical in what they return?

public object Switcher(bool flag) 
{
  switch (flag) {
    case true:
      return 1;
    case false:
     return 10.5;
  }
}
public object Switcher(bool flag) 
{
  return flag switch {
    true => 1,
    false => 10.5
  };
}

No.

TestResult –
Switch Statement
Result –
Switch Expression
Assert.Equal(1, x.Switcher(true));PassFail – Error Message:
Assert.Equal() Failure
Expected: 1 (System.Int32)
Actual: 1 (System.Double)
Assert.Equal(10.5, x.Switcher(false));PassPass

Scratching your head trying to find the difference? The logic in both looks identical. Arguably, it is.

Continue reading