Power Query M Primer (Part 23): Query Folding I

, , , , ,

Query folding, by now, is a concept you’re likely already familiar with. In short, Power Query’s query folding takes an M expression and translates part or all of it into the data source’s native query/request language (for example, translating M into T-SQL or into an OData request), then asks the source to execute that native request. You wrote M, but query folding offloaded some portion of your expression’s processing to the external data source by rewriting M logic into the source’s native language.

Query folding is a key concept in the Power Query ecosystem. It brings the potential for extremely significant performance benefits. Thanks to mechanisms like indexes and in-memory paging of data, the data source often can find the data of interest much more efficiently than when raw rows are streamed from the source to Power Query for local filtering. The source may also be able to perform other processing, such as aggregation and joins, again with much better performance than Power Query can locally. In addition to these benefits, offloading execution to the source usually reduces the quantity of data that needs to be sent across the wire to Power Query. For reasons such as these, query folding as much as possible is usually more efficient (and so quite desirable) in contrast to having Power Query internally handle all the processing itself.

Perhaps surprisingly, considering its importance, query folding is not part of the M language. You could write your own M mashup engine which is 100% compliant with the language specification without even knowing that query folding exists. How could such a key Power Query concept not be part—a prominent part!—of the language specification?

It doesn’t need to be.

Let’s take another look at query folding. This time, instead of focusing on what it is or why it is advantageous (topics we’ve touched on in the past—see parts 5 & 12), let’s explore how it works. We’ll do this by looking at the general concepts involved with how folding is implemented using Table.View.

The knowledge we gain should help make folding much less mysterious—which should help us write better queries and debug query folding problems. As a bonus, if we ever decide to try custom connector development or feel the need to override (or augment) an existing connector’s folding, what we learn here should serve as a useful starting place.

Let’s get going!

Series Index

Query Folding Recap

First, a recap. Say you ask for the following M code to be evaluated:

  Source = GetData(),
  Filtered = Table.SelectRows(Source, each [Code] = 50),
  First3 = Table.FirstN(Filtered, 3)

Suppose that the data source is Microsoft SQL Server. Query folding might rewrite the above to something effectively along the lines of the below, which is then handed to the mashup engine for execution.

  Source = Sql.Database("SomeServer", "SomeDB"),
  NativeResults = Value.NativeQuery(
    "SELECT TOP (3)
    FROM SomeTable
    WHERE Code = 50"

The logic you originally expressed as a pair of Table.* function calls was translated into a native T-SQL query. M logic was query folded to T-SQL!

Rewrite Away

Query folding involves an intelligent rewriting of your M expression. Power Query—not the mashup engine itself but another component of the larger Power Query system—takes the expression you asked Power Query to process and examines it. If it decides that a more efficient way to achieve the same result would be to execute something different—specifically something that’s folded to the data source—it adapts (or rewrites) your expression appropriately.

This optimization process occurs after you submit an expression for evaluation but before actual data production takes place.

The Language Is Agnostic

Whether or not query folding takes place, what’s executed by the mashup engine is M logic. This is true whether that logic is verbatim what you wrote or an adaptation produced by an intermediate Power Query process that intercepted and edited your logic. In either case, ultimately, the mashup engine been asked to evaluate M logic, and that’s what it will do its best to do.

In light of this thought, it makes sense that the M language is agnostic to query folding. No special M language constructs are required to make it possible for Power Query to query fold, so there’s no need for the language specification to contain query folding-specific details. (This is good separation of concerns. Why should the language be concerned with something it doesn’t need to be?)

What’s Foldable?

It’s a classic question: “Which methods query fold?” You may have heard others ask it; likely, you’ve wondered about it yourself. Why can’t someone just give you a simple list of the methods that fold?

Turns out, there isn’t one!

Why not? The answer to “which methods can fold?” depends on multiple factors, specifically the:

  • Power Query Version—The version being used defines the list of methods that possibly could fold, and inversely the list of methods that cannot fold. Different Power Query versions may support folding different sets of methods (e.g. perhaps the current version supports folding more than a version from a while back).
  • Data Connector/Data Source—A connector can only fold operations it was programmed to fold. A particular external source may not support every operation that Power Query possibly could fold.
  • Data Set (possibly)—Sometimes, the particulars of the data set, such as its schema, affect foldability.

Take Table.Group as an example. Current versions of Power Query can fold this operation, so it passes the “Power Query Version” test. However, only some data sources can perform a grouping, so the operation may or may not satisfy the “Data Connector/Data Source” criterion.

For example, Microsoft SQL Server has native capabilities to GROUP BY, so it makes sense for its data connector to support folding this operation. In contrast, a basic web API may not offer a way to request that it perform a grouping. If this is the case, having the associated data connector fold Table.Group makes no sense, as the operation can’t be offloaded to the source.

So, when determining what folds, Power Query can’t simply say, “Oh, here’s a Table.Group, I can fold it!” Instead, it needs to check whether the data connector that’s being used supports folding the operation.

When making this determination, the data connector may choose to factor in not just the kind of operation (e.g. it’s a “group by”) but also the specifics of the operation (that is, the parameters to the operation) and even possibly the specific data set being accessed (such as its schema).

Imagine another web API. For a given entity, say the API supports two methods of data access: paging through all records or looking up a single record by its primary key. In line with this, we could imagine that the corresponding data connector might support folding Table.SelectRows, but only if the filtering predicate is an equality test and only if the column being tested is the entity’s primary key. So, assuming the entity at hand is “Orders” with a primary key of “OrderID”, Table.SelectRows(source, each [OrderID] = 123) could fold but Table.SelectRows(source, each [ShipmentID] <= 100) could not fold. In this case, whether Table.SelectRows is foldable doesn’t have a static, fixed answer; rather, the connector dynamically determines that answer based on the specifics of the row filter predicate and the entity’s primary key.

From the preceding, you can see that Power Query must have some kind of mechanism for dynamically interacting with the data connector, as it needs a way to ask the connector whether it can fold specific operations.

To understand this better, let’s start by taking a step back.

Introducing Table Views

Where query folding is in play, our M code interacts with what look like a table, but which really isn’t a table. Instead, what we’re working with is something that can act like a table when it needs to (for example, it can produce table rows on demand) but which also has a mechanism for handling query folding requests.*

(*At least, this holds true in the context of Microsoft’s publicly documented mechanism for implementing custom query folding. How things work when data comes from a connector created using one of their internal-only mechanisms isn’t something I have details on.)

The name for this “table-like something” is a view. It’s often associated with custom data connectors, as it is foundational to implementing them. However, its use isn’t limited to this context, as views can also be created in normal M code.

Learning how views works—in particular, the basics of using a view to implement folding—will vastly enhance our understanding of query folding in general.

Defining a View

A view is built around a set of handlers. Power Query uses these handlers for two main purposes: to make requests, asking the view to configure the kind or shape of data it may later return, and to give commands, telling the view to produce a particular piece of information (like table rows).

The main effort involved with building a view is defining these handlers. However, before we get to handlers, let’s cover the other step involved with creating a view: choosing its mode.

Table.View, the standard library function for creating views, produces views that operate in one of in two modes: “override” or “from scratch” (my terms, not official Microsoft terms).

In “override” mode, the view created by Table.View is used to override, or augment, behaviors of an existing table. The new view defines handlers for the operations it wants to handle; operations not handled by it fall back to the underlying table (at least, that’s the general concept; the rules are a little more detailed on how this works, especially when multiple override-mode views are layered). Thanks to this fallback, there are no minimum required handlers that an “override” mode view must implement.

In contrast, a “from scratch” view is not based on top of an underlying table. Instead, it takes on the responsibility of defining all mandatory table behaviors, as well as any optional ones it wishes to implement.

A view’s mode is determined by Table.View‘s first argument:

  • If it’s a table (including a table that itself is defined by another view), the mode is “override”.
  • If it’s null, the mode is “from scratch.”

Creating a “from scratch” mode view will be our focus.

Minimum Handlers

A “from scratch” view must, at minimum, define handlers for the two fundamental table operations: GetType and GetRows. Without these, the view couldn’t act like a table, so they’re nonoptional in this context.

Handlers are defined using a record which is passed as Table.View‘s second argument. The record’s field names serve as the handlers’ names.

  MyDataSource = Table.View(
      GetType = () => type table [ID = number, Name = text],
      GetRows = () => #table({"ID", "Name"}, {{ 1, "Joe" }, { 2, "Paul" }})

When the above view is accessed (e.g. when MyDataSource in the “in” clause is evaluated), what’s rendered is a table with the type given by GetType() and the rows produced by calling GetRows().

Notice that it took two function calls for Power Query to obtain everything needed to have a complete table experience. The table’s rows are fetched using GetRows, but the type, including the column types, comes from GetType (columns must have the same names and be given in the same order between the two handlers). The beauty of this two-step process is that it allows Power Query to get type information without getting rows, something Power Query often likes to do (as we’ll eventually learn).

Normally, handler GetRows makes a call to an external data source, instead of returning a hard-coded table, as it does above for example’s sake. Also, a real life GetType may call the data source to read schema details dynamically, if they haven’t already been cached by other handlers in the view.

A real-world implementation might look more like:

  MyDataSource = Table.View(
      GetType = () => ComputeTypeByReferencingExternalSource(),
      GetRows = () => ReadRowsFromExternalSource()

Still, as things stand, this is underwhelming. We could have skipped using Table.View altogether and achieved the same effect by simply taking the output from ReadRowsFromExternalSource() and ascribing it with it the type returned by ComputeTypeByReferencingExternalSource().

  Data = ReadRowsFromExternalSource(),
  Type = ComputeTypeByReferencingExternalSource(),
  TypeAscribed = Value.ReplaceType(Data, Type)

The power of Table.View starts to show when we add other handlers to the mix.

Counting Rows

Going back to our last Table.View example, suppose someone wants to count the rows in the rendered table (e.g. do a Table.RowCount(MyDataSource)). As things currently stand, there’s no special handling for row counting, so the operation won’t be folded.

This doesn’t stop the operation from working. Rather, it will be handled internally by M instead of being offloaded to the data source. To compute the count, Power Query will call the view’s GetRows handler, count the rows it returns, then return the total count. This involves all rows being retrieved from the source and processed locally by Power Query.

It would be much more efficient if the source provided an option where it performs the count on the server-side, then simply returns the total.

Let’s imagine that it does offer this, say by changing the API call from https://somewhere/data to https://somewhere/data?count. How would we wire this into the Table.View so that that Table.RowCount operation can be folded instead of being processed locally by Power Query?

You guessed it! By adding another handler:

  MyDataSource = Table.View(
      GetType = () => ComputeTypeByReferencingExternalSource(),
      GetRows = () => ReadRowsFromExternalSource(),
      GetRowCount = () => GetRowCountFromExternalSource() // e.g. by calling https://somewhere/data?count

With this change, the above Table.RowCount invocation no longer leads to Power Query fetching all rows from GetRows and counting them locally. In place of this, handler GetRowCount is invoked. The number it outputs is returned as the count by Table.RowCount. Thanks to this handler, a bit of M logic (the Table.RowCount operation) was translated—a.k.a. query folded!—from M to a native request executed by the external data source. Yay!

To prove that folding really happened, let’s go back to our very first, not-exactly-real-world simple view example and update it to handle GetRowCount. However, here we’ll have GetRowCount return a hard-coded, incorrect value.

  MyDataSource = Table.View(
      GetType = () => type table [ID = number, Name = text],
      GetRows = () => #table({"ID", "Name"}, {{ 1, "Joe" }, { 2, "Paul" }}),
      GetRowCount = () => 1 // purposefully returning 1 even though the table actually contains 2 rows
  Table.RowCount(MyDataSource) // returns 1
  // Table.RowCount(Table.Buffer(MyDataSource)) // uses buffering, which blocks folding of subsequent steps, forcing Power Query to count the rows itself-- returns 2

Notice that Table.RowCount(MyDataSource) returns 1, which is the incorrect, hardcoded value output by handler GetRowCount, proving that this handler was used instead of Power Query actually counting the table’s rows.

If, instead, we force the row count to be performed by Power Query itself (say by preceding Table.RowCount with a call to Table.Buffer, which blocks folding of subsequent operations), a row count of 2 is returned, proving that Power Query didn’t use the view’s GetRowCount handler but instead counted the rows itself.

Configuring What’s Returned

Yay! We now have a glimmer of Table.View‘s potential. Its power grows as it is expanded to support folding more operations. How is this pulled off? You guessed it: by implementing more handlers!

However, unlike GetType, GetRows and GetRowCount, which output either data or details about data, most other handlers have a different purpose: configuring what will be returned (such as its shape), not actually returning data or data-related details.

Take operations Table.FirstN(data, 10) and Table.Skip(data, 5) as examples. Their corresponding handlers are OnTake and OnSkip, respectively.

  MyDataSource = Table.View(
      GetType = () => ComputeTypeByReferencingExternalSource(),
      GetRows = () => ReadRowsFromExternalSource(),
      GetRowCount = () => GetRowCountFromExternalSource(),
      OnTake = (count as number) => …,
      OnSkip = (count as number) => …
  Table.FirstN(Table.Skip(MyDataSource, 5), 10)

Notice the difference in naming? The new handler names start with “On”, in contrast to the “Get” handlers we met previously. These two-name prefixes delineate the two main categories of handlers:

  • Get* handlers return something (like data or details about data).
  • On* handlers configure how a view behaves (with one exception, which we’ll come to later).

While Get* handlers return data or related details, in general it doesn’t make sense for ordinary On* handler to do the same. To configure handling of all foldable operations, Power Query might need to call multiple On* handlers before it’s ready to retrieve the final data set. For example, above, both OnSkip and OnTake will be invoked by Power Query before it calls GetRows to fetch rows from the view.

Having each On* handler return actual data would be premature, resulting in unnecessary work being performed. What then should these handlers return?

A new Table.View that “remembers” what needs to be folded .

In essence, an On* handler needs to capture the details of the folding request, saving these instructions in a way that the new Table.View it returns can access them. This way, if Get* handlers are later called on that new view, those handlers can factor in the saved instructions, as appropriate, when producing data. If, instead, an On* handler is called on the new view, that handler can add to or modify those instructions, as appropriate, when it creates the even newer Table.View that it returns.

In a language where variables are mutable, On* handlers would probably skip returning new views and instead just update an “instructions” variable belonging to the current view. M’s immutability doesn’t allow this. Instead, to reflect new instructions, a new view must be returned that has access to the appropriate, revised instructions as computed by the current handler. How do we pull this off?

We might decide to save folding instructions using a state record. If we did, we could define a function that takes a state record as input and returns a Table.View. The code inside this “create view” function, including any handlers it defines for the view it creates, will have access to the arguments passed to the function. Thanks to closures (see part 22), this holds true even after the “create view” function finishes. In terms of our interests, this means that the handlers defined inside the “create view” function can access the passed-in state record when they are later invoked, even though the “create view” function may have long since finished running.

  GetView = (state as record) => Table.View(
	  GetType = () => ComputeTypeByReferencingExternalSource(state),
	  GetRows = () => ReadRowsFromExternalSource(state),
	  GetRowCount = () => GetRowCountFromExternalSource(state),
	  OnTake = (count as number) => @GetView(state & [Take = count]),
	  OnSkip = (count as number) => @GetView(state & [Skip = count])
  MyDataSource = GetView([])
  Table.FirstN(Table.Skip(MyDataSource, 5), 10)

To create the initial view, the “create view” function (in this case, GetView) is invoked, passing in an empty record as the state.

Inside this function, the Table.View‘s On* handlers are each defined to call the same “create view” function, passing it the current (i.e. existing) state merged with the appropriate new state details related to the handler’s invocation. For example, if OnTake(6) is invoked, a new view will be created and returned whose state is equal to the prior state with [Take = 6] merged onto it.

When, eventually, one or more Get* handlers are invoked, they have access to the latest state, thanks to a closure. So, when GetRows is invoked and calls ReadRowsFromExternalSource, that latter method is given a state record of [Take = 6]. Presumably, it factors this state in when fetching data from the source, limiting the number of rows returns to (at most) six.

To sum up query folding from a view’s perspective: On the view, Power Query interacts with the appropriate On* handlers to configure folding. Each successful handler invocation progressively builds up an internal set of instructions by returning a new view that has access to the appropriately updated details. Eventually one or more Get* handlers are called on the last returned view. These, in turn, access the instruction set that was computed by the handler that created that last view, factoring in those instructions, as appropriate, to produce data.

A bit different than how we typically write Power Query code. Perhaps a bit mind-stretching. Really, a fascinating concept. Manipulating state through a series of function calls without actually ever mutating (i.e. changing) a variable!

[Related: Kinda, Sorta Object-Like Behavior (from part 22)]

Conditional Handling

Recall that we said earlier that an operation’s foldability can depend not just on the type of operation at hand, but also on the parameters to that operation and even on the external data set. These latter factors imply that there must be some mechanism for Table.View to conditionally consider folding an operation.

If we don’t want a view to support folding an operation altogether, we simply don’t provide a handler for it. Here, the situation is different: The operation should sometimes be handled, so the handler needs to exist; instead, when it’s invoked, the handler needs to decide whether or not it will actually handle the folding request. If it decides to reject the request, it needs to communicate this back to Power Query. How is this pulled off?

Simply by having the handler raise an error if it chooses not to handle the request. An error is the way to signal Power Query that a folding request was turned down.

Say we want to support query folding of sort requests, like Table.Sort. OnSort is the relevant handler. When invoked, it’s passed a list of sort detail records, one record per column that should be sorted, with the records in the form of [ Name = "column name to sort", Order = integer] (where Order = Order.Ascending, a.k.a. 0, for ascending and Order.Descending, a.k.a. 1, for descending).

However, there’s a catch. The API of the remote system we’re interfacing with only supports sorting in ascending order. To align with this, our view needs to conditionally handle OnSort: it should reject the folding request if a sort detail record specifies an incompatible sort order.

    OnSort = (sortDetails as list) =>
      if List.MatchesAny(sortDetails, each [Order] <> Order.Ascending)
      then error "Only supports Order.Ascending sorting"
      else @GetView(state & [AscendingSortColumns = List.Transform(sortDetails, each [Column])]),

Above, OnSort first checks whether the sort details contain any records whose sort order is not ascending. If so, an error is raised. If not, a new view is returned with its state record set to include the list of “to sort by” columns.

To sum it up, Power Query’s invocation of a handler is a request asking the view to fold an operation. The view can choose to accept the request, which it indicates by having the handler return the expected output (e.g. a new view, for ordinary On* handlers). Alternately, the handler can reject the request, by raising an error, which means that that particular operation won’t be folded.

Ordinarily, a view’s rejection of a folding request doesn’t signal a stop to all folding; rather, it just denies the current folding request. Responsibility for the associated operation falls back to Power Query, for it to process locally. Thanks to M’s immutability, the previously returned view is still in the same shape as it was when it was created. It can still handle Get* handler calls, and if Power Query were to so choose, other On* handler invocations, as well.

In the case of our last example, say someone tries a sort descending:

Table.Sort(MyDataSource, {"Amount", Order.Descending})

The view’s OnSort handler will error (because it only supports ascending sorts), but this doesn’t stop the sort from happening or the view from being used. Rather, Power Query will fetch rows from the view and then sort them locally.

Direct Query

Direct Query is a special case: this “fall back to local processing” behavior does not occur with it. With Direct Query, both unhandled operations (i.e. when the view does not implement the associated handler) and rejected folding requests (when the associated handler raises an error) result in the entire Direct Query request erroring out. So, with Direct Query, the view is responsible for handling all foldable operations that will be used. Any that the view does not handle will be altogether unusable from Direct Query.

Thankfully, you don’t need to worry about this unless you’re building a Direct Query-enabled custom data connector because (to my knowledge) you can’t define a Table.View for Direct Query outside a connector’s source code.

The Rest of the Story

Our purpose in exploring Table.View is to gain a general sense for how view handlers are used to implement query folding, not to explore their every detail. For completeness, it’s worth noting that there are (as you might expect) a number of other On* handlers that can be implemented. With the exception of OnInvoke, all follow the pattern of returning new views.

The latter is a special case: OnInvoke is a sort of “catch all jack of all trades” used to enable folding support for certain methods that don’t map to specific On* handlers. As such, it can return a new view (like other On* handlers) or actual data values (unlike other On* handlers).

[Related: Custom Folding Arbitrary Functions: OnInvoke & Table.ViewFunction]

Moral of the Lesson #1

Hopefully you now understand the complexity involved with answering the question “which methods fold?” From the outside, it may seem like it should be an elementary question to answer, but that semblance of simplicity evaporates after exploring the dynamic interactions that occur between Power Query and data connectors.

To answer “which methods fold?” in a nutshell: There’s no fixed list of methods that always fold. The list of potentially eligible methods can change based on the Power Query version being used. A potentially foldable method, when not blocked by a preceding non-folded method, possibly can fold—but whether it does is up to the data connector. When making its decision, the data connector may choose to factor in the arguments to the particular operation as well as the associated data set.

Continued Next Time

This isn’t the only moral to learn from query folding, but our exploration is long enough for this installment. Next time, let’s pick back up, building more knowledge of how query folding works and gleaning more practically applicable insights that will help us work more effectively with Power Query.

For a glimpse of what’s to come: Query folding may result in the view communicating with the data source more than you might intuitively expect. Why? Stay tuned for next time….

Until then, happy data mashing up!

1 thought on “Power Query M Primer (Part 23): Query Folding I

  1. Alex Groberman

    Fantastic, thank you very much for another entry in this valuable blog series. Just had a practical use for Table.View today, where hard-coding GetType in Table.View drastically sped up query editing on a PBI Data Flow.

    Thanks again,



Leave a Reply

Your email address will not be published. Required fields are marked *