Exploring Power Query Buffering: How Table.Buffer and List.Buffer Work

, , ,

Table.Buffer and List.Buffer buffer data—but how do they work? How deeply do they buffer? How do they handle errors? And, for that matter, when do they populate?

In a nutshell: Table.Buffer creates a stable copy* of a table’s rows. These rows are fetched only once from the source, regardless of how many times they are read from the buffer. Each time the buffer is accessed, the same* rows are returned in the same order. With List.Buffer, the behavior is identical, except it is list items that are buffered instead of table rows. (*But this may not mean what you think, so keep reading.)

The details have some nuances to them. Let’s explore them, as well as what happens when errors are encountered.

Our focus today is imparting an understanding of how these two common buffer functions work. When to use them is a different question, one we’ll leave for other articles to explore.

Stable Rows/Stable Items

Today’s journey starts by zooming down to focus on how a single value is buffered by these functions….

Scalar Values

For simple values, like text, dates or a number, the idea of the same value being returned each time from the buffer is straightforward.

let
  ComputeValue = () => /* some complex logic returning: */ 25,
  List = { ComputeValue() },
  Buffered = List.Buffer(List) 
    // results in a buffer the equivalent of: 
    // { 25 }
in
  Buffered

Above, when the buffer is populated, the source list is enumerated. When its first item is encountered, that item’s expression is evaluated. ComputeValue() performs its complex logic and outputs 25. This value is saved into the list buffer. Then, for the lifespan of the buffer, that saved value will be returned as the buffered list’s first item, without CompleteValue() being called again.

Probably exactly how you expected the buffer to work.

In the case of a table row containing only scalar values, the behavior is similar: When the row is encountered during buffer population, its column values are computed and buffered. The original row and column expressions will not be evaluated again when the row and its values are read from the buffer.

let
  ComputeValue = () => /* some complex logic returning: */ 25,
  ComputeRow = () => /* some complex logic evaluating to: */ { ComputeValue() },
  Table = #table({ "ColA" }, { ComputeRow() }),
  Buffered = Table.Buffer(Table) 
    // results in a buffer the equivalent of: 
    // #table({ "ColA" }, {{ 25 }})
in
  Buffered

For a records—as well as for all other values—the behavior is exactly the same, and nothing more. That is, Table.Buffer and List.Buffer shallow buffer: they do not recursively descend into a “complex” value’s constituent components in order to buffer them.

Records

In M, a record’s fields are evaluated lazily. The first time a field is accessed, its value is computed. Table.Buffer and List.Buffer do not change this behavior. When the below record (the first and only item in the list) is buffered, the process of buffering does not cause the record’s field expressions to be evaluated.

let
  ComputeValue = () => /* some complex logic returning: */ 25,
  ComputeRecord = () => /* some complex logic evaluating to: */ [A = ComputeValue()],
  List = { ComputeRecord() },
  Buffered = List.Buffer(List) 
    // results in a buffer the equivalent of: 
    // { [A = ComputeValue()] }
in
  Buffered

When a table column contains a record, Table.Buffer behaves the same way: The record itself will be buffered but its field expressions left untouched.

In both cases, the buffered record behaves exactly like it would if it hadn’t been buffered. If and when a field value is needed for the first time, that value will be computed. Buffering does not change the record’s lazy behavior.

Streamed Values

In M, table, list, and binary values are streamed. As such, these values don’t necessarily contain materialized rows, items or bytes (respectively); rather, they only need to know how to produce rows, items or bytes on demand when they are requested to do so. (For more on streaming, see Power Query M Primer (Part 12): Tables—Table Think I.)

Since Table.Buffer and List.Buffer shallow buffer, they do not recursively buffer the contents of nested streamed values.

Below, the table value contained in column NestedTableColumn is buffered. However, this nested table value does not contain materialized rows; rather, it only represents a way to request that those rows be produced on demand. Since Table.Buffer performs a shallow buffer, it does not request the nested table’s rows, so they are not buffered. Even though the table value is buffered, its rows will still be produced on demand each time they are requested because they are unbuffered.

let
  ComputeValue = () => /* some complex logic returning: */ 25,
  Table = #table({"ScalarColumn", "NestedTableColumn"}, {{ ComputeValue(), Table.SelectRows(DbSource, each [Amount] < 25) }}),
  Buffered = Table.Buffer(Table) 
    // results in a buffer the equivalent of: 
    // #table({ "ScalarColumn", "NestedTableColumn" }, {{ 25, Table.SelectRows(DB, each [Amount] < 25) }})
in
  Buffered

The same behavior occurs if a nested column contained a list value or a binary value (the other two type of values that are streamed): the value itself is buffered, but buffering does not enumerate the value’s list items or bytes, respectively, in order to buffer them.

List.Buffer behaves similarly. When this function encounters a list item that is a streamed value (a table, list or binary value), the value is buffered but its contents are left untouched.

Power Query’s Table.Buffer and List.Buffer shallow buffer: they buffer all values, but do not descend into those values to buffer their constituent components (table rows/columns, list items and record field values).

Errors

Not every expression defining a list item, table row or column evaluates to a value. Instead, sometimes those expressions raise an error. Table.Buffer and List.Buffer buffer these errors.

Encountering an item, row or column error doesn’t cause the process of populating the buffer to die. Rather, the error is caught and saved into the buffer:

let
  ComputeValue = () => /* some complex logic evaluating to: */ error "big problem",
  List = { ComputeValue() },
  Buffered = List.Buffer(List) 
    // results in a buffer the equivalent of: 
    // { error "big problem" }
in
  Buffered
let
  ComputeRow = () => /* some complex logic evaluating to: */ error "big problem",
  Table = #table({ "ColA" }, { ComputeRow() }),
  Buffered = Table.Buffer(Table) 
    // results in a buffer the equivalent of: 
    // #table({ "ColA" }, { error "big problem" })
in
  Buffered
let
  ComputeValue = () => /* some complex logic evaluating to: */ error "big problem",
  Table = #table({ "ColA" }, {{ ComputeValue() }}),
  Buffered = Table.Buffer(Table) 
    // results in a buffer the equivalent of: 
    // #table({ "ColA"} , {{ error "big problem" }})
in
  Buffered

When a buffer is accessed, any buffered errors will then be reraised at the exact same point they would have been raised if the original non-buffered list or table were accessed. Buffering does not change the outwardly observable effect of where in the list or table these errors are raised from.

But not every error pertaining to streamed values is an item, row or column error. There is a classification of errors where buffering does change their behavior….

Enumeration Errors

Let’s zoom out from examining how individual list items and table rows are buffered so that we can review the big picture: List.Buffer enumerates all items from its source and saves (shallow buffers) them. Table.Buffer works similarly, enumerating all rows from its source, saving (shallow buffering) them along with their column values.

Seems simple enough…but populating the buffer does have an interesting effect related to a set of errors we don’t often talk about. In fact, the language spec doesn’t even define an official name for these errors. We’ll call them enumeration errors.

Imagine you are standing at the end of a conveyor belt. Next to you is a button which you press whenever you want another box sent down the conveyor to you. Each press of the button represents a request that another list item (for lists) or table row (for tables) be streamed to you.

Inside the box that comes down the conveyor is the expression defining the item or row. Opening the box triggers evaluation of that expression. In turn, this either produces a value or raises an error. These values and these errors are what are buffered by List.Buffer and Table.Buffer.

But errors can also be encountered when you press the button to request the next box. You didn’t open the box and find an error inside; instead, you encountered the error when you were trying to get the next box (which you never received).

Consider the below list. Attempting to “press the button” to request that the third item be streamed dies with an error.

List.Generate(
  ()=> 0, 
  (i) => if i < 2 then true else (error "problem"),
  (i) => i + 1
)

The box never makes it onto the belt so it can’t be streamed to you. You never get to touch or open it. Its item expression never runs. Instead, the attempt at enumerating to it dies with an error.

For an unbuffered list, enumeration errors are raised at the point in the enumeration process where they are encountered. Buffering changes this: Enumeration errors are not buffered, and so not raised at the position where they were encountered. Instead, enumeration errors cause the buffer function itself to die.

Notice the difference below. With the unbuffered list, you can successfully read its first two items, as the enumeration error is not encountered occur until an attempt is made to retrieve the third item. However, when List.Buffer is applied to the list, that function dies. It’s not possible to read just the first two items from the buffer, because no buffer was ever successfully populated.

let
  List = List.Generate(
      ()=> 0, 
      (i) => if i < 2 then true else (error "problem"),
      (i) => i + 1
    )
in
  List.FirstN(List, 2) // returns { 0, 1 }
let
  List = List.Generate(
      ()=> 0, 
      (i) => if i < 2 then true else (error "problem"),
      (i) => i + 1
    ),
  Buffered = List.Buffer(List), // dies here complaining of "problem"
  Result = List.FirstN(Buffered, 2) 
    // List.FirstN is never invoked because computing the value of Buffered raised an error
in
  Result

The equivalent holds true with Table.Buffer: An error encountered when enumerating rows will cause the invocation of Table.Buffer to terminate with that error.

Turns out, the fact that these errors cause the buffer function to die can come in handy in certain nuanced error handling situations. Some enumeration errors are very hard (may even seem impossible) to catch in normal streaming circumstances. By using Table.Buffer or List.Buffer to transition the behavior of these errors so that they cause the buffer function to terminate, they become something that can be handled with straightforward try catch logic.

Triggering Buffer Population

You now know quite a bit about how Power Query’s main buffer functions work, but one last question: When do they populate their buffers?

List.Buffer and, by default, Table.Buffer, eagerly buffer: The buffer is populated when the function is invoked, regardless of whether the buffer’s contents are ever accessed.

Notice that the items in the below list buffer are never accessed; instead, just the type is read off of the buffer. Regardless, the list items are still buffered. The delay encountered before evaluation completes gives evidence that the list is still being enumerated by List.Buffer.

let
  List = List.Generate(
      ()=> 0, 
      (i) => i < 3, 
      (i) => Function.InvokeAfter(() => i + 1, #duration(0, 0, 0, 5))
    ),
  Buffered = List.Buffer(List)
in
  Value.Type(Buffered)

From Power Query’s perspective, the fact that buffering this list produces no benefit is irrelevant. The person who authored the expression asked for the buffering to occur. Power Query obeyed that directive.

The same behavior is the default with Table.Buffer. However, Table.Buffer offers an option to instead only populate if and when the buffered table’s first row is requested for the first time. This can be useful in cases where a base query, which invokes Table.Buffer, is referenced by multiple other queries which don’t all touch the buffer’s contents. When BufferMode.Delayed is used, the buffer will be populated only if needed.

let
  Table = #table(
      { "ColA" }, 
      List.Generate(
        ()=> {0}, 
        (i) => i{0} < 3, 
        (i) => Function.InvokeAfter(() => {i{0} + 1}, #duration(0, 0, 0, 5))
      )
    ),
  Buffer = Table.Buffer(Table, [BufferMode = BufferMode.Delayed])
in 
  Value.Type(Buffer) // buffer's contents not needed so never populated

Note that BufferMode.Delayed is currently only available for Table.Buffer. List.Buffer always eagerly buffers.


Thanks to Curt H. for kindly taking the time to discuss the buffering of errors with me.

3 thoughts on “Exploring Power Query Buffering: How Table.Buffer and List.Buffer Work

  1. Nick Corbin

    Thanks for the insight into what I think are probably lesser known Power Query functions. For a long time I had a seemingly intractable problem with merging data on an index field where the sort order within a subgroup would change in one of the queries during the merge meaning I had incorrect results returned. After much investigation I figured out that using Table.Buffer as the last step in a query prior to merging it solved the problem as the sort order was preserved.

    I would suspect that many users have been in a similar situation and not known how to fix it, or worse assumed everything merged as expected (well you would, wouldn’t you?) and then discovered to their horror issues further down the line.

    Reply
    1. Ben Gribaudo Post author

      Hi Sam, I can’t speak officially for Microsoft, but a couple ideas that come to mind: to prevent buffering more than the user expects and to prevent recursive buffering.

      Imagine you’ve filtered a table of customers down to five rows, then buffer it. Seems like a very small set of data to buffer. However, the customers table comes from a database and so includes a relationship column containing a table of the customer’s orders from the orders table. If `Table.Buffer` deeply buffered, it would “descend” into the orders relationship column and buffer all orders for each of the five customers. Possibly, the user wasn’t expecting to pay the cost of buffering these orders.

      Say that the orders table contains a relationship column pointing to the ordered item’s details in the item master table. Now, buffering the five customers also includes buffering any related item master data. The item master table, in turn, may feature a relationship column containing all orders for the given item, so buffering item master would include all orders across all customers for each of the items that the original five customers ordered.

      Presuming that orders contains a relationship column holding the customers who ordered the respective item, buffering all those orders would then buffer all customers that had ever ordered one of those items. Then, as part of buffering those nested customer tables, all orders any of those customers ever made (for any item) would be buffered.

      Ultimately, the buffering process would never end, because customers has a nested column containing orders, and following that nested column eventually leads to a nested column for customers…and the loop repeats indefinitely.

      Shallow buffering avoids these complexities. 🙂

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *