Tag Archives: Microsoft Excel

Zero Rows Can Bite (part 2): The Mysterious All-Null Row

, , ,

The table you fetch from a web API mysteriously contains an unexpected row with a null value in each column. You manually try the API using a tool like Postman or Insomnia and don’t find any all-null objects in the raw response. Where is this null table row coming from?

Zero rows (again!).

Previously, we dug into refreshes mysteriously dying with the complaint that “column ‘Column1’ of the table wasn’t found” even though no M code or data source schema changes had occurred. Upon investigation, we learned that an insufficiently in some fetch data M code results in it outputting zero columns when the web API returns no rows, which breaks later code that expects the presence of specific columns.

This time, zero rows is again the trigger condition, though it’s not zero rows altogether. Instead, it’s when a web API that returns paged responses returns a page containing no rows. Receiving back an empty page is a real-world possibility. For example, the last page of a response might contain zero rows because the rows that were to have been in it were deleted just moments ago, after the preceding page was fetched.

A common pattern for processing paged responses is to read the various pages into a list, then turn that list into a table, which is then expanded out into the appropriate rows and columns. However, the implementation of this flow sometimes leaves a corner case unaccounted for which leads to the all-null row being present. Unfortunately, such an oversight is present in Table.GenerateByPage (a function commonly used by custom connectors).

Continue reading

New M Feature: Structured Error Messages

, , ,

Why Structured Error Messages?

In the real world, errors are a part of life. If you access and read data from real, in-production systems, sooner or later you will almost certainly encounter errors. While you may be unable to escape their unfortunate reality, at least in the Power Query world, they’re rendered out in an easy-to-read format:

Expression.Error: Bad code 'ABC', problem 'too short'

Easy to read, that is, if you are a human, reading just one error all by itself.

But what if you’re trying to analyze a collection of error messages? Imagine a set of errors like the above, but which are for a variety of different codes and problems (e.g. bad code ‘A235’, problem ‘must contain at least 2 letters’, bad code ’15WA’, problem ‘cannot start with a number’, etc.).

Let’s say you want to summarize these errors, reporting out the count of errors per problem, per bad code. Manually reading errors one at a time no longer cuts it. Instead, you could write code that parses each error message, extracting the text between the phrase bad code ‘ and the following quote character, and between problem ‘ and the following quote character. With the code and problem statement now separately captured, you can use their values to group by or otherwise compute the desired summaries.

Parsing log messages like this this involves coding work. Not only does it take effort on your part, but it is also tricky to get right. For example, the logic described above finds the end of each string it matches by looking for the next quote character. What if a bad code or problem description includes a quote character? The logic we’ve been considering won’t match the entire value. Say, the message starts with Bad code ‘ABC’DEF’. The above logic will miss the second portion of the code (only capturing ABC, not the full ABC’DEF) because it incorrectly assumes that a bad code will never contain a quote. You could address this by writing more robust parsing code, but that’s more work—and this is only one example of the corner cases you may need to handle to accurately parse a family of log messages.

On the other hand, maybe your interest is not analyzing log message parameters, but rather removing them altogether. For data privacy or security reasons, you want sanitized log messages, where parameter values have been stripped out and replaced with generic placeholders. This way, “clean” log messages can be aggregated or retained long-term without the complications that accompany storing PII or other confidential information that may have found its way into error message parameters. While this may be the opposite of our first scenario (extracting message parameters for analysis purposes), implementing it still requires a technical means to differentiate between the base log message pattern (or template) and the parameters that have been filled into it. If you’re implementing this yourself, you’re looking at some form of log message parsing.

In either case, if only there was a way to avoid the effort and complexity associated with writing log message parsing code….

Introducing M’s Structured Error Messages

Meet M’s new structured error message capabilities!

M’s error functionality has recently been expanded to offer a new way of defining error messages, splitting message definition between a template and a list of parameter values. These components are preserved with first class representation in the error after it is raised, enabling error handling code (and, potentially by extension, external logging mechanisms and log analytics tools) to separately work with these components without the need for custom text parsing. This style of error message is known as a structured error message and is key to making structured logging possible.

Continue reading

The Elusive, Uncatchable Error?

, , , ,

The error seems to escape catching. When the expression is evaluated, Query Editor displays the error. Try loading the query’s output into Microsoft Power BI or Excel and the operation dies with an error. Clearly, there’s an error—but if you wrap the expression with a try, the error isn’t caught! To the contrary, the record output by try reports HasError = false, even though if you access that record’s Value field, Query Editor again shows the error.

What’s going on?! Have you discovered an uncatchable error? Is this a Power Query bug?

Continue reading

Zero Rows Can Bite (part 1): The Mysterious Missing Column

, , , , ,

Your Power Query expression is happily skipping along, fetching data from a web API. Happily, that is, until one day its refreshes mysteriously die with the complaint that “column 'Column1' of the table wasn't found“. You haven’t changed any M code. You’ve verified that no schema changes occurred on the external web service. How, then, could a column come up missing?

Power Query error message - Expression.Error: The column 'Column1' of the table wasn't found. Details: Column1

Or, maybe it’s not a missing column, but rather your fetch data code starts outputting an unexpected table row with a null value in each column. You manually try the web API using a tool like Postman or Insomnia and don’t find any all-null objects in the API’s response. Where is this all-null table row coming from?

Both of these unexpected occurrences potentially stem from the same underlying cause. As common M code patterns tend to not properly handle this situation, it is possible (even probable!) that M code you use may leave you susceptible to being bitten by one or the other of these bugs.

Continue reading

Combining Query String Parameters—a.ka. Inclusive Record Field Combining

, ,

Your expression is building a query string for use in a Web.Contents call. Different parts of your code each separately compute parameters that should be included in the final query string. These are provided as “query fragment” records, which all need to be combined into a single, consolidated record that’s then passed to Web.Contents:

ProductCodeFragment = GetProductCodeQueryFragment(), // might return [productId = 123]
LimitFragment = GetLimitQueryFragment(), // might return [limit = 100]
FinalQueryParams = (single, consolidated record containing query parameters from all fragment records, such as ProductCodeFragment and LimitFragment),
Result = Web.Contents("some-url", [Query = FinalQueryParams]

Using Power Query’s built-in functionality, combining the fragment records into a consolidated record is easy, so long as the fragment records each have different fields. If they do, a record merge can be performed using the combination operator (&) or the records can be combined using Record.Combine, both of which produce the same output.

[productId = 123] & [limit = 100] // outputs [productId = 123, limit = 100]]
Record.Combine({[productId = 123], [limit = 100]}) // outputs [productId = 123, limit = 100]

The challenge comes if multiple records contain the same fieldsay, one fragment record contains [productId = 123] while another contains [productId = 456]. Record.Combine and & are exclusive in how they compute the field values they output. When the same field name is present in multiple input records (e.g. both input records contain field productId), the value for that field that’s output will be the value from the last/right-most input record (in this case, productId = 456). The other input record value(s) for that field will be ignored (so in this case, productId = 123 is ignored).

// notice that productId 123 is *not* included in the outputs
[productId = 123] & [productId = 456] // outputs [productId = 456]
Record.Combine({[productId = 123], [productId = 456]}) // outputs [productId = 456]
Continue reading

Power Query M Primer (Part 24): Query Folding II

, , , , ,

Last time, we began a deep dive into the inner workings of query folding. We examined how you can implement foldability using Table.View, ending with a firm grasp on why answering the question “what functions fold?” isn’t simple, but rather depends on the Power Query version, the data connector and possibly even some combination of the particular operation’s parameters and the data set being accessed.

But this isn’t the only “moral of the lesson” to be gleaned from our query folding deep dive….

As part of processing an expression, do you think Power Query communicates just once with each external source? For that matter, does Power Query process your expression verbatim and exactly one time? On query folding: Is it guaranteed to be transparent, producing identical results regardless of whether an expression is processed locally by Power Query, partly folded to source or fully folded to source?

These questions, and their answers, will lead us to more morals to be learned from this continuation of our lesson on query folding!

Continue reading

Power Query M Primer (Part 23): Query Folding I

, , , , ,

Query folding, by now, is a concept you’re likely already familiar with. In short, Power Query’s query folding takes an M expression and translates part or all of it into the data source’s native query/request language (for example, translating M into T-SQL or into an OData request), then asks the source to execute that native request. You wrote M, but query folding offloaded some portion of your expression’s processing to the external data source by rewriting M logic into the source’s native language.

Query folding is a key concept in the Power Query ecosystem. It brings the potential for extremely significant performance benefits. Thanks to mechanisms like indexes and in-memory paging of data, the data source often can find the data of interest much more efficiently than when raw rows are streamed from the source to Power Query for local filtering. The source may also be able to perform other processing, such as aggregation and joins, again with much better performance than Power Query can locally. In addition to these benefits, offloading execution to the source usually reduces the quantity of data that needs to be sent across the wire to Power Query. For reasons such as these, query folding as much as possible is usually more efficient (and so quite desirable) in contrast to having Power Query internally handle all the processing itself.

Perhaps surprisingly, considering its importance, query folding is not part of the M language. You could write your own M mashup engine which is 100% compliant with the language specification without even knowing that query folding exists. How could such a key Power Query concept not be part—a prominent part!—of the language specification?

It doesn’t need to be.

Let’s take another look at query folding. This time, instead of focusing on what it is or why it is advantageous (topics we’ve touched on in the past—see parts 5 & 12), let’s explore how it works. We’ll do this by looking at the general concepts involved with how folding is implemented using Table.View.

The knowledge we gain should help make folding much less mysterious—which should help us write better queries and debug query folding problems. As a bonus, if we ever decide to try custom connector development or feel the need to override (or augment) an existing connector’s folding, what we learn here should serve as a useful starting place.

Let’s get going!

Continue reading

Custom Folding Arbitrary Functions: OnInvoke & Table.ViewFunction

, , , ,

You are working happily away on a Power Query custom data connector (or maybe on a standalone Table.View). Implement OnTake and OnSkip handlers? Check. Implement OnSelectColumns? Check. And on your journey goes, adding functionality by coding up new handlers—that is, until you realize you want to handle folding for a function where there doesn’t seem to be a corresponding handler.

Maybe it’s a standard library function like Table.ApproximateRowCount. Your data source provides a shortcut way to fetch a close-to accurate count and you’d like users of your custom connector to be able to retrieve that count using the standard function that already exists for this purpose. The catch? There’s no publicly documented GetApproximateRowCount handler which you can handle.

Instead, maybe it’s a custom function. Perhaps your data source maintains snapshots of historic data. Your connector users would like to be able to fetch data as it existed at a user-selected point in time by doing something like MySource.AsOf(someTableFromMySource, someDateTime). For this to work efficiently, that method needs to be foldable, but there’s no OnMySourceAsOf handler provided by Microsoft. Are you stuck or is there a way to fold custom methods?

In either case, the challenge is that you want to fold something that doesn’t have a specific handler.

The solution?

Continue reading

Resilient Relative Column Reordering

, , ,
Screenshot showing the "Move > To Beginning" option in Query Editor

There’s this one column you’d always like to appear leftmost in the table. No problem! In Query Editor, you right-click on the column and choose Move > To Beginning, which generates a “Reordered Columns” step for you.

All is well, until down the road when you remove a different, seemingly unrelated column from the table. Your Power Query refreshes start failing, complaining that the removed column is not found.

Column not found error message

You dig into the problem and find that the reordered columns step that Query Editor generated included a hard-coded reference to the now-removed column. To get things working again, you must hand edit this step’s M expression, manually removing the problematic column reference.

Why? Why did you need to remove (by hand, nonetheless!) a reference to a column that you didn’t consciously put there—a reference to a column whose position you didn’t ask Query Editor to reorder?

Relative-less (how sad!)

As it turns out, Table.ReorderColumns—the function that powers Query Editor’s reorder columns feature—does not support relative reordering. Conceptually, you wanted a column reordered to be the table’s first column, but Table.ReorderColumns doesn’t provide a way to simply say “make this one column leftmost”. Instead, any column reordering performed in the UI generates a function call to that method where it’s passed a list of all columns in the table, each in their desired order.

#"Reordered Columns" = Table.ReorderColumns(Source,{"ID", "FirstName", "LastName"})

If one of these columns is later removed, Query Editor doesn’t automatically update the passed in column list, so your code breaks. Ouch! In contrast, adding a new column to the table doesn’t cause Table.ReorderColumns to fail, but this doesn’t mean the experience is painless: the presence of the new column may bump the column you wanted leftmost out of that position.

It would be nice to eliminate these pain points.

Make Do…

Table.ReorderColumns has an optional third argument which can be set to MissingField.Ignore. This suppresses the missing column name error, which keeps the function working even though the column is gone. While this works, it leaves the deleted column’s name in code (code clutter = undesirable). It also doesn’t guarantee that the column you want on the left will stay there when new columns are added to the table.

Surely there’s a better way to do relative reordering….

…Or, Do It Nice!

Let’s see. The pain point is the hard-coded column list that’s passed to Table.ReorderColumns. We’re M code writers. Why don’t we use code to dynamically compute that list and perform the reorder?! We could craft a function that takes a list of just those columns we want leftmost, which then dynamically fetches the table’s current column list and adjusts its order appropriately before passing the result to Table.ReorderColumns.

Something like the below (which includes the bonus feature of also supporting rightmost relative ordering):

let
  Function = 
    (data as table, columnsToOrderLeft as list, optional columnsToOrderRight as list) as table => 
    let
      CurrentOrder = Table.ColumnNames(data),
      ReorderLeft = columnsToOrderLeft,
      ReorderRight = columnsToOrderRight ?? {},
      OrderedColumnsRemoved = List.RemoveItems(CurrentOrder, ReorderLeft & ReorderRight),
      NewOrdering = ReorderLeft & OrderedColumnsRemoved & ReorderRight,
      Reordered = Table.ReorderColumns(data, NewOrdering)
    in
      Reordered,
  FunctionType = 
    type function 
      (
          data as table,
          columnsToOrderLeft as (type {text}), 
          optional columnsToOrderRight as (type {text})
      ) 
      as table
      meta [
        Documentation.Name = "TableRelativeReorderColumns", 
        Documentation.LongDescription = "Returns a table from the input <code>table</code>, with the columns in <code>columnsToOrderLeft</code> appearing leftmost in the order given and the columns in <code>columnsToOrderRight</code> appearing rightmost in the order given. Other columns will not be reordered."
      ],
  Ascribed = Value.ReplaceType(Function, FunctionType)
in
  Ascribed

No more need for a hardcoded list of all column names. No more code clutter when MissingField.Ignore is used and a column is removed. Columns stay in the expected relative order even when new columns are added.

let
  TableRelativeReorderColumns = (code from above),
  Source = ...,
  Reordered = TableRelativeReorderColumns(Source, {"ID"})
in
  Reordered

Hope this helps!