Zero Rows Can Bite (part 2): The Mysterious All-Null Row

, , ,

The table you fetch from a web API mysteriously contains an unexpected row with a null value in each column. You manually try the API using a tool like Postman or Insomnia and don’t find any all-null objects in the raw response. Where is this null table row coming from?

Zero rows (again!).

Previously, we dug into refreshes mysteriously dying with the complaint that “column ‘Column1’ of the table wasn’t found” even though no M code or data source schema changes had occurred. Upon investigation, we learned that an insufficiently in some fetch data M code results in it outputting zero columns when the web API returns no rows, which breaks later code that expects the presence of specific columns.

This time, zero rows is again the trigger condition, though it’s not zero rows altogether. Instead, it’s when a web API that returns paged responses returns a page containing no rows. Receiving back an empty page is a real-world possibility. For example, the last page of a response might contain zero rows because the rows that were to have been in it were deleted just moments ago, after the preceding page was fetched.

A common pattern for processing paged responses is to read the various pages into a list, then turn that list into a table, which is then expanded out into the appropriate rows and columns. However, the implementation of this flow sometimes leaves a corner case unaccounted for which leads to the all-null row being present. Unfortunately, such an oversight is present in Table.GenerateByPage (a function commonly used by custom connectors).

Problem Demo

Let’s demo this. We’ll use the common Table.GenerateByPage pattern. For simplicity’s sake, instead of actually making API calls, we’ll use a hard-coded PageFunction to simulate a two-page response sequence, where the first page contains a couple rows and the second is empty.

let
  Table.GenerateByPage =  { paste contents of https://github.com/microsoft/DataConnectors/blob/5104494d70e31633235e207692c46507647ca427/samples/TripPin/9-TestConnection/Table.GenerateByPage.pqm here } ,
  PageFunction = (previous) => 
    let
      Response1 = #table({"FirstName", "LastName"},{{"Joe", "Jones"}, {"Bob", "Brown"}}),
      Response2 = #table({"FirstName", "LastName"},{})
    in
      if previous = null then Response1 
      else if previous = Response1 then Response2
      else null,
    Result = Table.GenerateByPage(PageFunction)
in
  Result

The faked page web API returned a total of two rows, so two rows is what should be output. But it isn’t! Instead, three rows are produced, with the last row containing null in each column.

Table containing an unexpected all-null row

We see the mysterious all-null row—but where did it come from?! The no-rows page is the cause, but it shouldn’t be.

Internally, Table.GenerateByPage works by recursively calling the PageFunction function to retrieve the various pages from the web API. In response to each invocation, PageFunction outputs the current page’s rows as a separate table. These per-page tables are collected into a list, which is then turned into a table. This table contains one row per response page, with the single column for that row holding the response’s per-page table (i.e. a table containing the rows from that page). When Table.ExpandTableColumn is applied to the single column, the individual per-page tables it contains are combined into a consolidated table containing the appropriate number of rows and columns.

At least, that’s what takes place when each per-page table contains at least one row. Instead, if a page’s table contains zero rows, this will be reflected in the table containing the per-page tables as a row whose single column holds an empty table. When Table.ExpandTableColumn encounters this empty table, it won’t expand out its rows, as there are no rows to expand. However, Table.ExpandTableColumn also won’t remove the row holding the empty table. Instead, this row will carry over into the new table that’s output by Table.ExpandTableColumn—where it will appear with all column values set to null.

(The situation is more complex if the first response page contains zero rows, as Table.GenerateByPage detects this and shortcuts to returning an empty table. This results in any rows from subsequent pages being skipped. The fix we’re about to discuss addresses this case, as well.)

Solution 

Now you know how the all-null row got there.

The fix? One option is simply to have Table.GenerateByPage filter empty per-page tables out of its list before that list is turned into a table.

  filteredListOfPages = List.Select(listOfPages, each not Table.IsEmpty(_)),
  tableOfPages = Table.FromList(filteredListOfPages, Splitter.SplitByNothing(), {"Column1"}),

There is a potential catch, however. By applying this fix, you’re potentially adding an extra read of each per-page table (to check to see if it is empty), which could result in extra API calls being made back to the data source. Hopefully, the persistent query cache will help you here, but that’s not guaranteed. Ah—trade offs! Do you want all rows, with no extra all-null rows, when the corner case of a zero row page is encountered, or the guarantee of less API calls?

A version of Table.GenerateByPage updated with this modification can be found in a pull request to the microsoft/DataConnectors repository from several months back. (The referenced PR also incorporates a performance improvement option provided by Microsoft, so brings two benefits with it.)

Leave a Reply

Your email address will not be published.