Custom Connectors: Populating Table.Schema

,

Table.Schema offers to expose a variety of details about a table’s columns. Your custom connector can leverage this functionality to give users easy access to column-specific details from the external data source about the tables they are fetching.

Table.Schema output

Say that external source allows users to create new columns and give them descriptions. Why not make these descriptions available from within Power Query by having them appear in Table.Schema‘s Description column? Perhaps the type system on the source is a bit different from Power Query’s. Table.Schema columns like NativeTypeName, NumericScale and NumericPrecision can be used to communicate relevant details to your users for reference purposes, enhancing their understanding of the data they’re pulling.

How?

The idea of exposing informative details about a source’s columns is great—but how do you provide this information in a way that Table.Schema will read it? The answer depends on which Table.Schema column you’re trying to populate.

While these techniques may primarily be of interest to custom connector developers, they’re valid to use from any M code—no connector required.

Continue reading

Zero Rows Can Bite (part 2): The Mysterious All-Null Row

, , ,

The table you fetch from a web API mysteriously contains an unexpected row with a null value in each column. You manually try the API using a tool like Postman or Insomnia and don’t find any all-null objects in the raw response. Where is this null table row coming from?

Zero rows (again!).

Previously, we dug into refreshes mysteriously dying with the complaint that “column ‘Column1’ of the table wasn’t found” even though no M code or data source schema changes had occurred. Upon investigation, we learned that an insufficiently in some fetch data M code results in it outputting zero columns when the web API returns no rows, which breaks later code that expects the presence of specific columns.

This time, zero rows is again the trigger condition, though it’s not zero rows altogether. Instead, it’s when a web API that returns paged responses returns a page containing no rows. Receiving back an empty page is a real-world possibility. For example, the last page of a response might contain zero rows because the rows that were to have been in it were deleted just moments ago, after the preceding page was fetched.

A common pattern for processing paged responses is to read the various pages into a list, then turn that list into a table, which is then expanded out into the appropriate rows and columns. However, the implementation of this flow sometimes leaves a corner case unaccounted for which leads to the all-null row being present. Unfortunately, such an oversight is present in Table.GenerateByPage (a function commonly used by custom connectors).

Continue reading

New M Feature: Structured Error Messages

, , ,

Why Structured Error Messages?

In the real world, errors are a part of life. If you access and read data from real, in-production systems, sooner or later you will almost certainly encounter errors. While you may be unable to escape their unfortunate reality, at least in the Power Query world, they’re rendered out in an easy-to-read format:

Expression.Error: Bad code 'ABC', problem 'too short'

Easy to read, that is, if you are a human, reading just one error all by itself.

But what if you’re trying to analyze a collection of error messages? Imagine a set of errors like the above, but which are for a variety of different codes and problems (e.g. bad code ‘A235’, problem ‘must contain at least 2 letters’, bad code ’15WA’, problem ‘cannot start with a number’, etc.).

Let’s say you want to summarize these errors, reporting out the count of errors per problem, per bad code. Manually reading errors one at a time no longer cuts it. Instead, you could write code that parses each error message, extracting the text between the phrase bad code ‘ and the following quote character, and between problem ‘ and the following quote character. With the code and problem statement now separately captured, you can use their values to group by or otherwise compute the desired summaries.

Parsing log messages like this this involves coding work. Not only does it take effort on your part, but it is also tricky to get right. For example, the logic described above finds the end of each string it matches by looking for the next quote character. What if a bad code or problem description includes a quote character? The logic we’ve been considering won’t match the entire value. Say, the message starts with Bad code ‘ABC’DEF’. The above logic will miss the second portion of the code (only capturing ABC, not the full ABC’DEF) because it incorrectly assumes that a bad code will never contain a quote. You could address this by writing more robust parsing code, but that’s more work—and this is only one example of the corner cases you may need to handle to accurately parse a family of log messages.

On the other hand, maybe your interest is not analyzing log message parameters, but rather removing them altogether. For data privacy or security reasons, you want sanitized log messages, where parameter values have been stripped out and replaced with generic placeholders. This way, “clean” log messages can be aggregated or retained long-term without the complications that accompany storing PII or other confidential information that may have found its way into error message parameters. While this may be the opposite of our first scenario (extracting message parameters for analysis purposes), implementing it still requires a technical means to differentiate between the base log message pattern (or template) and the parameters that have been filled into it. If you’re implementing this yourself, you’re looking at some form of log message parsing.

In either case, if only there was a way to avoid the effort and complexity associated with writing log message parsing code….

Introducing M’s Structured Error Messages

Meet M’s new structured error message capabilities!

M’s error functionality has recently been expanded to offer a new way of defining error messages, splitting message definition between a template and a list of parameter values. These components are preserved with first class representation in the error after it is raised, enabling error handling code (and, potentially by extension, external logging mechanisms and log analytics tools) to separately work with these components without the need for custom text parsing. This style of error message is known as a structured error message and is key to making structured logging possible.

Continue reading

The Elusive, Uncatchable Error?

, , , ,

The error seems to escape catching. When the expression is evaluated, Query Editor displays the error. Try loading the query’s output into Microsoft Power BI or Excel and the operation dies with an error. Clearly, there’s an error—but if you wrap the expression with a try, the error isn’t caught! To the contrary, the record output by try reports HasError = false, even though if you access that record’s Value field, Query Editor again shows the error.

What’s going on?! Have you discovered an uncatchable error? Is this a Power Query bug?

Continue reading

HubSpot -> Power BI Connector

, , ,

I’m excited to formally announce a project I’ve been working on over the past several months: a HubSpot connector for Microsoft Power BI!

This connector differentiates itself from a number of the existing HubSpot -> Power BI integration solutions in several ways:

  • Privacy Conscious
    • A common pattern for Power BI integrations is for the vendor to stage your HubSpot data in an intermediate database they operate and control. While there are pros and cons to this at the technical level, at the data protection level this introduces an extra point of exposure: a copy of your data is processed and stored by a third-party.
    • In contrast, this new connector keeps your HubSpot data between HubSpot, Microsoft and systems you control. No intermediate database is used. There’s no need for the connector vendor (my company) to touch your data.
  • Real Time
    • Intermediate databases, as are used by some other integration solutions, introduce a data delay (data staleness). When a staging database is used, getting new data into your Power BI dataset involves first refreshing the staging database (typically on a scheduled basis) then a refresh in Power BI. The net effect is that your Power BI dataset is never refreshed with truly live data.
    • Since no intermediate database is used with this new connector, when a Power BI/Power Query refresh runs, live data is pulled directly from HubSpot. There’s no delay introduced by needing to wait for a HubSpot -> staging database sync to run before your Power BI solution can see the latest data.
  • No Arbitrary Limits
    • Existing integration solutions may be priced based on refresh frequency or entity row counts. For example, you may have to pay a bit more for the ability to refresh hourly vs. a one per day refresh.
    • This connector imposes no arbitrary limits. You can refresh as often as you want, moving as many entity records as you want, without connector licensing limiting you or imposing extra licensing fees. (Of course, the API call limits imposed by HubSpot and the resource limits imposed by Microsoft still apply.)
  • Perpetually Licensed (No Mandatory Monthly/Annual Fees)
    • Many integration offerings are services billed by the month or year.
    • This connector is perpetually licensed. Pay for it up front, and you can use it perpetually. This does mean that if the connector is later revised to add features, and you want to use those new capabilities, there will probably be an upgrade fee involved—but you can make that decision at a time of your choosing and only if you deem the benefits worth the cost.
Continue reading

Zero Rows Can Bite (part 1): The Mysterious Missing Column

, , , , ,

Your Power Query expression is happily skipping along, fetching data from a web API. Happily, that is, until one day its refreshes mysteriously die with the complaint that “column 'Column1' of the table wasn't found“. You haven’t changed any M code. You’ve verified that no schema changes occurred on the external web service. How, then, could a column come up missing?

Power Query error message - Expression.Error: The column 'Column1' of the table wasn't found. Details: Column1

Or, maybe it’s not a missing column, but rather your fetch data code starts outputting an unexpected table row with a null value in each column. You manually try the web API using a tool like Postman or Insomnia and don’t find any all-null objects in the API’s response. Where is this all-null table row coming from?

Both of these unexpected occurrences potentially stem from the same underlying cause. As common M code patterns tend to not properly handle this situation, it is possible (even probable!) that M code you use may leave you susceptible to being bitten by one or the other of these bugs.

Continue reading

Combining Query String Parameters—a.ka. Inclusive Record Field Combining

, ,

Your expression is building a query string for use in a Web.Contents call. Different parts of your code each separately compute parameters that should be included in the final query string. These are provided as “query fragment” records, which all need to be combined into a single, consolidated record that’s then passed to Web.Contents:

ProductCodeFragment = GetProductCodeQueryFragment(), // might return [productId = 123]
LimitFragment = GetLimitQueryFragment(), // might return [limit = 100]
FinalQueryParams = (single, consolidated record containing query parameters from all fragment records, such as ProductCodeFragment and LimitFragment),
Result = Web.Contents("some-url", [Query = FinalQueryParams]

Using Power Query’s built-in functionality, combining the fragment records into a consolidated record is easy, so long as the fragment records each have different fields. If they do, a record merge can be performed using the combination operator (&) or the records can be combined using Record.Combine, both of which produce the same output.

[productId = 123] & [limit = 100] // outputs [productId = 123, limit = 100]]
Record.Combine({[productId = 123], [limit = 100]}) // outputs [productId = 123, limit = 100]

The challenge comes if multiple records contain the same fieldsay, one fragment record contains [productId = 123] while another contains [productId = 456]. Record.Combine and & are exclusive in how they compute the field values they output. When the same field name is present in multiple input records (e.g. both input records contain field productId), the value for that field that’s output will be the value from the last/right-most input record (in this case, productId = 456). The other input record value(s) for that field will be ignored (so in this case, productId = 123 is ignored).

// notice that productId 123 is *not* included in the outputs
[productId = 123] & [productId = 456] // outputs [productId = 456]
Record.Combine({[productId = 123], [productId = 456]}) // outputs [productId = 456]
Continue reading

Power Query M Primer (Part 24): Query Folding II

, , , , ,

Last time, we began a deep dive into the inner workings of query folding. We examined how you can implement foldability using Table.View, ending with a firm grasp on why answering the question “what functions fold?” isn’t simple, but rather depends on the Power Query version, the data connector and possibly even some combination of the particular operation’s parameters and the data set being accessed.

But this isn’t the only “moral of the lesson” to be gleaned from our query folding deep dive….

As part of processing an expression, do you think Power Query communicates just once with each external source? For that matter, does Power Query process your expression verbatim and exactly one time? On query folding: Is it guaranteed to be transparent, producing identical results regardless of whether an expression is processed locally by Power Query, partly folded to source or fully folded to source?

These questions, and their answers, will lead us to more morals to be learned from this continuation of our lesson on query folding!

Continue reading

Custom Data Connectors and You

, ,

Curious about what a Power Query custom data connector can do for you—or for your customers?

Maybe you offer a product or service that exposes data. Would a custom connector help users more easily access your app’s data? Perhaps it’s not your own external offering, but rather a tool from someone else that your teams uses internally. You’re wondering if a custom connector could reduce the pain currently involved when interfacing with it. Or, maybe you’re just curious: You’d like to understand what a custom connector can offer, so that, down the road, you’ll be better positioned to evaluate whether one of these connectors would be an appropriate component to bring into the picture.

So what exactly does a custom connector offer you?

The ability to make interacting with your data source more efficient, convenient and faster to set up from Power Query in Microsoft Power BI.

Continue reading

Recursive Polymorphic Deserialization with System.Text.Json

,

Goal: Deserialize a nested JSON structure, where the objects instantiated are not instances of the given abstract base type but rather are of the appropriate derived types.

In other words, if the specified destination type is AbstractBaseTpe, which should be returned is not an instance of that class but rather an instance of ChildTypeA or ChildTypeB or GrandchildType1, etc., as appropriate. This polymorphic deserialization behavior should hold true at the root level (that is, for the type directly returned by JsonSerializer.Deserialize<AbstractBaseType>(…)), as well as recursively for nested properties (that is, anytime the destination property is of type AbstractBaseType, the object instantiated for the property should be of the appropriate derived type, like ChildTypeA, etc.).

Hopefully, one day support for something along these lines will be built into System.Text.Json. However, for now, when we need it, we have craft our own solution (or borrow one from someone else).

In my case, the other day I needed a simple solution for polymorphic deserialization. Performance wasn’t critical. Just something simple that worked.

Continue reading