Power Query M Primer (Part 19): Type System IV – Ascription, Conformance and Equality’s “Strange” Behaviors

, , , ,

At first glance, it seems logical to expect the mashup engine to error if you attempt to ascribe a seemingly incompatible type to a value. Surprisingly, this doesn’t always occur.

During ascription, the engine validates that the type’s base type matches the value’s base type (so, for example, you can’t ascribe type logical onto a date value because the two base types are different). However, for types with children, those child types (list’s item type, record’s field types, table’s column types and function’s argument and return type assertions) have no effect on mashup engine behavior and are not validated for compatibility. What?!

For example, you can ascribe an item type of text onto a list of numbers, and the mashup engine won’t complain. Ascribe a column type of logical onto a column containing dates and M will comply. Similar holds true for records: a field type of duration can be ascribed onto a field containing a function and no error is raised.

Value.ReplaceType({1, 2, 3 }, type { text })
Value.ReplaceType(#table({"Col1"}, {{ #date(2020, 6, 24)}}), type table [Col1 = logical])
Value.ReplaceType([FieldA = () => ...], type [FieldA = duration])

For functions, the newly ascribed type’s argument and return type assertions aren’t validated or enforced; instead, the assertions specified when the function was originally defined continue to be what the engine follows. Take a function argument originally defined as number and ascribe as text to it. Despite the ascription, the mashup engine will expect the argument’s value to be compatible with number, not text, when the function is invoked.

let
  Func = (input as number) as number => input,
  NewType = type function (input as text) as text,
  Ascribed = Value.ReplaceType(Func, NewType)
in
  Ascribed("hi") // errors
  // Ascribed(1) // works fine  

Speaking of things not validated during ascription: Don’t forget what we learned last time about how child component names (record field names, table column names and function parameter names) are not checked, even though bad things can happen when the names on the new type don’t align with the names on the value. M will allow you to, say, ascribe a table type whose first column is named Amount onto a table whose first column is named Amt, even though code that later works with that table may misbehave because of the name mismatch.

These behaviors seem strange—and they aren’t the only strangeness related to Power Query’s types. Comparing type values may also not work the way you expect. Think TypeValueA = TypeValueB will return true if the two types are identical? Maybe. Maybe not!

Fasten your seat belt. We’ll try to define and then clear up a bit of this confusion. It will be a journey! Here we go….

Series Index

Type Equality

If you compare two type values using the equality operator (=), what result should you receive? Why, true if the two values describe the same type and false if they do not, right?

let
  TypeValueA = type ..., // some type value
  TypeValueB = type ... // some type value
in
  TypeValueA = TypeValueB

Turns out, the equality operator is not very useful for type values. It’s usually best avoided when type values are involved.

Let’s think through type equality.

What do you expect the below to output? Why true, of course! type number obviously should be equal to type number, so it seems logical for this expression to evaluate to true.

type number = type number

A challenge with type equality is that the correct result isn’t always unambiguously obvious. Or, to put it in other words, what you’d expect isn’t always what another person would expect. To illustrate, let’s make the scenario a little more complex: Should these two type values be considered equal? Why or why not?

type number = Currency.Type

Some of you are probably saying, “Yes—because Currency.Type is type number with a type claim facet. Since a facet has no effect on mashup engine behavior, it should be ignored when comparing the values. If it’s ignored, the comparison becomes the equivalent of type number = type number and so should return true.” Other readers are probably thinking, “But wait—the two type values are different! Even though the Currency.Type facet on the one does not affect the mashup engine, a type value with that facet is still a different value than plain type number and so comparing the two should evaluate to false.”

Since we might differ in opinion as to the expected behavior, to make equality comparisons practically useful, precise rules need to be authoritatively defined that specify the expected outcomes. The language specification could have done this—but it doesn’t.

Rules (or the lack thereof) 

Instead, basically it says that two type references which point to the exact same type value in memory should be considered equal. Otherwise, the behavior of comparing type values is undefined except that it should be consistent (so if comparing two type values results in true using your version of the mashup engine, comparing those two values will consistently produce true when the same mashup engine is used; if instead your version of the engine reports false when asked to compare the two values, repeating the comparison will consistently produce false). Also, the specification promises that if two type values are considered equal, any value that conforms to one also conforms to the other (so if TypeA can be used to describe value 123 [which is synonymous for saying that this value conforms to that type] and TypeA = TypeB, then TypeB can also be used to describe value 123).

Confusing? Let’s try to unpack those first two statements.

First, comparing a value to itself should evaluate to true.

let
  A = type table [Col1],
  B = A
in
  A = B // true

Variables A and B both reference the same value because both point to the exact same location in memory. You could imagine that when A‘s expression is evaluated, the value defined by the expression type table [Col1] is stored somewhere in memory then that memory address is saved into variable A. When B is initialized, it is set to point to the exact same memory address as is saved in A. Since A and B point to the same memory location, they both are referencing the same value and so are equal.

It’s possible (and common) for two values you might consider equal to be stored in separate memory locations. In this next example, you might think that variables A and B also refer to the same value. Conceptually, this is correct—and usually when we say that values are the same, we’re making a conceptual reference. However, technically (behind the scenes in the mashup engine), the two values are different. They may be defined identically, but in memory they are each stored in a separate location because they were defined as separate values.

let
  A = type table [Col1],
  B = type table [Col1]
in
  A = B // ?

What happens when the equality operation is evaluated here? Something like this: The mashup engine first checks to see whether the two variables point to the same location in memory. If they did, the engine would return true but they don’t so the engine falls back to a second test—equality rules. These are rules the engine uses to determine whether two different values of the given types should be considered equal.

You might encounter a similar situation with an expression like the following:

let
  A = "hi",
  B = "hi"
in
  A = B // true

This expression evaluates to true, as you would expect. Why? Even though A and B are different values in memory (at least, as far as we can see, looking in from the outside), the language specification defines equality rules which say, in essence, “if two values are of type text and are both set to equivalent strings, consider them equal.”

Since for type values the language specification does not define equality rules, it is up to each mashup engine implementation to decide what to return when comparing two different type values. Taking the example from a moment ago where A and B were being compared, with both variables set to type table [Col1]: the result might be true or it might be false. The equality operator has to return true or false but the specification doesn’t say which it should return here, so the result produced is up to the particular mashup engine you’re using—and so could be different between engine versions.

Maybe your engine uses a simplistic strategy which always returns false if the two type values are in different memory locations. Maybe it uses a more advanced approach which considers the details of the two type values (or some portion thereof). Whatever it does, the behavior you see could change the next time your mashup engine is upgraded or if you try to run your M code in a different environment.

Since the Power Query language specification does not define equality rules for type values, the equality operator is generally useless, as far as comparing type values goes.

Do It Yourself

As we saw previously, opinions may differ on whether two type values should be considered equal. For better or worse, instead of defining rules that pick one opinion as the winning way when comparing type values, the Power Query language stays out of the argument. It leaves it to you to compare the aspects of the type values you consider important in your context when checking for equality. This doesn’t mean it’s impossible to compare type values, just you’ll need to do the comparison using standard library functions instead of using the equality operator.

To determine whether a type value is of a certain base type, use Type.Is.

Type.Is(type { number}, type list) // true

While the first argument can be any type value, the second argument must be a nullable primitive type (so not a custom types). Interestingly, the function won’t necessarily error if a non-nullable primitive type is used—but the results it returns in this case should not be considered reliable.

Type.Is(type { number}, type { number }) // don't do this

If all that is important to you is determining whether the type value is of a certain base type, you’re set!

What about complex type details and facets—what if you want to factor some or all of these into your comparison? You’ll need to use the appropriate standard library functions to extract the details of interest and then compare each of them. Unfortunately, the code involved can be a bit complex, in part because there is not always a succinct way to fetch and compare the attributes of interest.

Here’s an example of comparing two types, checking whether they both have a base type of number and the same type claim facet.

(ValueA as type, ValueB as type) as logical =>
let
  TypeClaimFacet = (input as type) as text => Table.SingleRow(Table.Schema(#table(type table [Col1 = input], {})))[TypeName]
in
  Type.Is(ValueA, type number) = true and Type.Is(ValueB, type number) = true and TypeClaimFacet(ValueA) = TypeClaimFacet(ValueB)

As you can see, it took a little bit of code to pull this comparison off. Maybe one day, the standard library will be expanded to include functions that would make comparisons like this simpler to write up.

Nullable Primitive Singletons 

After saying all this, it may come as a surprise to learn that the equality operator will return true when comparing two non-faceted nullable primitive type values which are identically defined. I mention this because you’ll see code on the Internet that attempts to rely on this behavior. Be warned: unless you know exactly what you are doing, dangerous!

type number = type number // true
let
  A = type text,
  B = type text
in
  A = B // true

I can hear you saying, “You just said that equality rules aren’t defined for type values. Why then do these comparisons consistently return true?” Turns out, for nullable primitive types, M currently has an internal optimization that reuses the same in-memory value anytime the same nullable primitive type is defined. (If you come from an object-oriented programming background, you could think of plain nullable primitive type values as singleton object instances.)

Above, this special optimization results in the mashup engine setting A and B to point to the same location in memory. Since A and B reference the same memory address, A and B equal each other.

So is the moral of the lesson that the equality operator is safe to use for nullable primitive types? Unfortunately, not generally.

The above behavior only promises that two plain nullable primitive types will equal when compared if they were defined identically, and not equal if they were not. It makes no guarantee whether comparisons when facets are involved, like type number = Currency.Type, will return true or false. Without being able to rely on a consistent behavior here, the equality operator for nullable primitive types is only potentially helpful when you know facets won’t be present. This eliminates that operator’s usefulness when writing general purpose functions because those functions can encounter faceted types.

Also, this behavior can only be relied on so long as the engine continues to include this internal optimization, which isn’t guaranteed.

Now at least you know why those examples online might seem to work at first glance but actually may be quite dangerous. In general, don’t use the equality operator to compare type values.

(Note: Based on testing I’ve done, it appears there could be some additional type equality rules that are not mentioned in the language specification. If this is true, there may be more cases where using the equality operator between type values could produce practically useful results. I’ve opened an issue asking for Microsoft’s clarification. If clarification would be of interest to you, as well, I’d be most appreciative if you’d add a comment or reaction on that issue—something to let Microsoft know that others are also interested in this being clarified. Thanks!)

Whew! That was a lot…and that was just about type values. Next up is child type checking for values. Hold on! Make sure you seat belt is fastened!

Values

In this post’s introduction, we started by encountering what likely seemed strange behavior: When a type is ascribed to a value, the mashup engine ensures that the type’s base type is the same as the value’s base type, and for values with structure, that the type’s structure matches the value’s—but no checking is done on the type’s child component types or names!

Let’s break this down.

First, the base type of the type being ascribed must be the same as the value’s base type. For example, you can ascribe type number or Decimal.Type (both of which are of base type number) onto a numeric value. However, you can’t ascribe that value with type text because it is of a different base type. Similarly, you can’t ascribe a list type onto a table value or a record type onto a date, etc. (This is probably intuitively obvious.)

Value.ReplaceType(1, type number) // ok
Value.ReplaceType(1, Decimal.Type) // ok
Value.ReplaceType(1, type text) // errors

Then, for types that can have multiple child components (records, tables and functions), the mashup engine preforms an addition check when type ascription is attempted: validating that the structure of the type being ascribed matches the structure of the value it is being ascribed to.

For records, this means that the type being ascribed has the same number of fields as the value. For tables, the type and value must have the same number of columns. For functions, the two must both have the same number of required arguments and of optional arguments. So, for example, if a record value has three fields, attempting to ascribe it with a two field (or four field) record type will fail, as will ascribing a one-argument function with a ten-argument function type.

Value.ReplaceType([A=1, B=2, C=3], type [A = number, B = number, C = number]) // no error
Value.ReplaceType([A=1, B=2, C=3], type [A = number, B = number]) // errors
Value.ReplaceType([A=1, B=2, C=3], type [A = number, B = number, C = number, D = number]) // errors

However, in no case does ascription result in the mashup engine checking child component names or validating whether the value’s child values are compatible with the new type’s child component types.

You can ascribe type { number } to a list of text values and M won’t complain. Ascribe a table containing a single numeric column with a table type whose column is type logical and M will accept the ascription. Similarly, the engine has no problem ascribing a record with a type whose field types are incompatible with the field’s values.

Value.ReplaceType({ "hi", "bye" }, type { number })
Value.ReplaceType(#table({"Col1"}, {{ 1 }}), type table [Col1 = logical])
Value.ReplaceType([FieldA = #date(2020, 6, 30)], type [FieldA = duration])

All of the above are accepted just fine by the mashup engine.

For functions, not only are the new type’s argument and return value assertions not checked for compatibility, but they also have no effect on the function’s behavior. Below, when the function is invoked, its argument will still be required to be compatible with type number even though type text was ascribed onto it. The argument’s type, as it was originally defined, is all that matters to the engine.

Value.ReplaceType((input as number) => ..., type function (input as text) as any)

As to child component names, to recap what we covered on this in part 18 (Type System III – Custom Types): The table column names, record field names or function parameter names on the type being ascribed should positionally match those on the value (for example, for a table, the first column name on the new type should be identical to the value’s first column name, the type’s second column name should match the value’s second column name, etc.). However, the mashup engine doesn’t enforce this, even though name mismatches can lead to bad things happening.

For example, the attempt below to use ascription to “rename” a table column succeeds but results in inconsistent behavior later on.

let
  MyTable = #table({"Col1"}, {{"Joe"}}),
  TypeChanged = Value.ReplaceType(MyTable, type table [Name = any]) // returns a table which appears to have a column named Name
in
  Table.SelectRows(TypeChanged, each [Name] = "Joe") // uses column's new name -- errors: "Expression.Error: The field 'Name' of the record wasn't found."
  // Table.SelectRows(TypeChanged, each [Col1] = "Joe") // uses column's old name - works
   // TypeChanged[Name] // uses column's new name -- works
  // TypeChanged[Col1] // uses column's old name -- errors: "Expression.Error: The column 'Col1' of the table wasn't found."

Why not?

Why doesn’t the mashup engine fully check ascribed types? Why doesn’t it enforce them? What’s the point of ascribing them if the child types they specify are not checked or enforced?

As to why child names are not validated by the mashup engine: I’m not sure. It seems like it could check them (after all, it validates structure, so checking names at the same time seems doable), but it doesn’t. Maybe this is an area where M’s type-related behaviors could be improved in the future.

As to child types, to start sorting this out, let’s go back to something discussed in Part 16 (Types I – Basics):

Power Query is dynamically typed. All checks on whether a value is compatible with a given type are done by looking at the value’s type at runtime. There’s no concept of a variable having a type nor is type casting relevant—the actual value’s type is what matters.

Consider the following example.

let
  Inner = () as any => ...,
  Outer = (input as anynonnull) => ...
in
  Outer (Inner())

In a statically typed language (C#, Java, etc.), Outer(Inner()) would be invalid because Inner would return values of type any while Outer expects values of type anynonnull. Since any is incompatible with anynonnull, passing Inner‘s output into Outer would be illegal.

With PowerQuery, Inner‘s as any does not change the type of what’s returned to type any. Instead, it simply ensures that the type of whatever value is returned is compatible with any. Similarly, Outer‘s as anynonnull does not limit the inputs that function accepts to only values of type anynonnull but rather to only those values which are compatible with type anynonnull.

Say, Inner returns a value of type number. This value passes Inner‘s return assertion of as any because type number is compatible with type any. It also passes Outer‘s argument assertion of as anynonnull because type number is compatible with anynonnull. On the other hand, if Inner returned a value of type null, it would pass Inner‘s as any assertion but the attempt to pass it into Outer would fail because type null is not compatible with anynonnull.

To repeat:

Power Query is dynamically typed. All checks on whether a value is compatible with a given type are done by looking at the value’s type at runtime. There’s no concept of a variable having a type nor is type casting relevant—the actual value’s type is what matters.

With this in mind, let’s jump back to the question about why M does not validate child types during ascription. Since, in M, type compatibility (conformance) is only checked on a per individual value basis, in order to check for type compatibility, it is necessary to compute the value that needs checking.

Take a list. Say you ascribe it with an item type of number: type list { number }. If M were to ensure that the type of each item in the list is compatible with item type number, each list item’s value would need to be computed. Below, this would involve making three function calls: CallServiceA(), CallServiceB() and CallServiceC().

let
  Value = { CallServiceA(), CallServiceB(), CallServiceC() },
  Ascribed = Value.ReplaceType(Value, list { number }),
  Result = List.Count(Ascribed) 
in
  Result// returns 3

However, M is lazy. When the above expression is executed, M doesn’t bother to call the service functions to get values for the list items. It doesn’t need to. Instead, it skips that unnecessary work because it can determine the count of list items (all it was asked to do) without computing the items’ values.

Instead, say the ask was to sum values of all items except for the first. M’s laziness would lead to it skipping over the first item without evaluating its expression. Service A would not be called because the value it would provide isn’t needed; only services B and C would be queried.

Result = List.Sum(List.Skip(Ascribed, 1)) // value for the first item is not computed because it is not needed

If M were to validate the item type when ascription occurs, the ascription process would need to compute each list item’s value then check whether that value is compatible with the specified item type. All values would need to be computed, even those not otherwise needed to produce the requested output. If Power Query switched to this behavior, it couldn’t be so lazy—which would have performance ramifications.

How about an expression that only returns some rows from a table:

let
  Source = GetData(),
  Ascribed = Value.ReplaceType(Source, type table [Category=text, Code=number, Amount=Currency.Type]),
  Filtered = Table.SelectRows(Ascribed, each [Category] = "Misc")
in
  Filtered 

Say that GetData() is a SQL data source. M will probably query fold this entire expression into something like SELECT … FROM SomeTable WHERE Category = 'Misc';.

If M were to validate the ascribed table column types, it would either need to send an additional query to the database server that somehow asked the server to report whether all values in the table are compatible with the specified types (if the source even supports that kind of a query) or, like the previous example, would have to pull all rows locally and then check their values for type compatibility (in this case, even those rows excluded by the [Category] = "Misc" filter, since the ascription applied to all rows). While technically possible, doing things this way could (potentially significantly!) diminish the benefits offered by M’s query folding.

Power Query’s paradigm doesn’t align well with validating child type compatibility. M’s focus is efficient processing of data received from a plethora of potential data sources that may or may not be super strong in how they type their data. One of the tradeoffs with this approach is that it is impractical for the mashup engine to ensure child type compatibility with ascribed child types.

In some ways, child types are similar to type facets: Neither affect the behavior of the mashup engine. Standard library (and custom) functions can make use of the information they convey and the host application may enforce that whatever data it receives is compatible with thembut mashup engine behavior isn’t affected by them.

Type Assertion and Conformance Checks

All of Power Query’s type assertion and value compatibility (conformance) checking functionality—as, is, Value.As and Value.Is, as well as function argument and return type assertions—only work at a nullable primitive type level. If complex type assertions were allowed and enforced, we’d have similar downsides to those that would be present if child types were validated during ascription. Instead, the language limits its assertion and compatibility checks to nullable primitive types, which it can validate without needing to evaluate list item, record field, table row or table column value expressions.

If you want to compare a value’s type’s facets, structure, child names or child types, or validate child values against those child types, the standard library provides functions you can use to implement a solution that meets the specifics of your requirements, with you balancing the precision desired to resource usage, factoring in the particular characteristics and stability of your data source. M forces you to make conscious choices if comparing or validating these aspects is important to you.

Recap 

To summarize:

  • Ascription: M ensures that the new type’s base type matches the nullable primitive base type of the value and that the new type’s and value’s structures are the same. However, child component names and types are not validated. While not enforced, the new type’s child component names should match those of the value (only applies to types with child components, obviously!).
  • Assertion/Conformance: M only checks whether the value is compatible with the given nullable primitive base type. Structure, child component names and child type(s) are not validated.

What’s the point?

Why ascribe specific child types? Since Power Query allows you to specify these types, there presumably is a reason for their existence, though as we’ve learned, that reason is not to ask M to perform validation. What then are the advantages of specifying them?

For one, how about the host application (Microsoft Power BI, Microsoft Excel, etc.)? M may only be concerned about the types on individual values, but host applications may like to know type information on a per-column basis. A tool like Power BI may decide how to store the values it receives for a particular column based on that column’s data type (and then reject any values that are incompatible with that type). Child types give you a way to communicate relevant information to the host.

Also, while the engine itself doesn’t care about child types, library functions and functions you write can factor them in. For example, below the Total column holding the sum of each group has its column type automatically set to match the type of the column it is summing (Amount). That type is read off the source column (Amount) and ascribed as the return type assertion on the aggregate function (each List.Sum([Amount])). Table.Group reads that assertion and uses it to type the column it creates to hold the aggregation’s results. Try changing Amount‘s type and notice how the aggregate column’s type automatically follows this change.

let
  Source = #table(type table [Category=text, Amount=Currency.Type], {{"Misc", 50}, {"Misc", 25}, {"Part", 2.50}}),
  Group = Table.Group(
    Source, 
    "Category",
    { "Total", Value.ReplaceType(each List.Sum([Amount]), type function (input as any) as Type.TableColumn(Value.Type(Source), "Amount")) }
  )
in
  Group

The previous example shows one case of why you might want to ascribe a new type onto a function. Another is documentation: The descriptions and examples displayed for a function in Query Editor come from metadata attached to its type and child types—and attaching metadata (something we’ll cover in a later post) is done through type ascription.

There’s a few reasons, for starts. Keep in mind that you don’t need to ascribe child types if they’re not helpful in your context. If ascribing them doesn’t give you any benefits, then no need to ascribe.

Conclusion

I hope this journey has made M’s “strange” type behaviors less confusing. Whether or not they are the behaviors you’d like, at least now you know what they are and (hopefully) have an understanding as to some of the reasons why they are what they are.

A key concept here is that child types are just claims. M hasn’t validated them and doesn’t enforce them at the mashup engine level. Even though they are unverified, other code and the host application has the option of choosing to trust them and derive value from them.

While it may not seem robust, M’s approach to type enforcement works rather well for Power Query’s purposes. In fact, increasing its robustness significantly would likely become a hindrance to M’s efficiently working with data. Mostly, the complexities around custom types stay out of the way unless and until you want to involve them and then they give you options which you must wield responsibly.

Whew…we’re done with the type system! Congratulations! Don’t worry, though, we’re not done exploring M just yet. More awaits to delve into!

Next up? Hmm…maybe metadata or name organization and scope.

2 thoughts on “Power Query M Primer (Part 19): Type System IV – Ascription, Conformance and Equality’s “Strange” Behaviors

  1. Joe Edwards

    Hi Ben,

    Would the following be the most efficient way of comparing all the types and facets of all columns in two tables in a single operation:

    Value.Equals(Table.Schema(Table1), Table.Schema(Table2))

    Chris Webb highlighted many years ago that you can compare the total data content of two tables using Value.Equals (https://blog.crossjoin.co.uk/2014/01/11/comparing-tables-in-power-query/), however I am unsure whether this returns a deterministic answer, or whether it would be down to the engine version to interpret how this is evaluated?

    Thanks,

    Joe

    Reply
    1. Ben Gribaudo Post author

      Hi Joe,

      Thank you for sharing. Very interesting idea!

      Per the language specification, the equality operator (=) will consider two tables equal if: they contain the same number of columns, the columns have the same names, the tables contain the same number of rows and “each row has equal values in corresponding cells.” Based on this, it sounds like Table.Schema(Table1) = Table.Schema(Table2) should work!

      Keep in mind that Table.Schema’s IsWritable and Description columns are from metadata, not type names or type facets. If desired, these two columns can be excluded from the comparison.

      Ben

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *