Why should you concern yourself with how Power Query “thinks” about tables? After all, you write an expression that outputs the table you want, the mashup engine executes it and everyone is happy without you having to think about how the engine does its thing…right? Yes—at least until you encounter performance problems, values change during processing or a firewall error bites—then what do you do?
Understanding how M processes tables is an important asset in developing efficient mashups, avoiding unexpected data variability and keeping the data privacy layer happy. Streaming, query folding, buffering, table keys, native query caching and the firewall—all of these relate to how the interpreter thinks about/processes/handles tables.
There’s so much to cover, we’ll split the list in two. Let’s tackle the first half (streaming, query folding and buffering) in this post and save the remainder (table keys, native query caching and the firewall) for next time.
Let’s get going!
At long last, tables
If you’re like most users, tables are the main reason you use Power Query. You ask Power Query to pull data from one or more sources, mash it up and then return the results in a table which is then handed off to the host application (Microsoft Excel, Power BI, SSAS, SSIS, etc.).
As you might expect, with tables being of such fundamental importance, there’s plenty we could explore. A whole series could be written about the standard library’s table functions. While that would be fun to do, the focus of this series is the M language, not the library, so things related to the language are what we’ll delve into here.
The previous post in this series concluded with the thought that tables are similar in behavior to lists and records but go beyond what we’d get if we tried to build our own simulated tables using those two types. Let’s start by looking at a way in which table and list are similar….
You might be inclined to skip learning about
record and jump directly to
table. After all, as far as types that hold other values go, you mostly work with
table’s data is made up of simple values, so it makes since that you learned about them (parts 6, 7, 8 and 9). However,
record might feel like types you won’t normally use, so why not skip them?
Turns out, a
table behaves in ways like both a
list and a
record. If you want to leverage all that
table offers, learning about these types is foundational. Besides, as you work with M, you just might find that
record are more useful in their own right than you first thought.
Oh, did I mention that there’s a surprising relationship between
let expressions and type
record? All the more reason not to skip over it!
Time to finish up our exploration of primitive, single-value holding types! We’ll learn about handling true and false, explore the oxymoron of
null (a value that represents the absence of a value) and touch on
Ran into this xUnit test failure the other day:
Expected: Boolean [True, False, False]
Actual: SelectListIterator<Process, Boolean> [True, False, False]
Why the failure?
The values displayed for expected and actual match. The only difference visible in the error message are the collection types…but that can’t be the source of failure because xUnit’s
Assert.Equal<T>(IEnumerable<T> expected, IEnumerable<T> actual) doesn’t compare the collection types of its arguments, just elements and their positions.
So why the failure? Continue reading
Temporal—something related to time (not to be confused with the word temporary). M offers several temporal types:
This group of types has a lot in common, so we’ll explore it a little differently than the types we’ve encountered previously. First, we’ll introduce each type and look at its unique facets. Then, we’ll examine how the various types in this family play together (like common functionality they share).
In our exploration of Power Query’s types, number is next!
You might think that working with numbers would be so simple we’d hardly need to talk about them. However, there’s a got-ya that can bite: if you’re not careful, you can end up with arithmetic not producing the results you expect! After we go over M’s syntax for numeric literals, we’ll talk about this potential pain-point and how to not let it cause unexpected complications.
Also, in M, columns can be tagged to identify the specific kind of numbers they contain. Properly setting this subtype can improve performance and storage as well as enhance the default formatting used for the column’s values. We’ll learn how to do this (it’s easy!).
Examine an executing Power Query mashup under a microscope and what to you see? Data pulsing down pathways between expressions. Increase your microscope’s magnifying power. As you zoom in on one of the flows, what you’re viewing transforms from a blurred stream of data into the individual data items that make up that flow.
As you study what you see, you notice that the data items flowing by fit into groupings based on the kind of value they contain: some hold text, others are made up of a date, a time or a datetime; yet others are true/false values, then there are numbers…and it looks like there are even more categories beyond these!
In this post, we’ll begin exploring these categories—we’ll begin exploring the kinds of values supported by the M language. In programming parlance, these kinds of values” are called types.
Types can have special behaviors associated with them. For example, date and time have a special rule around addition: add a date and a time together and you get back a datetime! We’ll investigate these special rules.
Also, we’ll discuss the literal syntax associated with each type. In this context, ”literal” implies that you literally want to write out a value in code. You might, for example, literally want to set variable rate to the hand-selected value of 100. The syntax you use to code up the expression that produces the literal value you want is called literal syntax. Different types have different literal syntax rules. We’ll talk about these rules, as well.
There’s a lot to cover. In this post, we’ll explore the specifics of text (strings). If we explored all of the types in this post, it would get mighty long, so we’ll save the others for later.
Ever rename a table’s column names to transform their style from what makes sense in the database world to what looks attractive in a reporting environment?
Perhaps a data source returns field names like:
user_id, username, first_name, last_name, is_active_user
In your report, you’d like these columns to show up as:
User ID, Username, First Name, Last Name, Is Active User
The obvious way to achieve this is to use the query editor to manually rename each column. For five columns in one table (such as the above example), hand-transforming their names isn’t too tedious. However, imagine a report that uses ten or twenty tables, each with twenty or thirty columns—and it’s a different story.
You’re a developer. You know how to write code that interacts with databases. Using tools like Entity Framework and LINQ, you can push and pull data to and from data sources. But how comfortable are you directly talking to a relational database in its native dialect—SQL? Maybe you can cobble together a basic
SELECT statement—but when you do, do you feel like a traveler in a strange land where you barely know the language?