Tag Archives: Data Transformation

Power Query M Primer (Part 21): Identifier Scope & Sections

July 12, 2021 •

The same identifier name (think: variable name, field name, etc.) can be defined more than once in the same set of Power Query expressions. If you reference an identifier name that’s been defined in multiple places, which of those definitions will your reference point to?

In this post, let’s learn how M sorts this out. We’ll also explore sections—the usually hidden “frame” at the core of organizing the different expressions that make up a Power Query program.

Let’s get to it…and have fun while we’re at it!

Continue reading →

Migration Thoughts: Raw Data vs. Real Data

June 24, 2021 •

A challenge when migrating data to a new system is that the legacy database may only contain a representation of the old system’s data. Arguably, the actual to-be-migrated data is what a user of the old system sees when using its user interface, reports or APIs. This data isn’t always exactly what is present in the database. Whether purposefully or inadvertently (think: bugs), the legacy system may apply transforms, filtering and such to the database’s raw data before displaying or outputting it. In order to create an accurate export for migration, these data manipulations need to be factored in.

From one migration, I remember an email address table along the lines of the following. This table’s design allows a customer to have multiple email addresses, including multiple primary email addresses (as the example below shows).

EmailId  CustomerId  EmailAddress       IsPrimary
-------  ----------  -----------------  ---------
1        15835       joe@example.com    1
2        15835       jsmith@work.com    1
3        15835       jsmith@home.com    0
4        321163      bob@elsewhere.com  1

If this system’s database were the exhaustive, authoritative source for its data, it would seem reasonable to conclude that all of the table’s email addresses should be included in the dataset to be migrated.

However, in the old system’s user interface, a customer could only have two email addresses: one primary and one secondary. Due to a bug, some email address edits resulted in new address rows being created instead of existing rows being updated. The system, perhaps inadvertently, was smart enough to ignore these old rows, so their presence didn’t cause it problems. Even though these rows were physically present in the database, they weren’t a part of the system’s authoritative data because from the user’s perspective they did not exist; instead, they were data corruptions which needed to be ignored during the export extraction process.

Continue reading →

Power Query M Primer (Part 20): Metadata

March 17, 2021 •

Power Query allows information about a value to be attached to a value.

Ordinarily, these descriptive tidbits are invisible. Their presence does not change the mashup engine’s behavior. However, code (yours or others’, such as code in the standard library or in the host application) can intentionally choose to read and set these annotations, known as metadata, using them as a way to communicate extra information relevant to a particular value. These details can simply be for informational or diagnostic purposes; they can also be factored in by expressions as they determine how to behave.

Take parameters as an example. Parameters get special treatment in Microsoft Power BI’s user interface, yet in Power Query itself, they are stored as ordinary values.

This is good. You shouldn’t need to do anything special to consume a parameter; it should just be a value to your code. Yet parameters are special, but where is that “specialness” (like their configuration settings) hiding? From Power Query’s perspective, the answer is—yes, you guessed it—in metadata.

Continue reading →

Adding an Error Details Column

December 10, 2020 •

Errors here and there, scattered throughout a table—unfortunately, sometimes this is the reality, especially when source data quality is low. To help monitor issues, you decide you want an error details column. At minimum, it should show the name of each column in the row with an error. This will allow you to, say, filter to all rows where column “Item ID” is in an error state. Taking it a step further, it would be really nice if the error column also contained the error message associated with each erroneous column. With this, you could compute statistics like “how many times did we struggle to convert strings into dates”—information which could help identify the most frequently occurring data quality issues and so guide you to which you might want to consider addressing first.

Screenshot of table with error details column added

Continue reading →

Power Query M Primer (Part 19): Type System IV – Ascription, Conformance and Equality’s “Strange” Behaviors

September 3, 2020 •

At first glance, it seems logical to expect the mashup engine to error if you attempt to ascribe a seemingly incompatible type to a value. Surprisingly, this doesn’t always occur.

During ascription, the engine validates that the type’s base type matches the value’s base type (so, for example, you can’t ascribe type logical onto a date value because the two base types are different). However, for types with children, those child types (list’s item type, record’s field types, table’s column types and function’s argument and return type assertions) have no effect on mashup engine behavior and are not validated for compatibility. What?!

For example, you can ascribe an item type of text onto a list of numbers, and the mashup engine won’t complain. Ascribe a column type of logical onto a column containing dates and M will comply. Similar holds true for records: a field type of duration can be ascribed onto a field containing a function and no error is raised.

Value.ReplaceType({1, 2, 3 }, type { text })
Value.ReplaceType(#table({"Col1"}, {{ #date(2020, 6, 24)}}), type table [Col1 = logical])
Value.ReplaceType([FieldA = () => ...], type [FieldA = duration])

For functions, the newly ascribed type’s argument and return type assertions aren’t validated or enforced; instead, the assertions specified when the function was originally defined continue to be what the engine follows. Take a function argument originally defined as number and ascribe as text to it. Despite the ascription, the mashup engine will expect the argument’s value to be compatible with number, not text, when the function is invoked.

let
  Func = (input as number) as number => input,
  NewType = type function (input as text) as text,
  Ascribed = Value.ReplaceType(Func, NewType)
in
  Ascribed("hi") // errors
  // Ascribed(1) // works fine

These behaviors seem strange—and they aren’t the only strangeness related to Power Query’s types. Comparing type values may also not work the way you expect. Think TypeValueA = TypeValueB will return true if the two types are identical? Maybe. Maybe not!

Fasten your seat belt. We’ll try to define and then clear up a bit of this confusion. It will be a journey! Here we go….

Continue reading →

How Power Query Thinks: Taking the Mystery Out of Streaming and Query Folding (Video)

August 26, 2020 •

Continue reading →

Editing Report/Spreadsheet Mashups in VSCode

July 16, 2020 •

Data Mashup Cmdlets import and export Power Query from most Microsoft Power BI and Microsoft Excel files; however, they don’t include a GUI editor. Microsoft’s new Power Query / M Language extension for Visual Studio Code provides a graphical editing experience for mashups, with syntax highlighting, validation, and IntelliSense, but doesn’t read or write PBIX, PBIT or XLSX files.

What if the two could be married? Imagine…if this were possible, you could use VSCode to externally edit the Power Query code saved in your Power BI and Excel files….

Now, stop imagining. Try Edit-DataMashup!

Screenshot showing VSCode being used to edit Power Query from a report file

Continue reading →

Power Query Privacy Levels: Context Is Key

June 24, 2020 •

What is the context, or scope, of this dialog’s settings?

Since there’s more than one category of context in play here, there’s more than one answer. The most obvious contextual category is probably data sets. Looking at settings from the data set scope perspective, “.\sql2017express” is scoped to that entire database server while data set “c:\” covers the entire C: drive.

Data set isn’t the only classification of context that applies to this dialog. Another is at the environment level. How broadly do each of the settings in the above dialog apply? If you open another report, will the settings you set here apply there? What if another user opens this report—how many (if any) of these settings will carry over to them?

Data set and environment are both groupings of context at the technical level. However, technical scopes aren’t the only kind of context present. There’s also the context the user assumes when they encounter the dialog, which can very easily be incorrect (not necessarily because of a fault with the user).

The ramifications of getting privacy levels wrong are significant. It’s very important to correctly configure these levels. Understanding the various contexts applicable to each setting one key to getting these settings right. Let’s take a few minutes to articulate these contextual groupings so we have a better understanding of how these settings apply.

While we’re at it, we’ll bump into a data source that never shows up in this list—but which we should be cognizant of when thinking about privacy levels.

Continue reading →

Shedding Light on the Mysterious Embedded.Value

June 4, 2020 •

“Accesses a value by name in an embedded mashup”—documentation’s terse description of this function raises almost as many questions as it answers. Let’s try to clear up at least some of the puzzle around this function. While we’re at it, we’ll uncover some surprises—did you know this function breaks an M language rule and that it exposes something never otherwise seen (as far as I’m aware) by end-user mashups?

Continue reading →

Power Query M Primer (Part 18): Type System III – Custom Types

June 2, 2020 •

After gaining a solid grasp on type system basics and exploring type facets, it’s now time to learn about complex types (also known as custom types or derived types). Thankfully, constructing these types isn’t very complex. Don’t worry, though, we’ll make up for that simplicity when we talk about how M handles them!

Let’s dive right in. To keep things simple, the focus for this post is syntax and conformance rules. We’ll save most of the discussion about how M works with these types for the next post in this series.

“Conformance—that sounds complicated!” you might be thinking. Actually, no. Just the name makes it sound that way. Saying that a value conforms to a type means that the value can be described by the given type, or to put it in other words, the value is compatible with the type. So, the numeric value 1 conforms to types number, nullable number, anynonnull and any because each of those types can be used to describe that value. Conformance rules, simply put, are the rules used to determine whether a value conforms to—is described by—a type.

Now, on to the custom types!

Continue reading →