Have you ever stopped to think about relationship columns: how they work, when they’re automatically added, and in particular how they’re named?
On that last point: Did you know there is a latent danger where seemingly unrelated changes can break existing M code?
What Is a Relationship Column?
In a nutshell, a relationship column is an automatically added nested join between the table you’re working with and a related table. In the relationship column, for each row, there’s a nested table containing the associated rows from the related table. Thanks to M’s laziness, if the nested join isn’t used, fetching the related table’s row data will be skipped—so the presence of a relationship column whose values are unneeded does not incur an appreciable cost.
Prerequisites
Several prerequisites must be met in order for relationship columns to be added:
- Connector Support: The data connector being used needs to support generating these columns. This feature is connector-specific, and not every connector offers it.
- Option Enabled: If the connector has an option for turning this feature on or off, the feature must be enabled (obviously!).
- Discernable Relationships: The data connector needs to be able to discern the relationships for the table you’re working with. You may know that tables Child and Parent are related via Child‘s column fParentID and Parent‘s column ParentID, but unless the data connector has a way to know this, it won’t be able to auto-generate relationship columns reflecting it. Typically, in order for a relationship to be discernable by the data connector, it must be defined in the data source’s schema/metadata. For example, in the case of a Microsoft SQL Server, this is done by defining a foreign key relationship between the relevant tables on the server.
Columns Generated
When the prerequisites are met, relationship columns will be generated.
Imagine you have SQL Server database table, dbo.Person, with a foreign key pointing to table dbo.Account (so a Person -> Account
relationship). In Power Query, if you fetch table Person, you’ll notice that it includes a relationship column for Account.
Relationship column generation also occurs* for the opposite side of the defined relationship. The presence of the foreign key on Person pointing to Account is used to infer the reverse relationship, as well (that is, an Account -> Person
relationship). If you fetch table Account, a relationship column to Person will be included.
(*At least, this is true for the Sql.Database
and PostgreSql.Database
connectors, which are the two I tested with. As the relationship columns feature is connector-specific, it’s possible that this behavior, or other behaviors mentioned in this post, could vary for other connectors.)
Column Naming
Typically, the relationship column’s name is simply that of the related table, possibly prefixed by that table’s schema. The preceding Person -> Account
relationship will be represented in table Person by an auto-generated relationship column named Account.
Notice that no schema prefix was included in this column name. This is because the Sql.Database
data connector doesn’t prefix the names it generates for objects in SQL Server’s default schema of dbo. (In contrast, apparently, the PostgreSql.Data
base connector always schema-prefixes the relationship column names it generates, even for tables in PostreSQL’s default schema of public.)
Multiple Relationships -> Same Table
It’s possible for more than one relationship to exist between two tables. Say that Person contains two relationships to Account, one to associate the person with a company (Person fAccountID_Company -> Account AccountID
) and the other to relate them to the school they attended (Person fAccountID_School -> Account AccountID
).
With this, the data connector needs to generate two relationship columns for each of the tables. Simply naming these after the related table doesn’t work, as a Power Query table cannot contain multiple columns with the same name (e.g. two relationship columns named Account can’t both be added to table Person). To resolve this quandary, a different naming format is used. It starts out as “related table name(comma-separated list of relationship column names from the local table)
“. If this would result in multiple columns being added with the same name, an incrementing numeric suffix is added to all but the first name that’s generated.
In the case of our last example, this results in column names Account(fAccountID_School) and Account(fAccountID_Company) being added to table Person
. Going the other way around, table Account gets columns Person(AccountID) and Person(AccountID) 2 added. (Notice the numeric suffix on that last name. This is necessary to differentiate between the two relationships, which otherwise would have the same column name, as both relationships point to the other table using the local table’s AccountID column.)
Column Name Already Present
It’s possible that the name that normally would be auto-generated for a given relationship is already in use in the table. That is, the data source could already contain a column with this name. When this is true, relationship column name generation falls back to the “related table name(comma-separated list of relationship column names from the local table) possibly followed by a numeric suffix
” format (described above), with the numeric suffix incremented, if necessary, until an unused column name is found.
Going back to the simple, single Person -> Account
relationship we started with: As we already know, normally, this relationship will be represented in table Person with an auto-generated column named Account. However, say Person already has an Account column. In that case, the name Account(AccountID) will be used for the relationship column, unless that column name already exists in the table, in which case, Account(AccountID) 2 will be used—and if that column exists, then the numeric suffix of 2 will be changed to 3, and so forth, until a usable column name is found.
Custom Names
If you’d rather define your own naming formula for these columns, check whether the data connector you’re using allows you to provide a NavigationPropertyNameGenerator function.
This callback allows you to take full control of relationship column name generation. It’s invoked when relationship column names need to be generated. Two arguments are passed to it: a list of columns already in the table and a list of records where each describes a relationship that needs a column name. These record are in the shape of [SourceTableName = text, SourceKeys = {text}, TargetTableName = text, TargetKeys = {text}, SingleTarget = logical]
. The function’s expected return value is a list of column names, one per relationship.
A Hidden, Lurking Potential Ouch
Did you catch the latent risk stemming from the default relationship column naming rules? Adding or removing a seemingly unrelated column on the data source could break existing M code by causing relationship column names to change. The same holds true when a relationship definition (like a foreign key) is added or removed from the data source.
Imagine table Order table with a single relationship to table User. Power Query represents this in Order as a relationship column named User. You write M code that refers to this column by that name. So far, so good.
Down the road, on the server, the database administrator adds a column to Order named User (say to capture the username of the staff member who last modified the row). No column by that name exists in the physical table on the database, so—from the DBA’s perspective—adding it seems like a safe change to make.
However, due to this addition, the auto-generated relationship column name in Power Query will change from User to something like User(UserID) to prevent it from conflicting with the new physical database column. M code you previously wrote which referenced column User still references the column with that name, but the column by that name is now the newly added User column, not the expected relationship column.
The seemingly safe addition of a new database column led to your existing code referencing the wrong column, which causes it to behave in unexpected ways—in essence, the addition broke existing M code.
Say you address this by revising your M expressions to reflect the relationship column’s new name. At a later point in time, the decision is made to drop the idea of the physical User column. When it’s removed from table Order, the Power Query-generated relationship column name pointing to table User switches from User(UserID) back to just plain User. In turn, this breaks any M code that references the parenthesized name, as that name is no longer being used.
Similar effects are possible if a foreign key relationship is added or removed.
In all of these cases, when properly thought through, the necessary M code modifications can be anticipated and done in sync with the database changes—but this requires proper planning, coordination and code changes.
Alternately, most of the risk here can be avoided by employing a custom NavigationPropertyNameGenerator which generates long-form names that include the other table’s name and the keys from both sides of the relationship (for example, in table Orders, imagine a relationship column named User(fUserID->UserID). With this style of naming, adding or removing other columns or relationships between the two tables is very unlikely to result in the generated name needing to change.