Power Query M Primer (Part 21): Identifier Scope & Sections

, , , ,

The same identifier name (think: variable name, field name, etc.) can be defined more than once in the same set of Power Query expressions. If you reference an identifier name that’s been defined in multiple places, which of those definitions will your reference point to?

In this post, let’s learn how M sorts this out. We’ll also explore sections—the usually hidden “frame” at the core of organizing the different expressions that make up a Power Query program.

Let’s get to it…and have fun while we’re at it!

Series Index

Identifier Scope

Let’s start really, really simple. For the moment, assume that the below is all that exists in your Power Query world (so there’s no standard library, no other queries and no global environment of any kind present as the mashup engine evaluates the expression).

In this record, what can the expression defining field a “see”? To put it in other words, what variables is the code that defines a able to reference?

[
  a = 1,
  b = 2,
  c = 3
]

a‘s expression can “see” (i.e. is allowed to reference) variables b (whose value is 2) and c (with a value of 3). However, a reference of “a” is not allowed.

In technical terms, an identifier reference in the style of “a” is called an exclusive identifier reference. “Exclusive” implies that the list of identifiers visible to the reference excludes the identifier for the let expression variable or record field currently being defined. Above, the expression defining a cannot contain the reference “a” because a is not in the list of identifiers visible to the reference—because a is the identifier currently being defined.

What happens if you attempt a self-reference? M stops you, raising an error because it can’t see the name you’re trying to use.

[
  a = a * 10 // raise "Expression.Error: The name 'a' wasn't recognized.  Make sure it's spelled correctly."
]

This limitation against self-references is a good thing. Something like the above doesn’t make sense. It would result in infinite recursion.

Inclusive Identifier References

While normally self-references are bad, there is a case where they are not only acceptable but also needful: recursive functions. How do you self-reference here? Use what’s known as an inclusive identifier reference: prepend the variable’s name with an “@”.

An inclusive identifier reference works exactly the same as an exclusive identifier reference, except that the exclusive reference’s “exclude the current let variable or record field” rule does not apply. So, the set of identifiers visible to an inclusive reference includes the currently being defined variable or field.

Below, SumConsecutive uses an inclusive identifier reference (notice the prepended “@”) to reference itself.

let
  SumConsecutive = (x) => if x <= 0 then 0 else x + @SumConsecutive(x - 1),
  Result = SumConsecutive(4)
in
  Result // returns 10

There’s no technical prohibition against using inclusive references where they are not needed (as the below example shows), but doing so doesn’t serve much of a point.

let
  a = 10,
  b = 20
in
  @a * @b // while valid, neither @ is needed here -- a * b works fine and is preferable

Unnecessarily using this style of reference might even cause confusion as the next developer reading your code might see the “@” and then spend time puzzling over why you used it. Recommendation: Only use inclusive references when necessary.

If the nefarious idea comes that maybe you could use a inclusive identifier reference to self-reference outside the context of a recursive function—say, to code up an expression like a = @a * 10—you’re treading on dangerous ice. Even so, M tries to keep you safe here: If you ask the engine to evaluate a cyclic reference that it determines will never terminate, it will raise Expression.Error: A cyclic reference was encountered during evaluation. (Note, though, this error willy only be raised if the engine detects the cyclic reference. If it doesn’t, M will try to evaluate the expression and then may run out of resources.)

Paternal Merge

With these basics out of the way, let’s make things more complex. Jumping back to the initial example, let’s wrap it in an outer record. Below, what can inner‘s a see?

[
  a = 10,
  inner = [
    a = 1, // can see a = 10 (from the outer record), b = 2, c = 3
    b = 2, // can see a = 1 (from the inner record), c = 3
    c = 3 // can see a = 1 (from the inner record), b = 2
  ]
]

inner‘s a can now see an a—but this a is not itself (not a self-reference); rather, it’s the a from the parent record (which has a value of 10). In contrast, the reference to a in the expressions for both inner b and c points to inner‘s a (value = 1). These two expressions “see” inner‘s a, while inner a‘s exclusive reference cannot; instead, that reference is resolved by falling back to a from the parent record.

What’s going on? Identifier references are resolved by first looking locally to check for a correspondingly-named identifier. If a match isn’t found there, the parent is then checked for the identifier.

As an expression can contain child expressions which themselves define identifiers, and those identifiers’ expressions can themselves define children which define identifiers (and so forth), this checking of the parent for the corresponding identifier happens recursively. The check starts locally then works up through each successive paternal layer (parent, grand-parent, great-grandparent, etc.) all the way to the global environment until a matching identifier is found.

The preceding example illustrates this behavior using a record nested inside another record, but this behavior is not exclusive to records—it is universally applicable to all identifiers in M (e.g. applies to let expression variables, references between queries, etc.).

Cross-Query References

Speaking of references between queries, consider a file with two queries in it:

// Data
10

// Query1
let
  Data = 100,
  Result = Data // sees the Data from the let expression (value = 100), not the value from the query named Data (value = 10)
in
  Result // returns 100

Query1‘s Result references identifier Data. There are two identifiers named Data in the file. Which one does Result‘s reference point to? The Data defined in the let expression (value = 100).

Why? Jump back to what we just learned: when two identifiers have the same name, local wins out over paternal. The “winning” Data is the Data that is the sibling to Result in the let expression (so is local), not the other Data (which is defined in a paternal scope).

Sections

The concept of references between queries leads to an interesting thought. In Query Editor, each query appears as a separate entity in the UI. Yet, as the last example illustrates, these visually separated queries can reference each other. To make this possible, there must be some kind of paternal scope—like some kind of hidden super-let expression or record that holds all of a file’s queries.

There is: The parent “container” holding the queries is called a section. It’s defined inside a section document.

For better or worse, section documents are not currently visible as raw code in Microsoft’s consumer-facing query editing tools (e.g. Query Editor). This doesn’t mean they’re not there—just you don’t (usually) get to see them. Instead, behind the scenes, the consumer-facing tools automatically mange them for you (e.g. when you create a query, the tool adds it to the appropriate section; when you update a query, it updates the appropriate place in the section; and so forth).

Nonetheless, sections (even though they’re hidden) affect how M works, so learning about them will help in your understanding of this language. Besides, who knows, maybe one day you’ll be in one of those advanced situations where you need to directly edit a section document (like when building a custom data connector using the Power Query SDK).

Below is what the section document for the previous example looks like:

section Section1;

shared Data = 10;

shared Query1 = let
  Data = 100,
  Result = Data // sees the Data from the let expression (value = 100), not the value from the query named Data (value = 10)
in
  Result // returns 100
;

(You can see this for yourself by creating the two queries in Microsoft Excel. After you’ve saved and closed the file, open it in Data Mashup Explorer, where you’ll be able to view the section document under Package Parts > Formulas > Section1.m.)

A section starts with the keyword section followed by a name for the section, followed by a semicolon. Each section name must be unique in the global environment.

Section Members (a.k.a. Queries)

Query Editor’s UI calls the top-level expressions you define “queries.” However, this term is not always an accurate descriptor, as what you can define isn’t limited to queries: you can also create parameters, functions, etc. Thankfully, sections use a more broadly applicable term for the equivalent concept.

In the section, each “query” from the UI equates to a section member. Looking at the above example, notice that it contains two section members. Each is identified by name, then there’s an equals sign, then the expression for the section member, then a semi-colon.

(You may have noticed that shared precedes both section member’s names. We’ll talk about its meaning shortly.)

Identifier Reference Resolution

As far as identifier reference resolution goes, a section is the immediate parent scope to its section members’ expressions.

Say one of those members is defined as a let expression which includes a reference to Query1. To resolve this reference, M will first look for an identifier with that name in the let expression. If it doesn’t find a match, it will checks the section containing the let expression (because it’s the let expression’s parent), for an identifier named Query1.

section Section1;
	
shared Query1 = …;
	
shared Query2 = let 
  Result = Table.FirstN(Query1, 10)
in 
  Result // since there's no Query1 in this expression, the Query1 in the section is what is referenced
;

The Ubiquitous Section1

The section documents we’ve been examining each contain a single section, named “Section1”. Section1 is the section Microsoft’s consumer-focused Power Query tools use to store the queries you write.

Language-wise, there’s no special significance to this name; it just happens to be what Microsoft decided to use. (You could argue that a more exciting name, say MyReallyCoolPowerQueryCodeThatSolvesTheWorldsProblems, might have been nice, but then, that’s not as professional as the generic, benign Section1, so it’s probably a good thing they went with what they did.)

Storing Invalid Code in a Validity-Required World

A section member’s expression must be syntactically valid and must be, well, an expression.

Since a section document is not an expression, a section member cannot contain a nested section document.

Speaking of valid syntax, in Query Editor, you can set a query’s expression to something that contains invalid syntax (imagine that you’re in the process of coding then need to quit for the day so you save your incomplete expression). How is this erroneous code stored if section member expressions must only contain accurate syntax?

Behind the scenes, your Query Editor escapes any double-quotes in the malformed code then wraps it in a #!"…" literal.

Here’s an incomplete expression:

let "hi!"

In the section document, this translates to the following section member expression:

#!"let ""hi!""";

Literal Attributes

Both sections as well as section members can be decorated with what are called literal attributes. These are a set of literal (hard-coded, not computed) values held in a record-like construct that can be used to attach an extra level of metadata-like data to the section as a whole or to individual section members.

Below, section member MyCustomDataSource.Feed is decorated with attributes DataSource.Kind and Publish.

section MyCustomDataSource;

[DataSource.Kind="MyCustomDataSource", Publish="MyCustomDataSource.Publish"]
shared MyCustomDataSource.Feed = …;

From what I’ve seen, these attributes are only used in a few places, like when Query Editor’s “Custom Function” feature is used (to link the sample expression and auto-updated function together), as well as during custom data connector development (which is where the previous example comes from). At any rate, they currently don’t appear to be applicable outside of limited special cases.

A key differentiator between literal attributes and metadata is the former’s literalness. Literal attributes can be used in cases where dynamically computed values would be costly or inappropriate. Reading literal attributes is cheap, safe and predictable, without the potential for recursive references, calls to external systems, etc., all of which can be encountered with metadata (i.e. because expressions can be used when defining the latter, like e.g. value meta [ IsImportant = CheckWhetherShouldBeImportantToday()]).

As far as syntax goes, think of a record whose field values are constrained to only literal strings, numbers, nulls and logical values, hard-coded lists made up of that same set of literal values, and nested records whose fields are similarly constrained.

Organizational, Not Executable

Prior to our encountering sections, all of the M code we looked at were expressions. Expressions can be executed and either return results or raise errors.

That’s not the case for sections. A section is an organizational structure which helps define the M program. Sections are not themselves executable (and so do not produce values); rather, they contain named expressions which can be executed, if desired. You can’t execute a section, but you (or the tool you’re using) can ask the mashup engine to execute a section member that’s contained in a particular section.

#sections

Now that you know what sections are, what if you want to see the sections that are currently present in your environment?

M defines a special keyword, #sections, that allows you to see all the sections currently in existence. Except in certain advanced scenarios (like when Embedded.Value is used or in the realm of custom connector development), only one section will be present: Section1. (How anticlimactic!)

#sections
Screenshot showing #sections having returned a record containing a single field: Section1

#sections returns a record with one field per section, with the field named after the section. Eachfield value is itself a record which contains a field for each of that section’s members. These fields can be used to access those members’ values.

#sections[Section1]
Screenshot showing the result of #sections[Section1]: a record with one field per Section1 section member

Remember the below example from earlier? Using what we just learned about #sections, if you want to make Query1‘s reference to Data point to the query (a.k.a. section member) named Data instead of to the let variable with the same name, you could change the reference from Data to #sections[Section1][Data].

// Data
10

// Query1
let
  Data = 100,
  // Previously: 
  // Result = Data// sees the Data from the let expression (value = 100), not the value from the query named Data (value = 10)
  // Now:
  Result = #sections[Section1][Data] // sees the query named Data (value = 10)
in
  Result //returns 10 (previously returned 100)

This works, though there is a more preferable syntax option that achieves the same effect….

Section Access Expressions

Instead of #sections[Section1][Data], a section access expression can be used as a shorter way to specify the same reference:

Section1!Query1

Yes, that’s section’s identifier + “!” + section member’s identifier. It works just like #section[Section1][Data], just uses more concise syntax.

Shared Section Members

Above, when sections were introduced, you may have noticed that each section member in the introductory example had a “shared” before its name. What’s this for?

Let’s imagine that one day the Power Query ecosystem allows normal users to create multiple sections. You define one section for your personal “standard library” of handy-dandy helper functions which you copy-and-paste into all your Power Query-powered report files. Then, in those files, you use the normal Query Editor to define various queries (section members), which the UI, per its default, saves into Section1.

Let’s say that in one of those files, the resulting section document looks something like this:

section Section1; // managed by the GUI Query Editor
shared CustomerData = …;
shared OrderData = …;
	
section MyHelpers; // from your copy-and-paste
shared CleanupColumnNames = (input) => …;
shared ConvertFromUnixTimestamp = (input) => …;

How would one of your queries (section members) in Section1 reference a helper in MyHelpers? Based on what we’ve covered so far, we know that a section access expression will work.

For example, let’s say CustomerData (in Section1) wants to use MyHelper‘s CleanupColumnNames. You could achieve this by coding CustomerData to use a section access expression:

let
  Source = Sql.Database(…),
  CleanColumns = MyHelpers!CleanupColumnNames(Source),
  …
in
  Result

This works; however, the little bit of extra typing required to identify the other section (the MyHelpers! part) can be avoided by sharing the referenced section member. Sharing a section member adds it to the global environment, which is the final place checked for identifier resolution.

In our example, MyHelper‘s CleanupColumnNames is already marked as shared, so nothing needs to be changed there. We can simply shorten Section1‘s CustomerData reference to:

let
  Source = Sql.Database(…),
  CleanColumns = CleanupColumnNames(Source),
  …
in
  Result

How does identifier resolution work for the above reference to CleanupColumnNames? M first checks locally for an identifier with the specified name. Finding none, it checks the section that contains CustomerData (that would be Section1). Since a “CleanupColumnNames” isn’t defined there, either, M moves on to check the global environment. Here, at last, it finds an identifier named “CleanupColumnNames”—which is there because it was shared with the global environment by MyHelper.

If this is so convenient, why is it even possible to not share section members? Why not force them to all be always shared?

Not sharing a section member gives you a way to indicate that a value isn’t intended for general consumption. Not sharing also avoids unnecessarily polluting the global environment and the risk of causing a global name conflict.

Suppose in section MyHelpers, you have a member named ToUpper that is only intended for internal use by other code in MyHelpers. Not sharing this method helps convey this intention. While the identifier can still be accessed by code in other sections (i.e. if they were to use #sections or a section member access expression), hopefully the extra work involved will discourage potential consumers by cluing them into the fact that you didn’t intend for them to use the non-shared member.

As to global name conflicts: Suppose two sections both share a member named DoSomething. An expression is being evaluated that references DoSomething which isn’t satisfied by any child scope. When M checks the global scope, it finds two DoSomethings (from the two shares). Which should it use to satisfy the reference? It doesn’t know, so it won’t use either. Instead, it will raise an error. This ambiguity and the resulting error won’t occur if only one of the two DoSomethings was shared.

Don’t shy away from sharing members, but at the same time, don’t needlessly share members that really are intended for in-this-section-only use.

So how do you not share a section member? Simply leave off the “shared” before its name.

section MyHelpers; 

// not shared
ToUpper = (input) => …; 

// shared
shared CleanupColumnNames = (input) => …; 

Global Environment

Several times in the preceding, the term “global environment” has been used. Let’s define this phrase.

The global environment is the last place (the top of the hierarchy) where M looks to resolve identifier references. Normally, it consists of all shared section members from all sections, as well as any identifiers directly injected into the global scope.

Directly injecting things into the global scope…can you do this? Not normally—but Microsoft can and does. Specifically, the standard library gets added into the global environment.

Putting these pieces together, the normal global environment consists of all shared members + the standard library.

#shared

To view the identifiers in the global environment, try #shared. It returns a record with a field for each global identifier. This field’s values can be used to access the values associated with those identifiers.

Result of evaluating #shared

There are some interesting uses for #shared. You could, for example, use it to see which standard library functions your particular install of Power Query includes.

let
  SharedAsTable = Record.ToTable(#shared),
  FilteredToStandardLibraryFunctions = Table.SelectRows(SharedAsTable, each Text.Contains([Name], ".") and Type.Is(Value.Type([Value]), type function))
in
  FilteredToStandardLibraryFunctions 

You could even take the above further and pull the documentation for those functions from metadata into a table….

let
  SharedAsTable = Record.ToTable(#shared),
  FilteredToStandardLibraryFunctions = Table.SelectRows(SharedAsTable, each Text.Contains([Name], ".") and Type.Is(Value.Type([Value]), type function)),
  ExtractTypeMetadata = Table.AddColumn(FilteredToStandardLibraryFunctions, "Docs", each Value.Metadata(Value.Type([Value]))),
  DocumentationToClumns = Table.ExpandRecordColumn(ExtractTypeMetadata, "Docs", {"Documentation.Name", "Documentation.Description", "Documentation.LongDescription", "Documentation.Category"}),
  RemoveValueColumn = Table.RemoveColumns(DocumentationToClumns,{"Value"})
in
  RemoveValueColumn

Standard Library Special Names

Could you ever have a name conflict between a shared section member (e.g. a query you define) and something from the standard library.

No, not normally.

Why? No rule in the language specification prevents this. Rather, what guards against it is a convention enforced by Microsoft’s consumer query editor tools. Those tools don’t allow you to use the period character in a query’s name (i.e. in section member’s name). In contrast, all identifiers defined by the standard library have a period in their name (e.g. Table.SelectRows, List.Count, etc.). Since the UI won’t let you use a period in a section member name, you can’t use the UI to define a section member whose name conflicts with the standard library.

Keep in mind that the period-containing identifier names used by standard library are just that: names. If you come from an object-oriented programming background, this style of naming might lead you to wonder if the name before the period identifies a class and the name after the period identifies particular method on that class. For example, you might be included to wonder if Text.Upper is a reference to the “Upper” method of class “Text”.

Nope! There are no classes or objects in M. Text.Upper is simply a name with a period in it. The practice of the standard library using  periods in its names is a convention solely for the benefit of human developers.

Next Time

Above, we learned that “the normal global environment consists of all shared members + the standard library” (emphasis added) and that we normally can’t inject things into the global scope. Next time, we’ll learn about a case when these normalities don’t hold true—where we have full control over the global environment.

Also, there’s one identifier resolution situation we haven’t looked into—which offers the possibility to code things up that are kind-of, sort-of, maybe like objects.

Up next time: Expression.Evaluate and closures. Until then, happy coding!

Revision History

  • 2021-07-13: Revised section on Literal Attributes to describe their syntax, as well as one use for them.
  • 2021-07-27: Revised Literal Attributes, adding commentary and an additional use case.

4 thoughts on “Power Query M Primer (Part 21): Identifier Scope & Sections

  1. Alex Groberman

    Great post, thanks! In regards to “Paternal Merge”, in the example you gave is there a way to access the outer “a” from “b”? Or would you need to use a let expression or some such to give a different name to that value so that there’s no naming conflict?

    The main use case I’m thinking of is nested “each” expressions, currently if values are needed from both the inner and outer “each”, I long-hand one of the “each”s to “(…) =>” to avoid the conflict.

    Thanks again!

    Reply
    1. Ben Gribaudo Post author

      Great question! There isn’t a way to say “give me the value of a from the previous scope.” Instead, something like you describe (giving the outer value a different name) is a very good approach. You also might be able to access the value by navigating to it from a section member expression (e.g.: Section1!MyQuery[outer][a]).

      Reply
  2. Thomas Edmunds

    Hi – really great series of posts. I am learning a lot!

    Just a quick note on the Literal Attributes of section members. I noticed if you create a linked function via the UI (right click a query and choose “Create Function…”) there is no obvious way to tell that that function is linked to a query. The only way in the UI that I am aware of is to right click the function and choose “Advanced Editor” and it will give you a warning about the function being linked.

    However looking at the underlying code in the Data Mashup Explorer shows there is a Literal Attribute above the function along the lines of [ FunctionQueryBinding = "{""exemplarFormulaName"":""LinkedQuery""}" ]. So at least we can see what goes on behind the scenes!

    Reply
    1. Ben Gribaudo Post author

      Thank you, Thomas. When I initially wrote this post, off the top of my head, I was remembering that linkage as being set up using metadata—but it isn’t; literal attributes are used. Just updated the post to reflect this. Thank you, again, for pointing this out!

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *