What’s With WriteToDataDestination?

, , , ,

If you examine the refresh details for a Power Query gen2 dataflow, you’ll notice that the activity names which are displayed are identical to the query names in your dataflow except that each has a “_WriteToDataDestination” suffix appended to it. What’s going on with this?

Let’s see what we can uncover! Time for some sleuthing….

In your Fabric/Power BI workspace, right-click on the dataflow of interest and choose Export .json (specifically choose this export option, not the Export template button found on the ribbon inside the dataflow editor, because the latter produces different output). Open the JSON document. Find property pbi:mashup.document. Copy and paste its value somewhere, then clean up the formatting (something I’ve done to the code extracts below to make reading easier).

pbi:mashup.document contains a section document. No surprise. As you may recall, what a Power Query tool’s UI calls queries are, behind the scenes, actually section members inside a section document that is normally named Section1.

Looking through this section document, we find the query (a.k.a. section member) we defined in the UI for our dataflow:

shared SomeAmazingQuery =
    let
        SomeAmazingQuery = #table(type table [ID = Int64.Type, Value = text], {{1, "abc"}})
    in
        SomeAmazingQuery;

So far so good—but this expression is not what we’re interested in. Rather, we’re trying to figure out where the “WriteToDataDestination” activity for our query comes from.

Continuing our exploration, we notice that the exported section document contains a number of section members that do not correspond with queries defined in the UI. These additional members are auto-generated by Power Query Online to help facilitate refreshing our query.

The Activity

Among these auto-generated section members is the definition of the WriteToDataDestination activity for our query:

[Staging = "FastCopyStaging"]
shared SomeAmazingQuery_WriteToDataDestination =
    let
        SomeAmazingQuery_WriteToDataDestination = Pipeline.ExecuteAction(
            ValueAction.WithTransaction(
                [Target = SomeAmazingQuery_DataDestination],
                (txn) =>
                    {
                        TableAction.DeleteRows(txn[Target]),
                        () =>
                            TableAction.InsertRows(txn[Target], SomeAmazingQuery_TransformForWriteToDataDestination),
                        Action.DoNothing
                    }
            )
        )
    in
        SomeAmazingQuery_WriteToDataDestination;

Yay! We found where this activity is defined…but what does it do?

In this expression, a pipeline is set up (using Pipeline.ExecuteAction) that contains a transaction (as specified by ValueAction.WithTransaction) which deletes (TableAction.DeleteRows) then inserts (TableAction.InsertRows) rows into the target. (As you might guess, if we reconfigure the dataflow to append instead of replace data, the delete step would not be present.)

Column Mappings

The data being inserted by TableAction.InsertRows is specified by the second argument to that function.

Interestingly, a direct reference to our query’s section member isn’t used, but rather a reference to a “_TransformForWriteToDataDestination” section member:

shared SomeAmazingQuery_TransformForWriteToDataDestination =
    let
        SourceTable = Table.SelectColumns(SomeAmazingQuery, {"ID", "Value"})
    in
        SourceTable;

This expression defines the column mapping between our query (which is given as the first argument to Table.SelectColumns) and the data destination’s target.

If a subset of our query’s columns are configured to be output to the target, only those column names will be included in this Table.SelectColumns.

If the target is an existing table (vs. a new table), there’s the possibility that the column names output by our query will not match the names of the applicable destination columns. If this is the case, the preceding expression will also incorporate a column rename that maps between the two sets of names. (For example, if we mapped our column ID to destination column TransactionID, a rename would be found here for ID -> TransactionID.)

Target Destination

Going back to the _WriteToDataDestination section member, the data destination’s target is defined as SomeAmazingQuery_DataDestination.

This expression returns the table that the output data should be written to. Specifically, in our example (which is configured to output to a lakehouse destination), it navigates to the appropriate lakehouse table and specifies that the table is to be created on demand if it does not exist.

shared SomeAmazingQuery_DataDestination =
    let
        Pattern = Lakehouse.Contents([CreateNavigationProperties = false]),
        Navigation_1 = Pattern{[workspaceId = "1e41b6ee-bba8-4e5e-acda-3f166fb92fcd"]}[Data],
        Navigation_2 = Navigation_1{[lakehouseId = "d258c7e7-e91b-4acf-8691-5c4e6185ccc5"]}[Data],
        TableNavigation = Navigation_2{[Id = "Data", ItemKind = "Table"]} ?[Data]?,
        Table = NavigationTable.CreateTableOnDemand(
            TableNavigation,
            #table(type table [ID = nullable Int64.Type, Value = nullable Text.Type], {}),
            [DeleteWhenEmpty = true]
        )
    in
        Table;

If we’d mapped the destination to an existing table (vs. a new table), the create on demand logic wouldn’t be there, as the table is already supposed to exist. Instead, the above let expression would end with step Table.

I’m not sure exactly what the DeleteWhenEmpty = true bit does, but it doesn’t seem to be present when the output target is an existing table.

Staging

Directing our attention back to _WriteToDataDestination, notice the literal attribute on the section member:

[Staging = "FastCopyStaging"]

While literal attribute Staging‘s value is just a string, somewhere in gen2 dataflow’s refresh subsystem, it appears there is logic that, in a case like this, looks for a section member whose name matches this value.

We can look in the section document and do the same. When we do, we find FastCopyStaging, which references DefaultStaging:

shared FastCopyStaging =
    let
        FastCopyStaging = Pipeline.FastCopyModelStorage(DefaultStaging, [Required = false])
    in
        FastCopyStaging;

shared DefaultStaging = 
    let 
         DefaultStaging = Pipeline.DefaultModelStorage() 
    in 
         DefaultStaging;

This pair of expressions appears to be some set up code related to staging.

Summary

Putting the pieces together, for every user-defined query (section member) in the dataflow that is mapped to a data destination (at least, when that destination is a Fabric Lakehouse), it appears that three auto-generated section members are added behind the scenes:

  • QueryName_TransformForWriteToDataDestination—Restructures the columns produced by the user-defined query to match those that should be output to the destination.
  • QueryName_DataDestination—Contains configuration defining where the destination table is located (i.e. how to find it), and possibly how to create it if it does not exist.
  • QueryName_WriteToDataDestination—The activity executed during the refresh to make the data load happen. Defines the pipeline step(s) needed to make the requested load happen.

If needed, these are supported by a couple additional section members related to staging:

  • FastCopyStaging
  • DefaultStaging

Now you know where the “WriteToDataDestination” activities that appear in the refresh history come from!

Manipulating Data?!

“But wait a minute!” you might be thinking, “I thought Power Query isn’t suitable for performing data modification activities.” True indeed—at least for those of us on the outside (those of us external to Microsoft).

So how does the above work?! Look back at the WriteToDataDestination section member. Notice how the word “Action” appears in several of the identifier names (like TableAction.InsertRows). Type action is a Microsoft-internal expansion of the M language that extends it to be usable for updating data sources. What we’re seeing here is code that you and I couldn’t write, that does something you and I couldn’t get M to (safely) do. This is one of the few cases where those of us on the outside can set type action used in working code.

(For more on type action, see M Mysteries: The Mysterious Type Action—An M-Internal Means to Write Data Modifications to External Systems.)

Internal Details, Publicly Exposed

The details we’ve walked through may be interesting, but don’t get too tied to their specifics. The various auto-generated expressions are internal details which, while Microsoft is letting us see them, can change without notice. Knowing a little about them may help us understand our refresh history log better and satisfy some curiosity, but we shouldn’t read much more into them beyond that. They are subject to change at any time—and, in fact, could have changed between the time I wrote this and when you read it.

Final Notes

Also, note that this exploration was based on a dataflow whose output targeted a Fabric Lakehouse. From what I’ve seen, the code generated when a Fabric Warehouse is targeted is very similar. The section document might be a bit different if the target were another type of destination.

In full, here’s the section document (reformatted for readability):

section Section1;

shared SomeAmazingQuery =
    let
        SomeAmazingQuery = #table(type table [ID = Int64.Type, Value = text], {{1, "abc"}})
    in
        SomeAmazingQuery;

shared SomeAmazingQuery_DataDestination =
    let
        Pattern = Lakehouse.Contents([CreateNavigationProperties = false]),
        Navigation_1 = Pattern{[workspaceId = "1e41b6ee-bba8-4e5e-acda-3f166fb92fcd"]}[Data],
        Navigation_2 = Navigation_1{[lakehouseId = "d258c7e7-e91b-4acf-8691-5c4e6185ccc5"]}[Data],
        TableNavigation = Navigation_2{[Id = "Data", ItemKind = "Table"]} ?[Data]?,
        Table = NavigationTable.CreateTableOnDemand(
            TableNavigation,
            #table(type table [ID = nullable Int64.Type, Value = nullable Text.Type], {}),
            [DeleteWhenEmpty = true]
        )
    in
        Table;

[Staging = "FastCopyStaging"]
shared SomeAmazingQuery_WriteToDataDestination =
    let
        SomeAmazingQuery_WriteToDataDestination = Pipeline.ExecuteAction(
            ValueAction.WithTransaction(
                [Target = SomeAmazingQuery_DataDestination],
                (txn) =>
                    {
                        TableAction.DeleteRows(txn[Target]),
                        () =>
                            TableAction.InsertRows(txn[Target], SomeAmazingQuery_TransformForWriteToDataDestination),
                        Action.DoNothing
                    }
            )
        )
    in
        SomeAmazingQuery_WriteToDataDestination;

shared SomeAmazingQuery_TransformForWriteToDataDestination =
    let
        SourceTable = Table.SelectColumns(SomeAmazingQuery, {"ID", "Value"})
    in
        SourceTable;

shared FastCopyStaging =
    let
        FastCopyStaging = Pipeline.FastCopyModelStorage(DefaultStaging, [Required = false])
    in
        FastCopyStaging;

shared DefaultStaging = 
    let 
        DefaultStaging = Pipeline.DefaultModelStorage() 
    in 
        DefaultStaging;

Leave a Reply

Your email address will not be published. Required fields are marked *