Privacy Levels in Dataflows: Impacted by Staging?

, , ,

Staging data is common in the world of Microsoft Power BI dataflows. Computed entities, linked entities and Microsoft Fabric’s “enable staging” options all result in intermediate output being staged to disk and/or a database.

Does this staging affect privacy levels? Yes! It can change which privacy level is being applied.

Straightforward Combining

Let’s start with a baseline example:

Query dependency diagram showing three queries: 'From A' (load disabled), 'From B' (load disabled), Output (loaded)

Imagine you have two queries, From A & From B, which pull from data sources A and B, respectively. Neither query has its output being staged (e.g. neither is enabled for loading [gen1 dataflows]/enabled for staging [gen2 dataflows]). Another query, Output, references both of these queries, combining between them.

Say the privacy level for both data sources A and B is private. When Output is evaluated, no cross-source folding will take place. Instead, data from the two sources will be combined locally within the realm of Power Query.

Why? Privacy level private disallows the source’s data from being folded into any other source, including other private sources. So far, so good.

Now, Stage the Data!

Query dependency diagram showing three queries: 'From A' (loaded), 'From B' (load disabled), Output (loaded)

Let’s make a change: not to any M code, but simply to set one of the source queries so that it is staged. Specifically, let’s do this to From A (in dataflows gen1, enable loading; in dataflows gen2, enable staging).

With this adjustment, when the dataflow is refreshed, From A will be evaluated and its output staged before Output is run. Moments later, when Output runs, its evaluation won’t result in From A being evaluated or data source A being accessed, as the results from From A have already been staged.

Instead, before Output is run, a temporary behind-the-scenes rewrite takes place. The definition of From A is replaced with logic that simply reads the previously staged data. (Literally, for the duration of Output‘s evaluation, the M code you wrote in From A is temporarily overwritten with M logic that reads from the staging location—e.g. by calling data source function PowerPlatform.Dataflows or Sql.Database.)

When Output runs, your original From A is not run, so data source A is not accessed; instead, previously staged data is read. In contrast, since From B was not staged, Output‘s evaluation still results in data source B being accessed. The net effect is that Output combines previously staged data from source A with live data fetched from source B.

But Which Privacy Levels?

Privacy levels are configured on a per source basis. When staging is enabled or disabled, the source being used changes from the perspective of downstream queries.

We’ve already said both data sources A and B are configured as private, and nothing has changed that. However, when Output is evaluated, the data sources being accessed are not sources A and B, but rather the staging source and source B. So, during Output‘s evaluation, data source A’s privacy level is irrelevant to the firewall, as that source is not being touched.

After data is staged, it is no longer associated with the privacy level of its original data source. Instead, it is “seen” as having the privacy level of the staging data source. Unless the staging source’s privacy level is set to the same as the original data source, expect to see different cross-source folding behaviors depending on whether or not staging is enabled.

For example, in the case of Output, if data source A is private but the staging source is organizational:

  • Staging Off: Data from source A (private) is prevented from folding to source B (private) because a private-level source cannot be folded into any other source, even another private source.
  • Staging: On: Data from source A (private) is staged. Then, when the downstream query (Output) reads from the staging source (organizational), staged data from A is eligible to be folded to source B (private) because organizational can fold into private.

Moral of the Lesson

Understanding which privacy levels are being applied at a given point in time is a prerequisite to ensuring that they are configured appropriately. The impact on privacy levels of the choice to stage or not stage data may not be inherently obviously. At minimum, then, this post serves to convey a point of awareness about a scenario whose implications may not immediately meet the eye.

Of note, this concern only applies to dataflows when staging is involved (e.g. computed entities, linked entities, “enable staging” entities).

It also only applies if the best practice of separately staging data from each source before combining has been deviated from. In contrast, if that practice is followed, then all data is staged before any cross-combining takes place. When that combining occurs, only the staging source(s) is being accessed, so there is no possibility for cross-source folding back to any of the original data sources.

Leave a Reply

Your email address will not be published. Required fields are marked *