Power Query Security: Evidence Left Behind

, , , ,

A notification pops up on your system: you have a new, red exclamation point, high-priority email. Before you can even read it, your phone rings. It’s your boss. “I just emailed you. We just discovered that column X in our database contains restricted PII that we can’t be storing. The DBAs are deleting the column as we speak. Please be sure it is gone from reports and your system immediately!” You open Microsoft Power BI Desktop, hit refresh, and the column disappears from the dataset because has been removed from the server; then you save the file. But is column really gone, or could there be remnants of its data left on your system?

A prospective client asks you to look at a PBIX file. When you’re done, to minimize liability you don’t want someone else’s data or credentials left sitting on your system, so you delete the file. Are its contents and associated credentials really all gone?

A coworker needs advice on a report, handing it to you on a thumb drive, with the file marked read-only. You open the PBIX file directly from the external drive, give your thoughts then close the file without saving it. Since the drive was external and the file read only, the file was contained exclusively on the flash drive—or was it?

Whether for security purposes, regulatory compliance’s sake, good business practices or liability management, having a sense for whether and, if so, how widely, Power BI spreads report data on your system is a good thing.

Important: This post’s goal is to give you a sense for where data ends up. As I’m not Microsoft, I don’t know all of Power BI’s inner workings, so cannot promise that the below is 100% accurate and complete. If you absolutely must be sure that a piece of data or metadata is eradicated, please open a ticket with Microsoft asking for guidance.

So where might data related to a report end up?

  • Connection Configuration—Credentials, privacy level settings for external data sources and approved native query metadata are all stored on a per-user basis. This is great when you have several reports accessing the same data sources, but also means that deleting those reports doesn’t cause your system to “forget” the names of, or how to access, those sources. You’ll need to manually go into Data source settings and clear them when you want them to go good bye.
  • Caches
    • Query Previews Data Cache—Query Editor’s step result previews are stored in per-user data cache. Deleting the relevant report files doesn’t clear this cache. To manually clear, in Options go to Global > Data Load > Data Cache Management Options.
    • Natural Language Q&A Cache—Power BI’s natural language Q&A feature also uses a per-user (vs. per-report) cache. To clear this cache, in Options go to Global > Data Load > Q&A Cache Options. (Possibly irrelevant if Q&A is disabled for a given report.)
    • Geocoding Cache—Apparently, when Power BI is asked to geocode (e.g. when using a map visual), relevant data may be cached. There is an option to bypass this cache in Options under Global > Diagnostics > Diagnostic Options; however, this option does not claim to clear the cache. Keep in mind that the geocoding process uses an Internet-based service, so figuring out a way to clear the local cache still won’t clear whatever the geocoding provider may have logged when it processed the coding requests. (Related: Details about how Power BI geocodes.)
  • Report Copies
    • Auto Recovery—If enabled, Power BI’s auto-recovery feature automatically saves copies of a report as you work with it. While these copies can be a lifesaver in case of an inadvertent no-save exit or crash, they also can be left behind after the “official” report file is gone. To check, in Options go to Global > Auto recovery, then choose Open auto recovery folder and inspect its contents.
    • Temporary Saves—Power BI also has a temporary save location. Usually, the application automatically cleans up report copies in this folder when it closes. However, a crash can result in temp saves being left behind. There’s no UI option to access this folder; instead, to inspect it, go to %UserProfile%\AppData\Local\Microsoft\Power BI Desktop\TempSaves.
  • Python & R Interchange Data—If you use Python or R to generate visuals or from Power Query, be aware that the data passed to and received back from scripts is staged in CSV files written to temporary folders which are not always automatically deleted. To check for these folders, first identify the temporary storage locations used by the language integrations (each can use a separate directory—in Options go to both Global > R scripting and to Global > Python scripting, and under both choose Change temporary storage location). Then, look in those folders for directories whose names start with RScriptWrapper_ (for R) or PythonScriptWrapper_ (for Python).
  • Analysis Services Workspaces—When a report is open, Power BI sets up a temporary directory as a workplace for the Analysis Services instance it spins up in the background to service the report. Usually, Power BI cleans these workspaces up when it closes; however, on occasion (such as after a crash), workspace directories can be left behind. To check, inspect %UserProfile%\AppData\Local\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces.
  • Logs
    • General Logs/Traces—The many details logged can include data source names (e.g. database server & database name) and text of queries executed against data sources, as well as details about the mashups in the report (e.g. query step names). Like the proceeding, logs are stored on a per-user basis. To clean up, in Options go to Global > Diagnostics > Crash Dump Collection, then choose Clear Traces folder or choose Open crash dumps/traces folder and clean up its contents manually.
    • Query Editor’s Diagnostic Traces—Contained detailed information about the activities of loading a report’s data (e.g. queries executed against external sources, query step names, etc.). Even though each diagnostics file is specific to a particular report, they are not kept in or with the report but instead are stored on a per-user basis. To clean up, in Options go to Global > Diagnostics > Crash Dump Collection, then choose Open crash dumps/traces folder and clear the contents of subfolder Diagnostics. (Note: The Clear Traces folder option mentioned previously does not clean up these diagnostic traces.)
    • Crash Dumps—When enabled, crash dumps will be collected if the mashup engine crashes. I have not inspected the particulars included in these dumps, but in general a crash dump can contain a copy of what was in a process’s memory when it died, such as the data and configuration (e.g. potentially credentials) that the process was then processing. To clean up, in Options go to Global > Diagnostics > Crash Dump Collection and choose Clear Traces folder then choose Open crash dumps/traces folder and inspect its contents to verify. (Enabling/disabling dump collection is also configured in Options under Global > Diagnostics > Crash Dump Collection.)
  • Other—There are a few file paths I’d like to know more about in order to evaluate their relevance to the topic at hand. Perhaps they’re benign as far as this topic goes, but I can’t say for sure.
    • CEF—Power BI uses Chromium Embedded Framework (CEF), storing its cache data on a per-user basis under %UserProfile%\AppData\Local\Microsoft\Power BI Desktop\CEF. Unsure what specifics end up stored in this cache.
    • Crashpad—Power BI also appears to use Crashpad. Unsure whether only basic statistic-like data is all that is ever stored in these files or if they could on occasion include memory dumps. These files are stored outside the main Power BI per-user directory. Location: %UserProfile%\AppData\Local\Power BI Desktop\User Data
    • LuciaIndex—Appears that some kind of data is indexed here (almost 2 GB worth when I tested). Neither of the two clear cache options (mentioned previously) gets rid of this data. Location: %UserProfile%\AppData\Local\Microsoft\Power BI Desktop\LuciaIndex

Part of Power Query’s ease of use and efficiency comes from storing various items on a per user basis. This isn’t a bad thing per se—but it does mean that getting rid of data or metadata that needs to be completely eradicated involves more work than simply deleting the relevant report file. Hopefully, you now have a sense for where report data and metadata may end up on your local system and a starting point for how to perform cleanup, when necessary.

On the other hand, if you know ahead of time that a particular report file’s contents will eventually need to be purged from your system, it may be advantageous to create a special user account just for working with that report, then delete the account and its associated user profile directory when done.

Microsoft Excel

Portions of the above apply to Microsoft Excel, as well. While I am not as familiar with the specifics, here’s a starting point:

  • Since Excel’s Power Query environment doesn’t currently support natural language Q&A, R/Python integration or query diagnostic traces, those items don’t apply.
  • General cache, trace logs and crash dumps can be cleared by going into Excel’s Power Query Query Options to the same settings locations mentioned above.
  • Behind the scenes, the main per-user directory used by Excel’s Power Query seems to be: %UserProfile%\AppData\Local\Microsoft\Office\16.0\PowerQuery.
  • Auto-save copies are kept in %UserProfile%\AppData\Local\Microsoft\Office\UnsavedFiles and in Excel’s UI may be accessed by going to Open > Recover Unsaved Workbooks.

Your Input Requested!

If you can shed more light on any of the above or know of locations I missed, please leave a comment or drop me an email. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *