Did you know that Power Query can convert (some) SQL into M code?!
No, I didn’t mean that the other way around. It’s true that Power Query can query fold M into SQL (M -> SQL), but that’s not what I’m referring to here. M also has some capability to take a SQL statement and translate it to Power Query code (SQL -> M).
(Disclaimer: As fun as SQL -> M may look, this technique is best not relied upon in production, for reasons that will be explained. However, exploring this functionality—especially the possible why behind its presence—may have educational benefits.)
Without further ado, let’s convert SQL to M:
let
ReferenceTables = [Department = Department, Company = Company],
SqlQuery = "
SELECT d.dept, c.name
FROM Department d
LEFT JOIN Company c ON d.id = c.id
",
TranslatedToM = SqlExpression.ToExpression(SqlQuery, ReferenceTables)
in
TranslatedToM
Outputs:
(environment) => Table.RenameColumns(let
t56319 = let
t56316 = Table.PrefixColumns(environment[Department], "d"),
t56317 = Table.PrefixColumns(environment[Company], "c"),
t56318 = Table.Join(t56316, {"d.id"}, t56317, {"c.id"}, JoinKind.LeftOuter, null, {Value.NullableEquals})
in
t56318,
t56322 = Table.SelectColumns(t56319, {"d.dept", "c.name"})
in
t56322, {{"d.dept", "dept"}, {"c.name", "name"}})
(Note the numbers in the “tXXXXX” names appear to be randomly generated, so may change each time the above is executed.)
The enabler of this feat is the standard library’s SqlExpression.ToExpression
. Pass it a string containing a SQL statement as well as a record with a field for each table referenced in the SQL (field name = table’s name in the SQL statement; field value = actual M table containing the corresponding data). In return—assuming the method understands your SQL—you’ll receive back a string of M code defining a function that contains the equivalent logic.
SQL -> M -> SQL
Taking this a step further: You can execute the generated code! Pass it to Expression.Evaluate
. You’ll receive back a function value. Invoke this and you’ll receive back the data of interest. You’ll need to pass Expression.Evaluate
a reference to #shared
, or at least to those standard library methods that the generated function uses. Also, when invoking the returned function, you’re expected to pass it a record that references the tables used (similar to what you passed to SqlExpression.ToExpression
). All of this is covered by the single line of code below.
Expression.Evaluate(TranslatedToM, #shared)(ReferenceTables) // returns a table that's the result of evaluating the M expression which was generated from SQL
The SQL we provided was converted to M then executed. Wow! If the data source used supports folding, it’s possible that the executed expression was folded back to the source using its native query language. Wow (again!!)!
We wrote SQL, which was translated to M code, which (potentially) was then translated to the source’s particular local flavor of SQL (or whatever native language it speaks). SQL -> M -> SQL?!?!?!?! This is pretty wild!
Why?
Does this mean we stop writing M code and just write everything in SQL then let Power Query do the translation? No, no, nothing crazy like that.
SqlExpression.ToExpression
is one of those mysterious methods which is listed in documentation with no details given about its purpose or describing how to use it. This clues us that it’s probably intended for Microsoft’s internal consumption—which means that it’s behavior could change at any time without notice and there’s no guarantee that what we currently think we know about how it works is complete or accurate. So, while it’s technically possible to use in production, doing so is not advised and is at your own risk.
Why, then, does Power Query have this SqlExpression.ToExpression
? What internal use is make of it? I can’t say authoritatively, but I have a guess:
The way I understand it,* behind the scenes Microsoft Power BI’s Direct Query uses Microsoft Analysis Service to generate data fetch queries. For some data sources (like Azure SQL), the queries are generated in the source’s native query language and so can be used directly with the source. For other sources, SQL queries are generated. These are then translated to M code which is executed to fetch the desired data. My guess is that SqlExpression.ToExpression
exposes (at least a portion of) the SQL -> M functionality that is used (at least sometimes) for this translation.
(*While this is my current understanding, I can’t guarantee that it’s 100% accurate; it’s possible that it could be way off. Even if it is spot-on, it could be out of date by the time you read this, as the internal interworking of DQ are subject to change at any time. So, take it with a grain of salt.)
Example
You can watch this in action by pulling the Microsoft Power BI performance logs for a Direct Query interaction with a source whose native language isn’t directly spoken by Analysis Service’s DQ engine. In the below example, a Postgres source is used.
Notice that:
- Event
SqlTranslator/SqlParser/Parse
shows the SQL from Analysis Service’s DQ engine. The SQL here appears to line up exactly with the SQL returned from tracing Analysis Service (e.g. the SQL shown in DAX Studio under Server Timings).
SqlTranslator/SqlParser/Parse
{
"Start": "2021-07-13T17:35:37.1733796Z",
"Action": "SqlTranslator/SqlParser/Parse",
"HostProcessId": "9220",
"PartitionKey": "Expression/Expression",
"SQL": "
SELECT
TOP (1000001) [t0].[dept],[t1].[name],
COUNT_BIG(*)
AS [a0]
FROM
([Department] AS [t0]
LEFT OUTER JOIN
[Company] AS [t1] on
(
[t0].[id] = [t1].[id]
)
)
GROUP BY [t0].[dept],[t1].[name] ",
"ProductVersion": "2.94.921.0 (21.06)",
"ActivityId": "170d87c3-29e1-4398-af55-17abe11d8eb3",
"Process": "Microsoft.Mashup.Container.NetFX45",
"Pid": 8752,
"Tid": 1,
"Duration": "00:00:00.1241158"
}
SqlExpressionTranslator/Translate
shows the SQL’s M code translation. Passing the preceding SQL toSqlExpression.ToExpression
produces equivalent M code (identical, except for the random numbers used in the “tXXXXX” names).
SqlExpressionTranslator/Translate
{
"Start": "2021-07-13T17:35:37.2978885Z",
"Action": "SqlExpressionTranslator/Translate",
"HostProcessId": "9220",
"PartitionKey": "Expression/Expression",
"IsRecognized": "True",
"Result": "(environment) => Table.RenameColumns(let
t772 = let
t769 = Table.PrefixColumns(environment[Department], \"t0\"),
t770 = Table.PrefixColumns(environment[Company], \"t1\"),
t771 = Table.Join(t769, {\"t0.id\"}, t770, {\"t1.id\"}, JoinKind.LeftOuter, null, {Value.NullableEquals})
in
t771,
t778 = Table.Group(t772, {\"t0.dept\", \"t1.name\"}, {{\"a0\", (t773) => Table.RowCount(t773)}}),
t779 = Table.SelectColumns(t778, {\"t0.dept\", \"t1.name\", \"a0\"}),
t781 = Table.FirstN(t779, 1000001)
in
t781, {{\"t0.dept\", \"dept\"}, {\"t1.name\", \"name\"}, {\"a0\", \"a0\"}})",
"ProductVersion": "2.94.921.0 (21.06)",
"ActivityId": "170d87c3-29e1-4398-af55-17abe11d8eb3",
"Process": "Microsoft.Mashup.Container.NetFX45",
"Pid": 8752,
"Tid": 1,
"Duration": "00:00:00.0947288"
}
- Lastly, the appropriate data source execute event (in this case,
Engine/IO/Db/PostgreSQL/Command/ExecuteDbDataReader
) shows the native query request produced by executing the generated M code. Query folding is evident in the PostgreSQL.
Engine/IO/Db/PostgreSQL/Command/ExecuteDbDataReader
{
"Start": "2021-07-13T17:35:37.5923585Z",
"Action": "Engine/IO/Db/PostgreSQL/Command/ExecuteDbDataReader",
"ResourceKind": "PostgreSQL",
"ResourcePath": "localhost;postgres",
"HostProcessId": "9220",
"PartitionKey": "Expression/Expression",
"CommandText": "select \"rows\".\"t0.dept\" as \"dept\",
\"rows\".\"t1.name\" as \"name\",
count(1) as \"a0\"
from
(
select \"$Outer\".\"t0.dept\",
\"$Inner\".\"t1.name\"
from
(
select \"_\".\"id\" as \"t0.id\",
\"_\".\"dept\" as \"t0.dept\",
\"_\".\"emp_id\" as \"t0.emp_id\"
from \"public\".\"department\" \"_\"
) \"$Outer\"
left outer join
(
select \"_\".\"id\" as \"t1.id\",
\"_\".\"name\" as \"t1.name\",
\"_\".\"age\" as \"t1.age\",
\"_\".\"address\" as \"t1.address\",
\"_\".\"salary\" as \"t1.salary\",
\"_\".\"join_date\" as \"t1.join_date\"
from \"public\".\"company\" \"_\"
where \"_\".\"age\" < 50
) \"$Inner\" on (\"$Outer\".\"t0.id\" = \"$Inner\".\"t1.id\")
) \"rows\"
group by \"t0.dept\",
\"t1.name\"
limit 1000001",",
"CommandTimeout": "600",
"Behavior": "Default",
"ProductVersion": "2.94.921.0 (21.06)",
"ActivityId": "170d87c3-29e1-4398-af55-17abe11d8eb3",
"Process": "Microsoft.Mashup.Container.NetFX45",
"Pid": 8752,
"Tid": 1,
"Duration": "00:00:00.0020264"
}
Conclusion
All this said, SqlExpression.ToExpression
probably isn’t something you should use in production code–but now, at least hopefully, it is a less mysterious entity and you have a better understanding of how Direct Query works.
If you’re adventurous, next time you need to port a SQL statement to M, you could try using SqlExpression.ToExpression
to generate a starting code draft for you to polish and refine. How well this works (or if it works at all) will likely depend on how closely your SQL aligns with how Analysis Service generates its SQL. Might be a fun experiment to try!
Credits
Thanks goes to a comments exchange on Lars Schreiber’s The Self-Service BI Blog for the basic details of SqlExpression.ToExpression
‘s arguments.
Hi Ben,
Thanks for the great write-up. I have a use case of joining two tables in power query based on multiple column including one column with between clause of another column. Does that mean I can achieve that using complex SQL queries inside M editor in Power Query.
Let me know if you think we should achieve it in any other better way.
Best,
Triparna
In Power Query, you could try a
Table.Join
orTable.NestedJoin
that joins all the appropriate columns except for the one where you need the between-like behavior, then write the between logic into aTable.SelectRows
filter. My guess is that the between-written-as-a-SelectRows filter, if it folds, will be written into the generated native query as aWHERE
clause—but can still be used to achieve the same effect as aBETWEEN
predicate.Alternately, you could use
Value.NativeQuery
to explicitly specify a raw SQL statement that is written exactly the way you’d like it to be.Ben, I followed this but got stuck at the last step. Do you mind taking a look please?
The M code output by
SqlExpression.ToExpression
(below) specifies that the join should be done usingValue.NullableEquals
as the equality comparer.Apparently, M doesn’t support this when executing the join locally (i.e. it can currently only be used when folding occurs).
If you edit the generated M so that it doesn’t set the comparer (that is, remove “
, {Value.NullableEquals}
“), it runs fine locally.Many Thanks Ben !!! Do you know which SQL can we use in PQWRY? Is it SQLite..It will certainly make my life easy
Likely, it’s a subset of T-SQL.
Ben, I have been experimenting with this since I discovered this. But can you please confirm it works in dataflow. My test says it does not work. If possible, can you please test it out and suggest if there is a workaround.
I tried the following
/*Query 1*/
which is same as
/*Query 2*/
Query 2 works in dataflow but Query 1 does not. IT generates the following error
Can’t save dataflow
One or more tables references a dynamic data source
Many thanks in advance.
I don’t know if there is a work around. SqlExpression.ToExpression isn’t meant for general production use. 🙁