M Mysteries: SQL -> M (SqlExpression.ToExpression)

, , ,

Did you know that Power Query can convert (some) SQL into M code?!

No, I didn’t mean that the other way around. It’s true that Power Query can query fold M into SQL (M -> SQL), but that’s not what I’m referring to here. M also has some capability to take a SQL statement and translate it to Power Query code (SQL -> M).

(Disclaimer: As fun as SQL -> M may look, this technique is best not relied upon in production, for reasons that will be explained. However, exploring this functionality—especially the possible why behind its presence—may have educational benefits.)

Without further ado, let’s convert SQL to M:

let
  ReferenceTables = [Department = Department, Company = Company],
  SqlQuery = "
	SELECT d.dept, c.name
	FROM Department d
	  LEFT JOIN Company c ON d.id = c.id
	",
  TranslatedToM = SqlExpression.ToExpression(SqlQuery, ReferenceTables)
in
  TranslatedToM

Outputs:

(environment) => Table.RenameColumns(let
    t56319 = let
        t56316 = Table.PrefixColumns(environment[Department], "d"),
        t56317 = Table.PrefixColumns(environment[Company], "c"),
        t56318 = Table.Join(t56316, {"d.id"}, t56317, {"c.id"}, JoinKind.LeftOuter, null, {Value.NullableEquals})
    in
        t56318,
    t56322 = Table.SelectColumns(t56319, {"d.dept", "c.name"})
in
    t56322, {{"d.dept", "dept"}, {"c.name", "name"}})

(Note the numbers in the “tXXXXX” names appear to be randomly generated, so may change each time the above is executed.)

The enabler of this feat is the standard library’s SqlExpression.ToExpression. Pass it a string containing a SQL statement as well as a record with a field for each table referenced in the SQL  (field name = table’s name in the SQL statement; field value = actual M table containing the corresponding data). In return—assuming the method understands your SQL—you’ll receive back a string of M code defining a function that contains the equivalent logic.

SQL -> M -> SQL

Taking this a step further: You can execute the generated code! Pass it to Expression.Evaluate. You’ll receive back a function value. Invoke this and you’ll receive back the data of interest. You’ll need to pass Expression.Evaluate a reference to #shared, or at least to those standard library methods that the generated function uses. Also, when invoking the returned function, you’re expected to pass it a record that references the tables used (similar to what you passed to SqlExpression.ToExpression). All of this is covered by the single line of code below.

Expression.Evaluate(TranslatedToM, #shared)(ReferenceTables) // returns a table that's the result of evaluating the M expression which was generated from SQL

The SQL we provided was converted to M then executed. Wow! If the data source used supports folding, it’s possible that the executed expression was folded back to the source using its native query language. Wow (again!!)!

We wrote SQL, which was translated to M code, which (potentially) was then translated to the source’s particular local flavor of SQL (or whatever native language it speaks). SQL -> M -> SQL?!?!?!?! This is pretty wild!

Why?

Does this mean we stop writing M code and just write everything in SQL then let Power Query do the translation? No, no, nothing crazy like that.

SqlExpression.ToExpression is one of those mysterious methods which is listed in documentation with no details given about its purpose or describing how to use it. This clues us that it’s probably intended for Microsoft’s internal consumption—which means that it’s behavior could change at any time without notice and there’s no guarantee that what we currently think we know about how it works is complete or accurate. So, while it’s technically possible to use in production, doing so is not advised and is at your own risk.

Why, then, does Power Query have this SqlExpression.ToExpression? What internal use is make of it? I can’t say authoritatively, but I have a guess:

The way I understand it,* behind the scenes Microsoft Power BI’s Direct Query uses Microsoft Analysis Service to generate data fetch queries. For some data sources (like Azure SQL), the queries are generated in the source’s native query language and so can be used directly with the source. For other sources, SQL queries are generated. These are then translated to M code which is executed to fetch the desired data. My guess is that SqlExpression.ToExpression exposes (at least a portion of) the SQL -> M functionality that is used (at least sometimes) for this translation.

(*While this is my current understanding, I can’t guarantee that it’s 100% accurate; it’s possible that it could be way off. Even if it is spot-on, it could be out of date by the time you read this, as the internal interworking of DQ are subject to change at any time. So, take it with a grain of salt.)

Example

You can watch this in action by pulling the Microsoft Power BI performance logs for a Direct Query interaction with a source whose native language isn’t directly spoken by Analysis Service’s DQ engine. In the below example, a Postgres source is used.

Notice that:

  1. Event SqlTranslator/SqlParser/Parse shows the SQL from Analysis Service’s DQ engine. The SQL here appears to line up exactly with the SQL returned from tracing Analysis Service (e.g. the SQL shown in DAX Studio under Server Timings).
SqlTranslator/SqlParser/Parse
{
  "Start": "2021-07-13T17:35:37.1733796Z",
  "Action": "SqlTranslator/SqlParser/Parse",
  "HostProcessId": "9220",
  "PartitionKey": "Expression/Expression",
  "SQL": "
SELECT 
TOP (1000001) [t0].[dept],[t1].[name],
COUNT_BIG(*)
 AS [a0]
FROM 
([Department] AS [t0]

 LEFT OUTER JOIN 

[Company] AS [t1] on 
(
[t0].[id] = [t1].[id]
)
)

GROUP BY [t0].[dept],[t1].[name] ",
  "ProductVersion": "2.94.921.0 (21.06)",
  "ActivityId": "170d87c3-29e1-4398-af55-17abe11d8eb3",
  "Process": "Microsoft.Mashup.Container.NetFX45",
  "Pid": 8752,
  "Tid": 1,
  "Duration": "00:00:00.1241158"
}
  1. SqlExpressionTranslator/Translate shows the SQL’s M code translation. Passing the preceding SQL to SqlExpression.ToExpression produces equivalent M code (identical, except for the random numbers used in the “tXXXXX” names).
SqlExpressionTranslator/Translate
{
  "Start": "2021-07-13T17:35:37.2978885Z",
  "Action": "SqlExpressionTranslator/Translate",
  "HostProcessId": "9220",
  "PartitionKey": "Expression/Expression",
  "IsRecognized": "True",
  "Result": "(environment) => Table.RenameColumns(let
    t772 = let
        t769 = Table.PrefixColumns(environment[Department], \"t0\"),
        t770 = Table.PrefixColumns(environment[Company], \"t1\"),
        t771 = Table.Join(t769, {\"t0.id\"}, t770, {\"t1.id\"}, JoinKind.LeftOuter, null, {Value.NullableEquals})
    in
        t771,
    t778 = Table.Group(t772, {\"t0.dept\", \"t1.name\"}, {{\"a0\", (t773) => Table.RowCount(t773)}}),
    t779 = Table.SelectColumns(t778, {\"t0.dept\", \"t1.name\", \"a0\"}),
    t781 = Table.FirstN(t779, 1000001)
in
    t781, {{\"t0.dept\", \"dept\"}, {\"t1.name\", \"name\"}, {\"a0\", \"a0\"}})",
  "ProductVersion": "2.94.921.0 (21.06)",
  "ActivityId": "170d87c3-29e1-4398-af55-17abe11d8eb3",
  "Process": "Microsoft.Mashup.Container.NetFX45",
  "Pid": 8752,
  "Tid": 1,
  "Duration": "00:00:00.0947288"
}
  1. Lastly, the appropriate data source execute event (in this case, Engine/IO/Db/PostgreSQL/Command/ExecuteDbDataReader) shows the native query request produced by executing the generated M code. Query folding is evident in the PostgreSQL.
Engine/IO/Db/PostgreSQL/Command/ExecuteDbDataReader
{
  "Start": "2021-07-13T17:35:37.5923585Z",
  "Action": "Engine/IO/Db/PostgreSQL/Command/ExecuteDbDataReader",
  "ResourceKind": "PostgreSQL",
  "ResourcePath": "localhost;postgres",
  "HostProcessId": "9220",
  "PartitionKey": "Expression/Expression",
  "CommandText": "select \"rows\".\"t0.dept\" as \"dept\",
    \"rows\".\"t1.name\" as \"name\",
    count(1) as \"a0\"
from 
(
    select \"$Outer\".\"t0.dept\",
        \"$Inner\".\"t1.name\"
    from 
    (
        select \"_\".\"id\" as \"t0.id\",
            \"_\".\"dept\" as \"t0.dept\",
            \"_\".\"emp_id\" as \"t0.emp_id\"
        from \"public\".\"department\" \"_\"
    ) \"$Outer\"
    left outer join 
    (
        select \"_\".\"id\" as \"t1.id\",
            \"_\".\"name\" as \"t1.name\",
            \"_\".\"age\" as \"t1.age\",
            \"_\".\"address\" as \"t1.address\",
            \"_\".\"salary\" as \"t1.salary\",
            \"_\".\"join_date\" as \"t1.join_date\"
        from \"public\".\"company\" \"_\"
        where \"_\".\"age\" < 50
    ) \"$Inner\" on (\"$Outer\".\"t0.id\" = \"$Inner\".\"t1.id\")
) \"rows\"
group by \"t0.dept\",
    \"t1.name\"
limit 1000001",",
  "CommandTimeout": "600",
  "Behavior": "Default",
  "ProductVersion": "2.94.921.0 (21.06)",
  "ActivityId": "170d87c3-29e1-4398-af55-17abe11d8eb3",
  "Process": "Microsoft.Mashup.Container.NetFX45",
  "Pid": 8752,
  "Tid": 1,
  "Duration": "00:00:00.0020264"
}

Conclusion

All this said, SqlExpression.ToExpression probably isn’t something you should use in production code–but now, at least hopefully, it is a less mysterious entity and you have a better understanding of how Direct Query works.

If you’re adventurous, next time you need to port a SQL statement to M, you could try using SqlExpression.ToExpression to generate a starting code draft for you to polish and refine. How well this works (or if it works at all) will likely depend on how closely your SQL aligns with how Analysis Service generates its SQL. Might be a fun experiment to try!

Credits

Thanks goes to a comments exchange on Lars Schreiber’s The Self-Service BI Blog for the basic details of SqlExpression.ToExpression‘s arguments.

8 thoughts on “M Mysteries: SQL -> M (SqlExpression.ToExpression)

  1. Triparna Ray

    Hi Ben,

    Thanks for the great write-up. I have a use case of joining two tables in power query based on multiple column including one column with between clause of another column. Does that mean I can achieve that using complex SQL queries inside M editor in Power Query.

    Let me know if you think we should achieve it in any other better way.

    Best,
    Triparna

    Reply
    1. Ben Gribaudo Post author

      In Power Query, you could try a Table.Join or Table.NestedJoin that joins all the appropriate columns except for the one where you need the between-like behavior, then write the between logic into a Table.SelectRows filter. My guess is that the between-written-as-a-SelectRows filter, if it folds, will be written into the generated native query as a WHERE clause—but can still be used to achieve the same effect as a BETWEEN predicate.

      Alternately, you could use Value.NativeQuery to explicitly specify a raw SQL statement that is written exactly the way you’d like it to be.

      Reply
  2. smpa01

    Ben, I followed this but got stuck at the last step. Do you mind taking a look please?

    let
      rt1 = 
        let
          Source = Table.FromRows(
            Json.Document(
              Binary.Decompress(
                Binary.FromText("i45WMlTSUUpUitWJVjICspKUYmMB", BinaryEncoding.Base64), 
                Compression.Deflate
              )
            ), 
            let
              _t = ((type nullable text) meta [Serialized.Text = true])
            in
              type table [id = _t, index = _t]
          ), 
          #"Changed Type" = Table.TransformColumnTypes(
            Source, 
            {{"index", type text}, {"id", Int64.Type}}
          ), 
          #"Renamed Columns" = Table.RenameColumns(#"Changed Type", {{"index", "sn"}})
        in
          #"Renamed Columns", 
      lft1 = 
        let
          Source = Table.FromRows(
            Json.Document(
              Binary.Decompress(
                Binary.FromText("i45WMlTSUUpUitWJVjICspKUYmMB", BinaryEncoding.Base64), 
                Compression.Deflate
              )
            ), 
            let
              _t = ((type nullable text) meta [Serialized.Text = true])
            in
              type table [id = _t, index = _t]
          ), 
          #"Changed Type" = Table.TransformColumnTypes(
            Source, 
            {{"index", type text}, {"id", Int64.Type}}
          ), 
          #"Renamed Columns" = Table.RenameColumns(#"Changed Type", {{"index", "sn"}})
        in
          #"Renamed Columns", 
      Ref = [Department = rt1, Company = lft1], 
      SqlQuery = "select r.id, l.sn from Department r left outer join Company l on r.id=l.id", 
      M = SqlExpression.ToExpression(SqlQuery, Ref), 
      Custom1 = Expression.Evaluate(M, #shared)(Ref)
    in
      Custom1
    Reply
    1. Ben Gribaudo Post author

      The M code output by SqlExpression.ToExpression (below) specifies that the join should be done using Value.NullableEquals as the equality comparer.

      (environment) => Table.RenameColumns(let
          t904 = let
              t901 = Table.PrefixColumns(environment[Department], "r"),
              t902 = Table.PrefixColumns(environment[Company], "l"),
              t903 = Table.Join(t901, {"r.id"}, t902, {"l.id"}, JoinKind.LeftOuter, null, {Value.NullableEquals})
          in
              t903,
          t907 = Table.SelectColumns(t904, {"r.id", "l.sn"})
      in
          t907, {{"r.id", "id"}, {"l.sn", "sn"}})

      Apparently, M doesn’t support this when executing the join locally (i.e. it can currently only be used when folding occurs).

      If you edit the generated M so that it doesn’t set the comparer (that is, remove “, {Value.NullableEquals}“), it runs fine locally.

      Reply
      1. smpa01

        Many Thanks Ben !!! Do you know which SQL can we use in PQWRY? Is it SQLite..It will certainly make my life easy

        Reply
  3. Samik Pal

    Ben, I have been experimenting with this since I discovered this. But can you please confirm it works in dataflow. My test says it does not work. If possible, can you please test it out and suggest if there is a workaround.

    I tried the following
    /*Query 1*/

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WykvMTTVU0lEyNDBQitVB8I1A/FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [name = _t, value = _t]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"name", type text}, {"value", Int64.Type}}),
        ref=[tbl=#"Changed Type"],
        sql= "select [name] as [name], sum(value) as value from tbl group by [name]",
        tran = SqlExpression.ToExpression(sql,ref),
        back= Expression.Evaluate(tran, #shared)(ref)
    in
        back
    

    which is same as
    /*Query 2*/

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WykvMTTVU0lEyNDBQitVB8I1A/FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [name = _t, value = _t]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"name", type text}, {"value", Int64.Type}}),
        #"Grouped Rows" = Table.Group(#"Changed Type", {"name"}, {{"sum", each List.Sum([value]), type nullable number}})
    in
        #"Grouped Rows"
    

    Query 2 works in dataflow but Query 1 does not. IT generates the following error
    Can’t save dataflow
    One or more tables references a dynamic data source

    Many thanks in advance.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *