M Language Proposal: Cleaning Up Function Chains with the Pipeline Operator

, , ,

Sometimes, a chain of M function calls reads as a dense blob of code, yet refactoring to the clearer structure of a let statement is an overkill. Let’s look at an alternative, a new operator to consider for inclusion in the M language.

The Problem

You’d like to add a column to your customers table that holds the average amount of the given customer’s three largest completed orders. The needed data is already available in the table, thanks to a nested orders table. All that’s needed is for you to define logic that uses this data to calculate the desired average.

To pull this off, your new column’s logic needs to:

  1. Filter the nested orders table to Status = "Completed".
  2. Sort by Total, descending.
  3. Take the top 3 results.
  4. Average their Totals.

Not too hard to pull off:

Screenshot of following code example inside Query Editor's "Add Column" dialog
List.Average(
  Table.FirstN(
    Table.Sort(
      Table.SelectRows(_[Orders], each [Status] = "Completed"),
      { "Total", Order.Descending }
    ),
    3
  )[Total]
)

From the technical perspective, writing this logic as a chain of function calls works just fine. However, the resulting code is dense. Making sense out of it takes careful reading: first, the reader needs to find the innermost function, which is where the action starts, then work outward one function at a time, being sure to mentally pair the correct parameters with the corresponding function invocation. The flow can be hard to follow, and the parameters to function call pairings easy to get wrong.

Refactoring the above to use a let expression clarifies how the logic reads:

Screenshot of following code example inside Query Editor's "Add Column" dialog
let
  CompletedOrders = Table.SelectRows(_[Orders], each [Status] = "Completed"),
  SortedByTotal = Table.Sort(CompletedOrders, { "Total", Order.Descending }),
  Top3Largest = Table.FirstN(SortedByTotal, 3)[Total],
  AverageOf3Largest = List.Average(Top3Largest)
in
  AverageOf3Largest

But is a let expression really necessary here? The variables it introduces aren’t needed for logic reuse or immutably’s sake. In fact, to use them, names needed to be defined for them, which in a scenario like this could be argued introduces its own sort of clutter.

let expressions and the variable definitions they allow are great in many circumstances. In no way am I suggesting we should make it a general practice to avoid using them. However, in simple function chaining scenarios, sometimes they can be an overkill.

An Alternative

As an alternative to the preceding examples, what do you think of the following?


Screenshot of following code example inside Query Editor's "Add Column" dialog
_[Orders]
|> Table.SelectRows(each [Status] = "Completed"),
|> TableSort({ "Total", Order.Descending }),
|> Table.FirstN(3)[Total],
|> List.Average()

In this not-currently-valid M code, we “borrow” the idea of reverse function application, specifically F#’s pipeline operator,|>” (which is approximately equivalent to Haskell’s Data.Function operator &).

In short, |> takes the output from what comes before it and passes it in as the first argument to the function that comes after it.

So

_[Orders]
|> Table.SelectRows(each [Status] = "Completed")

is equivalent to:

Table.SelectRows(_[Orders], each [Status] = "Completed")

The difference is that, thanks to |>, we can write our chain of function calls in linear, first-to-last order, instead of as a nested chain of invocations or using a let expression!

Variation

Instead of the proposed M pipeline operator passing whatever is on its left as the first argument to the function on its right, it could instead be defined to take whatever is on its left and assign it to a special variable which the expression on the right can reference, if and where it chooses.

If we used “!” as that variable (using “!“strictly for illustrative purposes, not married to it being the variable of choice), this would look like:

_[Orders]
|> Table.SelectRows(!, each [Status] = "Completed"),
|> TableSort(!, { "Total", Order.Descending }),
|> Table.FirstN(!, 3)[Total],
|> List.Average(!)

A Penny for Your Thoughts

What do you think? Would you like the option to use the pipeline operator in M? Again, I’m not suggesting that its use should replace let expressions in general; rather, in the case of simple function chains, it could be a useful construct to have available for crafting easy to read code.

8 thoughts on “M Language Proposal: Cleaning Up Function Chains with the Pipeline Operator

  1. Ignacio

    It would be amazing to have the pipe operator. I love both ideas, with or without “!”. The code is so much cleaner and readable. I wish Microsoft check this post out to think about it 😛
    Thanks for sharing.

    Reply
  2. David

    Compared to other languages, M is very difficult to read. This would help immensely in making it easier to parse. The only thing missing then would be regex support!

    Reply
  3. Kris

    Good call!
    I like the elegance of the R Pipes; %>% in R with magrittr. Changes like this will make me less stubborn to learn another language like M.

    Reply
  4. Mike Crnkovich

    Agreed, would love a pipeline function. I wrote a little custom pipeline function, that makes the code easier to read. I also wish functions were automatically curried 🙂
    Where fns is a list of functions

    let
        always = (fn) => fn,
        pipe2 = (fn2, fn1) => (data) => fn1(fn2(data)),
        pipe = (fns) => List.Accumulate(fns, always, pipe2)
    in
        pipe
    Reply
  5. Lutz

    What is the actual cost of the let expression “variables”? * Will the query plan be any different when using variables versus using chaining?

    I agree with the clutter concern, but is there more to it than potentially using up names that you might want to use later?

    *) I don’t think these should be called variables anyway. These are constants. (The same is true for DAX by the way).

    Reply
    1. Ben Gribaudo Post author

      I’m not aware of any (non-trivial) cost being associated with the use of variables.

      For me, the benefit here would be an option to minimize code clutter, for where it makes sense.

      Reply

Leave a Reply to Ignacio Cancel reply

Your email address will not be published. Required fields are marked *