Getting Rid of Columns: “Remove” or “Remove Other”?

, , ,

Your Power Query contains more columns than you need. How do you get rid of the ones you don’t want? The mechanics of making extraneous columns go away is easy: in Query Editor, right click on a column and choose either Remove Columns or Remove Other Columns—but what differentiates these two options?

Column context menu screenshot with Remove Columns + Remove Other Columns menu options highlighted

The Options

Remove Columns

  • What It Does: Removes the column(s) you selected, leaving all other columns present.
  • How It Works: Generates a call to Table.RemoveColumns(source, { list of columns to remove }).

Remove Other Columns

  • What It Does: Keeps the column(s) you selected, resulting in those you didn’t select being removed.
  • How It Works: Generates a call to Table.SelectColumns(source, { list of columns to keep }).

The Contrast

The difference between the two stems from the dependencies taken by the code Query Editor generates. Remove Columns codes up a list of the column names you don’t want (so takes a dependency on those names). Remove Other Columns does the opposite, creating a dependency on the list of columns you do want to keep.

What’s the the difference in effect between the two dependencies?

Imagine you’re pulling data from a database table and have used one or the other of these approaches to limit output to just the columns of interest…

What happens if a new column is added to the table?

  • Remove Columns: The new column shows up in your output. It’s not one of the columns you said to remove, so it is automatically included.
  • Remove Other Columns: The output doesn’t change; the newly added column doesn’t flow through. You specified the columns you wanted included. The new column isn’t one of them, so isn’t included.

What happens if a column you’re not interested in is removed?

Load failed due to someUnnecessaryColumn not being found
  • Remove Columns: Query evaluation dies with a missing column error. Remove Columns‘s code depends on the columns you don’t want. One of them is no longer present in the source table, so the dependency is violated, which triggers an error.
  • Remove Other Columns: Everything works fine. The dependency is on the columns of interest. They are all still present in the upstream table, so nothing changes (no error is raised).

What happens if a column you are interested in is removed?

  • Remove Columns: The column quietly disappears from the output. No error is raised.
  • Remove Other Columns: Query evaluation dies with a missing column error.

(All this is given in context of Power Query. The host application which receives Power Query’s output may have other behaviors, such as dying if the output handed to it is missing a column it expects; or ignoring extra columns that appear until a schema sync occurs. The host’s behaviors are outside the scope of this article.)

PII Rule: Phones Must Be Excluded

To make sure we’re grasping the ramifications, let’s consider a more concrete example:

You’re using Power Query to pull customer info. For privacy reasons, phone numbers must not be included in the output that’s returned. Currently, three phone columns exists: phone1, phone2 and phone3. When authoring your mashup, you can use either Remove Columns or Remove Other Columns to achieve the net effect of excluding these columns.

But what happens if a schema change is later made by the database team to the upstream source table without your Power Query expression being revised?

Say that a phone4 column is added to the table—with Remove Columns, it will flow through your Power Query mashup and be output, violating the rule that phones cannot be included. On the other hand, instead of a column being added, say that the existing phone3 column is removed from upstream—with Remove Columns, your query will die with an error because a column the mashup expected to remove is gone. In contrast, with Remove Other Columns, neither the column add or drop changes anything.

What if, instead, a column you do want to output (say firstName) is removed? With the Remove Columns approach, the column will silently disappear, while with Remove Other Columns the mashup will die because an expected column has gone absent without leave.

Column missing error

Conclusion

I’m inclined to prefer Remove Other Columns over Remove Columns by default, particularly when the upstream data source may change separately from my Power Query project. With Remove Other Columns, the dependency aligns with the end goal: dependency is on columns to keep, end goal is to keep those columns. With this approach, adds or removes of columns outside of that goal do not affect my query, while drops of columns I am interested in do affect the query.

In contrast, with Remove Columns, there is a difference between dependency and goal: goal is to keep certain columns, but the dependency is not on those columns. This mismatch can lead to interesting behaviors: Errors happen if columns I’m not interested in go missing, while columns of interest that disappear from upstream don’t cause errors—and newly added columns automatically flow through even though I may not be interested in them.

2 thoughts on “Getting Rid of Columns: “Remove” or “Remove Other”?

  1. Sam

    @Ben – Both have a Missing Field Parameter to handle the missing columns, so with Table.SelectColumns and setting the MissigField to use Null the problem of a column you want getting removed upstream wont disturb your downstream data model much and the query will still run

    Cheers
    Sam

    Reply
    1. Ben Gribaudo Post author

      Good point. Maybe I should work a mention of this in above. The article is primarily focused on what the UI exposes, which doesn’t include this option–but maybe a technical note about its availability would be good to add.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *