You’re authoring in Power Query. You decide that rows with 0001-01-01 in a certain column should be removed, so you filter on the column, excluding 0001-01-01 values. After you apply your filter, nothing changes: the 0001-01-01 rows are still present. What is going on?
This issue bit me recently. Turns out, it’s due to a translation loss when bridging between the worlds of MySQL and Power Query.
Background
MySQL can be set to allow invalid date
and datetime
values with a year of 0000. This includes dates made up of all zeros (0000-00-00, perhaps used as an alternative to null). Dates with month and/or day given but a 0 year are also supported (e.g. 0000-05-30, perhaps representing a birth month and day when the year is unknown). The day given can even be invalid for the month (like 0000-09-31, even though September only has 30 days).
In contrast, Power Query does not support invalid dates. In M, a date
, datetime
or datetimezone
value’s year must be between 1 and 9999, its month between 1 and 12, and its day must be valid for the specified month.
Since MySQL invalid dates with all-zero years can’t be directly represented in Power Query, the MySQL connector for Power Query translates them to Power Query’s minimum date
(or datetime
, as appropriate) of 0001-01-01.
(Outside of Power Query, in the .Net world, you can achieve a similar effect by setting MySQL Connector/Net‘s connection string property ConvertZeroDateTime to true.)
Problem
Back to the problem scenario: A MySql invalid 0000-xx-xx date flows into Power Query as 0001-01-01. You see this date in Query Editor and decide to filter out the row that contains it. You (or Query Editor) writes code like the following in an attempt to make this happen:
Table.SelectRows(
Data,
each ([SomeDateTimeColumn] <> #datetime(1, 1, 1, 0, 0, 0))
)
If this Table.SelectRows
is processed locally by Power Query, the filtering works as expected—but not when the SelectRows
is query folded.
Why? The above expression’s filter says to exclude 0001-01-01, so that value is written into the SQL generated by query folding:
select `_`.`SomeDateTimeColumn`
from `dbo`.`FoldingProblemDemo` `_`
where `_`.`SomeDateTimeColumn` <> '0001-01-01 00:00:00'
or `_`.`SomeDateTimeColumn` is null
However, while 0001-01-01 is M’s representation of a 0000-xx-xx invalid date, it is not the actual invalid date value that is stored in the database. MySQL doesn’t know anything about how Power Query represents invalid dates, but it does know that 0001-01-01 is not the same as a 0000-xx-xx date, so the above SQL does not exclude any 0000-xx-xx dates.
When this query is run, rows containing 0000-xx-xx are still returned, Power Query dutifully displays these as 0001-01-01, and you end up puzzled over why your filter had no apparent effect.
But now you know what is going on. Are their workarounds?
Workaround
Thankfully, there’s easy, foldable workaround that in many cases is acceptable:
In the Table.SelectRows
call, change the operator used in the filter predicate from <>
to >
//From:
Table.SelectRows(
Data,
each ([SomeDateTimeColumn] <> #datetime(1, 1, 1, 0, 0, 0))
)
//To:
Table.SelectRows(
Data,
each ([SomeDateTimeColumn] > #datetime(1, 1, 1, 0, 0, 0))
)
This revision folds nicely, resulting in SQL along the lines of “where `_`.`SomeDateTimeColumn` > '0001-01-01 00:00:00'
“. Since all 0000-xx-xx invalid dates come before 0001-01-01, this WHERE
clause successfully excludes those values.
The above filter predicate also works when the SelectRows
is processed locally by Power Query, instead of being folded.
However, there is a catch: If the column you’re filtering on contains actual, valid 0001-01-01 dates, these values will also be excluded. In many cases, year 0001 is outside the domain’s range of allowed values, so this >
filter doesn’t pose a problem. If it does, then likely you’ll need to write your own SQL to perform the correct filtering (e.g. using Value.NativeQuery
or by creating a MySQL view).
Other Invalid Dates
Our focus in this post has been MySQL invalid dates in the form of 0000-xx-xx. MySQL also supports a couple variants of invalid dates with non-0000 years:
- Non-0000 year dates with month and/or day set to zero (e.g. 2022-00-00). These come into Power Query set to its minimum date (0001-01-01).
- Non-0000 year dates with a valid month, with the day between 1 and 31 where the day is invalid for the given month (like 2022-09-31, even though September has only 30 days). These come into Power Query as errors (
DataSource.Error: An error happened while reading data from the provider: 'Year, Month, and Day parameters describe an un-representable DateTime.'
)
The previously described > #datetime(1, 1, 1, 0, 0, 0)
includes non-zero dates greater than 1/1/1 so, in general, won’t filter out these latter scenarios in a way that works both when folded and when processed locally. Instead, you’ll probably need to write your own filtering SQL (e.g. use Value.NativeQuery
or create a MySQL view).