MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
When ONLY_FULL_GROUP_BY Won't See the Query Is Deterministic...

This is a follow-up post to my recent announcement of only_full_group_by improvements in 5.7, where I’d like to address some tricky corner cases where GROUP BY contains non-column expressions (functions).

In the SQL standard, GROUP BY should contain only table columns and never expressions. So this example query where we want to count how many people have the same name is illegal in the SQL2011 standard because the GROUP BY clause contains a function:

But MySQL does allow expressions in a GROUP BY clause, and it’s a very convenient addition to the standard. The logic of the only_full_group_by SQL mode notices that the selected expression is equal to the grouping expression, so it concludes that the result is not random and thus accepts the query.

However, let’s try this:

As you’ll see, it returns:
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.people.first_name' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by"
Indeed, the selected expression is not equal to the grouping one anymore. A human brain instantly recognizes that it’s still deterministic, as it’s simply the UPPER() value of the grouping expression, but for MySQL to recognize it too it should analyze the selected expression’s parse tree, trying to spot the grouping expression at all possible places in that tree. As you can imagine, the selected expression could be rather complex. For example:

MySQL would have to:

  • Start with UPPER:
    • See if the UPPER value is the CONCAT of GROUP BY — no it’s not
    • So go down to arguments of UPPER
    • There it finds CONCAT, which is equal to the GROUP BY one — OK!
  • Now check SUBSTR:
    • It’s not CONCAT
    • So go down to the arguments
    • CONCAT is there in the first argument — fine
    • The second argument is a multiplication, which is not CONCAT
      • So go down to its arguments
      • We have 0.6, which is not CONCAT but rather a constant and deterministic
      • Then LENGTH, which is not CONCAT
      • So let’s check its arguments
        • CONCAT is there, good. Finally done!

All these comparisons of functions have a runtime cost and an implementation cost, so we decided not to go down this path. After all, only_full_group_by had always rejected such a query—which is a non-Standard one—so we chose not to address these specific cases.

Thus, MySQL simply observes that the selected expression is not the grouping one and then it collects columns referenced by the selected expression to see if they’re equal to, or functionally dependent on, some grouping columns. But if there’s no grouping column — bail out!

One could take a step back and realize that grouping on first_name and last_name (the columns) is more natural than grouping on CONCAT, and it also yields a Standard-compliant query:

But what if you do have a query which really must group on an expression (likely something less trivial than my examples)? We saw that UPPER and GROUP BY CONCAT don’t fit together. A first idea is thus to group on a column; for that, the Standard-compliant solution is to use a derived table:

It makes CONCAT’s output a column of the derived table and thus the grouping happens on this column and finally UPPER uses this column value.

However, the derived table will have as many rows as people; as it will be materialized internally it will form a big temporary table, which creates space and speed issues. It would be better if the derived table contained only the groups. So we’re set for grouping on CONCAT, keeping UPPER separated from the grouping query:

Another potential solution, albeit a less safe one, is to add ANY_VALUE around UPPER() (or even turn only_full_group_by off!):

One case I have seen recently, brought up by Roland Bouman, is when one wants to sort the output in a particular order, for example one wants NULLs first, so adds ISNULL in ORDER BY:

This time, the selected expression is equal to that of GROUP BY, so it has no problem; but the ISNULL expression is the new problem! To ensure a deterministic order of results, only_full_group_by has to validate this ISNULL, and, just like for the previous UPPER call, it will not see that the argument of the ISNULL call is the same as the grouping expression. Here, because the problem is in the ORDER BY clause, and because ORDER BY clauses are allowed to reference expressions of the select list by aliases in MySQL, the simplest solution is to simply use an alias:

MySQL will see that ISNULL’s argument is an alias (CONC) and that it matches the same alias in the GROUP BY clause — the alias is then treated as a “grouping pseudo-column”. Matching aliases is much simpler than matching expressions, so we have been able to implement that in the only_full_group_by logic. Thus the above query will pass the only_full_group_by restrictions. It’s also much more readable with aliases than without, in my humble opinion; it might even execute slightly faster, as the value of CONC, computed in GROUP BY, is now reused by ISNULL, leading to less calls to the CONCAT function. The only potential drawback is that using aliases in GROUP BY clauses is not Standard-compliant, so it may cause issues if you need to support various DBMS options with your application.

I hope that this blog post has given you enough safe and usable tips for those types of queries. If you know of other tricks, they are warmly welcome, and you might just drop them in a comment here (thanks in advance!).

As always, THANK YOU for using MySQL!