WL#8693: Remove the syntax for GROUP BY ASC and DESC

Affects: Server-8.0   —   Status: Complete

In MySQL, historically GROUP BY has been used for sorting. If a query specifies GROUP BY, output rows were sorted according to the GROUP BY columns as if the query had an ORDER BY for the same columns. This was in MySQL version 5.7 and earlier. However it changed in MySQL 8.0 when implicit sorting for GROUP BY was removed. But what still remains is the non-standard extension that MySQL provides with GROUP BY clause to specify the order for GROUP BY columns - GROUP BY column ASC/DESC.

The aim of this worklog is to remove the syntax for GROUP BY ASC and DESC. With this and the earlier removal of implicit sorting for GROUP BY, we can also remove some of the code which currently presumes that GROUP BY always sorts. For Ex: If a GROUP BY can be removed because the query generates only one row per group, optimizer currently re-writes GROUP BY to ORDER BY if possible. This is done as users expected results to be sorted when specifying GROUP BY.

However, what is not removed is something like the following:

MySQL optimizer currently checks if ORDER BY columns are compatible with GROUP BY. If found compatible, optimizer removes ORDER BY thinking that GROUP BY provides the required sorting. This is a good optimization if it is later decided that GROUP BY needs to sort to produce grouped results.

However, it might become costly if GROUP BY need not have to sort the data to give grouped results like in the case when temporary table is used for grouping. In this case, server updates one row per group in temporary table which results in smaller number of rows which can be sorted later if order by is present in the query. This is better in comparison with having to sort all the rows before grouping the data.

Since MySQL optimizer currently lacks a cost model for comparing the above operations, the code to move ORDER BY to GROUP BY if found compatible remains. Note that the above change is not introduced by this worklog.

So, with this worklog, we will be removing the non-standard syntax of GROUP BY ASC and GROUP BY DESC and will follow that with re-factoring of the code as mentioned above.

Deprecation of the syntax GROUP BY ASC/DESC in MySQL 5.7.

Note: Until 8.0.12, Mysql did not allow ORDER BY with ROLLUP. Reason being, ROLLUP produces NULLS in the row and Mysql treats NULLs as minimum value while sorting which leads to non-desirable results. So this made it difficult to get ordered results for queries using ROLLUP. Hence GROUP BY ASC/DESC had to be present to give sorted results in this case. However, with the support for ORDER BY with ROLLUP and GROUPING() function, one should be able to use GROUPING() in order by clause to get the desired results as noted in the following article http://mangalpardeshi.blogspot.co.uk/2009/08/rollup-and-order-by.html

Functional Requirements:
------------------------

F1: For queries using syntax of the form GROUP BY col1 ASC or GROUP BY col1 DESC, an error is thrown.

F2: For queries having GROUP BY ASC/DESC, if a query plan changes because of this feature, there must be a way to re-write the query and get the old plan.

Non-functional Requirements:

NF1: The performance of queries having GROUP BY without explicit use of ASC/DESC will be as good as before.

NF2: The performance of queries having GROUP BY col1 ASC/DESC which are now forced to be re-written into "GROUP BY col1 ORDER BY col1 ASC/DESC" will be same as "GROUP BY col1 ASC/DESC" in previous releases. See F2 above.

Parser:
------
Parser does not accept ASC or DESC keyword after column specification for a GROUP BY clause. So a syntax error is thrown if a GROUP BY column is followed by ASC or DESC keyword.

SELECT i, SUM(j) FROM t1 GROUP BY i ASC; ERROR 42000: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ASC' at line 1 SELECT i, SUM(j) FROM t1 GROUP BY i DESC; ERROR 42000: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DESC' at line 1

Resolver:

"is_explicit" - member of ORDER BY structure (st_order) was used to differentiate between explicit presence of ASC/DESC keyword for GROUP BY or ORDER BY clause with an implicit assumption of ascending order by default. This member has been removed now. It was added when implicit sorting for GROUP BY was removed. Since we have now removed explicit sorting too, this variable's usage is no more needed.

Optimizer:

If a GROUP BY can be removed because a query generates only one row per group, optimizer re-writes GROUP BY to ORDER BY if possible. This was done as users expected results to be sorted when specifying GROUP BY. This has been removed now, as both implicit and explicit sorting for GROUP BY is removed.

"no_order" - Member of JOIN structure is set to false when DISTINCT is re-written to group by and this group by is a prefix of order by if present in the query. However we can also use "direction :: ORDER_NOT_RELEVANT" for the same. So "no_order" is removed.

"ignore_order" - Member of ORDER_with_src, is set to false when GROUP BY is expected to provide order (When ORDER BY is moved to GROUP BY if found compatible). However we can use "direction :: ORDER_NOT_RELEVANT" for the same. So this variable is also removed.

We now have one variable related to ordering of ORDER BY.

"skip_sort_order" - Used w.r.t ORDER BY. By default false, but set to true when ORDER BY does not have to provide order (If ORDER BY is on a Constant or ORDER BY null is present etc)

MTR test changes:

All the queries which have syntax GROUP BY column ASC/DESC have been changed to GROUP BY column ORDER BY column ASC/DESC.