WL#1979: GROUPING() function

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

GROUPING FUNCTION
=================

Syntax:

GROUPING ( expr1, expr2,....)

SQL_FEATURE - T433

Purpose: (As defined by Oracle)
The GROUPING function distinguishes super-aggregate rows from regular grouped
rows.GROUP BY extensions such as ROLLUP and CUBE produce super-aggregate rows
where the set of all values is represented by null. Using the GROUPING function,
you can distinguish a null representing the set of all values in a
super-aggregate row from a NULL in a regular row.

The SQL standard specifies the following syntax for GROUPING():
Format
<set function specification> ::=
<aggregate function>
| <grouping operation>
<grouping operation> ::=
GROUPING <left paren> <column reference>
[ { <comma> <column reference> }... ] <right paren>

However, as MySQL allows expressions in GROUP BY clause, we have extended the
syntax for GROUPING to allow expressions as arguments.

The "expr" in the GROUPING function must match one of the expressions in the
GROUP BY clause. The function returns a value of 1 if the value of expr in the
row is a null representing the set of all values. Otherwise, it returns zero. 

As per SQL-2011, a GROUPING function can accept more than one argument. The
result of the GROUPING function would be the integer bit mask having 1's for the
arguments which have GROUPING() as 1. SQL standard -2011 describes this as follows:

If more than one <column reference> is specified, then let N be the number of
<column reference>s
and let CR i , 1 (one) ≤ i ≤ N, be the i-th <column reference>.
GROUPING ( CR 1 , ..., CR N-1 , CR N )
is equivalent to:
CAST ( ( 2 * GROUPING ( CR 1 , ..., CR N-1 ) + GROUPING ( CR N ) ) AS IDT )
where IDT is the implementation-defined declared type of the result.

For ex:
If GROUPING (a,b,c) returns 3, it means that NULL in column B and NULL in column
c is the result of ROLLUP/CUBE operation for that row.
If result is 1 then, NULL in column c alone is result of a ROLLUP/CUBE operation
for that row.

Multi-argument GROUPING function is SQL feature T433. PostgreSQL supports
multi-argument GROUPING().

Usage:
-----
As specified above the GROUPING function is used to understand the NULL's
generated as a result of ROLLUP (CUBE is not supported in MySQL). Below are some
of the examples using GROUPING().

#Check if a NULL present in output is a result of super aggregation
SELECT Department, Employee, SUM(salary) as SUM, GROUPING(Department) as
GP_Dept,GROUPING(Employee) as GP_Empl FROM t1 GROUP BY Department, Employee WITH
ROLLUP;

Department  Employees     SUM  GP_Dept  GP_Empl
      111         11      11        0       0
      111         12      25        0       0
      111       NULL      36        0       1
      222         22      22        0       0
      222         23      47        0       0
      222       NULL      69        0       1
      333         33      33        0       0
      333         34      69        0       0
      333       NULL     102        0       1
      NULL      NULL     207        1       1

#Check for the rows having only super-aggregates
SELECT Department, Employee, SUM(salary) as SUM, GROUPING(Department) as
GP_Dept,GROUPING(Employee) as GP_Empl FROM t1 GROUP BY Department, Employee
WITH ROLLUP HAVING GP_Dept=1 OR GP_Empl=1;

Department  Employees     SUM  GP_Dept  GP_Empl
      111       NULL      36        0       1
      222       NULL      69        0       1
      333       NULL     102        0       1
      NULL      NULL     207        1       1


#Differentiate Super-Aggregates and Aggregates
SELECT IF(GROUPING(Department)=1,'All Departments', Department) as Department,
IF(GROUPING(Employees)=1, 'All Employees', Employees) as Employees,
SUM(salary) as SUM
FROM t1 GROUP BY Department,Employees WITH ROLLUP;

    Department           Employees     SUM
            111                  11      11
            111                  12      25
            111       All Employees      36
            222                  22      22
            222                  23      47
            222       All Employees      69
            333                  33      33
            333                  34      69
            333       All Employees     102
All Departments       All Employees     207

GROUPING function can also be used to ORDER results with ROLLUP. In MySQL, GROUP
BY ASC/DESC sorts the result after ROLLUP operation.

An equivalent standard SQL query to get the above results:
SELECT IF(GROUPING(Department)=1,'All Departments', Department) as Department,
IF(GROUPING(Employees)=1, 'All Employees', Employees) as Employees,
SUM(salary) as SUM
FROM t1 GROUP BY Department,Employees WITH ROLLUP
ORDER BY GROUPING(Department), Department, GROUPING(Employees), Employees;

http://mangalpardeshi.blogspot.co.uk/2009/08/rollup-and-order-by.html

However MySQL does not allow ROLLUP with ORDER BY as of now. With the current
worklog implemented, we can remove the syntax for GROUP BY ASC/DESC and start
allowing ROLLUP with ORDER BY.

Feature requests:
BUG#3156 GROUPING() Feature request
BUG#46053 Adding grouping() function to support rollup identification for subtotals.
There is a contribution in the above bugpage. Part of the contribution is taken
to implement this feature.

User Documentation
==================

* https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-1.html
*
https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_grouping
* https://dev.mysql.com/doc/refman/8.0/en/group-by-modifiers.html
F-1 - GROUPING() is supported as per SQL standard-2011. However GROUPING() is,
not allowed to be present in ORDER BY for now as ORDER BY is not allowed with
ROLLUP in MySQL as mentioned in worklog text.

NF-1 - Performance of all the queries which do not use GROUPING() should not change.
Parser:
sum_expr has been changed to set_function and set_function becomes
set_function_specification:
                           sum_expr
                           | grouping_operation
                          ;

GROUPING is a new reserved word.

Resolver:
GROUPING() can be present only SELECT_LIST or in HAVING condition. If not, an
error will be thrown.
Every expression found in GROUPING() has to match exactly an expression found
in GROUP BY. If not, an error will be thrown. This check will happen after GROUP
BY expressions get resolved and before resolving rollup. After resolving rollup,
the same check cannot be made as all the group by expression found in the query
will be replaced with a reference object to be filled in/out by ROLLUP later
with a Item_null_result object.
Limitations: 
1. Ex: SELECT GROUPING( (SELECT MAX(b) FROM t1)) FROM t1 GROUP BY (SELECT MAX(b)
FROM t1);
In this case, GROUPING function throws error that Argument #1 is not present in
GROUP BY clause as we cannot equate two subqueries to be same.
2. GROUPING on const GROUP expressions do not work as expected if present in
HAVING clause. It gets evaluated during optimization itself.
Ex: SELECT a as f1, 'w' as f2 FROM t GROUP BY f1,f2 WITH ROLLUP HAVING
GROUPING(f2)=1;

Optimizer:
No changes

Executor:
Evaluation of the GROUPING function with single arguments, will check for the
type of the item that is referenced by the GROUPING function argument. If it is
Item_null_result, then it is a NULL produced by a ROLLUP operation. To make it
easier to check, a new type NULL_RESULT_ITEM is introduced. Objects of type
Item_null_result return their type as NULL_RESULT_ITEM. 
If the argument to GROUPING function is of any other type, then it is not a NULL
or it is a NULL but not a result of ROLLUP operation. GROUPING function return 0
in this case.

For a multi-argument GROUPING function, we check the same for every argument of
the function. If the check for NULL_RESULT_ITEM succeeds, then

result = result + (1 << (number of arguments - position from left zero based - 1)