WL#3220: Loose index scan for aggregate functions

Affects: Server-5.5   —   Status: Complete   —   Priority: Low

Currently MySQL executes COUNT(DISTINCT expr1, ..., exprN) by executing
a table or index scan that fills a temporary data structure (table or
tree) with all distinct values, and then returning the count of the
distinct values in that temporary data structure.

In cases when there are applicable indexes it is possible to execute
such queries much faster via the "loose index scan" technique employed
for execution of GROUP BY and DISTINCT queries, as implemented by
WL#1724.

This task should extend the loose index scan access method implemented
for GROUP BY/DISTINCT queries (WL#1724), so that it can be used to
optimize and execute queries with COUNT(DISTINCT ...).

If time permits, the task should also apply the same optimization to
queries with AVG(DISTINCT ...) and SUM(DISTINCT ...). Finally, it
might be also quite easy to extend the task to compute
GROUP_CONCAT([DISTINCT] expr, ...) expressions via the same technique,
depending on how different is its implementation from that of COUNT,
AVG and SUM.
=========================================================
HLS: WL#3220 Loose index scan for COUNT DISTINCT
=========================================================

Contents
--------
1. Background
2. Loose index scan for AGG_FUNC(DISTINCT ...)
3. Applicability
4. Optimization
5. Query execution and execution cost
6. Limitations and future work


1. Background
-------------


Let AGG_FUNC be one of the aggregate functions {COUNT, AVG, SUM}.

Currently, queries with one of these functions such as

  SELECT COUNT(DISTINCT a), AVG(DISTINCT a) FROM T1 WHERE b = 5;

with AGG_FUNC(DISTINCT ...) functions in the SELECT clause are
processed as follows.

- The parser takes the decision that the values of operands of the
  aggregate function will be materialized in some data structure
  (temporary table or tree) for the purpose of duplicate removal.

- The optimizer has the freedom of choice to select the access
  method(s) to the tables referenced by the operands of the aggregate
  function.

- At query execution time, the chosen access method(s) for each table
  are used to retrieve rows that satisfy all query conditions. Once
  join execution produces a complete result row, the aggregate function
  computes the values of its operands from the complete join tuple, and
  stores each unique tuple of such values in its temporary data
  structure.

- Once query execution computes the last join result tuple, the
  result of the aggregate function is computed from the temporary
  data structure that stores the unique values of the operands of
  the aggregate function.

Pros:

- This method is general enough to be able to compute an aggregate
  function over arbitrary expressions of columns of arbitrary number of
  joined tables.

Cons:

- The decision how to compute AGG_FUNC(DISTINCT ...) expressions is
  taken already at parse-time, which prevents some subsequent
  optimizations based on more efficient index access.

- In particular this approach doesn't allow the optimizer to make use
  of the loose index scan access method (described below).


2. Loose index scan for AGG_FUNC(DISTINCT ...)
----------------------------------------------

For certain types of queries with AGG_FUNC(DISTINCT ...) in the SELECT
list it is possible to use more efficient methods to compute the
distinct values of the arguments of AGG_FUNC.

One such method is to reuse the existing loose index scan access method
for DISTINCT/GROUP queries. This method is generally more efficient
than regular index scan, and in addition it has the property that it
returns unique tuples from the scanned index. Thus if we retrieve the
tuples that provide arguments for AGG_FUNC via loose index scan for
DISTINCT, there is no need for the aggregate function to remove
duplicates. This also leads to less memory used at query execution time
because there is no need for the intermediate data structures used for
duplicate removal.
Note that due to the need to traverse the tree from its root each time
the loose index scan can be worse than using temp data structure and
index scan. So we need to modify loose index scan to do index scan 
instead of random disk access to jump to the next distinct value if 
the cost of using loose index scan is worse than using index scan. 
But we still prefer the modified loose index scan for AGG_FUNC(DISTINCT) 
as it eliminates the need to use temporary structure for duplicate 
elimination and thus speeds up the calculation.

If we choose to use an access method that guarantees the uniqueness of
AGG_FUNC operands, we can rewrite any instance of
  AGG_FUNC(DISTINCT col1, col2, ...)
into just
  AGG_FUNC(col1, col2, ...).

At a logical level, the suggested transformation can be illustrated by
the following examples. 

Suppose we have table T1(b, a, d, c, g, f, e) with an index I1 on
columns (a, b, c, d, e). Below we consider several queries where I1
is a covering index for all of them.

The query:

Q2.1:
SELECT COUNT(DISTINCT a, b, c) FROM T1;

is equivalent to:

Q2.1a:
SELECT COUNT(*) FROM (SELECT DISTINCT a, b, c FROM T1) AS tmp;

Given the index I1, the derived table can be computed via the loose
index scan for DISTINCT/GROUP BY because (a,b,c) forms a prefix of the
index (fore details see WL#1724, and the comment to
opt_range.cc:get_best_group_min_max()).

This logical transformation is equivalent to that of "pushing" the
"loose index scan" operation before the "duplicate removal" operation,
and using the property of loose index scan that it returns unique keys
to remove the redundant duplicate removal operation. When this is the
case, we say that loose index scan "absorbs" DISTINCT. Notice that the
logical transformation above is only for illustrative purposes and we
do not transform Q2.1 explicitly into Q2.1a in order to make use of
loose index scan. However, since the subquery is logically equivalent
to a loose index scan for DISTINCT over I1, we use Q2.1a to illustrate
the transformation.

The "loose index scan" optimization can be applied in a number of more
general cases. Below we consider several examples divided into two
cases: complete and partial DISTINCT absorption.

We say that a loose index scan access method is a "complete distinct
absorber" if it guarantees that all tuples under all DISTINCT modifiers
of all AGG_FUNC(DISTINCT ....) and the optional GROUP BY clause are
distinct for any database instance.

A loose index scan method is a "partial distinct absorber" if for some
or all DISTINCT operands or the optional GROUP BY clause, their
uniqueness is not guaranteed. Partial distinct absorption is described
in Sect. 6 as future work.

Similar to WL#1724, this task will consider only loose index scans that
provide complete DISTINCT absorption. In this class of queries, it is
possible to remove all DISTINCT modifiers of all aggregate
functions. If there is a GROUP BY clause it is also possible to
directly determine all groups from the index.

2.1 Simple queries without GROUP BY
-------------------------------------

The query:

Q2.1.1:
SELECT COUNT(DISTINCT a), SUM(DISTINCT a), AVG(DISTINCT a) FROM T1;

can be fully transformed into a loose index scan query:

Q2.1.1a:
SELECT COUNT(a), SUM(a), AVG(a) FROM (SELECT DISTINCT a FROM T1) AS tmp;

Similarly:

Q2.1.2:
SELECT COUNT(DISTINCT a, b), COUNT(DISTINCT b, a) FROM T1;

can be fully transformed into a loose index scan query:

Q2.1.2a:
SELECT COUNT(a, b), COUNT(b, a)
FROM (SELECT DISTINCT a, b FROM T1) AS tmp;

The common property of Q2.1, Q2.1.1, Q2.1.2 is that all operands of all
AGG_FUNCs have a permutation that forms a prefix of the index I1 (and
by transitivity all operands consist of the same set of columns). One
consequence from this is that if there is any other AGG_FUNC but COUNT,
all AGG_FUNCs must reference one and the same column.

2.2 Generalization with index prefixes and suffixes
---------------------------------------------------

Generally, the principle behind all these examples is that the
arguments under the DISTINCT modifier form a prefix of some ordered
index, which allows us to "jump" from one distinct prefix to another.

This idea can be extended to cases where the aggregate function
arguments under DISTINCT form an infix of some index, given that other
parts of the query may supply the missing prefix fields, such that by
the query semantics the combination of all these fields is known to be
unique.

In addition, in certain cases, a query may refer to a suffix of an
index given that the conditions on the suffix allow us to form an index
search key.

The most trivial case is when there are constant equality conditions
for all key parts before the ones under the distinct modifier(s) in the
query. We leave this case as future work (described in Sec. 6) as it is
not very common.

Another (more common) case is when there is a GROUP BY clause that can
serve as an index prefix. When a query contains GROUP BY we observe the
following:

- In the presence of GROUP BY, AGG_FUNC(DISTINCT ...) are applied to
  each group.
- By the definition of GROUP BY, all group columns in a single group
  have the same values.

There are two cases with GROUP BY index prefixes that can be easily
handled by using the current loose index scan for DISTINCT. In the
first case (Sect. 2.2.1) the GROUP BY clause coincides with the fields
under the DISTINCT modifier. In the second case (2.2.2) the fields
under the DISTINCT modifier follow immediately after the GROUP BY
fields.

Finally we consider the case with constant suffixes used as in loose
index scan for MIN/MAX with GROUP BY (Sect. 2.2.2).


2.2.1 GROUP BY prefix coinciding with AGG_FUNC(DISTINCT ...) fields
-------------------------------------------------------------------

Depending on AGG_FUNC, the result of AGG_FUNC(DISTINCT ...) is:
- for COUNT(DISTINCT <group columns>), the result is always "1"
- for both
  SUM(DISTINCT <group column>) and AVG(DISTINCT <group column>)
  the result is the value of <group column> for each group.

Few examples below.

Q2.2.1.1:
SELECT a, c, COUNT(DISTINCT c, a, b) FROM T1 GROUP BY a, b, c;

is equivalent to:

Q2.2.1.1a:
SELECT a, c, 1 FROM (SELECT DISTINCT a,b,c FROM T1) AS tmp;

Since the current loose index scan for DISTINCT allows it, there also
can be range conditions on the index prefix:

Q2.2.1.2:
SELECT COUNT(DISTINCT c, a, b) FROM T1
WHERE a > 5 AND b BETWEEN 10 AND 20
GROUP BY a, b, c;

can be executed as:

Q2.2.1.2a:
SELECT 1
FROM (SELECT DISTINCT a,b,c FROM T1
      WHERE a > 5 AND b BETWEEN 10 AND 20) AS tmp;

and the mixed case with a constant prefix (see 2.1.3 above) that this
WL will *not* handle:

Q2.2.1.3:
SELECT COUNT(DISTINCT b), SUM(DISTINCT b) FROM T1
WHERE a = 5
GROUP BY b;

that can be computed as:

Q2.2.1.3a:
SELECT 1, b
FROM (SELECT b FROM T1 WHERE a = 5 GROUP BY b) tmp;


2.2.2 The fields under DISTINCT follow after the GROUP BY fields
----------------------------------------------------------------

In this case the loose index scan prefix must consist of the the
concatenation of the GROUP BY and DISTINCT fields. The it is guaranteed
that the values of all DISTINCT fields will be unique within each
group.  However, the GROUP BY fields will normally contain duplicate
values which need to be removed after retrieval. Notice that with this
approach generally there will be less rows to group than if no loose
index scan was used at all. Consider the examples below.

Q2.2.2.1:
SELECT a, COUNT(DISTINCT b), SUM(DISTINCT b) FROM T1 GROUP BY a;

can be computed as:

Q2.2.2.1a:
SELECT a, COUNT(b), SUM(b)
FROM (SELECT a, b FROM T1 GROUP BY a, b) AS tmp
GROUP BY a;

If the group fields are not selected at all (as in Q9 below), then
loose scan is sufficient and there is no need to perform the grouping
operation at all.

Q2.2.2.2:
SELECT COUNT(DISTINCT b), SUM(DISTINCT b) FROM T1 GROUP BY a;

can be computed as:

Q2.2.2.2a:
SELECT COUNT(b), SUM(b)
FROM (SELECT a, b FROM T1 GROUP BY a, b) AS tmp;

In this case the values of 'b' will be unique within each group.



2.2.3 Constant suffixes
-----------------------

As in loose index scan for DISTINCT, it is possible to add constant
suffixes after the DISTINCT fields:


Q2.2.3.1:
SELECT COUNT(DISTINCT a, b) FROM T1 WHERE c = 13 AND d = 42;

can be transformed as:

Q2.2.3.1a:
SELECT COUNT(a, b)
FROM (SELECT DISTINCT a, b FROM T1 WHERE c = 13 AND d = 42) tmp;

and there can also be a GROUP BY clause:

Q2.2.3.2:
SELECT a, COUNT(DISTINCT a), SUM(DISTINCT a) FROM T1
WHERE b = 13 AND c = 42
GROUP BY a;

can be transformed as:

Q2.2.3.2a:
SELECT a, COUNT(a), SUM(a)
FROM (SELECT a FROM T1 WHERE b = 13 AND c = 42 GROUP BY a) tmp;


Finally, a constant suffix may allow that queries with different
arguments under AGG_FUNC(DISTINCT ...) (and/or fields under GROUP BY)
are executed via loose index scan. For this it is necessary that
all different fields are "covered" by constant conditions as in:

Q2.2.3.3:
SELECT COUNT(DISTINCT a, b), SUM(DISTINCT a) FROM T1 WHERE b = 42;

which can be transformed as:

Q2.2.3.3a:
SELECT COUNT(a, b), SUM(a)
FROM (SELECT DISTINCT a, b FROM T1 WHERE b = 42) AS tmp;


2.3 MIN/MAX functions
-----------------------

It is possible to apply loose index scan also when together with
AGG_FUNC(DISTINCT ...), the SELECT list contains MIN/MAX functions.

Consider the example:

Q2.3.1:
SELECT SUM(DISTINCT a), MAX(b) FROM T1 GROUP BY a;

Q2.3.1a:
SELECT a, b_max
FROM (SELECT a, MAX(b) AS b_max FROM T1 GROUP BY a) AS tmp;

Based on our observations in 2.1.4, the values of the AGG_FUNC(DISTINCT
...)  functions, should be the same as described in 2.1.4. The value of
the MIN/MAX functions should be the same as computed by loose index
scan for GROUP BY because it doesn't matter from which group we will
pick the MIN/MAX values, so we can as well pick the actual MIN/MAX for
each group.


2.4 Expressions of aggregate functions
--------------------------------------

All cases above can be extended to expressions over aggregate
functions in the SELECT list.

In order for the optimization to be applicable in the case of
expressions, it is sufficient that any expression in the SELECT list
references AGG_FUNC(DISTINCT ...) functions, and optionally aggregate
fields and MIN/MAX that satisfy the same conditions as if they appear
directly in the SELECT list.

The general idea is that it doesn't matter how does the query combine
the result of the query from all clauses prior to SELECT to compute
the final result. What is important are the properties of the selected
aggregate fields and functions that allow us not to remove duplicates
since we "know" there will be no duplicates in the input stream.
Consider the examples below:

The variant of Q2.2.1.1 above:

Q2.4.1.1:
SELECT 42 * (a + c + COUNT(DISTINCT c, a, b)) FROM T1 GROUP BY a, b, c;

is equivalent to:

Q2.4.1.1a:
SELECT 42 * (a + c + 1) FROM (SELECT DISTINCT a,b,c FROM T1) AS tmp;

The variant of Q2.3.1 above:

Q2.4.2:
SELECT (SUM(DISTINCT a) + MAX(b)) FROM T1 GROUP BY a;

can be computed as:

Q2.4.2a:
SELECT (a + b_max)
FROM (SELECT a, MAX(b) AS b_max FROM T1 GROUP BY a) AS tmp;


In the following sections we describe the conditions under which this
transformation is applicable, and the required changes to optimizer and
execution engine.


3. Applicability
----------------

The optimization is applicable for queries with the following
properties:

3.1 The query fulfills all conditions (and limitations) that apply
    to loose index scan for DISTINCT (WL#1724).
    The most important such conditions are:
    - one table per query (limitation)
    - no subqueries (limitation)
      TODO: this looks like a BUG, see get_best_group_min_max() -
              JOIN *join= thd->lex->select_lex.join;
            for subqueries join == NULL.
    - all operands of AGG_FUNC(DISTINCT ...) must be direct column
      references,
    - there must be a permutation of the DISTINCT's operands that
      forms a prefix of some index over the single query table
    - this index must be a covering index for the query
    - the WHERE conditions should allow to form an infix for the
      search key (more detail in WL#1724).
    - see other conditions defined in WL#1724.

    Ideally the current task should not re-check these conditions.

3.2 All AGG_FUNC() in the SELECT list contain a DISTINCT modifier.

3.3 Either all arguments of all AGG_FUNC(DISTINCT ...) are the same,
    or the differing arguments form an index suffix, and there are
    constant equality predicates that "cover" the differing arguments.

3.4 If there is a GROUP BY clause the optimization is applicable in
    one of these two cases:
    - the GROUP BY clause consists of the same columns as all columns
      under the DISTINCT modifier of AGG_FUNC().
    - the GROUP BY clause is a prefix of the columns under DISTINCT,
      and 3.3 is true for all columns under DISTINCT.

3.5 If 3.4 is the case, then there can be MIN/MAX functions as
    specified in WL#1724. In all other cases the SELECT list may
    contain only AGG_FUNC functions.

3.6 Given that the conditions 3.1 - 3.5 are met, both AGG_FUNC and
    optional MIN/MAX and group columns may occur anywhere in expressions
    in the SELECT list since it doesn't matter how the selected data
    is combined to form a result. What matters is how the way this data is
    retrieved. In addition to AGG_FUNC, MIN/MAX and aggregate columns,
    SELECT expressions may also reference constants.

Thus the general form of the queries subject to the optimization is:

SELECT general_expression(
       [C_1, ..., C_n,]
       AGG_FUNC_1(DISTINCT C_1, ..., C_n), AGG_FUNC_p1(DISTINCT C_1, ..., C_n),
       [MIN(C_n+1)], [MAX(C_n+1)], const_1, ..., const_k)
FROM T1
WHERE where_condition
[GROUP BY C_1, ..., C_n]
[HAVING having_condition]

where "general_expression" is either a simple list of its arguments (in
this case it is a SELECT list with direct references to the arguments), or
it is an arbitrary expression or list of expressions that combine the
arguments. 


4. Optimization
---------------

The optimization of the queries specified in Sect. 3 via loose index
scan for DISTINCT is performed as follows.

- Change the parser to introduce an instance of Item_sum_<agg_func>
  for each AGG_FUNC(DISTINCT ..) function, and mark that object that
  it contains a DISTINCT modifier.

- Change the phase that analyzes the potential keys for range index
  scan, so that it analyzes the argument columns of
  AGG_FUNC(DISTINCT... ). This would allow the optimizer to consider
  the family of range index scan methods, including loose index scan
  for DISTINCT and GROUP BY.

- Extend the conditions that check the applicability of loose index
  scan for DISTINCT and GROUP BY, to check the additional conditions
  from Sect. 3.

- Pass to the range optimizer the information that there is a
  duplicate removal operation for the operands of the aggregate
  function.

- Let the range optimizer (loose scan for DISTINCT analysis)
  determine whether it is possible to use loose scan for the chosen
  fields, and if so create the corresponding access method. If this
  method is the cheapest to retrieve data, it will be used to supply
  input rows for the aggregate functions (and optional GROUP BY).

- Inside the each AGG_FUNC abstract away the method by which data is
  retrieved and duplicates are removed. Depending on the properties
  of the input stream, choose the appropriate implementation that
  supplies data for the aggregate function, e.g. if loose index
  scan was chosen, there is no need to remove duplicates.

- In the case of Sect. 2.2.2 take into account that the GROUP BY
  columns still need to be grouped, thus leave the grouping operation
  in the query plan.

- In all other cases there is no need to perform grouping or duplicate
  removal.
  
- If there is a GROUP BY clause, and the query passed the tests in
    Sect. 3, change the implementation of AGG_FUNC(DISTINCT ...)
    depending on the function as follows:
    - COUNT(DISTINCT ...) ==> return the constant 1
    - SUM(DISTINCT group_column), AVG(DISTINCT ...) ==>
      return the current value of group_column.


5. Query execution and execution cost
-------------------------------------

Given that AGG_FUNC(DISTINCT ....) and the corresponding AGG_FUNC(...) 
are agnostic to the method by which their operands are retrieved, this
task doesn't require changes to the execution engine.

The only added cost when computing AGG_FUNC(DISTINCT ...) is the CPU
cost to update the result of the function for each new tuple, which is
negligible for the three functions being considered.

Therefore the execution cost of loose index scan for AGG_FUNC is the
same as of loose index scan for DISTINCT.


6. Limitations and future work
------------------------------

6.1 Limitations following from the implementation of loose index scan
    for DISTINCT (WL#1724).

The implementation of loose index scan for DISTINCT and GROUP BY
(WL#1724) does not support the following cases, and if extended, it
will allow us to apply the current optimization in those cases as well:

- multi-table join queries,
- subqueries,
- constant prefixes that may also cover differing GROUP/DISTINCT columns
  (see 6.3 below),
- the ability to combine constant prefixes and infixes to cover wider
  class of queries.


6.2 Use Primary Key indexes as source of distinct rows

  The current optimization can be extended to detect DISTINCT over a
  primary key and rewrite AGG_FUNC(DISTINCT ...) into AGG_FUNC(...).
  In this case the optimization will not use loose index scan. It will
  be sufficient to use direct index scan.


6.3 Use constant prefixes for "index jumping"

When there are constant equality conditions for all key parts before the
ones under the distinct modifier(s), the constants from these conditions
can be used to form an index prefix. Consider the queries below.

Q6.3.1:
SELECT SUM(DISTINCT c) FROM T1 WHERE a = 5 AND b = 7;

Here we can retrieve all unique keys that begin with <5,7,c>, as in
the example table below:

<a,b,c,d, ...>
........
<5,7,3,10, ...>  <== jump here
<5,7,3,11, ...>
<5,7,3,16, ...>
<5,7,8,10, ...>  <== jump here
<5,7,8,11, ...>
.......

This query could be computed as:

Q6.3.1a:
SELECT SUM(c)
FROM (SELECT DISTINCT c FROM T1 WHERE a = 5 AND b = 7) tmp;

This idea can be extended to use equality prefixes to "cover" the
argument fields that differ between all AGG_FUNC(DISTINCT ...) in a
query, if all differing fields form an index prefix. For example:

Q6.3.2:
SELECT COUNT(DISTINCT a, b), SUM(DISTINCT b)
FROM T1
WHERE a = 42;

could be transformed as:

Q6.3.2a:
SELECT COUNT(a, b), SUM(b)
FROM (SELECT DISTINCT a, b FROM T1 WHERE a = 42) AS tmp;

Since this case is not handled by the current loose index scan
implementation, we will *not* consider it in this WL. This case can
(and should be handled as an extension of loose index scan for
DISTINCT/GROUP BY queries (WL#1724).


6.4 Implement partial distinct absorption

In a number of cases it is possible to use loose index scan to
pre-filter the input of the aggregate functions, and eventually fully
remove duplicates for some (not all) AGG_FUNC(DISTINCT ...) instances
and/or for the GROUP BY clause.

Consider query:

Q6.4.1:
SELECT COUNT(DISTINCT a, b), SUM(DISTINCT a) FROM T1;

This query cannot be fully transformed to a loose index scan because if
we select all unique 3-element tuples (a,b,c) we are not guaranteed
that the (a, b) sub-component is unique, so that we can remove the
DISTINCT modifier for SUM. However, in such a case we can absorb one of
the DISTINCTs:

Q6.4.1a:
SELECT COUNT(*), SUM(DISTINCT a)
FROM (SELECT DISTINCT a, b FROM T1) AS tmp;

where SUM(DISTINCT a) will compute the distinct values of 'a' over a
smaller input compared to all rows in T1.


Similarly the query:

Q6.4.2:
SELECT COUNT(DISTINCT a, b), COUNT(DISTINCT b, c) FROM T1;

can be transformed into:

Q6.4.2a:
SELECT COUNT(DISTINCT a, b), COUNT(DISTINCT b, c) FROM
(SELECT DISTINCT a, b, c FROM T1);

Even query Q6.4.3 may benefit from loose index scan pre-filtering:

Q6.4.3:
SELECT COUNT(DISTINCT b,c), SUM(DISTINCT d) FROM T1;

if executed as:

Q6.4.3a:
SELECT COUNT(c, b), SUM(d)
FROM (SELECT DISTINCT a, b, c, d FROM T1) as tmp;


Finally if the GROUP BY fields are immediately followed by
AGG_FUNC(DISTINCT...) arguments:

Q6.4.4:
SELECT a, b, COUNT(DISTINCT c), SUM(DISTINCT c) FROM T1
WHERE a > 13
GROUP BY a, b;

we could use loose index scan for the GROUP BY, which will reduce
the number of input tuples to the DISTINCT operation:

Q6.4.4a:
SELECT a, b, COUNT(DISTINCT c), SUM(DISTINCT c)
FROM (SELECT a, b, c FROM T1 WHERE a > 13 GROUP BY a, b ,c;


6.5 Make Aggregator_distinct choice between RB-tree and temp table cost based

Aggregator_distinct must choose between the hash table and the RB-tree
implementations by comparing the cost of the two. This will be accomplished
within a separate follow-up task.
Low-Level Architecture for WL#3220
===================================

Contents
=========
1. Key design decision
2. Sketch of the classes definitions
3. Changes to the parser
4. Changes to add_group_and_distinct_keys ()
5. Changes to get_best_group_min_max ()
6. Actions near the end of JOIN::optimize (at the place where the GROUP BY 
   is turned off)
7. Changes to setup_sum_funcs()/init_sum_functions()/update_sum_func()
8. Schedule  

1. Key design decision
----------------------
Currently the AGG_FUNC() and AGG_FUNC(DISTINCT) are implemented in totally
unrelated classes and the decision witch class to instantiate is taken at
parse time.
It is beneficial to convert AGG_FUNC(DISTINCT) to AGG_FUNC() 
if it's known that the incoming data are already distinct (for example 
if they're output of loose index scan). To make such a switch possible 
we need to redesign the current class hierarchy for the aggregate functions 
so as to allow deferring the decision whether they will do a DISTINCT 
projection internally or not and to put the actual functionality of 
these aggregate functions in one place (instead of the two places it is 
to be found currently : the non-distinct and the distinct class).

This is achieved by adding functionality to Item_sum : the super-class 
of all AGG_FUNC(DISTINCT) classes.
This class will store at parse time if it's with DISTINCT or not. 
Then, at a later stage of the query compilation (when the characteristics 
of the incoming data stream are determined) the relevant aggregator class 
Aggregator_* (see section 2 for definition) will be allocated. 
In the case when the incoming data are not guaranteed to be unique 
this special class (Aggregator_distinct) will implement the applicable 
aggregation by instantiating the right Aggregator class.
Then the calls to the setup/add/clear methods of Item_sum throughout the
code will be replaced with calls to aggregator_setup()/aggregator_add()/
aggregator_clear() methods (that internally call the Aggregator class).  
When all the data are collected into the Aggregator class the Item_sum
will call finalize() method right at the start of it's val_* methods.
This way it will cause the Aggregator_* class instance to emit the data it
collected back to Item_sum descendant's set of methods (if there's need). 
When all the data are emitted back to the normal set of functions the 
Item_sum descendant class will have the right state as if the incoming 
data were distinct and the results passed through it. This state allows 
extraction of the final data in the val_*() functions.
As an optimization (if the amount of data allows the uniqueness to be 
determined at the time when inserting into the Unique class) the 
Aggregator_distinct may pump the unique data right to the Item_sum 
descendant class methods thus evading the need for any action in finalize.
If the incoming data are guaranteed to be distinct (as is the case for loose
index scan) we will allocate an instance of Aggregator_pass_through : it will 
just
pass the calls through to the Item_sum derived class actual code 
(clear()/add()) functions and will not do any processing.

We also need two variations of loose index scan implementation 
(QUICK_GROUP_MIN_MAX_SELECT) : 
  - QUICK_GROUP_MIN_MAX_SELECT is the "traditional" implementation that makes 
  a random index read to find the next distinct value  
  - QUICK_GROUP_MIN_MAX_SCAN_SELECT (a subclass of QUICK_GROUP_MIN_MAX_SELECT)
  implements an overloaded next_different_prefix() method that does index_next()
  until it reaches the next different value.
We instantiate QUICK_GROUP_MIN_MAX_SCAN_SELECT in 
TRP_GROUP_MIN_MAX::make_quick()
if we are resolving AFF_FUNC(DISTINCT) and it qualifies for loose index scan 
but the cost of loose index scan is too high.
This condition is stored into the boolean flag 
TRP_GROUP_MIN_MAX_SELECT::is_index_scan (private, default to false, set to true 
by
TRP_GROUP_MIN_MAX_SELECT::use_index_scan()) and is used in 
TRP_GROUP_MIN_MAX::make_quick(). The condition is set near the end of 
get_best_group_min_max() (when the costs are
known).
We also need to signal the sub-type of loose index scan used in EXPLAIN and we 
add
a QUICK_GROUP_MIN_MAX_SELECT::append_loose_index_scan() and we override it in 
QUICK_GROUP_MIN_MAX_SCAN_SELECT.
   
2. Sketch of the classes definitions
----------------------

2.1. Item_sum hierarchy

/*
   The abstract base class for the Aggregator_* classes.
   It implements the data collection functions (setup/add/clear)
   as either pass-through to the real functionality or
   as collectors into an Unique (for distinct) structure.

   Note that update_field/reset_field are not in that
   class, because they're simply not called when
   GROUP BY/DISTINCT can be handled with help of index on grouped 
   fields (quick_group = 0);
 */
class Aggregator
{
/* 
   all members are protected as this class is not usable
   outside of an Item_sum descendant
*/
protected:
  friend Item_sum;

  /* the aggregate function class to act on */
  Item_sum *item_sum;

  Aggregator (Item_sum *arg): item_sum(arg) {}

  enum Aggregator_type { LOOSE_D_AGGREGATOR, DISTINCT_AGGREAGATOR }; 
  virtual Aggregator_type Aggrtype() = 0;
  /* 
    called before feeding the first row. Used to allocate/setup
    the internal structures used for aggregation (like Unique
    for distinct for example).
  */
  virtual bool setup(THD *) = 0;

  /* 
    called to clean up the internal structures and reset them to their
    initial state
  */  
  virtual void clear() = 0;

  /*
    called when a value is to be processed through the aggregation
  */  
  virtual bool add() = 0;

  /* 
     called when there are no more data and the final value is to 
     be retrieved 
  */
  virtual void finalize() = 0;
};


/*
   The distinct aggregator. Implements AGGFN (DISTINCT ..)
   Collects all the data into an Unique (similarly to what Item_sum_distinct 
   does currently) and then (if applicable) iterates over the list of 
   unique values and pumps them back into its object
*/
class Aggregator_distinct : public Aggregator
{
protected:
  /* storage for the unique elements */
  Unique *unique;
  /* flag whether or not we will need to pump back the values in
     Item_sum_num descendant class.
     Unique can be used to check whether a value is distinct on the go only 
     if it operates on one in-memory tree. 
     If the tree overflows the available memory it is flushed to disk and
     the check for uniqueness is 
    */
  bool need_postprocessing;

  Aggregator_distinct (Item_sum *sum) :
    Aggregator(sum), table(NULL), tree(NULL), need_postprocessing(FALSE) {}
  Aggregator_type Aggrtype() { return DISTINCT_AGGREAGATOR; }

  bool setup(THD *) { return unique->setup(); }
  void clear() 
  { 
    unique->clear();  
    item_sum->clear(); 
    need_postprocessing= FALSE;
  }
  bool add()
  {
    /* pseudocode */
    bool no_duplicate = unique->add();
    if (!need_postprocessing)
    {
      if (no_duplicate && unique->elements == 0) 
      {
        /* 
           inline check and pass-through of the values to object's add
           method. Works until there are no trees on disk in Unique.
           Note: COUNT(DISTINCT) is handled specially because there
           the unique->elements can be used instead of calling 
           Item_sum->add() for each qualifying element

        */
        if (item_sum->sum_func() == COUNT_DISTINCT_FUNC)
          return FALSE;
        else
          return item_sum->add();
      }
      else if (unique->elements == 0)
      {
        /* 
           tree have gone to disk. In that mode Unique needs to have all
           of the data in it, then mark the end of inserting data, and then
           have the list of unique values accessible
        */   
        item_sum->clear();
        need_postprocessing= TRUE;
      }
    }
    return FALSE;
  }

  void finalize() 
  {
    /* pseudocode */
    if (need_postprocessing)
    {
      unique->walk (sum_distinct_walk, (void *) item_sum);
    }
    else if (item_sum->sum_func() == COUNT_DISTINCT_FUNC)
    {
      use unique->elements as result of COUNT(DISTINCT)
    }
};

int sum_distinct_walk (void *elem, element_count count, void *arg)
{
  return ((item_sum *)arg)->add();
}

/*
   The pass-through aggregator. Implements AGGFN (DISTINCT ..) by 
   knowing it gets distinct data on input. So it just pumps them back
   to the Item_sum descendant class.
*/
class Aggregator_pass_through : public Aggregator
{
protected:

  Aggregator_pass_through (Item_sum *sum) :
    Aggregator(sum) {}
  Aggregator_type Aggrtype() { return PASS_THROUGH_AGGREGATOR; }

  bool setup(THD *) { return FALSE; }
  void clear() { item_sum->clear(); }
  bool add() { return item_sum->add(); }
  void finalize() {};
};

/*
   This is the list of all new members/methods we add to Item_sum.
   aggregator_setup/aggregator_clear()/aggregator_add() must be 
   called instead of setup()/add()/clear() from the relevant places
*/
class public Item_sum : Item_result_field
{
  Aggregator *aggr;
  bool with_distinct;

  friend Aggregator;
  friend Aggregator_distinct;
  friend Aggregator_pass_through;

public:
  void init_aggregator()
  {
    aggr= NULL;
    with_distinct= FALSE;
  }
  Item_sum(THD *thd, Item_sum *item) :
    Item_result_field(thd, item), ...
  {
    init_aggregator();
    set_distinct(item->with_distinct);
    if (item->aggr)
      set_aggregator(item->aggr->Aggrtype());
  }

  /* 
    calls Aggregator code : to be called where add/setup/clear 
    are currently called
  */
  inline bool aggregator_setup(THD *thd) { return aggr->setup(thd) };
  inline void aggregator_clear() { aggr->clear(thd); }
  inline bool aggregator_add() { return aggr->add(thd) };

  /*
    stores the declared DISTINCT flag (from the parser)
  */
  void set_distinct(bool distinct)
  {
    with_distinct= distinct;
    quick_group= with_distinct ? 1 : 0;
  }

  /* 
    called when the final determination is done about the aggregation 
    type and the object is about to be used
  */  
  void set_aggregator(Aggregator_type aggregator)
  {
    switch (aggregator)
    {
    case DISTINCT_AGGREAGATOR:
      aggr= new Aggregator_distinct(this);
      break;
      
    case PASS_THROUGH_AGGREGATOR:
      aggr= new Aggregator_pass_through(this);
      break;
    }
  }
}


/* Sketch of the new Item_sum_count */
class Item_sum_count : public Item_sum_num
{
  ulonglong count;
  ...
public:
  ...
  enum Sumfunctype sum_func()
  {
    return with_distinct ? COUNT_DISTINCT_FUNC : COUNT_FUNC;
  }
  const char *func_name() const
  {
    return with_distinct ? "count(distinct " : "count( ";
  }
  ...

protected:
  void clear ()
  {
    count= 0;
  }
  void add ()
  {
    /* pseudocode */
    if (arg is not null)
      count++;
  }
}

/* Sketch of the new Item_sum_sum */
class Item_sum_sum : public Item_sum_num
{
  double sum;
  /* flag for null result */
  bool null_value;
  ...
public:
  ...
  enum Sumfunctype sum_func()
  {
    return with_distinct ? SUM_DISTINCT_FUNC : SUM_FUNC;
  }
  const char *func_name() const
  {
    return with_distinct ? "sum(distinct " : "sum( ";
  }
  ...

protected:
  void clear ()
  {
    null_value= 1;
    sum= 0;
  }
  void add ()
  {
    /* pseudocode */
    if (arg is not null)
    {
      sum+= arg;
      null_value= 0;
    }
  }
}

/* 
  Sketch of the new Item_sum_avg 
  Note how it inherits from Item_sum_sum and thus reuses
  the SUM functionality from it.
*/
class Item_sum_avg : public Item_sum_sum
{
  ulonglong count;
  ...
public:
  ...
  enum Sumfunctype sum_func()
  {
    return with_distinct ? AVG_DISTINCT_FUNC : AVG_FUNC;
  }
  const char *func_name() const
  {
    return with_distinct ? "avg(distinct " : "avg( ";
  }
  ...

protected:
  void clear ()
  {
    Item_sum_sum::clear();
    count= 0;
  }
  void add ()
  {
    if (Item_sum_sum::add())
      return TRUE;
    if (!args[0]->null_value)
      count++;
    return FALSE;
  }
}

2.2. QUICK_GROUP_MIN_MAX_SCAN_SELECT

/*
  Index scan for AGGR_FN(DISTINCT) queries without MIN/MAX aggregate functions.

  This class provides a specialized loose index access method for queries 
  containing aggregate functions with distinct of the form:
    SELECT [SUM|COUNT|AVG](DISTINCT a,...) FROM t
  This method comes to replace the index scan + Unique class 
  (distinct selection) for loose index scan that visits all the rows of a 
  covering index instead of jumping in the begining of each group.
*/
class QUICK_GROUP_MIN_MAX_SCAN_SELECT : public QUICK_GROUP_MIN_MAX_SELECT
{
protected:
  int next_different_prefix();
public:
  void append_loose_scan_type(String *str) 
  { 
    str->append(STRING_WITH_LEN(" (scanning)"));
  }
 };


3. Changes to the parser
------------------------
We instantiate Item_sum_<AGGFN> regardless of the DISTINCT flag and 
then call set_distinct if there was a DISTINCT.

4. Actions in the add_group_and_distinct_keys
---------------------------------------------
We will call the is_indexed_agg_distinct () and pass a pointer to the 
indexed_fields local variable. This will fill up the const_keys the same way it
already does for GROUP BY/DISTINCT. 
There we must also set the JOIN::sort_and_group flag to 1 (to indicate that 
there are fields it need to group on that are not accessible as GROUP
BY/DISTINCT).

5. Changes in the get_best_group_min_max ()
-------------------------------------------
The detection phase in get_best_group_min_max () must be split into two :
 5.1. Checks that are done before GA1/GA2 : they're applicable only in the 
      GROUP BY/DISTINCT and need to be skipped if get_best_group_min_max () 
      returns true.
 5.2. Checks that are common for GROUP BY/DISTINCT and AGG_FUNC(DISTINCT).

The 5.2 tests must be extended to cater for the data structures for the 
AGG_FUNC(DISTINCT).  This is best done by extending the GA2 check to use 
the AGG_FUNC(DISTINCT) arguments instead of the list of the distinct fields.

Also we put a flag in TRP_GROUP_MIN_MAX (bool have_agg_distinct) so as to be
able to mark the flag in QUICK_GROUP_MIN_MAX_SELECT (bool have_agg_distinct)
accordingly. This flag is needed to trigger the update of the
AGG_FUNC(DISTINCT) -> AGG_FUNC().
If the cost of loose index scan is higher than the best cost so far (that we
pass as a parameter to get_best_group_min_max) and we are processing a
qualifying AGG_FUNC(DISTINCT) we set the flag TRP_GROUP_MIN_MAX::is_index_scan
so we can trigger using loose index scan without random index reads.  

6. Actions near the end of JOIN::optimize ()
--------------------------------------------
When we are sure that the optimizer has selected the loose index scan
we must do two things : 
  6.1. Set the aggregator type : the actual optimization.

  6.2. Make sure the add() methods of the aggregate functions are called 
    and they actually aggregate the output of the loose index scan (as 
    they are turned off by the loose index scan optimization that calls 
    them inside it's runtime).

To achieve 6.2 we do the following:

The place to add the code is where it turns on
JOIN::tmp_table_param.precomputed_group_by (effectively disabling the
calculation of the aggregate functions at the end of each row produced so
far) on the following condition :

if (join_tab->is_using_loose_index_scan())

In the same then branch we check the special function : 

inline bool is_using_agg_loose_index_scan ()

that we add to JOIN_TAB (next to is_using_loose_index_scan()) and if this
is true we set the JOIN::tmp_table_param.precomputed_group_by 
back to FALSE (as the loose index scan code will not call the
Item_sum_...::aggregator_add () aggregate functions for us otherwise as 
it does for min() and max()).

To achieve 6.1 we iterate over JOIN::sum_funcs and switch the call the 
set_aggregator() function.

7. Changes to setup_sum_funcs()/init_sum_functions()/update_sum_func()
--------------------------------------------
These functions are the place where the Item_sum's functions are called.
Here they need to be replaced by their aggregator_* counterparts 
(eg. setup() with aggregator_setup(), etc).

7. Schedule
--------------------------------------------
TODO:
- These are *very* rough estimates.
- Refine while refining the whole document.

Goal: get loose index scan working for AGG_FUNC(DISTINCT) 
1. Implement the Aggregator classes (from point 2 of this LLD)
   Time : 1 day
2. Implement points 3,4,5,6,7 of this LLD
   Time : 1 day
3. Test everything together and make test cases
   Time : 4 days
4. Implement changes from the code review (if any)
   and retest
   Time : 1 day
5. Merge and retest    
   Time : 1 day