#include <sys/types.h>
#include "my_base.h"
#include "my_inttypes.h"
#include "sql/range_optimizer/range_optimizer.h"
#include "sql/sql_const.h"

Classes
struct	GroupIndexSkipScanParameters

struct	GroupIndexSkipScanInfo

Functions
Mem_root_array< AccessPath * >	get_all_group_skip_scans (THD thd, RANGE_OPT_PARAM param, SEL_TREE *tree, enum_order order_direction, bool skip_records_in_range, double cost_est)
	Test if group index skip scan is applicable and if so, construct a new AccessPath for every candidate group index skip scan. More...

AccessPath *	get_best_group_skip_scan (THD thd, RANGE_OPT_PARAM param, SEL_TREE *tree, enum_order order_direction, bool skip_records_in_range, double cost_est)
	Test if this access method is applicable to a GROUP query with MIN/MAX functions, and if so, construct a new AccessPath. More...

void	trace_basic_info_group_index_skip_scan (THD thd, const AccessPath path, const RANGE_OPT_PARAM , Opt_trace_object trace_object)

void	dbug_dump_group_index_skip_scan (int indent, bool verbose, const AccessPath *path)

Function Documentation

◆ dbug_dump_group_index_skip_scan()

void dbug_dump_group_index_skip_scan	(	int	indent,
		bool	verbose,
		const AccessPath *	path
	)

◆ get_all_group_skip_scans()

Mem_root_array< AccessPath * > get_all_group_skip_scans	(	THD *	thd,
		RANGE_OPT_PARAM *	param,
		SEL_TREE *	tree,
		enum_order	order_direction,
		bool	skip_records_in_range,
		double	cost_est
	)

Test if group index skip scan is applicable and if so, construct a new AccessPath for every candidate group index skip scan.

Parameters

thd	Thread info
param	Range optimizer parameter
tree	Range tree generated by get_mm_tree
order_direction	The sort order the range access method must be able to provide. Three-value logic: asc/desc/don't care
skip_records_in_range	Same value as JOIN_TAB::skip_records_in_range()
cost_est	Best cost so far (=table/index scan time)

Return values

Mem_root_array of candidate GROUP_INDEX_SKIP_SCAN AccessPaths.

◆ get_best_group_skip_scan()

AccessPath * get_best_group_skip_scan	(	THD *	thd,
		RANGE_OPT_PARAM *	param,
		SEL_TREE *	tree,
		enum_order	order_direction,
		bool	skip_records_in_range,
		double	cost_est
	)

Test if this access method is applicable to a GROUP query with MIN/MAX functions, and if so, construct a new AccessPath.

DESCRIPTION Test whether a query can be computed via a GroupIndexSkipScanIterator. Queries computable via a GroupIndexSkipScanIterator must satisfy the following conditions: A) Table T has at least one compound index I of the form: I = <A_1, ...,A_k, [B_1,..., B_m], C, [D_1,...,D_n]> B) Query conditions: B0. Q is over a single table T. For a single table query which is internally transformed into a multi-table query E.g. due to semijoin transformations, group skip scan can still be used for the original table specified in the query for duplicate removal. E.g. SELECT DISTINCT f1 FROM t1 IN (SELECT f1 FROM t2); In this case, with a semi-join transformation, query would look like SELECT DISTINCT f1 FROM t1 semi-join t2 ON t1.f1=t2.f1; Group skip scan can still be used for table "t1" for duplicate removal even though the query has a JOIN now. B1. The attributes referenced by Q are a subset of the attributes of I. B2. All attributes QA in Q can be divided into 3 overlapping groups:

SA = {S_1, ..., S_l, [C]} - from the SELECT clause, where C is referenced by any number of MIN and/or MAX functions if present.
WA = {W_1, ..., W_p} - from the WHERE clause
GA = <G_1, ..., G_k> - from the GROUP BY clause (if any) = SA - if Q is a DISTINCT query (based on the equivalence of DISTINCT and GROUP queries.
NGA = QA - (GA union C) = {NG_1, ..., NG_m} - the ones not in GROUP BY and not referenced by MIN/MAX functions. with the following properties specified below. B3. If Q has a GROUP BY WITH ROLLUP clause the access method is not applicable.

SA1. There is at most one attribute in SA referenced by any number of MIN and/or MAX functions which, which if present, is denoted as C. SA2. The position of the C attribute in the index is after the last A_k. SA3. The attribute C can be referenced in the WHERE clause only in predicates of the forms:

(C {< | <= | > | >= | =} const)
(const {< | <= | > | >= | =} C)
(C between const_i and const_j)
C IS NULL
C IS NOT NULL
C != const SA4. If Q has a GROUP BY clause, there are no other aggregate functions except MIN and MAX. For queries with DISTINCT, aggregate functions are allowed. SA5. The select list in DISTINCT queries should not contain expressions. SA6. Clustered index can not be used by GROUP_MIN_MAX quick select for AGG_FUNC(DISTINCT ...) optimization because cursor position is never stored after a unique key lookup in the clustered index and further index_next/prev calls can not be used. So loose index scan optimization can not be used in this case. SA7. If Q has both AGG_FUNC(DISTINCT ...) and MIN/MAX() functions then this access method is not used. For above queries MIN/MAX() aggregation has to be done at nested_loops_join (end_send_group). But with current design MIN/MAX() is always set as part of loose index scan. Because of this mismatch MIN() and MAX() values will be set incorrectly. For such queries to work we need a new interface for loose index scan. This new interface should only fetch records with min and max values and let end_send_group to do aggregation. Until then do not use loose_index_scan. GA1. If Q has a GROUP BY clause, then GA is a prefix of I. That is, if G_i = A_j => i = j. GA2. If Q has a DISTINCT clause, then there is a permutation of SA that forms a prefix of I. This permutation is used as the GROUP clause when the DISTINCT query is converted to a GROUP query. GA3. The attributes in GA may participate in arbitrary predicates, divided into two groups:
RNG(G_1,...,G_q ; where q <= k) is a range condition over the attributes of a prefix of GA
PA(G_i1,...G_iq) is an arbitrary predicate over an arbitrary subset of GA. Since P is applied to only GROUP attributes it filters some groups, and thus can be applied after the grouping. GA4. There are no expressions among G_i, just direct column references. NGA1.If in the index I there is a gap between the last GROUP attribute G_k, and the MIN/MAX attribute C, then NGA must consist of exactly the index attributes that constitute the gap. As a result there is a permutation of NGA, BA=<B_1,...,B_m>, that coincides with the gap in the index. NGA2.If BA <> {}, then the WHERE clause must contain a conjunction EQ of equality conditions for all NG_i of the form (NG_i = const) or (const = NG_i), such that each NG_i is referenced in exactly one conjunct. Informally, the predicates provide constants to fill the gap in the index. WA1. There are no other attributes in the WHERE clause except the ones referenced in predicates RNG, PA, PC, EQ defined above. Therefore WA is subset of (GA union NGA union C) for GA,NGA,C that pass the above tests. By transitivity then it also follows that each WA_i participates in the index I (if this was already tested for GA, NGA and C). WA2. If there is a predicate on C, then it must be in conjunction to all predicates on all earlier keyparts in I.

C) Overall query form: SELECT EXPR([A_1,...,A_k], [B_1,...,B_m], [MIN(C)], [MAX(C)]) FROM T WHERE [RNG(A_1,...,A_p ; where p <= k)] [AND EQ(B_1,...,B_m)] [AND PC(C)] [AND PA(A_i1,...,A_iq)] GROUP BY A_1,...,A_k [HAVING PH(A_1, ..., B_1,..., C)] where EXPR(...) is an arbitrary expression over some or all SELECT fields, or: SELECT DISTINCT A_i1,...,A_ik FROM T WHERE [RNG(A_1,...,A_p ; where p <= k)] [AND PA(A_i1,...,A_iq)];

NOTES If the current query satisfies the conditions above, and if (mem_root! = NULL), then the function constructs and returns a new AccessPath. object, that is later used to construct a new GroupIndexSkipScanIterator. If (mem_root == nullptr), then the function only tests whether the current query satisfies the conditions above, and, if so, sets is_applicable = true.

Queries with DISTINCT for which index access can be used are transformed into equivalent group-by queries of the form:

SELECT A_1,...,A_k FROM T WHERE [RNG(A_1,...,A_p ; where p <= k)] [AND PA(A_i1,...,A_iq)] GROUP BY A_1,...,A_k;

The group-by list is a permutation of the select attributes, according to their order in the index.

TODO

What happens if the query groups by the MIN/MAX field, and there is no other field as in: "select min(a) from t1 group by a" ?
We assume that the general correctness of the GROUP-BY query was checked before this point. Is this correct, or do we have to check it completely?
Lift the limitation in condition (B3), that is, make this access method applicable to ROLLUP queries.