MySQL 8.4.0
Source Code Documentation
opt_sum.cc File Reference

Optimising of MIN(), MAX() and COUNT(*) queries without 'group by' clause by replacing the aggregate expression with a constant. More...

#include <limits.h>
#include <stddef.h>
#include <sys/types.h>
#include "ft_global.h"
#include "my_base.h"
#include "my_bitmap.h"
#include "my_dbug.h"
#include "my_inttypes.h"
#include "my_sys.h"
#include "my_table_map.h"
#include "mysql_com.h"
#include "sql/field.h"
#include "sql/handler.h"
#include "sql/item.h"
#include "sql/item_cmpfunc.h"
#include "sql/item_func.h"
#include "sql/item_sum.h"
#include "sql/key.h"
#include "sql/range_optimizer/range_optimizer.h"
#include "sql/sql_bitmap.h"
#include "sql/sql_class.h"
#include "sql/sql_const.h"
#include "sql/sql_error.h"
#include "sql/sql_lex.h"
#include "sql/sql_list.h"
#include "sql/sql_opt_exec_shared.h"
#include "sql/sql_select.h"
#include "sql/table.h"

Functions

static bool find_key_for_maxmin (bool max_fl, Index_lookup *ref, Item_field *item_field, Item *cond, uint *range_fl, uint *prefix_len)
 Check whether we can get value for {max|min}(field) by using a key. More...
 
static bool reckey_in_range (bool max_fl, Index_lookup *ref, Item_field *item_field, Item *cond, uint range_fl, uint prefix_len)
 Check whether found key is in range specified by conditions. More...
 
static bool maxmin_in_range (bool max_fl, Item_field *item_field, Item *cond)
 Check whether {MAX|MIN}(field) is in range specified by conditions. More...
 
static ulonglong get_exact_record_count (Table_ref *tables)
 Get exact count of rows in all tables. More...
 
static int get_index_min_value (TABLE *table, Index_lookup *ref, Item_field *item_field, uint range_fl, uint prefix_len)
 Use index to read MIN(field) value. More...
 
static int get_index_max_value (TABLE *table, Index_lookup *ref, uint range_fl)
 Use index to read MAX(field) value. More...
 
bool optimize_aggregated_query (THD *thd, Query_block *select, const mem_root_deque< Item * > &fields, Item *conds, aggregate_evaluated *decision)
 Substitute constants for some COUNT(), MIN() and MAX() functions in an aggregated (implicitly grouped) query. More...
 
bool is_simple_predicate (Item_func *func_item, Item **args, bool *inv_order)
 Test if the predicate compares a field with constants. More...
 
static bool move_endpoint (bool is_max, Item *cond)
 Check if a predicate should make the endpoint for the MIN/MAX optimization move towards -Inf for MAX or towards +Inf for MIN. More...
 
static bool matching_cond (bool max_fl, Index_lookup *ref, KEY *keyinfo, KEY_PART_INFO *field_part, Item *cond, table_map map, key_part_map *key_part_used, uint *range_fl, uint *prefix_len)
 Check whether a condition matches a key to get {MAX|MIN}(field):. More...
 

Detailed Description

Optimising of MIN(), MAX() and COUNT(*) queries without 'group by' clause by replacing the aggregate expression with a constant.

Given a table with a compound key on columns (a,b,c), the following types of queries are optimised (assuming the table handler supports the required methods)

SELECT COUNT(*) FROM t1[,t2,t3,...]
SELECT MIN(b) FROM t1 WHERE a=const
SELECT MAX(c) FROM t1 WHERE a=const AND b=const
SELECT MAX(b) FROM t1 WHERE a=const AND b<const
SELECT MIN(b) FROM t1 WHERE a=const AND b>const
SELECT MIN(b) FROM t1 WHERE a=const AND b BETWEEN const AND const
SELECT MAX(b) FROM t1 WHERE a=const AND b BETWEEN const AND const

Instead of '<' one can use '<=', '>', '>=' and '=' as well. Instead of 'a=const' the condition 'a IS NULL' can be used.

If all selected fields are replaced then we will also remove all involved tables and return the answer without any join. Thus, the following query will be replaced with a row of two constants:

SELECT MAX(b), MIN(d) FROM t1,t2
  WHERE a=const AND b<const AND d>const

(assuming a index for column d of table t2 is defined)

Function Documentation

◆ find_key_for_maxmin()

static bool find_key_for_maxmin ( bool  max_fl,
Index_lookup ref,
Item_field item_field,
Item cond,
uint *  range_fl,
uint *  prefix_len 
)
static

Check whether we can get value for {max|min}(field) by using a key.

If where-condition is not a conjunction of 0 or more conjuct the function returns false, otherwise it checks whether there is an index including field as its k-th component/part such that:

  1. for each previous component f_i there is one and only one conjunct of the form: f_i= const_i or const_i= f_i or f_i is null
  2. references to field occur only in conjucts of the form: field {<|<=|>=|>|=} const or const {<|<=|>=|>|=} field or field BETWEEN const1 AND const2
  3. all references to the columns from the same table as column field occur only in conjucts mentioned above.
  4. each of k first components the index is not partial, i.e. is not defined on a fixed length proper prefix of the field.

If such an index exists the function through the ref parameter returns the key value to find max/min for the field using the index, the length of first (k-1) components of the key and flags saying how to apply the key for the search max/min value. (if we have a condition field = const, prefix_len contains the length of the whole search key)

Parameters
[in]max_fl0 for MIN(field) / 1 for MAX(field)
[in,out]refReference to the structure we store the key value
[in]item_fieldField used inside MIN() / MAX()
[in]condWHERE condition
[out]range_flBit flags for how to search if key is ok
[out]prefix_lenLength of prefix for the search range
Note
This function may set field->table->key_read to true, which must be reset after index is used! (This can only happen when function returns 1)
Returns
true if index can be used to optimize MIN()/MAX(), false otherwise. If true, ref, range_fl and prefix_len are updated

◆ get_exact_record_count()

static ulonglong get_exact_record_count ( Table_ref tables)
static

Get exact count of rows in all tables.

This is called, when at least one of the table handlers support HA_COUNT_ROWS_INSTANT, but not HA_STATS_RECORDS_IS_EXACT (NDB is one such storage engine).

Parameters
tablesList of tables
Return values
Productof number of rows in all tables. ULLONG_MAX for error.

◆ get_index_max_value()

static int get_index_max_value ( TABLE table,
Index_lookup ref,
uint  range_fl 
)
static

Use index to read MAX(field) value.

Parameters
tableTable object
refReference to the structure where we store the key value
range_flWhether range endpoint is strict greater than
Return values
0No errors HA_ERR_... Otherwise

◆ get_index_min_value()

static int get_index_min_value ( TABLE table,
Index_lookup ref,
Item_field item_field,
uint  range_fl,
uint  prefix_len 
)
static

Use index to read MIN(field) value.

Parameters
tableTable object
refReference to the structure where we store the key value
item_fieldField used in MIN()
range_flWhether range endpoint is strict less than
prefix_lenLength of common key part for the range
Return values
0No errors HA_ERR_... Otherwise

◆ is_simple_predicate()

bool is_simple_predicate ( Item_func func_item,
Item **  args,
bool *  inv_order 
)

Test if the predicate compares a field with constants.

Parameters
func_itemPredicate item
[out]argsHere we store the field followed by constants
[out]inv_orderIs set to true if the predicate is of the form 'const op field'
Returns
true if function is a simple predicate, false otherwise.

◆ matching_cond()

static bool matching_cond ( bool  max_fl,
Index_lookup ref,
KEY keyinfo,
KEY_PART_INFO field_part,
Item cond,
table_map  map,
key_part_map key_part_used,
uint *  range_fl,
uint *  prefix_len 
)
static

Check whether a condition matches a key to get {MAX|MIN}(field):.

For the index specified by the keyinfo parameter and an index that contains the field as its component (field_part), the function checks whether

  • the condition cond is a conjunction,
  • all of its conjuncts refer to columns of the same table, and
  • each conjunct is on one of the following forms:
    • f_i = const_i or const_i = f_i or f_i IS NULL, where f_i is part of the index
    • field {<|<=|>=|>|=} const
    • const {<|<=|>=|>|=} field
    • field BETWEEN const_1 AND const_2

As a side-effect, the key value to be used for looking up the MIN/MAX value is actually stored inside the Field object. An interesting feature is that the function will find the most restrictive endpoint by over-eager evaluation of the WHERE condition. It continually stores the current endpoint inside the Field object. For a query such as

SELECT MIN(a) FROM t1 WHERE a > 3 AND a > 5;
#define MIN(a, b)
Definition: crypt_genhash_impl.cc:89
const std::string SELECT("SELECT")
Name of the static privileges.
@ WHERE
Definition: sql_yacc.h:687
@ FROM
Definition: sql_yacc.h:250

the algorithm will recurse over the conjunction, storing first a 3 in the field. In the next recursive invocation the expression a > 5 is evaluated as 3 > 5 (Due to the dual nature of Field objects as value carriers and field identifiers), which will obviously fail, leading to 5 being stored in the Field object.

Parameters
[in]max_flSet to true if we are optimizing MAX(), false means we are optimizing MIN()
[in,out]refReference to the structure where the function stores the key value
[in]keyinfoReference to the key info
[in]field_partPointer to the key part for the field
[in]condWHERE condition
[in]mapTable map for the key
[in,out]key_part_usedMap of matchings parts. The function will output the set of key parts actually being matched in this set, yet it relies on the caller to initialize the value to zero. This is due to the fact that this value is passed recursively.
[in,out]range_flSays whether endpoints use strict greater/less than.
[out]prefix_lenLength of common key part for the range where MAX/MIN is searched for
Return values
falseIndex can't be used.
trueWe can use the index to get MIN/MAX value

◆ maxmin_in_range()

static bool maxmin_in_range ( bool  max_fl,
Item_field item_field,
Item cond 
)
static

Check whether {MAX|MIN}(field) is in range specified by conditions.

Parameters
[in]max_flfalse for MIN(field) / true for MAX(field)
[in]item_fieldItem representing field used in MIN/MAX expression
[in]condCondition to check
Returns
true if condition is not true for the found row, otherwise false.

◆ move_endpoint()

static bool move_endpoint ( bool  is_max,
Item cond 
)
static

Check if a predicate should make the endpoint for the MIN/MAX optimization move towards -Inf for MAX or towards +Inf for MIN.

It is assumed that the current endpoint is stored in the field accessed by "cond", and that "cond" represents a comparison of a field and constant values using the <, >, <=, >= or BETWEEN operator.

◆ optimize_aggregated_query()

bool optimize_aggregated_query ( THD thd,
Query_block select,
const mem_root_deque< Item * > &  fields,
Item conds,
aggregate_evaluated decision 
)

Substitute constants for some COUNT(), MIN() and MAX() functions in an aggregated (implicitly grouped) query.

Parameters
[in]thdthread handler
[in]selectquery block
[in]fieldsAll fields to be returned
[in]condsWHERE clause
[out]decisionoutcome for successful execution = AGGR_REGULAR regular execution required = AGGR_COMPLETE values available = AGGR_DELAYED execution with ha_records() required = AGGR_EMPTY source tables empty, aggregates are NULL or zero (for COUNT)
Returns
false if success, true if error

This function is called for queries with aggregate functions and no GROUP BY, thus the result set will contain a single row only.

First, the function will analyze the source tables and WHERE clause to see whether the query qualifies for optimization. If not, the decision AGGR_REGULAR is returned.

Second, the function walks over all expressions in the SELECT list. If the expression can be optimized with a storage engine operation that is O(1) (MIN or MAX) or O(0) (instant COUNT), the value is looked up and inserted in the value buffer, and the corresponding Item is marked as being const. If the expression is a COUNT operation that can be evaluated efficiently by the storage manager (but still O(N)), indicated with HA_COUNT_ROWS_INSTANT, it will be marked as such.

When all SELECT list expressions have been processed, there are four possible outcomes:

  • An empty result from the source tables is indicated, and the output state is AGGR_EMPTY. Notice that the result of aggregation is still one row, filled with zero for COUNT operations and NULL values for all other expressions.
  • All expressions have been given values, indicated with output state AGGR_COMPLETE.
  • All expressions have been given values, except for one or more COUNT operations that will be evaluated in execution. This is indicated with AGGR_DELAYED.
  • Some expressions must be evaluated as part of a regular execution, indicated with AGGR_REGULAR. Notice that some of the expressions may have been given values and are marked as const, but no expressions will be candidates for delayed execution.

◆ reckey_in_range()

static bool reckey_in_range ( bool  max_fl,
Index_lookup ref,
Item_field item_field,
Item cond,
uint  range_fl,
uint  prefix_len 
)
static

Check whether found key is in range specified by conditions.

Parameters
[in]max_flfalse for MIN(field) / true for MAX(field)
[in]refReference to the key value and info
[in]item_fieldItem representing field used in MIN/MAX expression
[in]condCondition to check nullptr means that all keys are within the range.
[in]range_flSays whether there is a condition to to be checked
[in]prefix_lenLength of the constant part of the key
Returns
true if the condition is not true for the found row, false otherwise.