WL#11322: SUPPORT LOOSE INDEX RANGE SCANS FOR LOW CARDINALITY
Affects: Server-8.0
—
Status: Complete
Based on the patch provided by Facebook. This WL improves the optimizer range scans for certain queries. Suppose we have the following: CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL, PRIMARY KEY(f1, f2)); INSERT INTO t1 VALUES (1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (2,5); INSERT INTO t1 SELECT f1, f2 + 5 FROM t1; INSERT INTO t1 SELECT f1, f2 + 10 FROM t1; INSERT INTO t1 SELECT f1, f2 + 20 FROM t1; INSERT INTO t1 SELECT f1, f2 + 40 FROM t1; ANALYZE TABLE t1; EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40; MySQL currently chooses index scan to fetch all the rows and later filters the rows in server using predicate f2 > 40. A range scan cannot be used for this case because there is no predicate on the first key-part (f1) of the index. However optimizer can be smarter than this. It can use the current loose index scan to scan for distinct values of the first key_part (f1) and do a sub-range scan on this distinct prefix for f2 > 40." Simplified algorithm for the case above: 1. Get distinct value of the first key part(f1 = 1). 2. Construct the range based on the first and second key part(f1 = 1 AND f2 > 40) 3. Do range scan. 4. Get next distinct value of the first key part(f1 = 2). 5. Construct the range based on the first and second key part(f1 = 2 AND f2 > 40) 4. Do range scan. In this case number of accessed rows are decreased as MySQL skips the records which do not qualify for the constructed range. Example: Without skip scan algorithm: --- set optimizer_switch='skip_scan=off'; EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40; id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t1 NULL index NULL PRIMARY 8 NULL 160 33.33 Using where; Using index Warnings: Note 1003 /* select#1 */ select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`f2` AS `f2` from `test`.`t1` where (`test`.`t1`.`f2` > 40) SHOW STATUS LIKE 'handler_read%'; Variable_name Value Handler_read_first 1 Handler_read_key 1 Handler_read_last 0 Handler_read_next 160 Handler_read_prev 0 Handler_read_rnd 0 Handler_read_rnd_next 0 --- With skip scan algorithm: --- set optimizer_switch='skip_scan=on'; EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40; id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t1 NULL range PRIMARY PRIMARY 8 NULL 53 100.00 Using where; Using index for skip scan Warnings: Note 1003 /* select#1 */ select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`f2` AS `f2` from `test`.`t1` where (`test`.`t1`.`f2` > 40) SHOW STATUS LIKE 'handler_read%'; Variable_name Value Handler_read_first 1 Handler_read_key 5 Handler_read_last 0 Handler_read_next 80 Handler_read_prev 0 Handler_read_rnd 0 Handler_read_rnd_next 0 --- See also See BUG#88103.
Functional requirements: F-1: Skip scan will be considered and is picked if it has better cost. F-2: Optimizer switch 'skip_scan' switch enables/disables the use of 'skip scan' access method. F-3: If loose index 'skip scan' access method is used, 'Using index for skip scan' is printed in the 'Extra' field of the EXPLAIN output. F-4: If the index can be used for skip scan, it should be visible in 'possible_keys' column of the EXPLAIN output. Functional hint requirements: F-5: Hints must be enclosed into /*+ */ comment. F-6: Hints must be specified after SELECT|INSERT|REPLACE|UPDATE|DELETE key words. F-7: EXPLAIN must understand hints. F-8: Active hints must be printed in EXPLAIN warning. F-9: Subsequent conflicting/duplicating hints are ignored with warning. F-10: Unresolved hints cause warning. Non-Functional requirements: NF-1: The performance shall be as good or better for queries where skip scan can be applied for 90% of the test queries. NF-2: A higher performance degradation clearly caused by skewed data is acceptable as long as NF1 is satisfied.
1. Skip scan algorithm can be used for certain type of queries. The overall query form should look like this: SELECT A_1,...,A_k, B_1,...,B_m, C FROM T WHERE EQ(A_1,...,A_k) AND RNG(C); Queries computable via skip scan must satisfy the following conditions: A) Table T has at least one compound index I of the form: I = <A_1,...,A_k, B_1,..., B_m, C ,[D_1,...,D_n]> Key parts A and D may be empty, but B and C must be non-empty. B) Only one table referenced. C) Cannot have group by/select distinct D) Query must reference fields in the index only. E) The predicates on A_1...A_k must be equality predicates and they need to be constants. This includes the 'IN' operator. F) The query must be a conjunctive query. In other words, it is a AND of ORs: (COND1(kp1) OR COND2(kp1)) AND (COND1(kp2) OR ...) AND ... G) There must be a range condition on C. H) Conditions on D columns are allowed. Conditions on D must be in conjunction with range condition on C. 2. Cost model. The cost of doing skip scan is calculated in cost_skip_scan(). To estimate the size of the groups to read, index statistics for distinct groups from rec_per_key is used. For the cost calculation we need the number of distinct groups(num_groups) and the number of processed records(num_processed_records). num_groups can be obtained using following formula: // distinct_key_parts is the number of key parts used to create distinct groups. keys_per_group = index_info->records_per_key(distinct_key_parts); num_groups= table_records / keys_per_group; If there is equality ranges on columns A_1,...,A_k, table_records is the number of records returned by A_1,...,A_k ranges. It's calculated using check_quick_select() function. Number of processed records is the rows which are finally qualified by the range condition on "C for all the distinct groups in the table and it's calculated using following formula: num_processed_records= table_records * filtering_effect; where filtering_effect is the filtering effect for the range 'C'. Note that calculation of the filtering effect is based in the Item_field::get_cond_filter_default_probability(). It uses 'max distinct values' for the effect calculation. So the number of records passed as an argument to the ::get_filtering_effect() should be calculated as keys_per_group / keys_per_range, where keys_per_range is number of records per key for the range 'C'. IO_cost is calculated using method io_cost = HANDLER::index_scan_cost(key, num_groups, num_processed_records); CPU cost is calculated using following formula: cpu_cost = // Multiply by 2 since there are 2 re-positions per group tree_traversal_cost * num_groups * 2 + cost_model->row_evaluate_cost(num_processed_records); where tree_traversal_cost is calculated using formula tree_height = table_records == 0 ? 1.0 : ceil(log(double(table_records))); tree_traversal_cost = cost_model->key_compare_cost(tree_height); where table_records is the number of records present in the table. 3. New optimize switches. New optimizer switch 'skip_scan' is introduced. 'skip_scan' switch enables/disables the use of 'skip scan' algorithm. The switch is 'on' by default. 4. EXPLAIN output changes. If loose index 'skip scan' access method is used, 'Using index for skip scan' is printed in the 'Extra' field of the EXPLAIN output. Example: --- EXPLAIN SELECT f1, f3 FROM t1 WHERE f1 = 2 AND f2 = 3 AND f4 > 3; id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t1 NULL range PRIMARY PRIMARY 16 NULL 1 100.00 Using where; Using index for skip scan --- 5. Optimizer trace changes. Information about skip scan is printed in the following form: "skip scan" { "type": "skip_scan", "index": "index_used_for_skip_scan", "key_parts_used_for_access": ["key_parts_used_for_access"], "equality ranges": ["list_of_equality_ranges"], "non equality ranges": ["list_of_non_equality_ranges"] } For example, for the query EXPLAIN SELECT f1, f3 FROM t1 WHERE f1 = 2 AND f2 = 3 AND f4 > 3 Opt_trace output looks like: skip_scan": { "type": "skip_scan", "index": "PRIMARY", "key_parts_used_for_access": [ "f1", "f2", "f3", "f4" ] /* key_parts_used_for_access */, "equality ranges": [ "2 <= f1 <= 2 AND 3 <= f2 <= 3" ] /* equality ranges */, "non equality ranges": [ "3 < f4" ] /* non equality ranges */, "chosen": true } /* skip_scan */ 6. New optimizer hint SKIP_SCAN/NO_SKIP_SCAN. SKIP_SCAN specification: SKIP_SCAN([@QB_NAME] table [index[, index]...]) or SKIP_SCAN(table@QB_NAME [index[, index]...]) The hint forces the optimizer to use skip scan for the specified table using the specified set of indexes. If no index is specified, all possible indexes are considered by the optimizer and cheapest one is selected. Note that the hint may be ignored if the indexes is not applicable to the given query. NO_SKIP_SCAN specification: NO_SKIP_SCAN([@QB_NAME] table [index[, index]...]) or NO_SKIP_SCAN(table@QB_NAME [index[, index]...]) The hint disables skip scan for specified indexes. If no index is specified, skip scan is not allowed for the table. Resolving conflicting hints. First specified hint is applied and next conflicting hints are ignored with warning. If it's impossible to apply a valid hint, it is silently ignored without any warning. Note that if SKIP_SCAN and NO_SKIP_SCAN are used at the same time, second specified hints is treated as conflicting and ignored with warning. Examples: /*+ SKIP_SCAN(t1 idx1, idx2) NO_SKIP_SCAN(t1 idx1) */ SKIP_SCAN(t1 idx1, idx2) is applicable and NO_SKIP_SCAN is ignored. /*+ NO_SKIP_SCAN(t1 idx1, idx2) SKIP_SCAN(t1 idx2) */ SKIP_SCAN is ignored because there is a preceding hint for the same table. Interaction with optimizer switch. Hint has a higher priority, i.e. if optimizer_switch 'skip_scan' is OFF and hint SKIP_SCAN is specified then the hint is applicable. Interaction with old style hints USE|FORCE|IGNORE INDEX. Old hints have higher priority towards to new style hints. Examples: /*+ SKIP_SCAN(t1 idx1, idx2, idx3) */ ... IGNORE INDEX idx1 There is conflict between old and new style hints. In this case index 'idx1' will be excluded from the possible ranges for skip scan. /*+ NO_SKIP_SCAN(t1 idx1, idx2) */ ... FORCE INDEX idx1, idx2 Skip scan is disallowed for idx1, idx2 but optimizer is forced to use either idx1 or idx2 for range or ref access. Here is no conflicts, both hint are applicable. If IGNORE INDEX hint contains multiple indexes, indexes specified in the old hint are unavailable for skip scan. FORCE/USE INDEX hints make only specified indexes to be available for skip scan. Examples: EXPLAIN SELECT /*+ SKIP_SCAN(t1 f2, f3, f4) */ f2 FROM t1 FORCE INDEX (f2, f3) WHERE f4 = 'h' AND f2 = 2 AND f3 = 'b'; Skip scan will be used only for f2, f3 indexes. EXPLAIN SELECT /*+ SKIP_SCAN(t1 f2, f3, f4) */ f2 FROM t1 USE INDEX (f2, f3) WHERE f4 = 'h' AND f2 = 2 AND f3 = 'b'; Skip scan will be used only for f2, f3 indexes.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.