Index dives are performed during the optimization phase to help decide which index to use. Currently when user specifies FORCE INDEX, optimizer still always calculates cost using index dives. Under some circumstances it is possible to avoid the index dives and this could speed up execution.
With WL#6526 (present in mysql-8.0.3) optimizer skips index dives when the query has a FORCE INDEX.
What is an index dive?
As mentioned in Jorgen’s blog,
For as long as there have been a range access method in MySQL, the number of rows in a range has been estimated by divingĀ down the index to find the start and end of the range and use these to count the number of rows between them. This technique is accurate, and is therefore a good basis to make the best possible execution plan.
For the below queries, two index dives will be performed for each range (one each for min and max values in the range).
SELECT * FROM t1 WHERE (c1 > 1 AND c1 < 10) OR (c1 > 10 AND c1 < 20) ;
SELECT * FROM t1 WHERE c1 IN (11, 22) ;
Index dives can be skipped when there is a FORCE INDEX because the choice of index is pre-decided and access method depends on the WHERE clause (ref-access is used only when a single equality condition is present, in all other cases range access is chosen).
Optimizer skips index dives when:
- Only a single table is accessed in the query.
- FORCE INDEX applies to a single index.
- No subquery is present.
- Fulltext Index is not involved.
- No GROUP-BY or DISTINCT clause.
- No ORDER-BY clause.
This optimization currently does not apply to multi-table queries.
Identify whether a query is using this optimization:
- EXPLAIN:
- EXPLAIN FORMAT=TRADITIONAL FOR CONNECTION will see the following changes:
- Number of “rows” will change to NULL for the table where index dive is skipped.
- The value for “filtered” will be NULL.
- EXPLAIN FORMAT=JSON FOR CONNECTION will see the following changes:
- “rows_examined_per_scan” and “rows_produced_per_join” will not be
mentioned when index dive is skipped. - Number of “rows” will change to NULL for the table where index dive is
skipped. - The value for “filtered” will be NULL.
- “rows_examined_per_scan” and “rows_produced_per_join” will not be
- No change in output of EXPLAIN FORMAT=TRADITIONAL and EXPLAIN FORMAT=JSON
- EXPLAIN FORMAT=TRADITIONAL FOR CONNECTION will see the following changes:
- Optimizer trace:
- Optimizer trace contains the tag “skipped_due_to_force_index”.
12345678910111213141516"range_scan_alternatives": [{"index": "c1_idx","ranges": ["1 < c1 < 10","10 < c1 < 20"],"index_dives_for_range_access": "skipped_due_to_force_index","rowid_ordered": false,"using_mrr": false,"index_only": true,"rows": "not applicable","cost": "not applicable","chosen": true}],
- Optimizer trace contains the tag “skipped_due_to_force_index”.
So what kind of queries could benefit from this optimization?
- Queries that contain large number of values in IN clause OR
- Queries with a large number of range conditions. (see queries below).
In the example below the first query has no FORCE INDEX and hence this optimization doesn’t apply. The second query has FORCE INDEX and takes significantly less time for the “statistics” phase of the query execution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
mysql> TRUNCATE TABLE performance_schema.events_stages_history; Query OK, 0 rows affected (0.00 sec) mysql> TRUNCATE TABLE performance_schema.events_statements_history ; Query OK, 0 rows affected (0.01 sec) mysql> SELECT c1 FROM t1 WHERE ( c1 > 10 AND c1 < 11) OR ... <200 clauses>; +------+ | c1 | +------+ | 25 | <more rows> +------+ 832 rows in set (0.02 sec) mysql> mysql> SELECT c1 FROM t1 FORCE INDEX(c1_idx) WHERE ( c1 > 10 AND c1 < 11) OR ... <200 clauses>; +------+ | c1 | +------+ | 25 | <more rows> +------+ 832 rows in set (0.01 sec) mysql> mysql> select SQL_TEXT, stages.EVENT_NAME, stages.TIMER_WAIT/1000000000 "Time (ms)" -> from performance_schema.events_stages_history AS stages JOIN performance_schema.events_statements_history AS statements -> ON (stages.EVENT_ID > statements.EVENT_ID AND stages.EVENT_ID <= statements.END_EVENT_ID) -> WHERE stages.EVENT_NAME LIKE '%statistics%' AND SQL_TEXT LIKE '%SELECT c1 FROM t1%' ; +-------------------------------------------------------------------------+----------------------+------------+ | SQL_TEXT | EVENT_NAME | Time (ms) | +-------------------------------------------------------------------------+----------------------+------------+ | SELECT c1 FROM t1 WHERE ( c1 > 10 AND c1 < 11) ... | stage/sql/statistics | 29.5890 | | SELECT c1 FROM t1 FORCE INDEX(c1_idx) WHERE ( c1 > 10 AND c1 < 11) ... | stage/sql/statistics | 2.3026 | +-------------------------------------------------------------------------+----------------------+------------+ 2 rows in set (0.00 sec) mysql> |
This is a feature request from Facebook.
The optimization applies by default starting from MySQL 8.0.3. You do not need to enable any settings or run any commands to turn it on, Please try it out, and let us know your results.
Thank you for using MySQL!