WL#10398: Improve SELECT COUNT(*) performance by using handler::records_from_index(*num_rows, index) in execution phase
Affects: Server-8.0
—
Status: Complete
As part of WL6742,counting the number of records for count(*) was always done using the primary index, this was done because when there are concurrent changes in secondary indexes with multiple users using the table ,the time taken by secondary index to count all the records is slower than primary index scan. But the downside is that always using primary index which can be very big (consisting of many leaf pages) can be slower when compared to a small secondary index in case of single user. See also BUG#80580, BUG#81854, and BUG#82313. There is also a question on StackExchange about count performance: http://dba.stackexchange.com/questions/130780/mysql-count-performance
F1) EXPLAIN output for a "SELECT COUNT(*) ..." query for any table shouldn't say "Select tables optimized away" if the storage engine doesn't include HA_COUNT_ROWS_INSTANT.
NF1) Should improve performance of COUNT(*) queries on INNODB tables (with no conditions / group by etc) since optimizer will be allowed to choose the appropriate index to count the number of rows. This improvement will be seen only for single-threaded workloads. There might be some degradation if the same COUNT(*) query is run from multiple sessions.
1) Replace HA_HAS_RECORDS with HA_COUNT_ROWS_INSTANT to indicate that storage engine returns the number of records instatntly 2) If storage engine doesn't return the number of records instantly then HA_COUNT_ROWS_INSTANT shouldn't be included. So INNODB shouldn't have this flag despite implementing a records() function, since it won't be instant. MYISAM and HEAP, for example, will include this flag because they return the number of records instantly. 3) Implement new function handler::records_from_index(). It will perform an index scan to count number of records, if HA_COUNT_ROWS_INSTANT is not set. 4) Replace existing handler::records(). It will perform an table scan to count number of records, if HA_COUNT_ROWS_INSTANT is not set. Code Changes: 1) Changes to handler.h: a) New handler flag: i) Replace HA_HAS_RECORDS with HA_COUNT_ROWS_INSTANT to indicate that storage engine returns the number of records instatntly. b) New handler function: virtual int records_from_index(ha_rows *num_rows, uint index))). This can be over-ridden by the SE. It will perform an index scan to count number of records, if HA_COUNT_ROWS_INSTANT is not set. c) Replace existing handler::records(). It will perform an table scan to count number of records, if HA_COUNT_ROWS_INSTANT is not set. 2) Changes to sql_optimizer.h: a) New flag JOIN->select_count has been added. When join->select_count is set, tables will not be optimized away. The call to records() will be delayed until the execution phase and the counting will be done on an index of Optimizer's choice. This flag will be set in opt_sum_query. The index will be decided in find_shortest_key(). 3) Changes to opt_sum.cc: a) opt_sum_query() takes HA_COUNT_ROWS_INSTANT into account and ha_records() will be called in the execution phase. The JOIN->select_count flag is set to true. b) In case the query involves both INNODB ( HA_COUNT_ROWS_INSTANT is not set) and MYISAM ( HA_COUNT_ROWS_INSTANT is set), then the call to ha_records() will be delayed till execution. 4) Changes to sql_executor.cc: a) if JOIN->select_count is set, end_send_count() is called and this in turn will call ha_records(). b) records() will be called when optimizer chooses a table scan or primary key is chosen c) records_from_index() will be called otherwise.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.