The MySQL query optimizer uses estimated
statistics about key
distributions to choose the indexes for an execution plan, based
on the relative
selectivity of the index.
Certain operations cause InnoDB to sample random pages from each
index on a table to estimate the
cardinality of the index.
(This technique is known as
random dives.) These
operations include the ANALYZE
TABLE
statement, the SHOW TABLE
STATUS
statement, and accessing the table for the first
time after a restart.
To give you control over the quality of the statistics estimate
(and thus better information for the query optimizer), you can now
change the number of sampled pages using the parameter
innodb_stats_sample_pages
.
Previously, the number of sampled pages was always 8, which could
be insufficient to produce an accurate estimate, leading to poor
index choices by the query optimizer. This technique is especially
important for large tables and tables used in
joins. Unnecessary
full table scans for
such tables can be a substantial performance issue.
You can set the global parameter
innodb_stats_sample_pages
, at
runtime. The default value for this parameter is 8, preserving the
same behavior as in past releases.
The value of
innodb_stats_sample_pages
affects the index sampling for all tables
and indexes. There are the following potentially significant
impacts when you change the index sample size:
Small values like 1 or 2 can result in very inaccurate estimates of cardinality.
Increasing the
innodb_stats_sample_pages
value might require more disk reads. Values much larger than 8 (say, 100), can cause a big slowdown in the time it takes to open a table or executeSHOW TABLE STATUS
.The optimizer might choose very different query plans based on different estimates of index selectivity.
To disable the cardinality estimation for metadata statements such
as SHOW TABLE STATUS
or
SHOW INDEX
, or when accessing the
INFORMATION_SCHEMA.TABLES
or
INFORMATION_SCHEMA.STATISTICS
tables,
execute the statement SET GLOBAL
innodb_stats_on_metadata=OFF
. The ability to set this
option dynamically is also relatively new.
All InnoDB
tables are opened, and the
statistics are re-estimated for all associated indexes, when the
mysql client starts with the
--auto-rehash
setting on (the
default). To improve the start up time of the
mysql client, you can turn auto-rehash off
using the
--disable-auto-rehash
option. The auto-rehash
feature enables
automatic name completion of database, table, and column names for
interactive users.
Whatever value of
innodb_stats_sample_pages
works
best for a system, set the option and leave it at that value.
Choose a value that results in reasonably accurate estimates for
all tables in your database without requiring excessive I/O.
Because the statistics are automatically recalculated at various
times other than on execution of ANALYZE
TABLE
, it does not make sense to increase the index
sample size, run ANALYZE TABLE
,
then decrease sample size again. The more accurate statistics
calculated by ANALYZE
running with a high value
of innodb_stats_sample_pages
can
be wiped away later.
Although it is not possible to specify the sample size on a
per-table basis, smaller tables generally require fewer index
samples than larger tables do. If your database has many large
tables, consider using a higher value for
innodb_stats_sample_pages
than if
you have mostly smaller tables.