The MySQL query optimizer uses estimated statistics about key
distributions to choose the indexes for an execution plan, based
on the relative
selectivity of the index.
Certain operations cause InnoDB to sample random pages from each
index on a table to estimate the
cardinality of the index.
(This technique is known as
random dives.) These
operations include the ANALYZE
TABLE statement, the SHOW TABLE
STATUS statement, and accessing the table for the first
time after a restart.
To give you control over the quality of the statistics estimate
(and thus better information for the query optimizer), you can now
change the number of sampled pages using the parameter
innodb_stats_sample_pages.
Previously, the number of sampled pages was always 8, which could
be insufficient to produce an accurate estimate, leading to poor
index choices by the query optimizer. This technique is especially
important for large tables and tables used in
joins. Unnecessary
full table scans for
such tables can be a substantial performance issue.
You can set the global parameter
innodb_stats_sample_pages, at
runtime. The default value for this parameter is 8, preserving the
same behavior as in past releases.
The value of
innodb_stats_sample_pages
affects the index sampling for all tables
and indexes. There are the following potentially significant
impacts when you change the index sample size:
Small values like 1 or 2 can result in very inaccurate estimates of cardinality.
Increasing the
innodb_stats_sample_pages
value might require more disk reads. Values much larger
than 8 (say, 100), can cause a big slowdown in the time it
takes to open a table or execute SHOW TABLE
STATUS.
The optimizer might choose very different query plans based on different estimates of index selectivity.
To disable the cardinality estimation for metadata statements such
as SHOW TABLE STATUS, execute the statement
SET GLOBAL innodb_stats_on_metadata=OFF (or
0). The ability to set this option dynamically
is also relatively new.
All InnoDB tables are opened, and the statistics are re-estimated
for all associated indexes, when the mysql
client starts if the auto-rehash setting is set on (the default).
To improve the start up time of the mysql
client, you can turn auto-rehash off. The auto-rehash feature
enables automatic name completion of database, table, and column
names for interactive users.
Whatever value of
innodb_stats_sample_pages works
best for a system, set the option and leave it at that value.
Choose a value that results in reasonably accurate estimates for
all tables in your database without requiring excessive I/O.
Because the statistics are automatically recalculated at various
times other than on execution of ANALYZE
TABLE, it does not make sense to increase the index
sample size, run ANALYZE TABLE,
then decrease sample size again. The more accurate statistics
calculated by ANALYZE running with a high value
of innodb_stats_sample_pages can
be wiped away later.
Although it is not possible to specify the sample size on a
per-table basis, smaller tables generally require fewer index
samples than larger tables do. If your database has many large
tables, consider using a higher value for
innodb_stats_sample_pages than if
you have mostly smaller tables.

User Comments
Add your own comment.