Overview
MySQL 8.0 comes with the new design of INFORMATION_SCHEMA subsystem. The blog MySQL 8.0: Improvements to Information_schema provides an overview of the improvements we made. This blog focuses mainly to demonstrate performance of the INFORMATION_SCHEMA in MySQL 8.0, giving us an idea on the kind of performance gain that one can expect.
Tests
Let us look at three different cases focusing of A) Area of major performance gain, B) the performance gain when reading the static table meta-data and C) the performance of a query reading dynamic table meta-data. The configuration details of MySQL server instances and the machine details are provided at the end of the blog.
A) Read column names of all InnoDB tables:
One of the major performance bottlenecks of INFORMATION_SCHEMA queries in MySQL 5.7 is seen a query using more than one INFORMATION_SCHEMA table. Let us consider one such query for our study. For example, the following query retrieves column names for all the InnoDB tables,
1 2 3 4 5 6 |
SELECT t.table_schema, t.table_name, c.column_name FROM information_schema.tables t, information_schema.columns c WHERE t.table_schema = c.table_schema AND t.table_name = c.table_name AND t.engine='InnoDB'; |
The following chart shows us the huge performance gain we get in MySQL 8.0. We see that the query executes several hundred times faster than MySQL 5.7. It is ~140 times faster on the Server instance 1 and ~1000 times faster on Server instance 4.
On Server Instance 5, the query takes an average of 40 seconds on 8.0 and we estimate that it would approximately take 20 hours on 5.7. On Server Instance 6, the query takes an average of 80 seconds on 8.0 and we estimate that it would approximately take 40 hours on 5.7. We also need to repeat the test several times for statistical significance, so for practical reasons, we have skipped this.
B) Read static table meta-data:
This sections looks at performance of INFORMATION_SCHEMA query which retrieves static table meta-data. Let us consider following query which reads some of static table meta-data from all the tables in every schema name like ‘db%’.
1 2 3 |
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, ENGINE, ROW_FORMAT FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'db%'; |
The below chart shows that this query executes about ~30 times faster in MySQL 8.0 compared to MySQL 5.7.
C) Read dynamic table meta-data:
INFORMATION_SCHEMA queries also get us dynamic table meta-data. Dynamic meta-data frequently changes (for example: the auto_increment value will advance after each insert). In many cases the dynamic meta-data will also incur some cost to accurately calculate on demand, and accuracy may not be beneficial for the typical query. Consider the case of the DATA_FREE statistic which shows the number of free bytes in a table – a cached value is usually sufficient. In MySQL 8.0, the dynamic table meta-data will default to being cached. This is configurable via the setting information_schema_stats (default cached), and can be changed to information_schema_stats=latest in order to always retrieve the dynamic information directly from the storage engine (at the cost of slightly higher query execution). See manual section Information schema optimization for more details.
For our study, let us consider the following query to read TABLE_ROWS dynamic table meta-data from all the tables in schema name like ‘db%’. Let us execute this query in both information_schema_stats=cached and the information_schema_stats=latest configuration setting.
1 2 3 |
SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'db%'; |
C.1) When we execute the above query in setting information_schema_stats=cached (the default) setting, we see that this query also executes about ~30 times faster in MySQL 8.0 compared to MySQL 5.7, approximately it takes the same amount of time as in the case of INFORMATION_SCHEMA query that reads static table meta-data.
C.2) When we execute the same above query in information_schema_stats=latest setting, we see that the performance of MySQL 8.0 is slightly better than MySQL 5.7. This configuration enables queries to get latest dynamic table statics from Storage Engines by opening the underlying tables, and hence is slower when compared to other configurations above.
Observations
- Reading static table meta-data in a query involving multiple INFORMATION_SCHEMA tables joined, give us a huge performance gain in the order of ~100-1000 times faster query execution in MySQL 8.0.
- Reading static table meta-data in a query using a single INFORMATION_SCHEMA table, would give us the performance gain in order of ~30 times faster query execution in MySQL 8.0.
- Reading dynamic table meta-data from INFORMATION_SCHEMA with setting information_schema_stats=cached also gives us gain in order of ~30 times faster query execution in MySQL 8.0.
- Reading dynamic table meta-data from INFORMATION_SCHEMA with setting information_schema_stats=latest, performs about 10%-60% faster in MySQL 8.0. The query in this configuration does fetch latest statistics from Storage Engine during the query execution and hence takes considerably more time when compared to other configurations discussed above. We believe that the design of INFORMATION_SCHEMA in MySQL 8.0 subsystem opens-up new opportunities for us to further improve performance of queries retrieving dynamic table statistics.
From above, we see that the INFORMATION_SCHEMA design in MySQL 8.0 brings in huge performance improvement. We now scale in a linear fashion up to 1 Million tables. And it also enable users to configure the way INFORMATION_SCHEMA queries should read dynamic table meta-data, which was not possible before MySQL 8.0. We believe that this enhancement would enable several MySQL applications to run much faster go forward.
Configuration
A server running Ubuntu 16.04, 64bit 2.6GHz CPU was used to run the tests. And following MySQL server instances (SI) are used to execute the queries above,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SI 1 - 5,000 tables ( 100 databases with 50 tables each) SI 2 - 10,000 tables ( 100 databases with 100 tables each) SI 3 - 50,000 tables ( 100 databases with 500 tables each) SI 4 - 100,000 tables (1000 databases with 100 tables each) SI 5 - 500,000 tables (1000 databases with 500 tables each) SI 6 - 1,000,000 tables (1000 databases with 1000 tables each) Each table definition is as below, with two rows in it, CREATE TABLE t1 ( f1 int, f2 int, f3 int, f4 int, f5 int, f6 int, f7 int, f8 int, f9 int, f10 int, f11 int, f12 int, f13 int, f14 int, f15 int, f16 int, f17 int, f18 int, f19 int, f20 int, PRIMARY KEY (f1)); INSERT INTO t1 (f1,f2,f3,f6,f10,f19) VALUES (1, 1, 1, 1, 1, 1); INSERT INTO t1 (f1,f2,f3,f6,f10,f19) VALUES (2, 2, 2, 2, 2, 2); |
We also run ‘mysqlcheck -Aa’ on each instance, so that the ‘ANALYZE TABLE’ is run on all the tables. Which will cache the dynamic table statistics for all the tables in MySQL data dictionary tables.
Thanks for using MySQL!