MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Better Performance for JOINs Not Using Indexes

In some cases it is not possible to use an index to optimize a JOIN. This may for example happen when you query the Performance Schema. As a result these kind of queries can be very slow; however in MySQL 5.6 and later you can use a trick to improve the performance considerably.

As a working example in this post, I will use the schema_table_statistics view in the sys schema. Since the view involves the schema, I will create a reasonable large number of databases and tables for the test:

The current view definition for sys.schema_table_statistics is:

The issue is the LEFT JOIN on performance_schema.file_summary_by_instance. The Performance Schema tables do not have any indexes, so the query will need to examine the cross product:

To ensure that some activity has happened on all tables, a mysqldump backup is made:
shell$ mysqldump --all-databases > /dev/null

The original query takes 14 minutes and 24 seconds.

The trick to improve the performance is to rewrite the query to take advantage of the materialized temporary tables that MySQL 5.6 supports. To do this, change the LEFT JOIN on performance_schema.file_summary_by_instance to a LEFT JOIN on a subquery:

Now the output of EXPLAIN looks like:

The key here is the <auto_key0> index used for the LEFT JOIN between the pst table and the <derived2> table. This means the optimizer adds an ad-hoc index.

The performance difference is clear: the query now completes in around 20 seconds.

However the original query is a view definition and subqueries are not allowed in view definitions, so how to improve the view itself?

This can be done by creating a “helper view” instead of the subquery. The important thing to be aware of is that MySQL has two algorithms for creating a view:

  1. MERGE
  2. TEMPTABLE

MERGE will merge the corresponding parts of the view into the statement that refers to the view. So a view using the MERGE algorithm will not work as that will effectively take us back to the original view. However TEMPTABLE will materialize the view before using it with the rest of the statement, so that will do what we need.

So the view definition we need in this case is:

which then can be used in the view definition for schema_table_statistics:

Note: to take fully advantage of this rewrite, the extract_schema_from_file_name() and extract_table_from_file_name() functions needs to be changed from their current definitions (this has been done throughout the examples above). The current definition in the sys schema is that they return a varchar(512) – the same as the argument they take. However as the widest allowed schema and table name is a varchar(64), it is safe to change the return value to a varchar(64). The reason this is important is that the maximum width supported for a MyISAM index is 1000 bytes, so two columns each a UTF8 vachar(512) is too wide to allow for an auto key to be created.

Thanks a lot to Roy Lyseng and Mark Leith who is behind the ideas for this rewrite.

If you are interested to hear more about the Performance Schema, the MySQL sys schema, and performance tuning, then you should consider the following sessions at MySQL Central at Oracle OpenWorld: