Documentation Home
MySQL 8.0 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 46.8Mb
PDF (A4) - 46.9Mb
PDF (RPM) - 42.3Mb
HTML Download (TGZ) - 10.8Mb
HTML Download (Zip) - 10.8Mb
HTML Download (RPM) - 9.3Mb
Man Pages (TGZ) - 226.1Kb
Man Pages (Zip) - 331.6Kb
Info (Gzip) - 4.2Mb
Info (Zip) - 4.2Mb
Excerpts from this Manual

8.2.1.4 Hash Join Optimization

Beginning with MySQL 8.0.18, MySQL employs a hash join for any query for which each join has an equi-join condition and uses no indexes, such as this one:

SELECT * 
    FROM t1 
    JOIN t2 
        ON t1.c1=t2.c1;

A hash join is usually faster than and is intended to be used in such cases instead of the block nested loop algorithm (see Block Nested-Loop Join Algorithm) employed in previous versions of MySQL.

In the example just shown and the remaining examples in this section, we assume that the three tables t1, t2, and t3 have been created using the following statements:

CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);

You can see that a hash join is being employed by using EXPLAIN FORMAT=TREE, like this:

mysql> EXPLAIN FORMAT=TREE
    -> SELECT * 
    ->     FROM t1 
    ->     JOIN t2 
    ->         ON t1.c1=t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t1  (cost=0.35 rows=1)

To see whether hash joins are being used for a given join, you must use EXPLAIN with the FORMAT=TREE option. EXPLAIN ANALYZE also displays information about hash joins used.

The hash join is used for queries involving multiple joins as well, as long as at least one join condition for each pair of tables is an equi-join, like the query shown here:

SELECT * 
    FROM t1
    JOIN t2 
        ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
    JOIN t3 
        ON (t2.c1 = t3.c1);

In cases like the one just shown, any extra conditions which are not equi-joins are applied as filters after the join is executed. This can be seen here in the output of EXPLAIN FORMAT=TREE:

mysql> EXPLAIN FORMAT=TREE
    -> SELECT * 
    ->     FROM t1
    ->     JOIN t2 
    ->         ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
    ->     JOIN t3 
    ->         ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t3.c1 = t1.c1)  (cost=1.05 rows=1)
    -> Table scan on t3  (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 < t2.c2)  (cost=0.70 rows=1)
            -> Inner hash join (t2.c1 = t1.c1)  (cost=0.70 rows=1)
                -> Table scan on t2  (cost=0.35 rows=1)
                -> Hash
                    -> Table scan on t1  (cost=0.35 rows=1)

As also can be seen from the output just shown, multiple hash joins can be (and are) used for joins having multiple equi-join conditions.

A hash join cannot be used if any pair of joined tables does not have at least one equi-join condition, as can be seen here:

mysql> EXPLAIN FORMAT=TREE
    ->     SELECT * 
    ->         FROM t1
    ->         JOIN t2 
    ->             ON (t1.c1 = t2.c1)
    ->         JOIN t3 
    ->             ON (t2.c1 < t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

In this case, the slower block nested loop algorithm is employed, as in versions previous to MySQL 8.0.18 where no indexes are available:

mysql> EXPLAIN
    ->     SELECT * 
    ->         FROM t1
    ->         JOIN t2 
    ->             ON (t1.c1 = t2.c1)
    ->         JOIN t3 
    ->             ON (t2.c1 < t3.c1)\G             
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: t3
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)

A hash join is also applied for a Cartesian product—that is, when no join condition is specified, as shown here:

mysql> EXPLAIN FORMAT=TREE
    -> SELECT *
    ->     FROM t1
    ->     JOIN t2
    ->     WHERE t1.c2 > 50\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 > 50)  (cost=0.35 rows=1)
            -> Table scan on t1  (cost=0.35 rows=1)

By default, MySQL employs hash joins whenever possible. It is possible to control whether hash joins are used in either of two ways:

  • On the global or session level by using hash_join=on or hash_join=off as part of the setting for the optimizer_switch server system variable. The default is hash_join=on.

  • On a case-by-case basis, by using one of the optimizer hints HASH_JOIN or NO_HASH_JOIN as part of a given join.

Memory usage by hash joins can be controlled using the join_buffer_size system variable; a hash join cannot use more memory than this amount. When the memory required for a hash join exceeds the amount available, MySQL handles this by using files on disk. If this happens, you should be aware that the join may not succeed if a hash join cannot fit into memory and it creates more files than set for open_files_limit. To avoid such problems, make either of the following changes:

  • Increase join_buffer_size so that the hash join does not spill over to disk.

  • Increase open_files_limit.