By default, MySQL (8.0.18 and later) employs hash joins
whenever possible. It is possible to control whether hash
joins are employed using one of the
BNL
and
NO_BNL
optimizer hints, or
by setting
block_nested_loop=on
or
block_nested_loop=off
as part of the
setting for the optimizer_switch server system variable.
MySQL 8.0.18 supported setting a
hash_join
flag in
optimizer_switch
, as well
as the optimizer hints
HASH_JOIN
and
NO_HASH_JOIN
. In MySQL 8.0.19 and later,
none of these have any effect any longer.
Beginning with MySQL 8.0.18, MySQL employs a hash join for any query for which each join has an equi-join condition, and in which there are no indexes that can be applied to any join conditions, such as this one:
SELECT *
FROM t1
JOIN t2
ON t1.c1=t2.c1;
A hash join can also be used when there are one or more indexes that can be used for single-table predicates.
A hash join is usually faster than and is intended to be used in such cases instead of the block nested loop algorithm (see Block Nested-Loop Join Algorithm) employed in previous versions of MySQL. Beginning with MySQL 8.0.20, support for block nested loop is removed, and the server employs a hash join wherever a block nested loop would have been used previously.
In the example just shown and the remaining examples in this
section, we assume that the three tables
t1
, t2
, and
t3
have been created using the following
statements:
CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);
You can see that a hash join is being employed by using
EXPLAIN
,
like this:
mysql> EXPLAIN
-> SELECT * FROM t1
-> JOIN t2 ON t1.c1=t2.c1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using join buffer (hash join)
(Prior to MySQL 8.0.20, it was necessary to include the
FORMAT=TREE
option to see whether hash
joins were being used for a given join.)
EXPLAIN ANALYZE
also displays
information about hash joins used.
The hash join is used for queries involving multiple joins as well, as long as at least one join condition for each pair of tables is an equi-join, like the query shown here:
SELECT * FROM t1
JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
JOIN t3 ON (t2.c1 = t3.c1);
In cases like the one just shown, which makes use of an inner
join, any extra conditions which are not equi-joins are
applied as filters after the join is executed. (For outer
joins, such as left joins, semijoins, and antijoins, they are
printed as part of the join.) This can be seen here in the
output of EXPLAIN
:
mysql> EXPLAIN FORMAT=TREE
-> SELECT *
-> FROM t1
-> JOIN t2
-> ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
-> JOIN t3
-> ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1)
-> Table scan on t3 (cost=0.35 rows=1)
-> Hash
-> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1)
-> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Table scan on t1 (cost=0.35 rows=1)
As also can be seen from the output just shown, multiple hash joins can be (and are) used for joins having multiple equi-join conditions.
Prior to MySQL 8.0.20, a hash join could not be used if any pair of joined tables did not have at least one equi-join condition, and the slower block nested loop algorithm was employed. In MySQL 8.0.20 and later, the hash join is used in such cases, as shown here:
mysql> EXPLAIN FORMAT=TREE
-> SELECT * FROM t1
-> JOIN t2 ON (t1.c1 = t2.c1)
-> JOIN t3 ON (t2.c1 < t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.c1 < t3.c1) (cost=1.05 rows=1)
-> Inner hash join (no condition) (cost=1.05 rows=1)
-> Table scan on t3 (cost=0.35 rows=1)
-> Hash
-> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Table scan on t1 (cost=0.35 rows=1)
(Additional examples are provided later in this section.)
A hash join is also applied for a Cartesian product—that is, when no join condition is specified, as shown here:
mysql> EXPLAIN FORMAT=TREE
-> SELECT *
-> FROM t1
-> JOIN t2
-> WHERE t1.c2 > 50\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Filter: (t1.c2 > 50) (cost=0.35 rows=1)
-> Table scan on t1 (cost=0.35 rows=1)
In MySQL 8.0.20 and later, it is no longer necessary for the join to contain at least one equi-join condition in order for a hash join to be used. This means that the types of queries which can be optimized using hash joins include those in the following list (with examples):
Inner non-equi-join:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Filter: (t1.c1 < t2.c1) (cost=4.70 rows=12) -> Inner hash join (no condition) (cost=4.70 rows=12) -> Table scan on t2 (cost=0.08 rows=6) -> Hash -> Table scan on t1 (cost=0.85 rows=6)
Semijoin:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 -> WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G *************************** 1. row *************************** EXPLAIN: -> Hash semijoin (t2.c2 = t1.c1) (cost=0.70 rows=1) -> Table scan on t1 (cost=0.35 rows=1) -> Hash -> Table scan on t2 (cost=0.35 rows=1)
Antijoin:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2 -> WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.c1 = t2.c1)\G *************************** 1. row *************************** EXPLAIN: -> Hash antijoin (t1.c1 = t2.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1) 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Note Code: 1276 Message: Field or reference 't3.t2.c1' of SELECT #2 was resolved in SELECT #1
Left outer join:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Left hash join (t2.c1 = t1.c1) (cost=0.70 rows=1) -> Table scan on t1 (cost=0.35 rows=1) -> Hash -> Table scan on t2 (cost=0.35 rows=1)
Right outer join (observe that MySQL rewrites all right outer joins as left outer joins):
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Left hash join (t1.c1 = t2.c1) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1)
By default, MySQL 8.0.18 and later employs hash joins whenever
possible. It is possible to control whether hash joins are
employed using one of the
BNL
and
NO_BNL
optimizer hints.
(MySQL 8.0.18 supported
hash_join=on
or
hash_join=off
as part of the
setting for the
optimizer_switch
server
system variable as well as the optimizer hints
HASH_JOIN
or
NO_HASH_JOIN
. In MySQL
8.0.19 and later, these no longer have any effect.)
Memory usage by hash joins can be controlled using the
join_buffer_size
system
variable; a hash join cannot use more memory than this amount.
When the memory required for a hash join exceeds the amount
available, MySQL handles this by using files on disk. If this
happens, you should be aware that the join may not succeed if
a hash join cannot fit into memory and it creates more files
than set for
open_files_limit
. To avoid
such problems, make either of the following changes:
Increase
join_buffer_size
so that the hash join does not spill over to disk.Increase
open_files_limit
.
Beginning with MySQL 8.0.18, join buffers for hash joins are
allocated incrementally; thus, you can set
join_buffer_size
higher
without small queries allocating very large amounts of RAM,
but outer joins allocate the entire buffer. In MySQL 8.0.20
and later, hash joins are used for outer joins (including
antijoins and semijoins) as well, so this is no longer an
issue.