As of MySQL 5.6.5, the optimizer uses semi-join strategies to improve subquery execution, as described in this section.
For an inner join between two tables, the join returns a row
from one table as many times as there are matches in the other
table. But for some questions, the only information that
matters is whether there is a match, not the number of
matches. Suppose that there are tables named
list classes in a course curriculum and class rosters
(students enrolled in each class), respectively. To list the
classes that actually have students enrolled, you could use
SELECT class.class_num, class.class_name FROM class INNER JOIN roster WHERE class.class_num = roster.class_num;
However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.
class_num is a primary key in
class table, duplicate suppression
could be achieved by using
DISTINCT, but it is inefficient to generate all
matching rows first only to eliminate duplicates later.
The same duplicate-free result can be obtained by using a subquery:
SELECT class_num, class_name FROM class WHERE class_num IN (SELECT class_num FROM roster);
Here, the optimizer can recognize that the
IN clause requires the subquery to return
only one instance of each class number from the
roster table. In this case, the query can
be executed as a semi-join—that
is, an operation that returns only one instance of each row in
class that is matched by rows in
Before MySQL 5.6.6, the outer query specification was limited to simple table scans or inner joins using comma syntax, and view references were not possible. As of 5.6.6, outer join and inner join syntax is permitted in the outer query specification, and the restriction that table references must be base tables has been lifted.
In MySQL, a subquery must satisfy these criteria to be handled as a semi-join:
It must be an
=ANY) subquery that appears at the top
level of the
ON clause, possibly as a term in an
AND expression. For example:
SELECT ... FROM ot1, ... WHERE (oe1, ...) IN (SELECT ie1, ... FROM it1, ... WHERE ...);
represent tables in the outer and inner parts of the
represent expressions that refer to columns in the outer
and inner tables.
It must not contain a
GROUP BY or
HAVING clause or aggregate functions.
It must not have
ORDER BY with
The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.
The subquery may be correlated or uncorrelated.
DISTINCT is permitted, as is
ORDER BY is
If a subquery meets the preceding criteria, MySQL converts it to a semi-join and makes a cost-based choice from these strategies:
Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.
Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.
FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.
LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.
Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned.
Each of these strategies except Duplicate Weedout can be
enabled or disabled using the
semijoin flag controls
whether semi-joins are used. If it is set to
materialization flags enable finer control
over the permitted semi-join strategies. These flags are
on by default. See
Section 188.8.131.52, “Controlling Switchable Optimizations”.
The use of semi-join strategies is indicated in
EXPLAIN output as follows:
Semi-joined tables show up in the outer select.
EXPLAIN EXTENDED plus
SHOW WARNINGS shows the
rewritten query, which displays the semi-join structure.
From this you can get an idea about which tables were
pulled out of the semi-join. If a subquery was converted
to a semi-join, you will see that the subquery predicate
is gone and its tables and
were merged into the outer query join list and
Temporary table use for Duplicate Weedout is indicated by
Start temporary and
temporary in the
column. Tables that were not pulled out and are in the
rows covered by
Start temporary and
End temporary will have their
rowid in the temporary table.
Extra column indicates join
Extra column indicates use of
the LooseScan strategy.
n are key part numbers.
As of MySQL 5.6.7, temporary table use for materialization
is indicated by rows with a
MATERIALIZED and rows with a
table value of
Before MySQL 5.6.7, temporary table use for
materialization is indicated in the
Extra column by
Materialize if a single table is used,
Start materialize and
End materialize if multiple tables are
Scan is present, no temporary
table index is used for table reads. Otherwise, an index
lookup is used.