A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization. The optimizer uses semijoin strategies to improve subquery execution, as described in this section.
          For an inner join between two tables, the join returns a row
          from one table as many times as there are matches in the other
          table. But for some questions, the only information that
          matters is whether there is a match, not the number of
          matches. Suppose that there are tables named
          class and roster that
          list classes in a course curriculum and class rosters
          (students enrolled in each class), respectively. To list the
          classes that actually have students enrolled, you could use
          this join:
        
SELECT class.class_num, class.class_name
    FROM class
    INNER JOIN roster
    WHERE class.class_num = roster.class_num;However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.
          Assuming that class_num is a primary key in
          the class table, duplicate suppression is
          possible by using
          SELECT
          DISTINCT, but it is inefficient to generate all
          matching rows first only to eliminate duplicates later.
        
The same duplicate-free result can be obtained by using a subquery:
SELECT class_num, class_name
    FROM class
    WHERE class_num IN
        (SELECT class_num FROM roster);
          Here, the optimizer can recognize that the
          IN clause requires the subquery
          to return only one instance of each class number from the
          roster table. In this case, the query can
          use a semijoin; that is,
          an operation that returns only one instance of each row in
          class that is matched by rows in
          roster.
        
          The following statement, which contains an
          EXISTS subquery predicate, is
          equivalent to the previous statement containing an
          IN subquery predicate:
        
SELECT class_num, class_name
    FROM class
    WHERE EXISTS
        (SELECT * FROM roster WHERE class.class_num = roster.class_num);
          Any statement with an EXISTS subquery
          predicate is subject to the same semijoin transforms as a
          statement with an equivalent IN subquery
          predicate.
        
The following subqueries are transformed into antijoins:
- NOT IN (SELECT ... FROM ...)
- NOT EXISTS (SELECT ... FROM ...).
- IN (SELECT ... FROM ...) IS NOT TRUE
- EXISTS (SELECT ... FROM ...) IS NOT TRUE.
- IN (SELECT ... FROM ...) IS FALSE
- EXISTS (SELECT ... FROM ...) IS FALSE.
          In short, any negation of a subquery of the form IN
          (SELECT ... FROM ...) or EXISTS (SELECT ...
          FROM ...) is transformed into an antijoin.
        
An antijoin is an operation that returns only rows for which there is no match. Consider the query shown here:
SELECT class_num, class_name
    FROM class
    WHERE class_num NOT IN
        (SELECT class_num FROM roster);
          This query is rewritten internally as the antijoin
          SELECT class_num, class_name FROM class ANTIJOIN
          roster ON class_num, which returns one instance of
          each row in class that is
          not matched by any rows in
          roster. This means that, for each row in
          class, as soon as a match is found in
          roster, the row in class
          can be discarded.
        
          Antijoin transformations cannot in most cases be applied if
          the expressions being compared are nullable. An exception to
          this rule is that (... NOT IN (SELECT ...)) IS NOT
          FALSE and its equivalent (... IN (SELECT
          ...)) IS NOT TRUE can be transformed into antijoins.
        
Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, view references, or common table expressions.
          In MySQL, a subquery must satisfy these criteria to be handled
          as a semijoin (or an antijoin, if NOT
          modifies the subquery):
- It must be part of an - IN,- = ANY, or- EXISTSpredicate that appears at the top level of the- WHEREor- ONclause, possibly as a term in an- ANDexpression. For example:- SELECT ... FROM ot1, ... WHERE (oe1, ...) IN (SELECT ie1, ... FROM it1, ... WHERE ...);- Here, - ot_and- i- it_represent tables in the outer and inner parts of the query, and- i- oe_and- i- ie_represent expressions that refer to columns in the outer and inner tables.- i- The subquery can also be the argument to an expression modified by - NOT,- IS [NOT] TRUE, or- IS [NOT] FALSE.
- It must not contain a - HAVINGclause.
- It must not contain any aggregate functions (whether it is explicitly or implicitly grouped). 
- It must not have a - LIMITclause.
- The statement must not use the - STRAIGHT_JOINjoin type in the outer query.
- The - STRAIGHT_JOINmodifier must not be present.
- The number of outer and inner tables together must be less than the maximum number of tables permitted in a join. 
- The subquery may be correlated or uncorrelated. Decorrelation looks at trivially correlated predicates in the - WHEREclause of a subquery used as the argument to- EXISTS, and makes it possible to optimize it as if it was used within- IN (SELECT b FROM ...). The term trivially correlated means that the predicate is an equality predicate, that it is the sole predicate in the- WHEREclause (or is combined with- AND), and that one operand is from a table referenced in the subquery and the other operand is from the outer query block.
- The - DISTINCTkeyword is permitted but ignored. Semijoin strategies automatically handle duplicate removal.
- A - GROUP BYclause is permitted but ignored, unless the subquery also contains one or more aggregate functions.
- An - ORDER BYclause is permitted but ignored, since ordering is irrelevant to the evaluation of semijoin strategies.
If a subquery meets the preceding criteria, MySQL converts it to a semijoin (or to an antijoin if applicable) and makes a cost-based choice from these strategies:
- Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query. 
- Duplicate Weedout: Run the semijoin as if it was a join and remove duplicate records using a temporary table. 
- FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows. 
- LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group. 
- Materialize the subquery into an indexed temporary table that is used to perform a join, where the index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned. For more information about materialization, see Section 10.2.2.2, “Optimizing Subqueries with Materialization”. 
          Each of these strategies can be enabled or disabled using the
          following optimizer_switch
          system variable flags:
- The - semijoinflag controls whether semijoins and antijoins are used.
- If - semijoinis enabled, the- firstmatch,- loosescan,- duplicateweedout, and- materializationflags enable finer control over the permitted semijoin strategies.
- If the - duplicateweedoutsemijoin strategy is disabled, it is not used unless all other applicable strategies are also disabled.
- If - duplicateweedoutis disabled, on occasion the optimizer may generate a query plan that is far from optimal. This occurs due to heuristic pruning during greedy search, which can be avoided by setting- optimizer_prune_level=0.
These flags are enabled by default. See Section 10.9.2, “Switchable Optimizations”.
          The optimizer minimizes differences in handling of views and
          derived tables. This affects queries that use the
          STRAIGHT_JOIN modifier and a view with an
          IN subquery that can be converted to a
          semijoin. The following query illustrates this because the
          change in processing causes a change in transformation, and
          thus a different execution strategy:
        
CREATE VIEW v AS
SELECT *
FROM t1
WHERE a IN (SELECT b
           FROM t2);
SELECT STRAIGHT_JOIN *
FROM t3 JOIN v ON t3.x = v.a;
          The optimizer first looks at the view and converts the
          IN subquery to a semijoin, then checks
          whether it is possible to merge the view into the outer query.
          Because the STRAIGHT_JOIN modifier in the
          outer query prevents semijoin, the optimizer refuses the
          merge, causing derived table evaluation using a materialized
          table.
        
          EXPLAIN output indicates the
          use of semijoin strategies as follows:
- For extended - EXPLAINoutput, the text displayed by a following- SHOW WARNINGSshows the rewritten query, which displays the semijoin structure. (See Section 10.8.3, “Extended EXPLAIN Output Format”.) From this you can get an idea about which tables were pulled out of the semijoin. If a subquery was converted to a semijoin, you should see that the subquery predicate is gone and its tables and- WHEREclause were merged into the outer query join list and- WHEREclause.
- Temporary table use for Duplicate Weedout is indicated by - Start temporaryand- End temporaryin the- Extracolumn. Tables that were not pulled out and are in the range of- EXPLAINoutput rows covered by- Start temporaryand- End temporaryhave their- rowidin the temporary table.
- FirstMatch(in the- tbl_name)- Extracolumn indicates join shortcutting.
- LooseScan(in the- m..- n)- Extracolumn indicates use of the LooseScan strategy.- mand- nare key part numbers.
- Temporary table use for materialization is indicated by rows with a - select_typevalue of- MATERIALIZEDand rows with a- tablevalue of- <subquery.- N>
          A semijoin transformation can also be applied to a
          single-table UPDATE or
          DELETE statement that uses a
          [NOT] IN or [NOT] EXISTS
          subquery predicate, provided that the statement does not use
          ORDER BY or LIMIT, and
          that semijoin transformations are allowed by an optimizer hint
          or by the optimizer_switch
          setting.