Certain optimizations are applicable to comparisons that use
          the IN (or =ANY)
          operator to test subquery results. This section discusses
          these optimizations, particularly with regard to the
          challenges that NULL values present. The
          last part of the discussion suggests how you can help the
          optimizer.
        
Consider the following subquery comparison:
outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)
          MySQL evaluates queries “from outside to inside.”
          That is, it first obtains the value of the outer expression
          outer_expr, and then runs the
          subquery and captures the rows that it produces.
        
          A very useful optimization is to “inform” the
          subquery that the only rows of interest are those where the
          inner expression inner_expr is
          equal to outer_expr. This is done
          by pushing down an appropriate equality into the
          subquery's WHERE clause to make it
          more restrictive. The converted comparison looks like this:
        
EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.
          More generally, a comparison of N
          values to a subquery that returns
          N-value rows is subject to the same
          conversion. If oe_i and
          ie_i represent corresponding outer
          and inner expression values, this subquery comparison:
        
(oe_1, ..., oe_N) IN
  (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)Becomes:
EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND oe_1 = ie_1
                          AND ...
                          AND oe_N = ie_N)For simplicity, the following discussion assumes a single pair of outer and inner expression values.
The “pushdown” strategy just described works if either of these conditions is true:
outer_exprandinner_exprcannot beNULL.You need not distinguish
NULLfromFALSEsubquery results. If the subquery is a part of anORorANDexpression in theWHEREclause, MySQL assumes that you do not care. Another instance where the optimizer notices thatNULLandFALSEsubquery results need not be distinguished is this construct:... WHERE outer_expr IN (subquery)In this case, the
WHEREclause rejects the row whetherIN (returnssubquery)NULLorFALSE.
          Suppose that outer_expr is known to
          be a non-NULL value but the subquery does
          not produce a row such that
          outer_expr =
          inner_expr. Then
           evaluates as follows:
outer_expr IN (SELECT
          ...)
          In this situation, the approach of looking for rows with
           is no longer
          valid. It is necessary to look for such rows, but if none are
          found, also look for rows where
          outer_expr =
          inner_exprinner_expr is
          NULL. Roughly speaking, the subquery can be
          converted to something like this:
        
EXISTS (SELECT 1 FROM ... WHERE subquery_where AND
        (outer_expr=inner_expr OR inner_expr IS NULL))
          The need to evaluate the extra IS
          NULL condition is why MySQL has the
          ref_or_null access method:
        
mysql> EXPLAIN
       SELECT outer_expr IN (SELECT t2.maybe_null_key
                             FROM t2, t3 WHERE ...)
       FROM t1;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: ref_or_null
possible_keys: maybe_null_key
          key: maybe_null_key
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Using index
...
          The unique_subquery and
          index_subquery
          subquery-specific access methods also have “or
          NULL” variants.
        
          The additional OR ... IS NULL condition
          makes query execution slightly more complicated (and some
          optimizations within the subquery become inapplicable), but
          generally this is tolerable.
        
          The situation is much worse when
          outer_expr can be
          NULL. According to the SQL interpretation
          of NULL as “unknown value,”
          NULL IN (SELECT  should evaluate to:
inner_expr
          ...)
          For proper evaluation, it is necessary to be able to check
          whether the SELECT has produced
          any rows at all, so
           cannot be
          pushed down into the subquery. This is a problem because many
          real world subqueries become very slow unless the equality can
          be pushed down.
        outer_expr =
          inner_expr
          Essentially, there must be different ways to execute the
          subquery depending on the value of
          outer_expr.
        
          The optimizer chooses SQL compliance over speed, so it
          accounts for the possibility that
          outer_expr might be
          NULL:
If
outer_exprisNULL, to evaluate the following expression, it is necessary to execute theSELECTto determine whether it produces any rows:NULL IN (SELECT inner_expr FROM ... WHERE subquery_where)It is necessary to execute the original
SELECThere, without any pushed-down equalities of the kind mentioned previously.On the other hand, when
outer_expris notNULL, it is absolutely essential that this comparison:outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)Be converted to this expression that uses a pushed-down condition:
EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)Without this conversion, subqueries are slow.
To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within “trigger” functions. Thus, an expression of the following form:
outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)Is converted into:
EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(outer_expr=inner_expr))More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:
(oe_1, ..., oe_N) IN (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)And converts it to this expression:
EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(oe_1=ie_1)
                          AND ...
                          AND trigcond(oe_N=ie_N)
       )
          Each trigcond(
          is a special function that evaluates to the following values:
X)
Xwhen the “linked” outer expressionoe_iis notNULLTRUEwhen the “linked” outer expressionoe_iisNULL
            Trigger functions are not triggers of
            the kind that you create with CREATE
            TRIGGER.
          Equalities that are wrapped within
          trigcond() functions are not first class
          predicates for the query optimizer. Most optimizations cannot
          deal with predicates that may be turned on and off at query
          execution time, so they assume any
          trigcond( to
          be an unknown function and ignore it. Triggered equalities can
          be used by those optimizations:
X)
Reference optimizations:
trigcond(can be used to constructX=Y[ORYIS NULL])ref,eq_ref, orref_or_nulltable accesses.Index lookup-based subquery execution engines:
trigcond(can be used to constructX=Y)unique_subqueryorindex_subqueryaccesses.Table-condition generator: If the subquery is a join of several tables, the triggered condition is checked as soon as possible.
          When the optimizer uses a triggered condition to create some
          kind of index lookup-based access (as for the first two items
          of the preceding list), it must have a fallback strategy for
          the case when the condition is turned off. This fallback
          strategy is always the same: Do a full table scan. In
          EXPLAIN output, the fallback
          shows up as Full scan on NULL key in the
          Extra column:
        
mysql> EXPLAIN SELECT t1.col1,
       t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
        ...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: index_subquery
possible_keys: key1
          key: key1
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Full scan on NULL key
          If you run EXPLAIN followed by
          SHOW WARNINGS, you can see the
          triggered condition:
        
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `test`.`t1`.`col1` AS `col1`,
         <in_optimizer>(`test`.`t1`.`col1`,
         <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2
         on key1 checking NULL
         where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having
         trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS
         `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)`
         from `test`.`t1`
          The use of triggered conditions has some performance
          implications. A NULL IN (SELECT ...)
          expression now may cause a full table scan (which is slow)
          when it previously did not. This is the price paid for correct
          results (the goal of the trigger-condition strategy is to
          improve compliance, not speed).
        
          For multiple-table subqueries, execution of NULL IN
          (SELECT ...) is particularly slow because the join
          optimizer does not optimize for the case where the outer
          expression is NULL. It assumes that
          subquery evaluations with NULL on the left
          side are very rare, even if there are statistics that indicate
          otherwise. On the other hand, if the outer expression might be
          NULL but never actually is, there is no
          performance penalty.
        
To help the query optimizer better execute your queries, use these suggestions:
Declare a column as
NOT NULLif it really is. This also helps other aspects of the optimizer by simplifying condition testing for the column.If you need not distinguish a
NULLfromFALSEsubquery result, you can easily avoid the slow execution path. Replace a comparison that looks like this:outer_expr [NOT] IN (SELECT inner_expr FROM ...)with this expression:
(outer_expr IS NOT NULL) AND (outer_expr [NOT] IN (SELECT inner_expr FROM ...))Then
NULL IN (SELECT ...)is never evaluated because MySQL stops evaluatingANDparts as soon as the expression result is clear.Another possible rewrite:
[NOT] EXISTS (SELECT inner_expr FROM ... WHERE inner_expr=outer_expr)
          The
          subquery_materialization_cost_based
          flag of the optimizer_switch
          system variable enables control over the choice between
          subquery materialization and
          IN-to-EXISTS subquery
          transformation. See
          Section 10.9.2, “Switchable Optimizations”.