13.1.1 Scalar IN Subquery

To rewrite a scalar IN subquery, the Item_in_subselect::single_value_transformer method is used. The scalar IN subquery will be replaced with an Item_in_optimizer item.

An Item_in_optimizer item is a special boolean function. On a value request (one of val, val_int, or val_str methods) it evaluates the left expression of the IN by storing its value in a cache item (one of Item_cache* items), then it tests the cache to see whether it is NULL. If left expression (cache) is NULL, then Item_in_optimizer returns NULL, else it evaluates Item_in_subselect.

Example queries.

a) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2);
b) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2 GROUP BY t2.a);
  • Item_in_subselect inherits the mechanism for getting a value from Item_exists_subselect.

  • Select_transformer stores a reference to the left expression in its conditions:

(in WHERE and HAVING in case 'a' and in HAVING in case 'b')
  • Item from item list of this select (t2.a) can be referenced with a special reference (Item_ref_null_helper or Item_null_helper). This reference informs Item_in_optimizer whether item (t2.a) is NULL by setting the was_null flag.

  • The return value from Item_in_subselect will be evaluated as follows:

    • If TRUE, return true

    • If NULL, return null (that is, unknown)

    • If FALSE, and was_null is set, return null

    • Return FALSE

<left_expression> IN (SELECT <item> ...) will be represented as follows:

           |                                  |
+-----------------------+             +-----------------+
|   <left_expression>   |             |Item_in_subselect|
|                       |             +-----------------+
+-----------------------+                      |
|<left_expression cache>|          +-----------+-----------+
|                       |          |                       |
+-----------------------+          |                       |
           ^                 +----------+        +--------------------+
           +<<<<<<<<<<<<<<<<<| Item_ref | +<<<|Item_ref_null_helper|
                             +----------+    V   +--------------------+
                                             V   +--------------------+
                                             +>>>| <item>       |

where <<<<<<<<< is reference in meaning of Item_ref.

Item_ref is used to point to <left_expression cache>, because at the time of transformation we know only the address of the variable where the cache pointer will be stored.

If the select statement has an ORDER BY clause, it will be wiped out, because there is no sense in ORDER BY without LIMIT here.

If IN subquery union, the condition of every select in the UNION will be changed individually.

If a condition needs to be added to the WHERE clause, it will be presented as (item OR item IS NULL) and Item_is_not_null_test(item) will be added to the HAVING clause. Item_is_not_null_test registers a NULL value the way Item_ref_null_helper does it, and returns FALSE if the argument is NULL. With the above trick, we will register NULL value of Item even for the case of index optimization of a WHERE clause (case 'a' in the following example).

The following are examples of IN transformations:

  • Example 1:

<left_expression> IN (SELECT <item> FROM t WHERE <where_exp>)

If returning NULL correctly would make sense, the above will become:

    <where_exp> and
    (Item_ref(<cached_left_expression>)=<item> or      <Item> is null)
  HAVING Item_is_not_null_test(<item>))

When a subquery is marked as the top item of the WHERE clause, it will become:

     <where_exp> and

Example 2:

<left_expression> IN (SELECT <item> FROM t
                           HAVING <having_expr>

will be represented as

(SELECT <item> as ref_null_helper FROM t
   HAVING <having_exp> AND
     Item_ref(<cached_left_expression>) = Item_ref_null_helper(item))
  • Example 3:

<left_expression> IN (SELECT <item> UNION ...)

will become

   HAVING Item_ref(<cached_left_expression>)=           <Item_null_helper(<Item>)>
 UNION ...)

(HAVING without FROM is a syntax error, but a HAVING condition is checked even for subquery without FROM)

  • Example 4:

<left_expression> IN (select <item>)

will be completely replaced with <left_expression> = <item>

Now conditions (WHERE (a) or HAVING (b)) will be changed, depending on the select, in the following way:

If subquery contains a HAVING clause, SUM() function or GROUP BY (example 1), then the item list will be unchanged and an Item_ref_null_helper reference will be created on item list element. A condition will be added to the HAVING.

If the subquery does not contain HAVING, SUM() function, or GROUP BY (example 2), then:

  • item list will be replaced with 1.

  • left_expression cache> = <item> or is null <item> will be added to the WHERE clause and a special is_not_null(item) will be added to the HAVING, so null values will be registered. If returning NULL wouldn't make correct sense, then only left_expression cache> = <item> will be added to the WHERE clause. If this subquery does not contain a FROM clause or if the subquery contains UNION (example 3), then left_expression cache> = Item_null_helper(<item>) will be added to the HAVING clause.

A single select without a FROM clause will be reduced to just <left_expression> = <item> without use of Item_in_optimizer.