MySQL 5.5 Reference Manual  /  Restrictions and Limits  /  Restrictions on Subqueries

C.4 Restrictions on Subqueries

  • Subquery optimization for IN is not as effective as for the = operator or for the IN(value_list) operator.

    A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.

    The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:

    SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);

    The optimizer rewrites the statement to a correlated subquery:

    SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);

    If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.

    An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.

  • In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:

    DELETE FROM t WHERE ... (SELECT ... FROM t ...);
    UPDATE t ... WHERE col = (SELECT ... FROM t ...);

    Exception: The preceding prohibition does not apply if for the modified table you are using a derived table (subquery in the FROM clause) and that derived table is materialized rather than merged into the outer query. Example:

    UPDATE t ... WHERE col = (SELECT * FROM (SELECT ... FROM t...) AS _t ...);

    Here the result from the derived table is materialized as a temporary table, so the relevant rows in t have already been selected by the time the update to t takes place.

  • Row comparison operations are only partially supported:

    • For expr [NOT] IN subquery, expr can be an n-tuple (specified using row constructor syntax) and the subquery can return rows of n-tuples. The permitted syntax is therefore more specifically expressed as row_constructor [NOT] IN table_subquery

    • For expr op {ALL|ANY|SOME} subquery, expr must be a scalar value and the subquery must be a column subquery; it cannot return multiple-column rows.

    In other words, for a subquery that returns rows of n-tuples, this is supported:

    (expr_1, ..., expr_n) [NOT] IN table_subquery

    But this is not supported:

    (expr_1, ..., expr_n) op {ALL|ANY|SOME} subquery

    The reason for supporting row comparisons for IN but not for the others is that IN is implemented by rewriting it as a sequence of = comparisons and AND operations. This approach cannot be used for ALL, ANY, or SOME.

  • Subqueries in the FROM clause cannot be correlated subqueries. They are materialized in whole (evaluated to produce a result set) before evaluating the outer query, so they cannot be evaluated per row of the outer query.

  • MySQL does not support LIMIT in subqueries for certain subquery operators:

    mysql> SELECT * FROM t1
        ->   WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);
    ERROR 1235 (42000): This version of MySQL doesn't yet support
     'LIMIT & IN/ALL/ANY/SOME subquery'
  • The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.

    An exception occurs for the case where an IN subquery can be rewritten as a SELECT DISTINCT join. Example:

    SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition);

    That statement can be rewritten as follows:

    SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;

    But in this case, the join requires an extra DISTINCT operation and is not more efficient than the subquery.

  • MySQL permits a subquery to refer to a stored function that has data-modifying side effects such as inserting rows into a table. For example, if f() inserts rows, the following query can modify data:

    SELECT ... WHERE x IN (SELECT f() ...);

    This behavior is an extension to the SQL standard. In MySQL, it can produce indeterminate results because f() might be executed a different number of times for different executions of a given query depending on how the optimizer chooses to handle it.

    For statement-based or mixed-format replication, one implication of this indeterminism is that such a query can produce different results on the master and its slaves.

