WL#8083: Introduce <query expression> parser rule

Affects: Server-8.0   —   Status: Complete

This worklog will introduce the SQL standard's syntactic rule <query expression> and its sub-rules for production of query syntax. These will replace our current bison rules for representing select/union syntax. Note that currently this syntax is duplicated for subqueries. This duplication will be removed, as the <query expression> et.al. will replace both varieties.

The worklog will also add proper handling of the

<table reference>
syntax rule, including the <joined table> rule.

This is a refactoring worklog, and most changes are not visible to clients.

  • F-1: Any number of parentheses must be allowed around any query expressions, also when used as a SELECT statement.
  • F-2: It must be possible to specify right-recursive joins as in the standard, e.g. ... JOIN ... JOIN ... ON ... ON ... .
  • F-3: The syntax NATURAL INNER JOIN is now allowed.
  • F-3: Query expressions used in CREATE VIEW, INSERT and CREATE TABLE statements are not affected by this worklog.
  • NF-1: Apart from the changes mentioned in the functional requirements and "Differences in Syntax" in the HLS, there shall be no changes visible to users.
  • NF-2: No measurable performance regression is expected.

Contents


Part I. The <query expression> rule

Select Statements

This part of the work will introduce the new parser rule <query expression> and sub-rules. Currently, a top-level select is defined as

  select:
    select_init

The <select_init> rule is problematic because it will not properly handle UNION and parentheses. It is also very far removed from the SQL standard. The UNION syntax is not integrated into the query expression syntax, leading to correct syntax being rejected.

Derived Tables

There are several problems with how derived tables are handled:

  • The grammar has huge redundancy, causing different syntax to be allowed on the top level vs. any nested level. Syntax errors also look slightly different depending on whether they happened on the top level or on a sub-level.
  • The grammar is laden with shift-reduce conflicts.
  • In an attempt to include all valid syntax, the derived-table sub-grammar accepts almost everything, and tries to manually throw syntax errors afterwards. For instance, the grammar parses
    SELECT ... FROM ( mytable UNION SELECT ... )
    but throws an error manually when it discovers that it parsed something illegal.
  • The parse trees get very deep for derived tables, which is wasteful of space and time, and is also confusing.
  • The parser maintains a stack of its own to keep track of the current nesting level. This is error-prone in itself, but is something we can live with. However, at the current state the parser cannot tell whether an expression is nested or just in parentheses. A convoluted set of rules decides when to push or pop the context, and they frequently fail, causing bugs that are very time consuming to fix. In the new state, the nesting is specified in the grammar itself. This way the pushing and popping happen unconditionally when entering and leaving the construct where the nesting happens, without any if() statements.

The new rules will do away with all of the above problems. Manual syntax errors will only be thrown when user-friendliness benefits. E.g. omitting a table alias for a derived table will still yield the error message "Every derived table must have its own alias", rather than just "You have an error in your SQL syntax etc...".

Part II. The <joined table> rule

The changes to the <joined table> rule will fix the following problems:

  • NATURAL INNER JOIN is not allowed.

Differences in syntax

No Locking and Procedure Clauses in Union Queries

Procedure clauses were never allowed in queries with unions, even though the error messages are little bit off:

mysql> select 1 from dual procedure analyse() union select 1;;
ERROR 1221 (HY000): Incorrect usage of UNION and SELECT ... PROCEDURE ANALYSE()
ERROR: 
No query specified

mysql> select 1 union select 2 from dual procedure analyse();
ERROR 1221 (HY000): Incorrect usage of PROCEDURE and subquery

For some reason, likely by mistake, a locking clause is allowed in conjunction with union, but this syntax is not in the manual.

The new rule is that both procedure and locking clauses are only allowed on in non-union queries. The error type is always ER_PARSE_ERROR for these queries.

Parentheses and Unions

The parser now accepts parentheses around query expressions. I.e. the syntax (SELECT .. UNION SELECT) is now understood.

NATURAL INNER JOIN

NATURAL INNER JOIN is now allowed syntax. It is equivalent to specifying NATURAL JOIN

Proper Nesting of Joins

In standard SQL, joins can be both right- and left-associative. They are normally left-associative, e.g.

  SELECT ..
  FROM table JOIN table ON ...
             JOIN table ON ...
             JOIN table ON ...
    ...
  WHERE ...

They can also be right-associative as in:

  SELECT ..
  FROM table JOIN table JOIN table JOIN table ON .. ON .. ON ...
    ...
  WHERE ...

In order to support ON ... ON earlier, parentheses where required around the second join.

After this worklog, full standard compliance is achieved. However, the following MySQL extension is preserved: Whereas in standard SQL, there is a difference between inner joins and cross joins, there is no syntactical difference between them in MySQL. All of these join syntaxes are valid:

  table JOIN table          -- Meaning CROSS JOIN, non-standard.
  table JOIN table ON/USING -- Meaning INNER JOIN, standard compliant.
  table CROSS JOIN table          -- Meaning CROSS JOIN, standard compliant.
  table CROSS JOIN table ON/USING -- Meaning INNER JOIN, non-standard.

Worthy of note is that in e.g. the query

SELECT * FROM t1 JOIN t2 CROSS JOIN t3 ON t2.a=t3.a

the join condition is attached to the latter cross join rather than the former join.

Different errors for SQL_CACHE/SQL_NO_CACHE

According to the manual, "...these options are not permitted in subqueries (including subqueries in the FROM clause), and SELECT statements in unions other than the first SELECT." The truth is quite different. In current trunk we see, at the time of writing, the following:

Example 1

mysql> CREATE TABLE t1 ( SQL_NO_CACHE int ); Query OK, 0 rows affected (0,02 sec)

The above is allowed.

Example 2

mysql> SELECT SQL_NO_CACHE FROM t1;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM t1' at line 1

The above is not allowed, against the manual.

Example 3

mysql> SELECT (SELECT SQL_NO_CACHE FROM t1);
+-------------------------------+
| (SELECT SQL_NO_CACHE FROM t1) |
+-------------------------------+
|                          NULL |
+-------------------------------+
1 row in set (0,00 sec)

The above is allowed, against the manual.

Example 4

mysql> SELECT 1 FROM (SELECT SQL_NO_CACHE 1) alias;
ERROR 1234 (42000): Incorrect usage/placement of 'SQL_NO_CACHE'

The above is allowed, against the manual.

Example 5

mysql> SELECT 1 FROM t1 WHERE 1 IN (SELECT SQL_NO_CACHE FROM t1);
Empty set (0,01 sec)

The above is allowed, against the manual.

Example 6

mysql> SELECT SQL_NO_CACHE FROM t1 UNION SELECT 1;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM t1 UNION SELECT 1' at line 1

The above follows the manual.

Example 7

mysql> SELECT 1 UNION SELECT SQL_NO_CACHE FROM t1;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1

The above follows the manual.

There are eight (!) shift/reduce conflicts in trunk around SQL_CACHE/SQL_NO_CACHE alone. No doubt the above behaviour stems from how Bison resolves these conflicts. For this reason it's inevitable that any change to the select syntax rules will change how the conflicts are resolved. After this work, the count goes down to six, but the examples 3 and 5 change:

Example 3 After This Work

mysql> SELECT (SELECT SQL_NO_CACHE FROM t1);                                    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM t1)' at line 1

This is now closer to the manual, even though the error message is not obvious.

Example 5 After This Work

mysql> SELECT 1 FROM t1 WHERE 1 IN (SELECT SQL_NO_CACHE FROM t1);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM t1)' at line 1

This is now closer to the manual, even though the error message is not obvious.

Summary

The following cases were allowed before the worklog, but not after:

  • SELECT (SELECT SQL_NO_CACHE FROM t1)
  • SELECT 1 FROM t1 WHERE 1 IN (SELECT SQL_NO_CACHE FROM t1)

They were never allowed according to the manual.