WL#8067: True bottom-up server parser: refactoring of the CREATE TABLE statement

Affects: Server-8.0   —   Status: Complete

This WL is intended to refactor CREATE TABLE statement-related parser grammar rules in a pure bottom-up style to make it (grammar) context-independent for the better maintainability and extendability at the first place.

Functional requirements

  • F1-1: See minor changes in the "Changes visible to users" paragraph in the HLS.

Non-functional requirements

  • NF-1: Apart from the changes mentioned in the "Changes visible to users" paragraph in the HLS, there shall be no changes visible to users.
  • NF-2: No performance regressions.

Contents


Changes to the documentation

  • parse_gcol_expr and its upper-cased variations are not keywords any more (neither reserved nor non-reserved).

Changes visible to users

Bugfixes

B-1 (gcol: NOT NULL NULL)

Generated column definition in CREATE/ALTER TABLE ignored the NULL attribute. Since the default value of the "nullness" attribute is true, everything worked as expected, if a column definition had none or only one NULL/NOT NULL attribute.

However the result of the "NOT NULL NULL" sequence was "NOT NULL".

The WL fixes this bug harmonizing NULL/NOT NULL attribute processing for both regular and generated column types.

 mysql> CREATE TABLE t2 (i INT, v INT AS (1) NOT NULL NULL);
 Query OK, 0 rows affected (0,03 sec)

Before

Note the "VIRTUAL NOT NULL" text:

 mysql> SHOW CREATE TABLE t2\G
 *************************** 1. row ***************************
        Table: t2
 Create Table: CREATE TABLE `t2` (
   `i` int(11) DEFAULT NULL,
   `v` int(11) GENERATED ALWAYS AS (1) VIRTUAL NOT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 1 row in set (0,00 sec)

After

 mysql> SHOW CREATE TABLE t2\G
 *************************** 1. row ***************************
        Table: t2
 Create Table: CREATE TABLE `t2` (
   `i` int(11) DEFAULT NULL,
   `v` int(11) GENERATED ALWAYS AS (1) VIRTUAL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 1 row in set (0,01 sec)

Differences in syntax

S-1 (parse_gcol_expr as a label)

Before the current WL the parse_gcol_expr was a non-reserved keyword with a limitation: it couldn't be used as a SP/SF label:

 mysql>   DELIMITER |
 mysql>   CREATE PROCEDURE p1()
     ->   BEGIN
     ->   parse_gcol_expr: LOOP
     ->     SELECT 1;
     ->   END LOOP parse_gcol_expr;
     ->   END|

Before

 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ': LOOP
   SELECT 1;
 END LOOP parse_gcol_expr;
 END' at line 3

After

No syntax error: parse_gcol_expr is a usual identifier, i.e. it can be used everywhere without limitations including SP/SF labels:

 Query OK, 0 rows affected (0,01 sec)

S-2 (CREATE TEMPORARY TEMPORARY...)

Removal of the undocumented nonsensical syntax: CREATE TEMPORARY TEMPORARY...

Before

 mysql> CREATE TEMPORARY TEMPORARY TEMPORARY TABLE very_temporary_table (i INT);
 Query OK, 0 rows affected (0,01 sec)
 mysql> SHOW CREATE TABLE very_temporary_table\G
 *************************** 1. row ***************************
        Table: very_temporary_table
 Create Table: CREATE TEMPORARY TABLE `very_temporary_table` (
   `i` int(11) DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 1 row in set (0,00 sec)

After

 mysql> CREATE TEMPORARY TEMPORARY very_temporary_table (i INT);
 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'TEMPORARY very_temporary_table (i INT)' at line 1

Differences in stored DD data/I_S output

I-1

 mysql> CREATE TABLE t1 (purchased DATE)
     -> PARTITION BY RANGE( YEAR(purchased) )
     ->   SUBPARTITION BY HASH( TO_DAYS(purchased) ) (
     ->     PARTITION p0 VALUES LESS THAN (1990),
     ->     PARTITION p2 VALUES LESS THAN MAXVALUE
     ->   );
 Query OK, 0 rows affected (0,06 sec)

Before

Note: there were redundant space character before hash and range expressions (we borrowed them from the original CREATE statement without trimming them out):

 mysql> SHOW CREATE TABLE t1\G
 *************************** 1. row ***************************
        Table: t1
 Create Table: CREATE TABLE `t1` (
   `purchased` date DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 /*!50100 PARTITION BY RANGE ( YEAR(purchased))
 SUBPARTITION BY HASH ( TO_DAYS(purchased))
 (PARTITION p0 VALUES LESS THAN (1990) ENGINE = InnoDB,
  PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
 1 row in set (0,00 sec)

After

Note: no more redundant space characters:

 mysql> SHOW CREATE TABLE t1\G
 *************************** 1. row ***************************
        Table: t1
 Create Table: CREATE TABLE `t1` (
   `purchased` date DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 /*!50100 PARTITION BY RANGE (YEAR(purchased))
 SUBPARTITION BY HASH (TO_DAYS(purchased))
 (PARTITION p0 VALUES LESS THAN (1990) ENGINE = InnoDB,
  PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
 1 row in set (0,00 sec)

Differences in error messages

E-1

 mysql> PARTITION foo bar baz;

Before

 ERROR 1064 (42000): Partitioning can not be used stand-alone in query near 'PARTITION foo bar baz' at line 1

Note: there was a special grammar branch to parse internal DD expressions with the "PARTITION" prefix word. The error above guarded a regular SQL from accepting such a quasi-statement.

After

 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PARTITION foo bar baz' at line 1

Note: from now we process the "PARTITION" quasi-statement in a separate grammar level, so currently this is a regular syntax error from the regular SQL parser's point of view.

E-2

 ERROR 1064 (42000): Partitioning can not be used stand-alone in query near 'PARTITION foo bar baz' at line 1
 mysql>   CREATE TABLE t1 (a int)
     ->     PARTITION BY RANGE (a)
     ->     (
     ->       PARTITION p0 VALUES LESS THAN (1),
     ->       PARTITION p1 VALU ES LESS THAN (2)
     ->     );

Before

Note: there is an obvious syntax error (mistyped keyword): "VALU ES", but we defer its reporting for a while and report a confusing semantic error message instead first:

 ERROR 1479 (HY000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition

After

Note: from now a mistyped keyword here is just a regular syntax error:

 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'VALU ES LESS THAN (2)
     )' at line 5

E-3

 mysql> CREATE TABLE t1 (a int)
     -> partition by key (a)
     -> partitions 3
     -> (partition x1, partition x2);

Before

Note: error message reports a position in the query at the closing parentheses of the wrong clause:

 ERROR 1064 (42000): Wrong number of partitions defined, mismatch with previous setting near ')' at line 4

After

Note: error message reports a position in the query at the opening partenthes of the wrong clause:

 ERROR 1064 (42000): Wrong number of partitions defined, mismatch with previous setting near '(partition x1, partition x2)' at line 4

E-4

 mysql> CREATE TABLE t1 (a int)
     -> partition by range (rand(a))
     -> partitions 2
     -> (partition x1 values less than (0), partition x2 values less than (2));

Before

Note: error message reports a position in the query right after a not allowed expression:

 ERROR 1064 (42000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed near ')
 partitions 2
 (partition x1 values less than (0), partition x2 values less than' at line 2

After

Note: error message reports a position in the query at the wrong expression:

 ERROR 1064 (42000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed near 'rand(a))
 partitions 2
 (partition x1 values less than (0), partition x2 values le' at line 2

E-5

 mysql> CREATE TABLE t1 (a int)
     -> partition by range (a)
     -> partitions 2
     -> (partition x1 values less than (4),
     -> partition x2);

Before

Note: error message doesn't point to a wrong clause in the query:

 ERROR 1479 (HY000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition

After

Note: error message points to a wrong clause:

 ERROR 1064 (42000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition near '
 partition x2)' at line 4

Similar cases

 mysql> CREATE TABLE t1 (a int)
     -> partition by list (a)
     -> partitions 2
     -> (partition x1 values in (4),
     -> partition x2);

Before:

 ERROR 1479 (HY000): Syntax error: LIST PARTITIONING requires definition of VALUES IN for each partition

After:

 ERROR 1064 (42000): Syntax error: LIST PARTITIONING requires definition of VALUES IN for each partition near '
 partition x2)' at line 4

Contents


Changes in AST

LEX

  • LEX::create_info has been converted to a pointer for a dynamic allocation.

Removal of intermediate/redundant values

  • LEX::length
  • LEX::dec
  • LEX::change
  • LEX::default_value
  • LEX::on_update_value
  • LEX::charset
  • LEX::ref_list
  • LEX::interval_list
  • LEX::parse_gcol_expr
  • LEX::uint_geom_type
  • LEX::fk_match_option
  • LEX::fk_update_opt
  • LEX::fk_delete_opt
  • LEX::comment (*)

(*) Note: BINLOG only statement specific string has been introduced instead: LEX::binlog_base64_str

Item tree

Item_null

Before the current WL the Item_null class did the trick to force the EXPLICIT_NULL_FLAG in LEX::type unconditionally.

The current WL removes that trick as well as the Item_null::itemize() function.

PTI_num_literal_num, Item_int

The PTI_num_literal_num node class has been replaced with a more natural Item_int class (by introducing a new Item_int constructor) and removed.

Changes in the Parse tree (also see "New files" below)

The Parse_tree_node_tmpl template has been introduced to allow custom parse context objects in some parts of the parse tree.

For example, partitioning-specific parser does a lot of parse-time reordering of internal partitioning structures in place of the preallocated LEX::part_info object.

A custom parse context allows to do all that intermediate allocation and reordering work in the context object, then we assign the result to LEX::part_info for use in the resolver & executor.

New parse tree node classes

Field identifiers

PT_field_ident Trivial 1D field identifier
PT_field_ident_3d 2D or 3D field identifier (can include DB and table parts)

Table-wide options in CREATE/ALTER TABLE statements

Class name Table option
PT_create_max_rows_option MAX_ROWS
PT_create_min_rows_option MIN_ROWS
PT_create_avg_row_length_option AVG_ROW_LENGTH
PT_create_password_option PASSWORD
PT_create_commen_option COMMENT
PT_create_compress_option COMPRESSION
PT_create_encryption_option ENGRYPTION
PT_create_auto_increment_option AUTO_INCREMENT
PT_create_row_format_option ROW_FORMAT
PT_create_insert_method_option INSERT_METHOD
PT_create_data_directory_option DATA DIRECTORY
PT_create_index_directory_option INDEX DIRECTORY
PT_create_tablespace_option TABLESPACE
PT_create_connection_option CONNECTION
PT_create_key_block_size_option KEY_BLOCK_SIZE
PT_create_pack_keys_option PACK_KEYS
PT_create_stats_persistent_option STATS_PERSISTENT
PT_create_checksum_option CHECKSUM, TABLE_CHECKSUM
PT_create_delay_key_write_option DELAY_KEY_WRITE
PT_create_table_engine_option ENGINE
PT_create_stats_auto_recalc_option STATS_AUTO_RECALC
PT_create_stats_stable_pages STATS_SAMPLE_PAGES
PT_create_union_option UNION
PT_create_storage_option STORAGE
PT_create_table_default_charset [DEFAULT] CHARSET
PT_create_table_default_collation [DEFAULT] COLLATE

Clause nodes and other parts in CREATE/ALTER TABLE

PT_create_table_stmt a whole CREATE TABLE node
PT_like_clause the LIKE clause
PT_check_constraint the CHECK clause
PT_column_def a column definition
PT_table_element_list a column/constraint/index definition list

New files

check_stack.{h,cc}

The check_stack_overrun() function has been moved to these files to break cyclic dependencies in include files.

parse_error.{h,cc}

These files contains new and existent parse-time error messaging functions:

  • my_syntax_error()
  • parse_error_at()
  • vparse_error_at()

parse_tree_column_attrs.{h,cc}

This file contain definitions for column attribute and field definition nodes (this stuff is related to CREATE/ALTER table column definitions):

CREATE/ALTER TABLE clause nodes

Class name Clause
PT_field_def regular (non-generated) column definition
PT_generated_field_def generated column definition

Column Types

Class name Column attribute
PT_int_type INT, TINYINT, SMALLINT, MEDIUMINT and BIGINT
PT_numeric_type NUMERIC, REAL, DOUBLE, DECIMAL and FIXED
PT_bit_type BIT
PT_boolean_type BOOL/BOOLEAN
PT_char_type CHAR, NCHAR and BINARY
PT_varchar_type VARCHAR, NVARCHAR and VARBINARY
PT_tinyblob_type TINYBLOB and TINYTEXT
PT_mediumblob_type MEDIUMBLOB, MEDIUMTEXT, LONG, LONG VARCHAR and LONG VARBINARY
PT_blob_type BLOB
PT_year_type YEAR
PT_date_type DATE
PT_time_type TIME
PT_datetime_type TIMESTAMP and DATETIME
PT_timestamp_type TIMESTAMP
PT_geometry_type GEOMETRY
PT_geometrycollection_type GEOMETRYCOLLECTION
PT_point_type POINT
PT_multipoint MULTIPOINT
PT_linestring LINESTRING
PT_multilinestring MULTILINESTRING
PT_polygon POLYGON
PT_multipolygon MULTIPOLYGON
PT_enum_type ENUM
PT_set_type SET
PT_json_type JSON

Column attributes

Common for all columns

Class name Column attribute
PT_null_column_attr NULL
PT_not_null_column_attr NOT NULL
PT_unique_key_column_attr UNIQUE [KEY]
PT_primary_key_column_attr PRIMARY [KEY]
PT_comment_column_attr COMMENT ...
PT_collate_column_attr COLLATE ...

Available for regular (not generated) columns only

Class name Column attribute
PT_default_column_attr DEFAULT ...
PT_on_update_column_attr ON UPDATE ...
PT_auto_increment_column_attr AUTO_INCREMENT
PT_serial_default_value_column_attr SERIAL DEFAULT VALUE
PT_column_format_column_attr COLUMN_FORMAT ...
PT_storage_media_column_attr STORAGE ...

parse_tree_partitions.{h,cc}

Partition options

Class name Option
PT_partition_comment COMMENT
PT_partition_index_directory INDEX DIRECTORY
PT_partition_data_directory DATA DIRECTORY
PT_partition_min_rows MIN_ROWS
PT_partition_max_rows MAX_ROWS
PT_partition_nodegroup NODEGROUP
PT_partition_engine [STORAGE] ENGINE
PT_partition_tablespace TABLESPACE

Partition declarations

Class name Clause
PT_partition PARTITION
PT_part_type_def_key ... BY [LINEAR] KEY
PT_part_type_def_hash ... BY [LINEAR] HASH
PT_part_type_def_range_expr ... BY RANGE
PT_part_type_def_range_columns ... BY RANGE COLUMNS
PT_part_type_def_list_expr ... BY LIST
PT_part_type_def_list_columns ... BY LIST COLUMNS

Subpartition declarations

PT_sub_partition_by_hash SUBPARTITION BY HASH
PT_sub_partition_by_key SUBPARTITION BY KEY

Individual partition options

Class name Clause
PT_part_definition PARTITION
PT_subpartition SUBPARTITION

Partitioning values, expressions and lists

Class name In clause(-s)
PT_part_value_item_max VALUES LESS THAN .../VALUES IN ... PT_part_value_item_expr PT_part_value_item_list_paren
PT_part_values_in_item VALUES IN ... PT_part_values_in_list

ALTER TABLE ADD PARTITION stuff

PT_add_partition ALTER TABLE ADD PARTITION
PT_add_partition_def_list ALTER TABLE ADD PARTITION (partition list)
PT_add_partition_num ALTER TABLE ADD PARTITION PARTITIONS (n)

sql_cmd_ddl_table.{h,cc}

Implementation on Sql_cmd_create_table for the CREATE TABLE statement.

Changes in the lexical scanner

Renamed tokes

Currently Bison generates token definitions in a form of #defined numeric constants, and their names have a global scope. Many of them have very common names like "DEFAULT", so if you have to introduce a C++ identifier with such a name (e.g. DEFAULT), you are in a trouble, since the preprocessor substitutes it with a numeric constant once you include lex.h or its friends.

(There is a Bison option to generate token constants in a form of C enum, but it that case we're still in a trouble (slightly less destructive), since that enum also has a global scope. Wrapping of that enum with a C++ namespace doesn't help much with the current parser.)

Thus, the simplest workaround is to add the _SYM suffix to conflicting token names.

The current WL renames these tokens:

  • BIGINT
  • BINARY
  • DATETIME
  • DEFAULT
  • ENUM
  • GEOMETRYCOLLECTION
  • INSERT
  • LINESTRING
  • LONGBLOB
  • LONGTEXT
  • MEDIUMBLOB
  • MEDIUMINT
  • MEDIUMTEXT
  • MULTILINESTRING
  • MULTIPOINT
  • MULTIPOLYGON
  • ON
  • POLYGON
  • REAL
  • REPLACE
  • SET
  • SMALLINT
  • TIMESTAMP
  • TINYBLOB
  • TINYINT
  • TINYTEXT
  • UNSIGNED
  • VARBINARY
  • ZEROFILL

Removed token

We don't need PARSE_GCOL_EXPR to be a non-reserved keyword -- see "DD quasi-statements" paragraph below.

DD synthetic statements

Grammar tricks those recognized quasi-statements from DD (gcol and partitioning expressions) have been moved from the grammar to the lexer.

From the current WL the lexical scanner can output fake tokens at the start to the parser to go into a separate, quasi-statement specific area -- this way we emulate multiple starting rules in the grammar:

start_entry:

         query
       | GRAMMAR_SELECTOR_EXPR bit_expr END_OF_INPUT
       | GRAMMAR_SELECTOR_PART partition END_OF_INPUT

...

Before that we had to introduce unique keywords every time we introduce a new non-SQL DD-related expression and parse that expression as very special SQL statements. Also we needed guardian code to protect that quasi-SQL from executing in the regular SQL input.

  • GRAMMAR_SELECTOR_PART fake token (not a keyword! invisible outside the lexer
 parser!) forces the parser to parse the partitioning expression from DD
 metadata.
  • GRAMMAR_SELECTOR_GCOL does the same for generated column expressions from
 DD.
  • GRAMMAR_SELECTOR_EXPR is reserved for the future use with a parser of
 generic expressions outside SQL.

"Classification" of enums

Our Runtime team does an incremental work on converting #defined numeric constants and C-style enums to C++11 "enum classes". Such a refactoring helps to remove cyclic dependencies in header files: "enum classes" can be referenced by forward references like usual classes. The current WL also needed to resolve some issues related to cyclic references, so it also converts some C enums into "enum classes".

  • enum partition_info::enum_key_algorithm --> global enum class
 enum_key_algorithm.
  • partition_type:
Before After
enum partition_type {
 NOT_A_PARTITION= 0,
 RANGE_PARTITION,
 HASH_PARTITION,
 LIST_PARTITION
};
enum class partition_type {
 NONE= 0,
 RANGE,
 HASH,
 LIST
};

Mem_root_array changes

An additional template parameter has been added to use custom parent classes (e.g. Sql_alloc).

Partitioning code changes

Parser-time fields have been moved to the new parser_partition_info class.

SP code changes

Removal of redundant fields and functions:

  • sp_parser_data::m_expr_start_ptr
  • sp_parser_data::pop_expr_start_ptr()
  • sp_parser_data::push_expr_start_ptr()