WL#8067: True bottom-up server parser: refactoring of the CREATE TABLE statement

Affects: Server-8.0 — Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

This WL is intended to refactor CREATE TABLE statement-related parser grammar rules in a pure bottom-up style to make it (grammar) context-independent for the better maintainability and extendability at the first place.

Functional requirements

F1-1: See minor changes in the "Changes visible to users" paragraph in the HLS.

Non-functional requirements

NF-1: Apart from the changes mentioned in the "Changes visible to users" paragraph in the HLS, there shall be no changes visible to users.

NF-2: No performance regressions.

Changes to the documentation

parse_gcol_expr and its upper-cased variations are not keywords any more (neither reserved nor non-reserved).

Changes visible to users

Bugfixes

B-1 (gcol: NOT NULL NULL)

Generated column definition in CREATE/ALTER TABLE ignored the NULL attribute. Since the default value of the "nullness" attribute is true, everything worked as expected, if a column definition had none or only one NULL/NOT NULL attribute.

However the result of the "NOT NULL NULL" sequence was "NOT NULL".

The WL fixes this bug harmonizing NULL/NOT NULL attribute processing for both regular and generated column types.

 mysql> CREATE TABLE t2 (i INT, v INT AS (1) NOT NULL NULL);
 Query OK, 0 rows affected (0,03 sec)

Before

Note the "VIRTUAL NOT NULL" text:

 mysql> SHOW CREATE TABLE t2\G
 *************************** 1. row ***************************
        Table: t2
 Create Table: CREATE TABLE `t2` (
   `i` int(11) DEFAULT NULL,
   `v` int(11) GENERATED ALWAYS AS (1) VIRTUAL NOT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 1 row in set (0,00 sec)

After

 mysql> SHOW CREATE TABLE t2\G
 *************************** 1. row ***************************
        Table: t2
 Create Table: CREATE TABLE `t2` (
   `i` int(11) DEFAULT NULL,
   `v` int(11) GENERATED ALWAYS AS (1) VIRTUAL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 1 row in set (0,01 sec)

Differences in syntax

S-1 (parse_gcol_expr as a label)

Before the current WL the parse_gcol_expr was a non-reserved keyword with a limitation: it couldn't be used as a SP/SF label:

 mysql>   DELIMITER |
 mysql>   CREATE PROCEDURE p1()
     ->   BEGIN
     ->   parse_gcol_expr: LOOP
     ->     SELECT 1;
     ->   END LOOP parse_gcol_expr;
     ->   END|

Before

 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ': LOOP
   SELECT 1;
 END LOOP parse_gcol_expr;
 END' at line 3

After

No syntax error: parse_gcol_expr is a usual identifier, i.e. it can be used everywhere without limitations including SP/SF labels:

 Query OK, 0 rows affected (0,01 sec)

S-2 (CREATE TEMPORARY TEMPORARY...)

Removal of the undocumented nonsensical syntax: CREATE TEMPORARY TEMPORARY...

Before

 mysql> CREATE TEMPORARY TEMPORARY TEMPORARY TABLE very_temporary_table (i INT);
 Query OK, 0 rows affected (0,01 sec)

 mysql> SHOW CREATE TABLE very_temporary_table\G
 *************************** 1. row ***************************
        Table: very_temporary_table
 Create Table: CREATE TEMPORARY TABLE `very_temporary_table` (
   `i` int(11) DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 1 row in set (0,00 sec)

After

 mysql> CREATE TEMPORARY TEMPORARY very_temporary_table (i INT);
 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'TEMPORARY very_temporary_table (i INT)' at line 1

Differences in stored DD data/I_S output

I-1

 mysql> CREATE TABLE t1 (purchased DATE)
     -> PARTITION BY RANGE( YEAR(purchased) )
     ->   SUBPARTITION BY HASH( TO_DAYS(purchased) ) (
     ->     PARTITION p0 VALUES LESS THAN (1990),
     ->     PARTITION p2 VALUES LESS THAN MAXVALUE
     ->   );
 Query OK, 0 rows affected (0,06 sec)

Before

Note: there were redundant space character before hash and range expressions (we borrowed them from the original CREATE statement without trimming them out):

 mysql> SHOW CREATE TABLE t1\G
 *************************** 1. row ***************************
        Table: t1
 Create Table: CREATE TABLE `t1` (
   `purchased` date DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 /*!50100 PARTITION BY RANGE ( YEAR(purchased))
 SUBPARTITION BY HASH ( TO_DAYS(purchased))
 (PARTITION p0 VALUES LESS THAN (1990) ENGINE = InnoDB,
  PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
 1 row in set (0,00 sec)

After

Note: no more redundant space characters:

 mysql> SHOW CREATE TABLE t1\G
 *************************** 1. row ***************************
        Table: t1
 Create Table: CREATE TABLE `t1` (
   `purchased` date DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 /*!50100 PARTITION BY RANGE (YEAR(purchased))
 SUBPARTITION BY HASH (TO_DAYS(purchased))
 (PARTITION p0 VALUES LESS THAN (1990) ENGINE = InnoDB,
  PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
 1 row in set (0,00 sec)

Differences in error messages

E-1

 mysql> PARTITION foo bar baz;

Before

 ERROR 1064 (42000): Partitioning can not be used stand-alone in query near 'PARTITION foo bar baz' at line 1

Note: there was a special grammar branch to parse internal DD expressions with the "PARTITION" prefix word. The error above guarded a regular SQL from accepting such a quasi-statement.

After

 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PARTITION foo bar baz' at line 1

Note: from now we process the "PARTITION" quasi-statement in a separate grammar level, so currently this is a regular syntax error from the regular SQL parser's point of view.

E-2

 ERROR 1064 (42000): Partitioning can not be used stand-alone in query near 'PARTITION foo bar baz' at line 1
 mysql>   CREATE TABLE t1 (a int)
     ->     PARTITION BY RANGE (a)
     ->     (
     ->       PARTITION p0 VALUES LESS THAN (1),
     ->       PARTITION p1 VALU ES LESS THAN (2)
     ->     );

Before

Note: there is an obvious syntax error (mistyped keyword): "VALU ES", but we defer its reporting for a while and report a confusing semantic error message instead first:

 ERROR 1479 (HY000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition

After

Note: from now a mistyped keyword here is just a regular syntax error:

 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'VALU ES LESS THAN (2)
     )' at line 5

E-3

 mysql> CREATE TABLE t1 (a int)
     -> partition by key (a)
     -> partitions 3
     -> (partition x1, partition x2);

Before

Note: error message reports a position in the query at the closing parentheses of the wrong clause:

 ERROR 1064 (42000): Wrong number of partitions defined, mismatch with previous setting near ')' at line 4

After

Note: error message reports a position in the query at the opening partenthes of the wrong clause:

 ERROR 1064 (42000): Wrong number of partitions defined, mismatch with previous setting near '(partition x1, partition x2)' at line 4

E-4

 mysql> CREATE TABLE t1 (a int)
     -> partition by range (rand(a))
     -> partitions 2
     -> (partition x1 values less than (0), partition x2 values less than (2));

Before

Note: error message reports a position in the query right after a not allowed expression:

 ERROR 1064 (42000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed near ')
 partitions 2
 (partition x1 values less than (0), partition x2 values less than' at line 2

After

Note: error message reports a position in the query at the wrong expression:

 ERROR 1064 (42000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed near 'rand(a))
 partitions 2
 (partition x1 values less than (0), partition x2 values le' at line 2

E-5

 mysql> CREATE TABLE t1 (a int)
     -> partition by range (a)
     -> partitions 2
     -> (partition x1 values less than (4),
     -> partition x2);

Before

Note: error message doesn't point to a wrong clause in the query:

 ERROR 1479 (HY000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition

After

Note: error message points to a wrong clause:

 ERROR 1064 (42000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition near '
 partition x2)' at line 4

Similar cases

 mysql> CREATE TABLE t1 (a int)
     -> partition by list (a)
     -> partitions 2
     -> (partition x1 values in (4),
     -> partition x2);

Before:

 ERROR 1479 (HY000): Syntax error: LIST PARTITIONING requires definition of VALUES IN for each partition

After:

 ERROR 1064 (42000): Syntax error: LIST PARTITIONING requires definition of VALUES IN for each partition near '
 partition x2)' at line 4

Changes in AST
- LEX
  - Removal of intermediate/redundant values
- Item tree
  - Item_null
  - PTI_num_literal_num, Item_int
Changes in the Parse tree (also see "New files" below)
- New parse tree node classes
  - Field identifiers
  - Table-wide options in CREATE/ALTER TABLE statements
- Clause nodes and other parts in CREATE/ALTER TABLE
New files
Changes in the lexical scanner
"Classification" of enums
Mem_root_array changes
Partitioning code changes
SP code changes

Changes in AST

LEX

LEX::create_info has been converted to a pointer for a dynamic allocation.

Removal of intermediate/redundant values

LEX::length
LEX::dec
LEX::change
LEX::default_value
LEX::on_update_value
LEX::charset
LEX::ref_list
LEX::interval_list
LEX::parse_gcol_expr
LEX::uint_geom_type
LEX::fk_match_option
LEX::fk_update_opt
LEX::fk_delete_opt
LEX::comment (*)

(*) Note: BINLOG only statement specific string has been introduced instead: LEX::binlog_base64_str

Item tree

Item_null

Before the current WL the Item_null class did the trick to force the EXPLICIT_NULL_FLAG in LEX::type unconditionally.

The current WL removes that trick as well as the Item_null::itemize() function.

PTI_num_literal_num, Item_int

The PTI_num_literal_num node class has been replaced with a more natural Item_int class (by introducing a new Item_int constructor) and removed.

Changes in the Parse tree (also see "New files" below)

The Parse_tree_node_tmpl template has been introduced to allow custom parse context objects in some parts of the parse tree.

For example, partitioning-specific parser does a lot of parse-time reordering of internal partitioning structures in place of the preallocated LEX::part_info object.

A custom parse context allows to do all that intermediate allocation and reordering work in the context object, then we assign the result to LEX::part_info for use in the resolver & executor.

New parse tree node classes

Field identifiers

PT_field_ident	Trivial 1D field identifier
PT_field_ident_3d	2D or 3D field identifier (can include DB and table parts)

Table-wide options in CREATE/ALTER TABLE statements

Class name	Table option
PT_create_max_rows_option	MAX_ROWS
PT_create_min_rows_option	MIN_ROWS
PT_create_avg_row_length_option	AVG_ROW_LENGTH
PT_create_password_option	PASSWORD
PT_create_commen_option	COMMENT
PT_create_compress_option	COMPRESSION
PT_create_encryption_option	ENGRYPTION
PT_create_auto_increment_option	AUTO_INCREMENT
PT_create_row_format_option	ROW_FORMAT
PT_create_insert_method_option	INSERT_METHOD
PT_create_data_directory_option	DATA DIRECTORY
PT_create_index_directory_option	INDEX DIRECTORY
PT_create_tablespace_option	TABLESPACE
PT_create_connection_option	CONNECTION
PT_create_key_block_size_option	KEY_BLOCK_SIZE
PT_create_pack_keys_option	PACK_KEYS
PT_create_stats_persistent_option	STATS_PERSISTENT
PT_create_checksum_option	CHECKSUM, TABLE_CHECKSUM
PT_create_delay_key_write_option	DELAY_KEY_WRITE
PT_create_table_engine_option	ENGINE
PT_create_stats_auto_recalc_option	STATS_AUTO_RECALC
PT_create_stats_stable_pages	STATS_SAMPLE_PAGES
PT_create_union_option	UNION
PT_create_storage_option	STORAGE
PT_create_table_default_charset	[DEFAULT] CHARSET
PT_create_table_default_collation	[DEFAULT] COLLATE

Clause nodes and other parts in CREATE/ALTER TABLE

PT_create_table_stmt	a whole CREATE TABLE node
PT_like_clause	the LIKE clause
PT_check_constraint	the CHECK clause
PT_column_def	a column definition
PT_table_element_list	a column/constraint/index definition list

New files

check_stack.{h,cc}

The check_stack_overrun() function has been moved to these files to break cyclic dependencies in include files.

parse_error.{h,cc}

These files contains new and existent parse-time error messaging functions:

my_syntax_error()
parse_error_at()
vparse_error_at()

parse_tree_column_attrs.{h,cc}

This file contain definitions for column attribute and field definition nodes (this stuff is related to CREATE/ALTER table column definitions):

CREATE/ALTER TABLE clause nodes

Class name	Clause
PT_field_def	regular (non-generated) column definition
PT_generated_field_def	generated column definition

Column Types

Class name	Column attribute
PT_int_type	INT, TINYINT, SMALLINT, MEDIUMINT and BIGINT
PT_numeric_type	NUMERIC, REAL, DOUBLE, DECIMAL and FIXED
PT_bit_type	BIT
PT_boolean_type	BOOL/BOOLEAN
PT_char_type	CHAR, NCHAR and BINARY
PT_varchar_type	VARCHAR, NVARCHAR and VARBINARY
PT_tinyblob_type	TINYBLOB and TINYTEXT
PT_mediumblob_type	MEDIUMBLOB, MEDIUMTEXT, LONG, LONG VARCHAR and LONG VARBINARY
PT_blob_type	BLOB
PT_year_type	YEAR
PT_date_type	DATE
PT_time_type	TIME
PT_datetime_type	TIMESTAMP and DATETIME
PT_timestamp_type	TIMESTAMP
PT_geometry_type	GEOMETRY
PT_geometrycollection_type	GEOMETRYCOLLECTION
PT_point_type	POINT
PT_multipoint	MULTIPOINT
PT_linestring	LINESTRING
PT_multilinestring	MULTILINESTRING
PT_polygon	POLYGON
PT_multipolygon	MULTIPOLYGON
PT_enum_type	ENUM
PT_set_type	SET
PT_json_type	JSON

Column attributes

Common for all columns

Class name	Column attribute
PT_null_column_attr	NULL
PT_not_null_column_attr	NOT NULL
PT_unique_key_column_attr	UNIQUE [KEY]
PT_primary_key_column_attr	PRIMARY [KEY]
PT_comment_column_attr	COMMENT ...
PT_collate_column_attr	COLLATE ...

Available for regular (not generated) columns only

Class name	Column attribute
PT_default_column_attr	DEFAULT ...
PT_on_update_column_attr	ON UPDATE ...
PT_auto_increment_column_attr	AUTO_INCREMENT
PT_serial_default_value_column_attr	SERIAL DEFAULT VALUE
PT_column_format_column_attr	COLUMN_FORMAT ...
PT_storage_media_column_attr	STORAGE ...

parse_tree_partitions.{h,cc}

Partition options

Class name	Option
PT_partition_comment	COMMENT
PT_partition_index_directory	INDEX DIRECTORY
PT_partition_data_directory	DATA DIRECTORY
PT_partition_min_rows	MIN_ROWS
PT_partition_max_rows	MAX_ROWS
PT_partition_nodegroup	NODEGROUP
PT_partition_engine	[STORAGE] ENGINE
PT_partition_tablespace	TABLESPACE

Partition declarations

Class name	Clause
PT_partition	PARTITION
PT_part_type_def_key	... BY [LINEAR] KEY
PT_part_type_def_hash	... BY [LINEAR] HASH
PT_part_type_def_range_expr	... BY RANGE
PT_part_type_def_range_columns	... BY RANGE COLUMNS
PT_part_type_def_list_expr	... BY LIST
PT_part_type_def_list_columns	... BY LIST COLUMNS

Subpartition declarations

PT_sub_partition_by_hash	SUBPARTITION BY HASH
PT_sub_partition_by_key	SUBPARTITION BY KEY

Individual partition options

Class name	Clause
PT_part_definition	PARTITION
PT_subpartition	SUBPARTITION

Partitioning values, expressions and lists

Class name	In clause(-s)
PT_part_value_item_max	VALUES LESS THAN .../VALUES IN ...	PT_part_value_item_expr		PT_part_value_item_list_paren
PT_part_values_in_item	VALUES IN ...	PT_part_values_in_list

ALTER TABLE ADD PARTITION stuff

PT_add_partition	ALTER TABLE ADD PARTITION
PT_add_partition_def_list	ALTER TABLE ADD PARTITION (partition list)
PT_add_partition_num	ALTER TABLE ADD PARTITION PARTITIONS (n)

sql_cmd_ddl_table.{h,cc}

Implementation on Sql_cmd_create_table for the CREATE TABLE statement.

Changes in the lexical scanner

Renamed tokes

Currently Bison generates token definitions in a form of #defined numeric constants, and their names have a global scope. Many of them have very common names like "DEFAULT", so if you have to introduce a C++ identifier with such a name (e.g. DEFAULT), you are in a trouble, since the preprocessor substitutes it with a numeric constant once you include lex.h or its friends.

(There is a Bison option to generate token constants in a form of C enum, but it that case we're still in a trouble (slightly less destructive), since that enum also has a global scope. Wrapping of that enum with a C++ namespace doesn't help much with the current parser.)

Thus, the simplest workaround is to add the _SYM suffix to conflicting token names.

The current WL renames these tokens:

BIGINT
BINARY
DATETIME
DEFAULT
ENUM
GEOMETRYCOLLECTION
INSERT
LINESTRING
LONGBLOB
LONGTEXT
MEDIUMBLOB
MEDIUMINT
MEDIUMTEXT
MULTILINESTRING
MULTIPOINT
MULTIPOLYGON
ON
POLYGON
REAL
REPLACE
SET
SMALLINT
TIMESTAMP
TINYBLOB
TINYINT
TINYTEXT
UNSIGNED
VARBINARY
ZEROFILL

Removed token

We don't need PARSE_GCOL_EXPR to be a non-reserved keyword -- see "DD quasi-statements" paragraph below.

DD synthetic statements

Grammar tricks those recognized quasi-statements from DD (gcol and partitioning expressions) have been moved from the grammar to the lexer.

From the current WL the lexical scanner can output fake tokens at the start to the parser to go into a separate, quasi-statement specific area -- this way we emulate multiple starting rules in the grammar:

start_entry:

         query
       | GRAMMAR_SELECTOR_EXPR bit_expr END_OF_INPUT
       | GRAMMAR_SELECTOR_PART partition END_OF_INPUT

...

Before that we had to introduce unique keywords every time we introduce a new non-SQL DD-related expression and parse that expression as very special SQL statements. Also we needed guardian code to protect that quasi-SQL from executing in the regular SQL input.

GRAMMAR_SELECTOR_PART fake token (not a keyword! invisible outside the lexer

 parser!) forces the parser to parse the partitioning expression from DD
 metadata.

GRAMMAR_SELECTOR_GCOL does the same for generated column expressions from

DD.

GRAMMAR_SELECTOR_EXPR is reserved for the future use with a parser of

 generic expressions outside SQL.

"Classification" of enums

Our Runtime team does an incremental work on converting #defined numeric constants and C-style enums to C++11 "enum classes". Such a refactoring helps to remove cyclic dependencies in header files: "enum classes" can be referenced by forward references like usual classes. The current WL also needed to resolve some issues related to cyclic references, so it also converts some C enums into "enum classes".

enum partition_info::enum_key_algorithm --> global enum class

 enum_key_algorithm.

partition_type:

Before	After
`enum partition_type {` NOT_A_PARTITION= 0, RANGE_PARTITION, HASH_PARTITION, LIST_PARTITION };	`enum class partition_type {` NONE= 0, RANGE, HASH, LIST };

Mem_root_array changes

An additional template parameter has been added to use custom parent classes (e.g. Sql_alloc).

Partitioning code changes

Parser-time fields have been moved to the new parser_partition_info class.

SP code changes

Removal of redundant fields and functions:

sp_parser_data::m_expr_start_ptr
sp_parser_data::pop_expr_start_ptr()
sp_parser_data::push_expr_start_ptr()

WL#8067: True bottom-up server parser: refactoring of the CREATE TABLE statement

Functional requirements

Non-functional requirements

Contents

Changes to the documentation

Changes visible to users

Bugfixes

B-1 (gcol: NOT NULL NULL)

Before

After

Differences in syntax

S-1 (parse_gcol_expr as a label)

Before

After

S-2 (CREATE TEMPORARY TEMPORARY...)

Before

After

Differences in stored DD data/I_S output

I-1

Before

After

Differences in error messages

E-1

Before

After

E-2

Before

After

E-3

Before

After

E-4

Before

After

E-5

Before

After

Similar cases

Contents

Changes in AST

LEX

Removal of intermediate/redundant values

Item tree

Item_null

PTI_num_literal_num, Item_int

Changes in the Parse tree (also see "New files" below)

New parse tree node classes

Field identifiers

Table-wide options in CREATE/ALTER TABLE statements

Clause nodes and other parts in CREATE/ALTER TABLE

New files

check_stack.{h,cc}

parse_error.{h,cc}

parse_tree_column_attrs.{h,cc}

CREATE/ALTER TABLE clause nodes

Column Types

Column attributes

Common for all columns

Available for regular (not generated) columns only

parse_tree_partitions.{h,cc}

Partition options

Partition declarations

Subpartition declarations

Individual partition options

Partitioning values, expressions and lists

ALTER TABLE ADD PARTITION stuff

sql_cmd_ddl_table.{h,cc}

Changes in the lexical scanner

Renamed tokes

Removed token

DD synthetic statements

"Classification" of enums

Mem_root_array changes

Partitioning code changes

SP code changes