WL#8067: True bottom-up server parser: refactoring of the CREATE TABLE statement
This WL is intended to refactor CREATE TABLE statement-related parser grammar rules in a pure bottom-up style to make it (grammar) context-independent for the better maintainability and extendability at the first place.
Functional requirements
- F1-1: See minor changes in the "Changes visible to users" paragraph in the HLS.
Non-functional requirements
- NF-1: Apart from the changes mentioned in the "Changes visible to users" paragraph in the HLS, there shall be no changes visible to users.
- NF-2: No performance regressions.
Contents |
Changes to the documentation
- parse_gcol_expr and its upper-cased variations are not keywords any more (neither reserved nor non-reserved).
Changes visible to users
Bugfixes
B-1 (gcol: NOT NULL NULL)
Generated column definition in CREATE/ALTER TABLE ignored the NULL attribute. Since the default value of the "nullness" attribute is true, everything worked as expected, if a column definition had none or only one NULL/NOT NULL attribute.
However the result of the "NOT NULL NULL" sequence was "NOT NULL".
The WL fixes this bug harmonizing NULL/NOT NULL attribute processing for both regular and generated column types.
mysql> CREATE TABLE t2 (i INT, v INT AS (1) NOT NULL NULL); Query OK, 0 rows affected (0,03 sec)
Before
Note the "VIRTUAL NOT NULL" text:
mysql> SHOW CREATE TABLE t2\G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE `t2` ( `i` int(11) DEFAULT NULL, `v` int(11) GENERATED ALWAYS AS (1) VIRTUAL NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0,00 sec)
After
mysql> SHOW CREATE TABLE t2\G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE `t2` ( `i` int(11) DEFAULT NULL, `v` int(11) GENERATED ALWAYS AS (1) VIRTUAL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0,01 sec)
Differences in syntax
S-1 (parse_gcol_expr as a label)
Before the current WL the parse_gcol_expr was a non-reserved keyword with a limitation: it couldn't be used as a SP/SF label:
mysql> DELIMITER | mysql> CREATE PROCEDURE p1() -> BEGIN -> parse_gcol_expr: LOOP -> SELECT 1; -> END LOOP parse_gcol_expr; -> END|
Before
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ': LOOP SELECT 1; END LOOP parse_gcol_expr; END' at line 3
After
No syntax error: parse_gcol_expr is a usual identifier, i.e. it can be used everywhere without limitations including SP/SF labels:
Query OK, 0 rows affected (0,01 sec)
S-2 (CREATE TEMPORARY TEMPORARY...)
Removal of the undocumented nonsensical syntax: CREATE TEMPORARY TEMPORARY...
Before
mysql> CREATE TEMPORARY TEMPORARY TEMPORARY TABLE very_temporary_table (i INT); Query OK, 0 rows affected (0,01 sec)
mysql> SHOW CREATE TABLE very_temporary_table\G *************************** 1. row *************************** Table: very_temporary_table Create Table: CREATE TEMPORARY TABLE `very_temporary_table` ( `i` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0,00 sec)
After
mysql> CREATE TEMPORARY TEMPORARY very_temporary_table (i INT); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'TEMPORARY very_temporary_table (i INT)' at line 1
Differences in stored DD data/I_S output
I-1
mysql> CREATE TABLE t1 (purchased DATE) -> PARTITION BY RANGE( YEAR(purchased) ) -> SUBPARTITION BY HASH( TO_DAYS(purchased) ) ( -> PARTITION p0 VALUES LESS THAN (1990), -> PARTITION p2 VALUES LESS THAN MAXVALUE -> ); Query OK, 0 rows affected (0,06 sec)
Before
Note: there were redundant space character before hash and range expressions (we borrowed them from the original CREATE statement without trimming them out):
mysql> SHOW CREATE TABLE t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `purchased` date DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE ( YEAR(purchased)) SUBPARTITION BY HASH ( TO_DAYS(purchased)) (PARTITION p0 VALUES LESS THAN (1990) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0,00 sec)
After
Note: no more redundant space characters:
mysql> SHOW CREATE TABLE t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `purchased` date DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (YEAR(purchased)) SUBPARTITION BY HASH (TO_DAYS(purchased)) (PARTITION p0 VALUES LESS THAN (1990) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ 1 row in set (0,00 sec)
Differences in error messages
E-1
mysql> PARTITION foo bar baz;
Before
ERROR 1064 (42000): Partitioning can not be used stand-alone in query near 'PARTITION foo bar baz' at line 1
Note: there was a special grammar branch to parse internal DD expressions with the "PARTITION" prefix word. The error above guarded a regular SQL from accepting such a quasi-statement.
After
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PARTITION foo bar baz' at line 1
Note: from now we process the "PARTITION" quasi-statement in a separate grammar level, so currently this is a regular syntax error from the regular SQL parser's point of view.
E-2
ERROR 1064 (42000): Partitioning can not be used stand-alone in query near 'PARTITION foo bar baz' at line 1 mysql> CREATE TABLE t1 (a int) -> PARTITION BY RANGE (a) -> ( -> PARTITION p0 VALUES LESS THAN (1), -> PARTITION p1 VALU ES LESS THAN (2) -> );
Before
Note: there is an obvious syntax error (mistyped keyword): "VALU ES", but we defer its reporting for a while and report a confusing semantic error message instead first:
ERROR 1479 (HY000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition
After
Note: from now a mistyped keyword here is just a regular syntax error:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'VALU ES LESS THAN (2) )' at line 5
E-3
mysql> CREATE TABLE t1 (a int) -> partition by key (a) -> partitions 3 -> (partition x1, partition x2);
Before
Note: error message reports a position in the query at the closing parentheses of the wrong clause:
ERROR 1064 (42000): Wrong number of partitions defined, mismatch with previous setting near ')' at line 4
After
Note: error message reports a position in the query at the opening partenthes of the wrong clause:
ERROR 1064 (42000): Wrong number of partitions defined, mismatch with previous setting near '(partition x1, partition x2)' at line 4
E-4
mysql> CREATE TABLE t1 (a int) -> partition by range (rand(a)) -> partitions 2 -> (partition x1 values less than (0), partition x2 values less than (2));
Before
Note: error message reports a position in the query right after a not allowed expression:
ERROR 1064 (42000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed near ') partitions 2 (partition x1 values less than (0), partition x2 values less than' at line 2
After
Note: error message reports a position in the query at the wrong expression:
ERROR 1064 (42000): Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed near 'rand(a)) partitions 2 (partition x1 values less than (0), partition x2 values le' at line 2
E-5
mysql> CREATE TABLE t1 (a int) -> partition by range (a) -> partitions 2 -> (partition x1 values less than (4), -> partition x2);
Before
Note: error message doesn't point to a wrong clause in the query:
ERROR 1479 (HY000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition
After
Note: error message points to a wrong clause:
ERROR 1064 (42000): Syntax error: RANGE PARTITIONING requires definition of VALUES LESS THAN for each partition near ' partition x2)' at line 4
Similar cases
mysql> CREATE TABLE t1 (a int) -> partition by list (a) -> partitions 2 -> (partition x1 values in (4), -> partition x2);
Before:
ERROR 1479 (HY000): Syntax error: LIST PARTITIONING requires definition of VALUES IN for each partition
After:
ERROR 1064 (42000): Syntax error: LIST PARTITIONING requires definition of VALUES IN for each partition near ' partition x2)' at line 4
Changes in AST
LEX
- LEX::create_info has been converted to a pointer for a dynamic allocation.
Removal of intermediate/redundant values
- LEX::length
- LEX::dec
- LEX::change
- LEX::default_value
- LEX::on_update_value
- LEX::charset
- LEX::ref_list
- LEX::interval_list
- LEX::parse_gcol_expr
- LEX::uint_geom_type
- LEX::fk_match_option
- LEX::fk_update_opt
- LEX::fk_delete_opt
- LEX::comment (*)
(*) Note: BINLOG only statement specific string has been introduced instead: LEX::binlog_base64_str
Item tree
Item_null
Before the current WL the Item_null class did the trick to force the EXPLICIT_NULL_FLAG in LEX::type unconditionally.
The current WL removes that trick as well as the Item_null::itemize() function.
PTI_num_literal_num, Item_int
The PTI_num_literal_num node class has been replaced with a more natural Item_int class (by introducing a new Item_int constructor) and removed.
Changes in the Parse tree (also see "New files" below)
The Parse_tree_node_tmpl template has been introduced to allow custom parse context objects in some parts of the parse tree.
For example, partitioning-specific parser does a lot of parse-time reordering of internal partitioning structures in place of the preallocated LEX::part_info object.
A custom parse context allows to do all that intermediate allocation and reordering work in the context object, then we assign the result to LEX::part_info for use in the resolver & executor.
New parse tree node classes
Field identifiers
PT_field_ident | Trivial 1D field identifier |
PT_field_ident_3d | 2D or 3D field identifier (can include DB and table parts) |
Table-wide options in CREATE/ALTER TABLE statements
Class name | Table option |
---|---|
PT_create_max_rows_option | MAX_ROWS |
PT_create_min_rows_option | MIN_ROWS |
PT_create_avg_row_length_option | AVG_ROW_LENGTH |
PT_create_password_option | PASSWORD |
PT_create_commen_option | COMMENT |
PT_create_compress_option | COMPRESSION |
PT_create_encryption_option | ENGRYPTION |
PT_create_auto_increment_option | AUTO_INCREMENT |
PT_create_row_format_option | ROW_FORMAT |
PT_create_insert_method_option | INSERT_METHOD |
PT_create_data_directory_option | DATA DIRECTORY |
PT_create_index_directory_option | INDEX DIRECTORY |
PT_create_tablespace_option | TABLESPACE |
PT_create_connection_option | CONNECTION |
PT_create_key_block_size_option | KEY_BLOCK_SIZE |
PT_create_pack_keys_option | PACK_KEYS |
PT_create_stats_persistent_option | STATS_PERSISTENT |
PT_create_checksum_option | CHECKSUM, TABLE_CHECKSUM |
PT_create_delay_key_write_option | DELAY_KEY_WRITE |
PT_create_table_engine_option | ENGINE |
PT_create_stats_auto_recalc_option | STATS_AUTO_RECALC |
PT_create_stats_stable_pages | STATS_SAMPLE_PAGES |
PT_create_union_option | UNION |
PT_create_storage_option | STORAGE |
PT_create_table_default_charset | [DEFAULT] CHARSET |
PT_create_table_default_collation | [DEFAULT] COLLATE |
Clause nodes and other parts in CREATE/ALTER TABLE
PT_create_table_stmt | a whole CREATE TABLE node |
PT_like_clause | the LIKE clause |
PT_check_constraint | the CHECK clause |
PT_column_def | a column definition |
PT_table_element_list | a column/constraint/index definition list |
New files
check_stack.{h,cc}
The check_stack_overrun() function has been moved to these files to break cyclic dependencies in include files.
parse_error.{h,cc}
These files contains new and existent parse-time error messaging functions:
- my_syntax_error()
- parse_error_at()
- vparse_error_at()
parse_tree_column_attrs.{h,cc}
This file contain definitions for column attribute and field definition nodes (this stuff is related to CREATE/ALTER table column definitions):
CREATE/ALTER TABLE clause nodes
Class name | Clause |
---|---|
PT_field_def | regular (non-generated) column definition |
PT_generated_field_def | generated column definition |
Column Types
Class name | Column attribute |
---|---|
PT_int_type | INT, TINYINT, SMALLINT, MEDIUMINT and BIGINT |
PT_numeric_type | NUMERIC, REAL, DOUBLE, DECIMAL and FIXED |
PT_bit_type | BIT |
PT_boolean_type | BOOL/BOOLEAN |
PT_char_type | CHAR, NCHAR and BINARY |
PT_varchar_type | VARCHAR, NVARCHAR and VARBINARY |
PT_tinyblob_type | TINYBLOB and TINYTEXT |
PT_mediumblob_type | MEDIUMBLOB, MEDIUMTEXT, LONG, LONG VARCHAR and LONG VARBINARY |
PT_blob_type | BLOB |
PT_year_type | YEAR |
PT_date_type | DATE |
PT_time_type | TIME |
PT_datetime_type | TIMESTAMP and DATETIME |
PT_timestamp_type | TIMESTAMP |
PT_geometry_type | GEOMETRY |
PT_geometrycollection_type | GEOMETRYCOLLECTION |
PT_point_type | POINT |
PT_multipoint | MULTIPOINT |
PT_linestring | LINESTRING |
PT_multilinestring | MULTILINESTRING |
PT_polygon | POLYGON |
PT_multipolygon | MULTIPOLYGON |
PT_enum_type | ENUM |
PT_set_type | SET |
PT_json_type | JSON |
Column attributes
Common for all columns
Class name | Column attribute |
---|---|
PT_null_column_attr | NULL |
PT_not_null_column_attr | NOT NULL |
PT_unique_key_column_attr | UNIQUE [KEY] |
PT_primary_key_column_attr | PRIMARY [KEY] |
PT_comment_column_attr | COMMENT ... |
PT_collate_column_attr | COLLATE ... |
Available for regular (not generated) columns only
Class name | Column attribute |
---|---|
PT_default_column_attr | DEFAULT ... |
PT_on_update_column_attr | ON UPDATE ... |
PT_auto_increment_column_attr | AUTO_INCREMENT |
PT_serial_default_value_column_attr | SERIAL DEFAULT VALUE |
PT_column_format_column_attr | COLUMN_FORMAT ... |
PT_storage_media_column_attr | STORAGE ... |
parse_tree_partitions.{h,cc}
Partition options
Class name | Option |
---|---|
PT_partition_comment | COMMENT |
PT_partition_index_directory | INDEX DIRECTORY |
PT_partition_data_directory | DATA DIRECTORY |
PT_partition_min_rows | MIN_ROWS |
PT_partition_max_rows | MAX_ROWS |
PT_partition_nodegroup | NODEGROUP |
PT_partition_engine | [STORAGE] ENGINE |
PT_partition_tablespace | TABLESPACE |
Partition declarations
Class name | Clause |
---|---|
PT_partition | PARTITION |
PT_part_type_def_key | ... BY [LINEAR] KEY |
PT_part_type_def_hash | ... BY [LINEAR] HASH |
PT_part_type_def_range_expr | ... BY RANGE |
PT_part_type_def_range_columns | ... BY RANGE COLUMNS |
PT_part_type_def_list_expr | ... BY LIST |
PT_part_type_def_list_columns | ... BY LIST COLUMNS |
Subpartition declarations
PT_sub_partition_by_hash | SUBPARTITION BY HASH |
PT_sub_partition_by_key | SUBPARTITION BY KEY |
Individual partition options
Class name | Clause |
---|---|
PT_part_definition | PARTITION |
PT_subpartition | SUBPARTITION |
Partitioning values, expressions and lists
Class name | In clause(-s) | ||||
---|---|---|---|---|---|
PT_part_value_item_max | VALUES LESS THAN .../VALUES IN ... | PT_part_value_item_expr | PT_part_value_item_list_paren | ||
PT_part_values_in_item | VALUES IN ... | PT_part_values_in_list |
ALTER TABLE ADD PARTITION stuff
PT_add_partition | ALTER TABLE ADD PARTITION |
PT_add_partition_def_list | ALTER TABLE ADD PARTITION (partition list) |
PT_add_partition_num | ALTER TABLE ADD PARTITION PARTITIONS (n) |
sql_cmd_ddl_table.{h,cc}
Implementation on Sql_cmd_create_table for the CREATE TABLE statement.
Changes in the lexical scanner
Renamed tokes
Currently Bison generates token definitions in a form of #defined numeric constants, and their names have a global scope. Many of them have very common names like "DEFAULT", so if you have to introduce a C++ identifier with such a name (e.g. DEFAULT), you are in a trouble, since the preprocessor substitutes it with a numeric constant once you include lex.h or its friends.
(There is a Bison option to generate token constants in a form of C enum, but it that case we're still in a trouble (slightly less destructive), since that enum also has a global scope. Wrapping of that enum with a C++ namespace doesn't help much with the current parser.)
Thus, the simplest workaround is to add the _SYM suffix to conflicting token names.
The current WL renames these tokens:
- BIGINT
- BINARY
- DATETIME
- DEFAULT
- ENUM
- GEOMETRYCOLLECTION
- INSERT
- LINESTRING
- LONGBLOB
- LONGTEXT
- MEDIUMBLOB
- MEDIUMINT
- MEDIUMTEXT
- MULTILINESTRING
- MULTIPOINT
- MULTIPOLYGON
- ON
- POLYGON
- REAL
- REPLACE
- SET
- SMALLINT
- TIMESTAMP
- TINYBLOB
- TINYINT
- TINYTEXT
- UNSIGNED
- VARBINARY
- ZEROFILL
Removed token
We don't need PARSE_GCOL_EXPR to be a non-reserved keyword -- see "DD quasi-statements" paragraph below.
DD synthetic statements
Grammar tricks those recognized quasi-statements from DD (gcol and partitioning expressions) have been moved from the grammar to the lexer.
From the current WL the lexical scanner can output fake tokens at the start to the parser to go into a separate, quasi-statement specific area -- this way we emulate multiple starting rules in the grammar:
start_entry:
query | GRAMMAR_SELECTOR_EXPR bit_expr END_OF_INPUT | GRAMMAR_SELECTOR_PART partition END_OF_INPUT
...
Before that we had to introduce unique keywords every time we introduce a new non-SQL DD-related expression and parse that expression as very special SQL statements. Also we needed guardian code to protect that quasi-SQL from executing in the regular SQL input.
- GRAMMAR_SELECTOR_PART fake token (not a keyword! invisible outside the lexer
parser!) forces the parser to parse the partitioning expression from DD metadata.
- GRAMMAR_SELECTOR_GCOL does the same for generated column expressions from
DD.
- GRAMMAR_SELECTOR_EXPR is reserved for the future use with a parser of
generic expressions outside SQL.
"Classification" of enums
Our Runtime team does an incremental work on converting #defined numeric constants and C-style enums to C++11 "enum classes". Such a refactoring helps to remove cyclic dependencies in header files: "enum classes" can be referenced by forward references like usual classes. The current WL also needed to resolve some issues related to cyclic references, so it also converts some C enums into "enum classes".
- enum partition_info::enum_key_algorithm --> global enum class
enum_key_algorithm.
- partition_type:
Before | After |
---|---|
enum partition_type {
NOT_A_PARTITION= 0, RANGE_PARTITION, HASH_PARTITION, LIST_PARTITION}; |
enum class partition_type {
NONE= 0, RANGE, HASH, LIST}; |
Mem_root_array changes
An additional template parameter has been added to use custom parent classes (e.g. Sql_alloc).
Partitioning code changes
Parser-time fields have been moved to the new parser_partition_info class.
SP code changes
Removal of redundant fields and functions:
- sp_parser_data::m_expr_start_ptr
- sp_parser_data::pop_expr_start_ptr()
- sp_parser_data::push_expr_start_ptr()