WL#6379: Schema definitions for new DD

Affects: Server-Prototype Only   —   Status: Complete   —   Priority: Medium

The goal of this WL is to provide schema definitions for new data-dictionary.
Below you can find draft of proposed schema for new data-dictionary.

Open questions are marked as Q1, Q2, ... Notes are marked N1, N2, ...

Let us call object attributes which must be always present for
object and for which it is obvious that it makes perfect sense
to store them in a separate column in a row describing object
"core" attributes (good examples are "schema_id", "name" for
table). There are also other attributes for which the situation
is not that clear, let us call these attributes "extended".

We propose to use the following two approaches to handling
extended attributes in new DD:

1) Use separate columns for:
   a) user-visible attributes which need to be mapped to standalone
      columns in I_S (both non-SE-specific and SE-specific)
      (for example for table's COMMENT).
   b) attributes which need to be indexed (both non-SE-specific/
      SE-specific and user-visible/private).
      (for example for "tables.schema_id" or internal IDs to be used
       by InnoDB for tables and indexes, see more below),

2) Use grouped I_S.TABLES.CREATE_OPTIONS-style columns for:
   a) for user-visible attributes which are mapped to grouped
      columns in I_S (both non-SE-specific and SE-specific).
      (good example is MAX_ROWS attribute).
   b) for SE-specific private attributes (i.e. not mapped to I_S).

*) For non-SE-specific private attributes which don't need to be
   indexed we can choose either of approaches.

*) We can move some of attributes to separate tables, e.g. to
   keep attributes which need to be updated not only during
   DDL statements apart from other more stable attributes.
   (E.g. InnoDB team would like to store snapshots of 'update_time'
   and 'count(*)' in DD. Another possible example is "corrupted"
   attribute which they would like to have).


We assume that default charset in the below table definitions is UTF8,
more specifically 3-byte version of UTF8. The main reasons for choosing
utf8mb3 over utf8mb4 are a) utf8mb3 is already used for name of objects
in MySQL, switching to utf8mb4 means that we have to update our code to
use new sizes of buffers and see if there any other assumptions b) using
utf8mb4 means that we will need 30% more space for various buffers of
static sizes (including record buffer) which might affect performance,
c) IMO it is not likely that characters which are exclusive to utf8mb4
are going to be used in names of system objects,

Also we use BIGINT UNSIGNED for all ID/reference columns for consistency.


CATALOGS
========
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(64) NOT NULL,
created TIMESTAMP NOT NULL,
last_altered TIMESTAMP NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (name)


SCHEMATA
========
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
catalog_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
default_collation_id BIGINT UNSIGNED NOT NULL,
created TIMESTAMP NOT NULL,
last_altered TIMESTAMP NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (catalog_id, name),
FOREIGN KEY (catalog_id) REFERENCES catalogs(id),
FOREIGN KEY (default_collation_id) REFERENCES collations(id)

N1) The plan is to be backward compatible regarding case-sensivity/uniqueness
    and support current behavior controlled by --lower-case-table-names
    option/dependent on platform for objects that currently rely on file-system
    to enforce uniqueness.
    This means that collation used for "name" column in this and other
    similar tables depends on --lower-case-table-names value which we are
    trying to emulate. (utf8_bin for l-c-t-n=0, utf8_general_ci for l-c-t-n=2,
    either of them for l-c-t-n=1).

N2) default_collation_id fully identifies values for both fields
    "default_character_set_name" and "default_collation_name" in
    I_S.SCHEMATA table.
    RoyL suggests to store character set name here and in other
    similar places as well since in the standard character set 
    and collation are orthogonal in principle. Since there is
    no immediate plans to support this in MySQL it is probably
    a good idea to postpone this idea for now.

N3) We can add fields for other default schemata attributes later (e.g.
    DEFAULT_ROW_FORMAT=...).

N4) To implement I_S.SCHEMATA as a view joining this and other table we
    will also need UDF which will determine if user accessing I_S table
    has enough privileges to see specific schema. In theory it should be
    possible to use subqueries on privilege tables to find this out, but
    due to the way our privileges work it is likely to be complicated and
    more expensive.


TABLES
======
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
schema_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
type ENUM('BASE TABLE', 'VIEW', 'SYSTEM VIEW') NOT NULL,
engine VARCHAR(64),
version INT UNSIGNED,
collation_id BIGINT UNSIGNED,
comment VARCHAR(2048) NOT NULL,
hidden BOOL NOT NULL,
options MEDIUMTEXT,
se_private_data MEDIUMTEXT,
se_private_id INT UNSIGNED,
tablespace_id BIGINT UNSIGNED,
partition_type ENUM('AUTO', 'RANGE','LIST','RANGE_COLUMNS','LIST_COLUMNS',
                    'HASH','KEY_51','KEY_55','LINEAR_HASH','LINEAR_KEY_51',
                    'LINEAR_KEY_55'),
default_partitioning ENUM('NO', 'YES', 'NUMBER'); 
partition_expression VARCHAR(2048),
subpartition_type ENUM('HASH','KEY_51','KEY_55','LINEAR_HASH',
                       'LINEAR_KEY_51','LINEAR_KEY_55'),
subpartition_expression VARCHAR(2048),
default_subpartitioning ENUM('NO', 'NAMES', 'NUMBER'); 
created TIMESTAMP NOT NULL,
last_altered TIMESTAMP NOT NULL,
view_definition LONGBLOB,
view_definition_utf8 LONGTEXT,
view_check_option ENUM('NONE', 'LOCAL', 'CASCADED'),
view_is_updatable ENUM('NO', 'YES'),
view_algorithm ENUM('UNDEFINED','TEMPTABLE','MERGE'),
view_security_type ENUM('INVOKER','DEFINER'),
view_definer VARCHAR(77),
view_client_collation_id BIGINT UNSIGNED,
view_connection_collation_id BIGINT UNSIGNED,
PRIMARY KEY (id),
UNIQUE KEY (schema_id, name),
UNIQUE KEY (engine, se_private_id),
KEY(engine),
KEY(type),
FOREIGN KEY (schema_id) REFERENCES schemata(id),
FOREIGN KEY (collation_id) REFERENCES collations(id),
FOREIGN KEY (tablespace_id) REFERENCES tablespaces(id),
FOREIGN KEY (view_client_collation_id) REFERENCES collations(id),
FOREIGN KEY (view_connection_collation_id) REFERENCES collations(id)

N5) We can denormalize the schema a bit by adding catalog and schema
    name to this TABLE. Benchmarks has shown that such denormalization
    speeds up table lookup by catalog/schema/table name by 10-30%
    depending on DD having small-medium size, with huge DD situation
    is reversed. The drawback of such denormalization is that we will
    need to do cascading updates on renames, thus code complexity
    rises. Therefore suggestion is to use normalized schema and
    perform denormalization if there will be push for it from QA/users.
    
N6) We store SE name in text form instead of having id referencing to
    DD.ENGINES table in order to avoid problems with keeping latter
    validated/up to date (think of cases when engine gets new
    capabilities or description). I.e. the main goal of this
    denormalization is to simplify SE-handling code rather than bring
    performance benefit from reduced number of joins. We still have
    I_S.ENGINES which is filled on-the-fly like now.

Q1) Formerly, "version" field was used for version of .FRM format used for
    the table. We can either repurpose this field for storing version of
    table/row format or start using it for showing TABLE_SHARE::mysql_version
    instead. In the former case we probably need separate field for
    mysql_version.

N7) "options", "se_private_data" are lists of key=value pairs separated by 
    spaces. Indeed this means that string values will require proper
    quoting and escaping.

N8) The proposal is to change current approach to storing table and other
    options and store only those options which are supported by SE.
    If we are to keep old approach and to store non-supported options
    as well probably need to have two separate columns  - "options" for
    for storing generic user-visible table options (even those which are
    not supported by SE) and "se_options" for storing SE-specific
    user-visible options.

N9) Extended benchmarks has shown that usage of BLOB/TEXT type for "options"
    and "se_private_data" columns rather than VARCHAR doesn't bring noticiable
    performance penalty for reading their values neither gives much benefit
    (difference jumps around 5-15% on both sides).
    According to Marko there should be no difference from VARCHAR in
    storage of short BLOB/TEXT values.
    The same is true for NDB, i.e. small blobs are inlined, but
    constant for inlining are different. Also NDB has much stricter
    limit for record size (around 14K).
    OTOH Sunny has pointed out that there were quite a few bugs
    related to BLOB-handling in the past.
    Still it is probably an OK idea to use BLOBs to make things easier
    for Cluster and for future extensibility (2048 chars should be enough
    for current options, but what about the future?).

N10) 'se_private_id' is necessary because InnoDB would like to change
     table ID on TRUNCATE TABLE, while data-dictionary needs table ID
     to stay the same to avoid costly/cumbersome updates of all 
     references to it. The same applies to index ID and table-rebuilding
     ALTER TABLE.

N11) I_S.TABLES.row_format field is currently dynamically retrieved from SE,
     so needs to go to non-DD table. This needs to stay because of MyISAM
     compressed tables which are compressed by external tool without touching
     DD/.FRM. Indeed by default SE can simply return "row_format" value stored
     for it as one of table options in DD.

N12) The following I_S.TABLES columns "table_rows", "avg_row_length",
     "data_length", "max_data_length", "index_length", "data_free",
     "auto_increment", "check_time", "update_time" and "checksum" are
     dynamic values and should be retrieved from SE.
     At most we can keep some snapshots of these values in DD, but
     even then IMO it makes sense to keep them in separate table.

N13) MattiasJ proposes to keep partition-related columns in a separate
     table called table_partitioning. In this case we will have to add
     boolean field to DD.TABLES marking partitioned tables, to be able
     correctly show table as partitioned in I_S.TABLES without joining
     table_partitioning.

N14) To implement I_S.TABLES as a view joining this and other table we
     will also need UDF which will determine if user accessing I_S table
     has enough privileges to see specific table.

N15) The idea behind the "hidden" attributes is to allow to hide tables or
     columns from I_S (e.g. DD tables, or FTS-related tables and columns
     in InnoDB).

N16) Since we won't store list of values or ranges in partition_expression
     and subpartition_expression columns, 2048 characters should be enough
     for them (64 characters in column name * 32 columns). OKayed by
     MattiasJ.

N17)  The 'comment' column is non-nullable since we don't differentiate
      empty comment (i.e. '') and non-existing comment currently.

N18) Views are represented by records in one table - TABLES. That means
     that base-table-specific columns in TABLES record for view object have
     to be NULL, and view-specific columns in TABLES record for table
     object have to be NULL.

N..) Note, views were previously represented by records in two tables -
     TABLES (as each view is a table) and VIEWS, but that was changed to
     simplify DD API implementation.

N..) In the future we might want to add a separate tables for attributes,
     which are often changed (for instance: update_time, auto_increment).

     Update from InnoDB: Marko said, InnoDB plans to keep frequently updated
     attributes in memory and put them to table (persistent storage) only
     on shutdown or when a table is evicted from the DD cache.

N19) We use connection charset to store view definition for view execution
     (hence VARBINARY) and version of view definition in UTF8 for showing
     it in I_S.
     We can't store view definition in UTF8 as original view definition
     might contain characters which are not Unicode roundtrip-safe
     (see some examples of such characters from SJIS/CP932 at
     http://dev.mysql.com/doc/refman/5.6/en/charset-asian-sets.html).
     We probably don't want to convert view definition from original
     charset to UTF8 for I_S on the fly as currently this procedure
     involves parsing of view definition and reprinting from Item-tree
     in special mode.

N20) Note that value for "view_is_updatable" column is computed at view
     creation time and recalculated each time view or any of underlying
     views change. This means that I_S.VIEWS can be efficiently
     implemented as a simple view on top of DD.TABLES, but OTOH results in
     more complex locking/ implementation of DDL. To track dependencies
     between view and its underlying views (tables) we use VIEW_TABLE_USAGE
     table.

N..) An index on the "type" column is required to make I_S.VIEWS efficient
     (I_S.VIEWS needs to have all records from the "TABLES" table where
     the "type" column is "VIEW").


COLUMNS
=======
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
table_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
ordinal_position INT UNSIGNED NOT NULL,
type ENUM(...) NOT NULL,
is_nullable BOOL NOT NULL,
is_zerofill BOOL,
is_unsigned BOOL,
char_length INT UNSIGNED,
numeric_precision INT UNSIGNED,
numeric_scale INT UNSIGNED,
datetime_precision INT UNSIGNED,
collation_id BIGINT UNSIGNED,
default_value BLOB,
default_option VARCHAR(32),
update_option VARCHAR(32),
is_auto_increment BOOL,
comment VARCHAR(2048) NOT NULL,
hidden BOOL NOT NULL,
options MEDIUMTEXT,
se_private_data MEDIUMTEXT,
PRIMARY KEY(id),
UNIQUE KEY(table_id, name),
UNIQUE KEY(table_id, ordinal_position),
FOREIGN KEY (table_id) REFERENCES tables(id),
FOREIGN KEY (collation_id) REFERENCES collations(id)

N21) We might need extra fields in order to cache data type and default value in
     I_S-compatible format to simplify mapping to I_S.

N22) To implement I_S.COLUMNS as a view joining this and other tables we
     will also need UDF which will determine if user accessing I_S table
     has enough privileges to see specific column. We will also need
     separate UDF for producing contents of I_S.COLUMNS.PRIVILEGES
     column.

N23) Also I_S.COLUMNS will require joining of this table with INDEX-related
     tables to provide value for I_S.COLUMNS.COLUMN_KEY column.

N24) "is_unsigned", "char_length" and other similar attributes can be NULL
     in cases when specific type of attribute is irrelevant for the
     column type (e.g. is_unsigned is irrelevant for CHAR columns).

N25) We will have to replace "is_auto_increment" field with "sequence_id"
     eventually.

Q2) 'default_option' and 'update_option' are to be used to store
     non-literal default/on update values. The longest value allowed by
     standard is CURRENT_TIMESTAMP(N) so 32 characters should be
     sufficient for standard values. Does it makes sense to make these
     columns longer for future extensibility?

N26) We can't use UTF8 for storing default value since some charsets
     can have characters that are not Unicode-roundtrip-safe.

Q3) Should we combine 'default_option' and 'default_value' in one
    column and store non-expression default values as a literals?

N27) Similarly to TABLES.VIEW_IS_UPDATABLE values of rows representing
     columns belonging to views are calculated at view creation time and
     re-evaluated each time one of underlying tables changes. Thanks to
     this I_S.COLUMNS can work efficiently for both base tables and views,
     while no new restriction on changes to underlying tables are added. Of
     course one has to pay the price for it - DDL implementation/locking
     becomes more complex. To track dependencies between view and its
     underlying tables we use VIEW_TABLE_USAGE table.

N28) The size of maximum size of practical default value for varchar is 65533
     chars. This gives 65533+2 bytes needed to store this value. But this
     obviously means that DD.TABLES will break limit of 64K for one table
     record unless BLOB is used. In most cases this BLOB will be inlined into
     normal row.


INDEXES
=======
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
table_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
type ENUM(PRIMARY, UNIQUE, MULTIPLE, FULLTEXT, SPATIAL) NOT NULL,
algorithm ENUM(UNDEF, BTREE, RTREE, HASH, FULLTEXT) NOT NULL,
is_generated BOOL NOT NULL,
hidden BOOL NOT NULL,
ordinal_position INT UNSIGNED NOT NULL,
comment VARCHAR(2048) NOT NULL,
options MEDIUMTEXT,
se_private_data MEDIUMTEXT,
se_private_id INT UNSIGNED,
tablespace_id BIGINT UNSIGNED,
engine VARCHAR(64),
PRIMARY KEY(id),
UNIQUE KEY(table_id, name),
UNIQUE KEY(engine, se_private_id),
FOREIGN KEY (table_id) REFERENCES tables(id),
FOREIGN KEY (tablespace_id) REFERENCES tablespaces(id)

N..) The UNDEF enum value of the 'algorithm' column reflects the current
     behavior, when we store UNDEF in the FRM files. UNDEF is then resolved
     to a particular value using the table storage engine during opening
     of the table. The current behavior has the following user-visible
     effects:

       - SHOW CREATE TABLE does not explicitly specify index algorithm
         if it's UNDEF;

       - The actual index algorithm depends on the actual storage engine
         at the open-table time. That means, that if the storage engine
         has been changed for a table, the actual index algorithm might
         change.

         In other words, different storage engines support different
         index algorithms and thus have different default values for
         undefined index algorithm. So, a table might change its storage
         engine, and its index algorithms will be adjusted automatically.

     However, the current behavior is considered outdated and there is a
     wish to change it. Changing storage engines nowadays is a serious
     task, and if one wants to change storage engine, one should be ready
     to update index definitions as well.

     Thus, the proposed (new) behavior is to resolve undefined index
     algorithm at CREATE TABLE time (i.e. in the parser) using
     corresponding storage engine and store the actual value in the DD.
     So, the UNDEF enum value is expected to be removed once that new
     behavior is implemented.

N..) MyISAM differentiates indexes by their ordinal position and not name,
     The field 'ordinal_position' is added for the same.

     The field 'ordinal_position' also helps in scenario when
     InnoDB adds a implicit PK index, which is required to appear as the
     first index among the indexes available on a table. 


INDEX_COLUMN_USAGE
==================
index_id BIGINT UNSIGNED NOT NULL,
ordinal_position INT UNSIGNED NOT NULL,
column_id BIGINT UNSIGNED NOT NULL,
length INT UNSIGNED,
order ENUM('ASC', 'DESC') NOT NULL,
hidden BOOL NOT NULL,
UNIQUE KEY (index_id, ordinal_position),
UNIQUE KEY (index_id, column_id, hidden),
FOREIGN KEY (index_id) REFERENCES indexes(id),
FOREIGN KEY (column_id) REFERENCES columns(id)

N30) KEY_PART structure also has key_type, key_part_flag, offset fields,
     but they can be derived from column attributes.
N31) The order attribute is neccessary to support future descending indexes

N..) There are possibilities of a index having two elements using
     same column, e.g: In case we have,
        =>  CREATE TABLE t1 (c CHAR(10) PRIMARY KEY, KEY(c(5)));

     InnoDB would add index element c(10) to secondary key,
     which would make it look like KEY(c(5), c(10)). So we have,
       * A non-hidden index element 'c(5)' explicitly provided by user.
       * A hidden index element 'c(10)', implicitly added
         by InnoDB from a PK.
     This creates need of 'hidden' in index,
     UNIQUE KEY (index_id, column_id, hidden),



VIEW_TABLE_USAGE
================
view_id BIGINT UNSIGNED NOT NULL,
table_catalog VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
table_schema VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
table_name VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
PRIMARY KEY (view_id, table_catalog, table_schema, table_name),
FOREIGN KEY (view_id) REFERENCES tables(id),
KEY (table_catalog, table_schema, table_name)

N32) We need this table to track dependencies between views and their
     underlying tables. At some point we might add columns for tracking
     parts of query in which table is used if necessary.


COLUMN_TYPE_ELEMENTS
====================
column_id BIGINT UNSIGNED NOT NULL,
element_index INT UNSIGNED NOT NULL,
name VARBINARY(255) NOT NULL,
PRIMARY KEY(column_id, element_index),
UNIQUE KEY(column_id, name),
FOREIGN KEY (column_id) REFERENCES columns(id)

N33) While now it is techically possible to create table with enum column
     with a single element which name has length of 65175 bytes and length
     of 65176 bytes is disallowed (don't ask!), supporting such limit is
     hardly practical. The proposed limited is 255 characters. We will
     need to add deprecation warning for too long names to 5.6.
     The same applies to SET type.
N34) We use INT for "index" field for ENUM and SET for future extensibility.

N35) Since ENUM/SET names can be in any charset we use VARBINARY to store
     them.

N..) the 'element_index' name is used since 'index' is a reserved word.
     This field has semantics, thus it's not named 'ordinal_position'
     as other surrogate fields.

N..) this table contains elements both for ENUM and SET column data types.


FOREIGN_KEYS
============
id BIGINT NOT NULL AUTO_INCREMENT,
schema_id BIGINT UNSIGNED NOT NULL,
table_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
unique_constraint_id BIGINT UNSIGNED NOT NULL,
match_option ENUM('NONE', 'PARTIAL', 'FULL') NOT NULL,
update_rule ENUM('NO ACTION', 'RESTRICT', 'CASCADE', 'SET NULL', 'SET DEFAULT')
NOT NULL,
delete_rule ENUM('NO ACTION', 'RESTRICT', 'CASCADE', 'SET NULL', 'SET DEFAULT')
NOT NULL,
referenced_table_catalog VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
referenced_table_schema VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
referenced_table_name VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
PRIMARY KEY (id),
UNIQUE KEY (schema_id, name),
UNIQUE KEY (table_id, name),
FOREIGN KEY (schema_id) REFERENCES schemata(id),
FOREIGN KEY (unique_constraint_id) REFERENCING indexes(id) 

N36) Specifying fixed "unique_constraint_id" simplifies mapping to
     I_S.REFERENTIAL_CONSTRAINTS but increases complexity. Its value
     can be evaluated at FK creation time but might have to be updated
     each time when we re-shuffle UNIQUE/PRIMARY keys in parent table.

Q4) We need to check collation used for "name" column in InnoDB sources.

N37) InnoDB team suggests to use index_id to identify child columns in FK,
     but this idea has problems with storing ordinal position of columns in
     FK.

N38) Note that to preserve backward compatibility we need to be able to
     create/store FKs referencing tables which are not yet created. Thus
     We can't reference parent catalog/schema/table by ID and have to store
     its name instead. The same applies to 'referenced_column_name' field in
     FOREIGN_KEY_COLUMN_USAGE table.
     Of course this means that renaming of parent table and columns in parent
     table becomes more complex.

N..) Note that according to The Standard, table_id is not required in this
     table, but having it simplifies our implementation.

FOREIGN_KEY_COLUMN_USAGE
========================
foreign_key_id BIGINT UNSIGNED NOT NULL,
ordinal_position INT UNSIGNED NOT NULL,
column_id BIGINT UNSIGNED NOT NULL,
referenced_column_name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
UNIQUE KEY(foreign_key_id, column_id, referenced_column_name),
UNIQUE KEY(foreign_key_id, ordinal_position),
FOREIGN KEY (foreign_key_id) REFERENCES foreign_keys(id),
FOREIGN KEY (column_id) REFERENCES columns(id)


TABLE_PARTITIONS
================
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
table_id BIGINT UNSIGNED NOT NULL,
parent_partition_id UNSIGNED,
number SMALLINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
comment VARCHAR(2048) NOT NULL,
options MEDIUMTEXT,
se_private_data MEDIUMTEXT,
se_private_id INT UNSIGNED,
tablespace_id BIGINT UNSIGNED,
PRIMARY KEY(id),
UNIQUE KEY(table_id, name),
UNIQUE KEY(table_id, level, number),
UNIQUE KEY (se_private_id),
FOREIGN KEY (table_id) REFERENCES tables(id),
FOREIGN KEY (tablespace_id) REFERENCES tablespaces(id),
FOREIGN KEY (parent_partition_id) REFERENCES table_partitions(id)


TABLE_PARTITION_VALUES
======================
partition_id BIGINT UNSIGNED NOT NULL,
list_num TINYINT UNSIGNED NOT NULL,
column_num TINYINT UNSIGNED NOT NULL,
value BLOB NOT NULL,
max_value BOOL,
PRIMARY KEY (partition_id, list_num, column_num),
FOREIGN KEY (partition_id) REFERENCES table_partitions(id)


N39) We can't join list_num and column_num and use ordinal_position instead
     as in some cases they will be used at the same time.


INDEX_PARTITIONS
================
partition_id BIGINT UNSIGNED NOT NULL,
index_id BIGINT UNSIGNED NOT NULL,
options MEDIUMTEXT,
se_private_data MEDIUMTEXT,
se_private_id INT UNSIGNED,
tablespace_id BIGINT UNSIGNED,
PRIMARY KEY (partition_id, index_id),
UNIQUE KEY (se_private_id),
FOREIGN KEY (partition_id) REFERENCES table_partitions(id),
FOREIGN KEY (index_id) REFERENCES indexes(id),
FOREIGN KEY (tablespace_id) REFERENCES tablespaces(id)


TRIGGERS
========
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
schema_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_bin/utf8_general_ci,
event_type ENUM('INSERT','UPDATE','DELETE') NOT NULL,
table_id BIGINT UNSIGNED NOT NULL,
action_timing ENUM('BEFORE','AFTER') NOT NULL,
action_order INT UNSIGNED NOT NULL,
action_statement LONGBLOB NOT NULL,
action_statement_utf8 LONGTEXT NOT NULL,
created TIMESTAMP NOT NULL,
last_altered TIMESTAMP NOT NULL,
sql_mode SET(...) NOT NULL,
definer VARCHAR(77) NOT NULL,
client_collation_id BIGINT UNSIGNED NOT NULL,
connection_collation_id BIGINT UNSIGNED NOT NULL,
schema_collation_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (schema_id, name),
UNIQUE KEY (table_id, event_type, action_timing, action_order),
FOREIGN KEY (schema_id) REFERENCES schemata(id),
FOREIGN KEY (table_id) REFERENCES tables(id),
FOREIGN KEY (client_collation_id) REFERENCES collations(id),
FOREIGN KEY (connection_collation_id) REFERENCES collations(id),
FOREIGN KEY (schema_collation_id) REFERENCES collations(id),

N40) 'action_statement' length is aligned with maximum length of body
     for stored routines. We need both 'action_statement' and
     'action_statement_utf8' since we want to both correctly parse
     trigger body and show readable version in I_S (because trigger
     definition can contain charset introducers and an characters
     that are Unicode-roundtrip-unsafe).


ROUTINES
========
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
schema_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
type ENUM('FUNCTION', 'PROCEDURE') NOT NULL,
result_data_type ENUM(...),
result_is_zerofill BOOL,
result_is_unsigned BOOL,
result_char_length INT UNSIGNED,
result_numeric_precision INT UNSIGNED,
result_numeric_scale INT UNSIGNED,
result_datetime_precision INT UNSIGNED,
result_collation_id BIGINT UNSIGNED,
definition LONGBLOB,
definition_utf8 LONGTEXT,
is_deterministic BOOL NOT NULL,
sql_data_access ENUM('CONTAINS_SQL', 'NO_SQL', 'READS_SQL_DATA',
'MODIFIES_SQL_DATA') NOT NULL,
security_type ENUM('INVOKER', 'DEFINER') NOT NULL,
definer VARCHAR(77) NOT NULL,
sql_mode SET(...) NOT NULL,
client_collation_id BIGINT UNSIGNED NOT NULL,
connection_collation_id BIGINT UNSIGNED NOT NULL,
schema_collation_id BIGINT UNSIGNED NOT NULL,
created TIMESTAMP NOT NULL,
last_altered TIMESTAMP NOT NULL,
comment VARCHAR(2048) NOT NULL,
PRIMARY KEY(id),
UNIQUE KEY (schema_id, name, type),
FOREIGN KEY (schema_id) REFERENCES schemata(id),
FOREIGN KEY (result_collation_id) REFERENCES collations(id),
FOREIGN KEY (client_collation_id) REFERENCES collations(id),
FOREIGN KEY (connection_collation_id) REFERENCES collations(id),
FOREIGN KEY (schema_collation_id) REFERENCES collations(id)

N41) Since NOT NULL clause represents constraint and not a part of data type,
     it is always possible to return NULL from stored function. Hence
     "result_is_nullable" is irrelevant.

N42) 'definition' and 'definition_utf8' will contain routine body
     and not the whole definition (this aligns well with standard).

N43) This table is the only case when 'comment' length is decreased
     as compared to current situation.


PARAMETERS
==========
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
routine_id BIGINT UNSIGNED NOT NULL,
ordinal_position INT UNSIGNED NOT NULL,
mode ENUM('IN','OUT','INOUT') NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
data_type ENUM(...) NOT NULL,
is_zerofill BOOL,
is_unsigned BOOL,
char_length INT UNSIGNED,
numeric_precision INT UNSIGNED,
numeric_scale INT UNSIGNED,
datetime_precision INT UNSIGNED,
collation_id BIGINT UNSIGNED,
PRIMARY KEY(id),
UNIQUE KEY(routine_id, ordinal_position),
FOREIGN KEY (routine_id) REFERENCES routines(id),
FOREIGN KEY (collation_id) REFERENCES collations(id)

N44) The 'id' column for this table is not strictly necessary and was
     added mostly for uniformity.


EVENTS
======
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
schema_id BIGINT UNSIGNED NOT NULL,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
definer VARCHAR(77) NOT NULL,
time_zone VARCHAR(64) NOT NULL,
definition LONGBLOB NOT NULL,
definition_utf8 LONGTEXT NOT NULL,
execute_at DATETIME,
interval_value INT,
interval_field
ENUM('YEAR','QUARTER','MONTH','DAY','HOUR','MINUTE','WEEK','SECOND','MICROSECOND
','YEAR_MONTH','DAY_HOUR','DAY_MINUTE','DAY_SECOND','HOUR_MINUTE','HOUR_SECOND',
'MINUTE_SECOND','DAY_MICROSECOND','HOUR_MICROSECOND','MINUTE_MICROSECOND','SECON
D_MICROSECOND'),
sql_mode SET(...) NOT NULL,
starts DATETIME,
ends DATETIME,
status ENUM('ENABLED','DISABLED','SLAVESIDE_DISABLED') NOT NULL,
on_completion ENUM('DROP','PRESERVE') NOT NULL,
created TIMESTAMP NOT NULL,
last_altered TIMESTAMP NOT NULL,
last_executed DATETIME,
comment VARCHAR(2048) NOT NULL,
originator INT UNSIGNED NOT NULL,
client_collation_id BIGINT UNSIGNED NOT NULL,
connection_collation_id BIGINT UNSIGNED NOT NULL,
schema_collation_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY(id),
UNIQUE KEY(schema_id, name),
FOREIGN KEY (schema_id) REFERENCES schemata(id),
FOREIGN KEY (client_collation_id) REFERENCES collations(id),
FOREIGN KEY (connection_collation_id) REFERENCES collations(id),
FOREIGN KEY (schema_collation_id) REFERENCES collations(id)

N45) time_zone can contain:
     - name of time zone from TIME_ZONE_NAMES table (e.g. 'Europe/Moscow')
     - time zone description in the form of offset ([+|-]HH:MM)
     - name of system time zone ('SYSTEM').

Q5) Last executed field can be updated fairly often, does it makes
    sense to move it into separate table?


COLLATIONS
==========
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
character_set_id BIGINT UNSIGNED NOT NULL,
is_compiled BOOL NOT NULL,
sort_length INT UNSIGNED NOT NULL,
PRIMARY KEY(id),
UNIQUE KEY(name),
FOREIGN KEY (character_set_id) REFERENCES character_sets(id)


CHARACTER_SETS
==============
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(64) NOT NULL COLLATE utf8_general_ci,
default_collation_id BIGINT UNSIGNED NOT NULL,
comment VARCHAR(2048) NOT NULL,
mb_max_length UNSIGNED INT NOT NULL,
PRIMARY KEY(id),
UNIQUE KEY(name),
FOREIGN KEY (default_collation_id) REFERENCES collations(id)

N46) The above we assume that legacy collation IDs are directly mapped to
     collation IDs.

N47) Note that a) collations are now described by share/charsets/Index.xml
     file b) they can added to server by editing this/other .xml files
     without recompilation c) adding some collations requires adding .c
     files to MySQL source code. The proposal is to get rid of file-based
     descriptions and to store all information about character sets/collations
     in tables. This means that some additional columns will be necessary.


TABLESPACES
===========
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(512) NOT NULL COLLATE utf8_bin/utf8_general_ci,
options MEDIUMTEXT,
se_private_data MEDIUMTEXT,
se_private_id INT UNSIGNED,
comment VARCHAR(2048) NOT NULL,
engine VARCHAR(64),
PRIMARY KEY(id),
UNIQUE KEY (engine, se_private_id),
UNIQUE KEY(name)

N48) "name" can be automatically generated for example by combining schema,
     table and index names or using its filename.

N49) se_private_id could hold space_id generated by InnoDB.
     se_private_id cannot be stored within se_private_data as InnoDB might
     need to do index lookup based on space_id.
     se_private_id can be NULL, when there a different engine
     other then InnoDB creates a tablespace. E.g., NDB could
     create a tablespace and store tablespace id in se_private_data.
     TODO: What if NDB also needs a index on tablespace id's that
           is generated by NDB ?

N..) the tentative decision is to have tablespaces in the catalog scope,
     i.e. one catalog may contain several tablespaces, one tablespace belongs
     to the one catalog. If that decision becomes final (post 5.7), a new
     column will need to be added to TABLESPACES table: catalog_id. The unique
     key on name will also nees to include catalog_id column.

TABLESPACE_FILES
================
tablespace_id BIGINT UNSIGNED NOT NULL,
ordinal_position INT UNSIGNED NOT NULL,
file_name VARCHAR(512) NOT NULL,
se_private_data MEDIUMTEXT,
UNIQUE KEY (tablespace_id, file_number),
UNIQUE KEY (file_name),
FOREIGN KEY (tablespace_id) REFERENCES tablespaces(id)

N50) Definitions of tablespace-related tables might have to
     be adjusted once feedback from Cluster team becomes available.
Marko thinks that tablespace files should be named space_id.ibd by default,
e.g., 0000c001.ibd. This probably means that the schema should expose the
internal identifier numbers (tablespace, table, index) to the DBA.

If we use internal identifiers in file names, we would get rid of the
filename-safe encoding that was introduced in MySQL 5.1. For example, # gets
translated into @0023, and ‘old-style’ names get the #mysql50# prefix. We would
also be able to allow full Unicode (4-byte UTF-8) in all identifier names.

There could be an option for user-specified tablespace names. Then it could be
the user’s responsibility to assign names that work on the file system.