WL#6391: Protect Data Dictionary tables

Affects: Server-8.0   —   Status: Complete

This WL restricts the availability of DD information to the users. We do not
want to expose DD tables directly to users, so access must happen through I_S.
Additionally, access to the system schema and tablespace is limited.


Motivation
==========
There are three main aspects motivating restriction of access to the data
dictionary tables, elaborated below.

Maintain predictable DD table definitions: Reject altering DD tables
--------------------------------------------------------------------
Allowing users to execute DDL operations on DD tables must be avoided. The
definitions of the DD tables need to be what the server expects, since these
expectations are reflected in the way the server reads and writes the meta data
to and from the DD tables. Freely altering the DD tables is likely to cause a
malfunctioning server.

Maintain meta data correctness: Reject updating DD tables
---------------------------------------------------------
Allowing users to execute updates on the DD table contents (i.e., the meta data
reflecting e.g. user tables) must also be avoided. Although there are
constraints defined to enforce foreign key relationships, key uniqueness etc.,
there are also numerous constraints that may not be enforced by the data model,
but which are verified at the application level as part of SQL statement
execution. Validation of names and paths are examples of this. Bypassing this
validation may introduce logical corruption of the data model breaking the
expected invariants. An additional complication is that a change in the meta
data normally done by means of a DDL statement also involves proper
synchronization with various meta data caches, and with the relevant storage
engine. Bypassing this synchronization by allowing updates of the DD table
contents directly is therefore potentially disastrous.

Separate interface from implementation: Reject querying DD tables
-----------------------------------------------------------------
Allowing users to execute queries directly against the DD tables must also be
avoided. If we allow queries directly against the DD tables, this will become a
de facto interface that the users will expect to be stable. Hence, this will
severely limit the changes we may do to the DD table definitions. On the
opposite, if we support a stable interface in terms of information schema views
built upon the DD tables, we will both provide the users with a stable interface
and infrastructure for accessing the appropriate information, and at the same
time have the freedom to change the underlying table definitions to meet new
requirements.


Solution
========
Based on the motivation above, this worklog prohibits access for users to
execute both DDL and DML against the DD tables. User access to meta data is
supported by the information schema. Access to executing DDL and DML statements
is still supported for statements originating from the server itself, e.g.
during initial start and upgrade.

The next section of this worklog elaborates on the different categories of
DD tables and how their visibility is restricted.
This section categorizes the different database entities (mainly tables),
defines relevant levels of visibility restrictions, classifies the DD tables
according to the categorization, and describes how the visibility of the
different categories is restricted. At the end of this section, functional
requirements are listed.

Entity classification
=====================
The DD tables, schema and tablespace (commonly referred to as entities) are
classified as follows, based on the purpose of the entity:

Inert entities
--------------
This is only the version table, which can never change.

Core entities
-------------
The entities that are needed to read an arbitrary table definition from disk,
i.e., the tables that store table definitions, in addition to the schema and
tablespace where these tables belong. A core entity must always be present in
the DD cache, because we cannot handle a cache miss without all core entities
being present.

Second order entities
---------------------
The second order entities are those that are needed to read arbitrary server
related meta data. E.g. the 'events' table is a second order entity, because the
table definition of the 'events' table can be read from disk by using the core
entities.

Dynamic plugin entities
-----------------------
These are entities needed by dynamic plugins and storage engines. The
entities are not needed if running without the plugin.

Support entities
----------------
The support entities are e.g. the help tables, the time zone information, etc.,
and the tables needed by the mandatory plugins.


Entity creation
===============
In addition to the classification above, it is relevant to categorize the
entities based on the way they are treated by the server. With the introduction
of the global DD, we have two main categories, orthogonal to the classification
above.

Dictionary internal
-------------------
The internal entities are created by the server, during DD initialization, based
on table definitions that are hard coded in the source code. Upgrade is managed
by the server without using an external client (TBD). This is the way we manage
the tables of the global DD.

Dictionary external
-------------------
The external entities are created by the server, during server initialization,
based on table definitions that are hard coded in the source code. Upgrade is
managed by an external client. This is the way system tables used to be managed.
As tables are moved into the global DD, they are also moved from being
dictionary external to being dictionary internal.


Visibility restrictions
=======================
We classify different types of visibility in terms of whether the entity is
accessible in an SQL statement. The SQL statements can be 'internal', meaning
that the execution is requested by the server itself, or they can be 'external',
meaning that their execution is requested by a user, or a client. Additionally,
we distinguish between DML and DDL statements.

Based on this, visibility is classified as follows:

Internal DDL: The entity definition can be modified by the server itself
              executing DDL.
External DDL: The entity definition can be modified by executing DDL 
              statements received from a client.
Internal DML: Entries can be added/updated/read (this applies only to 
              tables) by the server itself, by executing DML statements.
External DML: Entries can be added/updated/read (this applies only to 
              tables) by executing DML statements receieved from a client.

Regarding P_S and I_S; we hide the non-accessible table entities from the I_S.
This means that users are not able to see the existence of the DD tables in SHOW
statements or SELECT from I_S views. The existence of the DD tables is visible
in the P_S, not in terms of individual tables, but as a subsystem.


System schema and tablespace
============================
This worklog implements the following handling of the system schema and tablespace:

- All system tables, except some of the plugin tables, are located in the 
  'mysql' schema.
- All system tables are located in the 'mysql' tablespace.

The motivation for this is:

- Introducing a new schema may conflict with an existing user defined schema.
- There may be a conflict with an existing user defined 'mysql' tablespace, 
  this must be detected and handled by the upgrade tool when upgrading from
  5.7 to 5.8.


Entity classification
=====================
The table below classifies the different dictionary internal system tables based
on the categorization described above.

+----------+------------------------------+
|          | Dictionary internal          |
+----------+------------------------------+
| Inert    | version                      |
| Core     | catalogs                     |
|          | character_sets               |
|          | collations                   |
|          | columns                      |
|          | column_type_elements         |
|          | foreign_key_column_usage     |
|          | foreign_keys                 |
|          | index_column_usage           |
|          | indexes                      |
|          | index_partitions             |
|          | schemata                     |
|          | table_partitions             |
|          | table_partition_values       |
|          | tables                       |
|          | tablespace_files             |
|          | tablespaces                  |
|          | view_table_usage             |
| Second   | events                       |
|          | parameters                   |
|          | parameter_type_elements      |
|          | routines                     |
|          | st_spatial_reference_systems |
|          | triggers                     |
| Support  | innodb_index_stats           |
|          | innodb_table_stats           |
+----------+------------------------------+

The dictionary external tables are shown in the table below. These are not
affected by this worklog, but are included here for completeness.

+---------------------------+
|    Dictionary external    |
+---------------------------+
| column_stats              |
| columns_priv              |
| component                 |
| db                        |
| engine_cost               |
| func                      |
| general_log               |
| gtid_executed             |
| help_category             |
| help_keyword              |
| help_relation             |
| help_topic                |
| plugin                    |
| procs_priv                |
| proxies_priv              |
| server_cost               |
| servers                   |
| slave_master_info         |
| slave_relay_log_info      |
| slave_worker_info         |
| slow_log                  |
| tables_priv               |
| time_zone                 |
| time_zone_leap_second     |
| time_zone_name            |
| time_zone_transition      |
| time_zone_transition_type |
| user                      |
+---------------------------+


Entity visibility
=================
The resulting accessibility in SQL statements (internal and external DDL/DML) is
shown below (we only list where the entity is visible, default is that it is
hidden). The tables in the 'support' category, being accessible in DML
statements, are also visible in I_S queries, unlike the other dictionary
internal tables:

---------+---------------------+
         | Dictionary internal |
---------+----------+----------+
         |   DDL    |   DML    |
---------+-----+----+-----+----+
         | IN  | EX | IN  | EX |
---------+-----+----+-----+----+
Inert    |  X          X       |
Core     |  X          X       |
Second   |  X          X       |
Support  |  X          X    X  |
---------+---------------------+

The long term goal is to gradually move all system tables to become dictionary
internal, and to reject external access as far as possible.


Error messages
==============
When trying to access an entity through an interface where it is not accessible,
we raise a new error message:

DROP TABLE mysql.tables;
ERROR ...: Access to data dictionary entity 'mysql.tables' not permitted.


Possible extensions
===================
Some of the meta data in the dictionary internal tables is not available through
the current I_S implementation. If such meta data is urgently required, the
following alternatives may be considered:

1. Use the serialized dictionary information (WL#7069), which can be retrieved 
   from a separate file or from a tablespace file (after WL#7141 is pushed, by 
   using the tool from WL#7066).
2. It might also be possible to implement additional I_S views providing access 
   to the data.

None of these alternatives will be implemented or specifically supported by this WL.


Functional requirements
=======================
F-1. A new error message "Access to data dictionary entity '%s' not permitted."
     shall be introduced.

F-2. An attempt to access dictionary internal tables in a server external DDL 
     statement shall be prohibited, and fail with the error message in F-1.

F-3. The dictionary internal tables in the 'support' category shall be 
     reflected in the information schema views or tables. The other categories
     shall not be reflected.

F-4. An attempt to access dictionary internal tables in a server external

     a) SHOW statement
     b) DML statement
     c) VIEW definition
     d) STORED PROGRAM definition
     e) PREPARED STATEMENT definition
     f) TRIGGER definition
     g) FOREIGN KEY definition

     shall be accepted for tables in the 'support' category, but shall be 
     prohibited, and fail with the error message in F-1 for tables in any other 
     categories.

F-5. An attempt to access the system schema 'mysql' in a server external DDL 
     statement shall be prohibited, and fail with the error message in F-1.
F-6. An attempt to access the system tablespace 'mysql' in a server external 
     DDL statement shall be prohibited, and fail with the error message in F-1.
Interfaces changed
==================
Referring to the previous section regarding interfaces affected, the changes in 
more detail are the following:

- SQL semantics: Some table names previously allowed are now rejected.
- Errors and warnings: One new error message if illegally accessing a DD table.
- Install/upgrade: The DD tables can not be upgraded by means of SQL statements
  executed by an external client. They must be upgraded by the server itself.
- MySQL utilities: External clients can not run SQL queries against DD tables. 
  The DD tables not in the 'support' category will not be included in the output
  from 'mysqldump', which uses information schema queries to decide which tables
  to include.

User-visible consequences
=========================
Users need to take into account the changes related to the InnoDB specific
tables, which have DDL access prohibited. The tables 'innodb_index_stats'
and 'innodb_table_stats' may be referenced in SQL scripts and dump files from
'mysqldump'. These DDL statements must be removed from the dump files,
otherwise, restoring the dump will fail. Other DD tables, not in the 'support'
category, are not reflected in the dump file.

Cross-version replication
=========================
Access to the dictionary internal tables must also be prohibited in a cross
version replication setting. DDL operations are prohibited implicitly since they
are replicated as statements, but row based replication of DML from an older
mysqld version may gain access.

Suppose e.g. that the older version creates a table called e.g. 'mysql.version',
with the same definition as on the newer server version, inserts a record with
the same contents as on the newer version, then sets up replication to the newer
server version, starting at the latest binlog position (i.e., not replication
the CREATE TABLE and INSERT statements). Then, deleting the contents of the
version table on the old server version will be replicated and executed on the
newer server version. Thus, we need to prohibit this situation.
Remove obsolete structures
==========================
The System_tables registry is extended to cover the necessary classification of
the various entities. The structures 'mysql_system_tables' and
'known_system_databases' are abandoned.


Thread type based access check
==============================
Restriction of access to DD tables is done by checking the thread type. To
facilitate this, we add a THD method 'bool is_initialize_system_thread()' to
check the thread type. This way, the thread that runs the initialization of the
DD is allowed to access tables, while other threads are not.

In addition to this, we need to check whether a table is a DD table, and whether
it is potentially accessible (i.e., the tables in the 'support' category). For
these purposes, we have the two Dictionary methods 'bool is_dd_table_name()' and
'bool is_accessible_dd_table_name()'. 


Implement access check for various statements
=============================================
We implement access check as described above in the method
'SELECT_LEX::add_table_to_list()'. This check will catch most use cases
described in the QA notes, including LOAD and HANDLER statements. Additionally,
there are two situations that
require special handling: 

- Prohibit access from stored programs: Extend 'sp_head::merge_table_list()'.
- Prohibit access in foreign keys: Extend 'check_fk_parent_table_access()'.

Note that for prepared statements, the '?' placeholders cannot be used for meta
data such as schema, table or column names, so the dynamic binding of
placeholders does not represent a problem here.


Implement access check for replication
======================================
We implement access check as described above also in
'table_def::compatible_with()' in the replication subsystem.