WL#6391: Protect Data Dictionary tables
Affects: Server-8.0 — Status: Complete — Priority: Medium
This WL restricts the availability of DD information to the users. We do not want to expose DD tables directly to users, so access must happen through I_S. Additionally, access to the system schema and tablespace is limited. Motivation ========== There are three main aspects motivating restriction of access to the data dictionary tables, elaborated below. Maintain predictable DD table definitions: Reject altering DD tables -------------------------------------------------------------------- Allowing users to execute DDL operations on DD tables must be avoided. The definitions of the DD tables need to be what the server expects, since these expectations are reflected in the way the server reads and writes the meta data to and from the DD tables. Freely altering the DD tables is likely to cause a malfunctioning server. Maintain meta data correctness: Reject updating DD tables --------------------------------------------------------- Allowing users to execute updates on the DD table contents (i.e., the meta data reflecting e.g. user tables) must also be avoided. Although there are constraints defined to enforce foreign key relationships, key uniqueness etc., there are also numerous constraints that may not be enforced by the data model, but which are verified at the application level as part of SQL statement execution. Validation of names and paths are examples of this. Bypassing this validation may introduce logical corruption of the data model breaking the expected invariants. An additional complication is that a change in the meta data normally done by means of a DDL statement also involves proper synchronization with various meta data caches, and with the relevant storage engine. Bypassing this synchronization by allowing updates of the DD table contents directly is therefore potentially disastrous. Separate interface from implementation: Reject querying DD tables ----------------------------------------------------------------- Allowing users to execute queries directly against the DD tables must also be avoided. If we allow queries directly against the DD tables, this will become a de facto interface that the users will expect to be stable. Hence, this will severely limit the changes we may do to the DD table definitions. On the opposite, if we support a stable interface in terms of information schema views built upon the DD tables, we will both provide the users with a stable interface and infrastructure for accessing the appropriate information, and at the same time have the freedom to change the underlying table definitions to meet new requirements. Solution ======== Based on the motivation above, this worklog prohibits access for users to execute both DDL and DML against the DD tables. User access to meta data is supported by the information schema. Access to executing DDL and DML statements is still supported for statements originating from the server itself, e.g. during initial start and upgrade. The next section of this worklog elaborates on the different categories of DD tables and how their visibility is restricted.
This section categorizes the different database entities (mainly tables), defines relevant levels of visibility restrictions, classifies the DD tables according to the categorization, and describes how the visibility of the different categories is restricted. At the end of this section, functional requirements are listed. Entity classification ===================== The DD tables, schema and tablespace (commonly referred to as entities) are classified as follows, based on the purpose of the entity: Inert entities -------------- This is only the version table, which can never change. Core entities ------------- The entities that are needed to read an arbitrary table definition from disk, i.e., the tables that store table definitions, in addition to the schema and tablespace where these tables belong. A core entity must always be present in the DD cache, because we cannot handle a cache miss without all core entities being present. Second order entities --------------------- The second order entities are those that are needed to read arbitrary server related meta data. E.g. the 'events' table is a second order entity, because the table definition of the 'events' table can be read from disk by using the core entities. Dynamic plugin entities ----------------------- These are entities needed by dynamic plugins and storage engines. The entities are not needed if running without the plugin. Support entities ---------------- The support entities are e.g. the help tables, the time zone information, etc., and the tables needed by the mandatory plugins. Entity creation =============== In addition to the classification above, it is relevant to categorize the entities based on the way they are treated by the server. With the introduction of the global DD, we have two main categories, orthogonal to the classification above. Dictionary internal ------------------- The internal entities are created by the server, during DD initialization, based on table definitions that are hard coded in the source code. Upgrade is managed by the server without using an external client (TBD). This is the way we manage the tables of the global DD. Dictionary external ------------------- The external entities are created by the server, during server initialization, based on table definitions that are hard coded in the source code. Upgrade is managed by an external client. This is the way system tables used to be managed. As tables are moved into the global DD, they are also moved from being dictionary external to being dictionary internal. Visibility restrictions ======================= We classify different types of visibility in terms of whether the entity is accessible in an SQL statement. The SQL statements can be 'internal', meaning that the execution is requested by the server itself, or they can be 'external', meaning that their execution is requested by a user, or a client. Additionally, we distinguish between DML and DDL statements. Based on this, visibility is classified as follows: Internal DDL: The entity definition can be modified by the server itself executing DDL. External DDL: The entity definition can be modified by executing DDL statements received from a client. Internal DML: Entries can be added/updated/read (this applies only to tables) by the server itself, by executing DML statements. External DML: Entries can be added/updated/read (this applies only to tables) by executing DML statements receieved from a client. Regarding P_S and I_S; we hide the non-accessible table entities from the I_S. This means that users are not able to see the existence of the DD tables in SHOW statements or SELECT from I_S views. The existence of the DD tables is visible in the P_S, not in terms of individual tables, but as a subsystem. System schema and tablespace ============================ This worklog implements the following handling of the system schema and tablespace: - All system tables, except some of the plugin tables, are located in the 'mysql' schema. - All system tables are located in the 'mysql' tablespace. The motivation for this is: - Introducing a new schema may conflict with an existing user defined schema. - There may be a conflict with an existing user defined 'mysql' tablespace, this must be detected and handled by the upgrade tool when upgrading from 5.7 to 5.8. Entity classification ===================== The table below classifies the different dictionary internal system tables based on the categorization described above. +----------+------------------------------+ | | Dictionary internal | +----------+------------------------------+ | Inert | version | | Core | catalogs | | | character_sets | | | collations | | | columns | | | column_type_elements | | | foreign_key_column_usage | | | foreign_keys | | | index_column_usage | | | indexes | | | index_partitions | | | schemata | | | table_partitions | | | table_partition_values | | | tables | | | tablespace_files | | | tablespaces | | | view_table_usage | | Second | events | | | parameters | | | parameter_type_elements | | | routines | | | st_spatial_reference_systems | | | triggers | | Support | innodb_index_stats | | | innodb_table_stats | +----------+------------------------------+ The dictionary external tables are shown in the table below. These are not affected by this worklog, but are included here for completeness. +---------------------------+ | Dictionary external | +---------------------------+ | column_stats | | columns_priv | | component | | db | | engine_cost | | func | | general_log | | gtid_executed | | help_category | | help_keyword | | help_relation | | help_topic | | plugin | | procs_priv | | proxies_priv | | server_cost | | servers | | slave_master_info | | slave_relay_log_info | | slave_worker_info | | slow_log | | tables_priv | | time_zone | | time_zone_leap_second | | time_zone_name | | time_zone_transition | | time_zone_transition_type | | user | +---------------------------+ Entity visibility ================= The resulting accessibility in SQL statements (internal and external DDL/DML) is shown below (we only list where the entity is visible, default is that it is hidden). The tables in the 'support' category, being accessible in DML statements, are also visible in I_S queries, unlike the other dictionary internal tables: ---------+---------------------+ | Dictionary internal | ---------+----------+----------+ | DDL | DML | ---------+-----+----+-----+----+ | IN | EX | IN | EX | ---------+-----+----+-----+----+ Inert | X X | Core | X X | Second | X X | Support | X X X | ---------+---------------------+ The long term goal is to gradually move all system tables to become dictionary internal, and to reject external access as far as possible. Error messages ============== When trying to access an entity through an interface where it is not accessible, we raise a new error message: DROP TABLE mysql.tables; ERROR ...: Access to data dictionary entity 'mysql.tables' not permitted. Possible extensions =================== Some of the meta data in the dictionary internal tables is not available through the current I_S implementation. If such meta data is urgently required, the following alternatives may be considered: 1. Use the serialized dictionary information (WL#7069), which can be retrieved from a separate file or from a tablespace file (after WL#7141 is pushed, by using the tool from WL#7066). 2. It might also be possible to implement additional I_S views providing access to the data. None of these alternatives will be implemented or specifically supported by this WL. Functional requirements ======================= F-1. A new error message "Access to data dictionary entity '%s' not permitted." shall be introduced. F-2. An attempt to access dictionary internal tables in a server external DDL statement shall be prohibited, and fail with the error message in F-1. F-3. The dictionary internal tables in the 'support' category shall be reflected in the information schema views or tables. The other categories shall not be reflected. F-4. An attempt to access dictionary internal tables in a server external a) SHOW statement b) DML statement c) VIEW definition d) STORED PROGRAM definition e) PREPARED STATEMENT definition f) TRIGGER definition g) FOREIGN KEY definition shall be accepted for tables in the 'support' category, but shall be prohibited, and fail with the error message in F-1 for tables in any other categories. F-5. An attempt to access the system schema 'mysql' in a server external DDL statement shall be prohibited, and fail with the error message in F-1. F-6. An attempt to access the system tablespace 'mysql' in a server external DDL statement shall be prohibited, and fail with the error message in F-1.
Interfaces changed ================== Referring to the previous section regarding interfaces affected, the changes in more detail are the following: - SQL semantics: Some table names previously allowed are now rejected. - Errors and warnings: One new error message if illegally accessing a DD table. - Install/upgrade: The DD tables can not be upgraded by means of SQL statements executed by an external client. They must be upgraded by the server itself. - MySQL utilities: External clients can not run SQL queries against DD tables. The DD tables not in the 'support' category will not be included in the output from 'mysqldump', which uses information schema queries to decide which tables to include. User-visible consequences ========================= Users need to take into account the changes related to the InnoDB specific tables, which have DDL access prohibited. The tables 'innodb_index_stats' and 'innodb_table_stats' may be referenced in SQL scripts and dump files from 'mysqldump'. These DDL statements must be removed from the dump files, otherwise, restoring the dump will fail. Other DD tables, not in the 'support' category, are not reflected in the dump file. Cross-version replication ========================= Access to the dictionary internal tables must also be prohibited in a cross version replication setting. DDL operations are prohibited implicitly since they are replicated as statements, but row based replication of DML from an older mysqld version may gain access. Suppose e.g. that the older version creates a table called e.g. 'mysql.version', with the same definition as on the newer server version, inserts a record with the same contents as on the newer version, then sets up replication to the newer server version, starting at the latest binlog position (i.e., not replication the CREATE TABLE and INSERT statements). Then, deleting the contents of the version table on the old server version will be replicated and executed on the newer server version. Thus, we need to prohibit this situation.
Remove obsolete structures ========================== The System_tables registry is extended to cover the necessary classification of the various entities. The structures 'mysql_system_tables' and 'known_system_databases' are abandoned. Thread type based access check ============================== Restriction of access to DD tables is done by checking the thread type. To facilitate this, we add a THD method 'bool is_initialize_system_thread()' to check the thread type. This way, the thread that runs the initialization of the DD is allowed to access tables, while other threads are not. In addition to this, we need to check whether a table is a DD table, and whether it is potentially accessible (i.e., the tables in the 'support' category). For these purposes, we have the two Dictionary methods 'bool is_dd_table_name()' and 'bool is_accessible_dd_table_name()'. Implement access check for various statements ============================================= We implement access check as described above in the method 'SELECT_LEX::add_table_to_list()'. This check will catch most use cases described in the QA notes, including LOAD and HANDLER statements. Additionally, there are two situations that require special handling: - Prohibit access from stored programs: Extend 'sp_head::merge_table_list()'. - Prohibit access in foreign keys: Extend 'check_fk_parent_table_access()'. Note that for prepared statements, the '?' placeholders cannot be used for meta data such as schema, table or column names, so the dynamic binding of placeholders does not represent a problem here. Implement access check for replication ====================================== We implement access check as described above also in 'table_def::compatible_with()' in the replication subsystem.
Copyright (c) 2000, 2018, Oracle Corporation and/or its affiliates. All rights reserved.