WL#4165: Prepared statements: validation
                    Affects: Server-5.1
                                              —  
                        Status: Complete
                
    When executing a prepared statement, ensure that its parsed tree is valid by checking that metadata of all used objects (tables, views, etc) did not change. If there is a change, attempt to reprepare the prepared statement (WL#4166) and execute it again. Necessary to ensure continuous operation of prepared statements in presense of DDL. A pre-requisite for a fix for BUG#27430. See related WL#4166.
Validation of prepared statement metadata does not introduce functionality that is visible to user, since it's performed in conjunction with automatic statement reprepare (WL#4166). See the LLD section for the architecture description. A new error is added (it never reaches the user, for details see WL#4166): ER_NEED_REPREPARE eng "Prepared statement needs to be re-prepared"
Validation of prepared statements --------------------------------- When opening and locking tables for execution of a prepared statement, validate metadata of used tables: ensure that it did not change since statement prepare. For that: At prepare, remember the version and type of each used base, temporary, information_schema table or a view. At execute, check that the version and type did not change. In case the version/type did change, report an error. Pseudocode ---------- Both prepare and execute already perform the following functionality in procedure open_tables(): for each (table_list in) { let table= find or insert a table definition cache element where element.name == table_list.name; if (table is NULL) ERROR; let table_list->table= table; } This loop is extended to validate table metadata: for each (table_list in
) { let table= find or insert a table definition cache element where element.name == table_list.name; if (table is NULL) ERROR; + if (check_and_update_table_ref_version(table_list, table->table_share) + ERROR; let table_list->table= table; } (word table is meant in general sense). At prepared statement prepare, check_and_update_table_ref_version() records the current version value in the table_list. At prepared statement execute, check_and_update_table_ref_version() compares the version in the new share with the recorded value and yields an error in case of mismatch. Note, that an error must be and is reported only for DML statements: DDL statements, such as DROP TABLE, naturally change the metadata version of its subject table, and thus expectedly fail the validation. At the same time execution of DDL statements is not frail to break after a metadata change, and thus does not need validation. check_and_update_table_ref_version() pseudocode: { if (prepared statement prepare) table_list->set_table_ref_version(table_share); else if (prepared statement execute) { if (! table_list->is_table_ref_version_equal(table_share) && is_dml(thd->lex->sql_command)) ERROR; } } Pseudocode of TABLE_LIST::set_table_ref_version() and TABLE_LIST::is_table_ref_version_equal() will be provided after the description of the versioning mechanims that follows below. Choice of the versioning mechanism ---------------------------------- The current choice of the versioning mechanism is a large compromise. The version consists of two parts: metadata type and version id. Allowed metadata types are: enum enum_table_ref_type { /** Initial value set by the parser */ TABLE_REF_NULL= 0, TABLE_REF_VIEW, TABLE_REF_BASE_TABLE, TABLE_REF_I_S_TABLE, TABLE_REF_TMP_TABLE }; TABLE_LIST class has a member of this enum. At prepared statement prepare, this member is assigned a value as of the current state of the database. Before (re-)execution of a prepared statement, we check that the value recorded at prepare matches the type of the object we obtained from the table definition cache (in check_and_update_table_ref_version()). Version id is a numeric value assigned according to the following rules: * for base tables, it is table_map_id. It is assigned from a global counter incremented for each new table loaded into the table definition cache (TDC). * for temporary tables it's table_map_id again. But for temporary tables table_map_id is assigned from thd->query_id. The latter is assigned from a thread local counter incremented for every new SQL statement. Since temporary tables are thread-local, each temporary table gets a unique id. * for everything else (views, information schema tables), the version id is zero. This choice of version id is a large compromise to have a working prepared statement validation in 5.1. In future version ids will be persistent, as described in WL#4180 "Persistent object versions". Let's try to explain why and how this limited solution allows to validate prepared statements. Firstly, sets (in mathematical sense) of version numbers never intersect for different metadata types. Therefore, version id of a temporary table is never compared with a version id of a view or a temporary table, and vice versa. Secondly, for base tables, we know that each DDL flushes the respective share from the TDC. This ensures that whenever a table is altered or dropped and recreated, it gets a new version id. The list of SQL statements that remove its subject table from the table definition cache includes all DDL statements (ALTER, RENAME, DROP, CREATE, TRUNCATE, FLUSH), OPTIMISE, ANALYZE and REPAIR. DML statements, such as INSERT, UPDATE, DELETE, SELECT, REPLACE, etc, do not expel the table definition from the cache. Unfortunately, since elements of the TDC are also flushed on LRU basis, this choice of version ids leads to false positives. E.g. when the TDC size is too small, we may have a SELECT * FROM INFORMATION_SCHEMA.TABLES flush all its elements, which in turn will lead to a validation error and a subsequent reprepare of all prepared statements. This is considered acceptable, since as long as prepared statements are automatically reprepared, spurious invalidation is only a performance hit. Besides, no better simple solution exists. For temporary tables, using thd->query_id ensures that if a temporary table was altered or recreated, a new version id is assigned. This suits validation needs very well and will perhaps never change. Metadata of information schema tables never changes. Thus we can safely assume 0 for a good enough version id. Views are a special and tricky case. A view is always inlined into the parse tree of a prepared statement at prepare. This happens regardless of whether the view uses a temporary table or a nested join to execute. Thus, when we execute a prepared statement, the parse tree will not get modified even if the view is replaced with another view. Therefore, we can safely choose 0 for version id of views and effectively never invalidate a prepared statement when a view definition is altered. Note, that this leads to wrong binary log in statement-based replication, since we log prepared statement execution in form of Query_log_events containing conventional statements. The second effect of this approach is that the prepared statement may return results that are no longer correct after a view DDL (BUG#36002). But since there is no metadata locking for views, the very same problem exists for conventional statements alone, and is reported in BUG#25144. The only difference between prepared and conventional execution is, effectively, that for prepared statements the temporal window in which a DDL may happen is much wider. In 6.0 we plan to support view metadata locking (WL#3726) and extend table definition cache to cache views (WL#4298). When this is done, views will be handled in the same fashion as the base tables. Finally, by taking into account metadata type, we can ensure that if a view is replaced with a base table, or a base table is replaced with a temporary table, or vice versa, new and old metadata versions differ. Validation of used stored routines and triggers ----------------------------------------------- A prepared statement may refer (use, invoke) stored functions, procedures or triggers. Validation of used stored routines and triggers is not implemented. A stored routine or trigger does not affect the parsed tree of a prepared statement except in only one way: all tables used in stored routines and triggers are merged into the pre-locking list of the prepared statement at prepare. Thus, if these tables change, the general validation mechanism described above will trigger a validation error. If the actual list of tables changes, in worst case we may prelock extra tables or not lock some used tables, and thus get a "table was not locked" error. Note, that there is a separate problem of validation of stored routine/trigger statements. Currently, when opening tables inside a substatement (a stored function or trigger statement), no metadata validation is performed. This is done to suppress errors when a substatement uses temporary tables. If a temporary table does not exist at start of the main statement, it's not prelocked and thus is not validated with other prelocked tables. Later on, when the temporary table is opened, metadata versions mismatch, expectedly. In order to suppress errors, part of validation functionality, namely, reporting an error in case of version mismatch, is captured behind an interface -- class Reprepare_observer. An instance of the class can be accessed by a THD member, m_reprepare_observer. Inside stored routines and triggers the observer is not installed, and thus a version mismatch never yields an error. The proper solution for the problem is to re-validate tables of substatements, WL#4179 "Stored programs: validation of stored program statements", but it's not implemented yet. Having provided the constraints of the implementation, let's formulate pseudocode for TABLE_LIST::is_table_ref_version_equal() and TABLE_LIST::get_table_ref_version(): enum enum_table_ref_type TABLE_SHARE::get_table_ref_type() { if (is_view) return TABLE_REF_VIEW; switch (tmp_table) { case NO_TMP_TABLE: return TABLE_REF_BASE_TABLE; case SYSTEM_TMP_TABLE: return TABLE_REF_I_S_TABLE; default: return TABLE_REF_TMP_TABLE; } } ulong TABLE_SHARE::get_table_ref_version() { return tmp_table == SYSTEM_TMP_TABLE || is_view ? 0 : table_map_id; } bool TABLE_LIST::is_table_ref_version_equal(TABLE_SHARE *s) { return (m_table_ref_type == s->get_table_ref_type() && m_table_ref_version == s->get_table_ref_version()); } void TABLE_LIST::set_table_ref_version(TABLE_SHARE *s) { m_table_ref_type= s->get_table_ref_type(); m_table_ref_version= s->get_table_ref_version(); } If validation of a prepared statement fails according to the rules described above, ER_NEED_REPREPARE error is reported and execution of the prepared statement is aborted. Later on the prepared statement framework may choose to re-prepare and re-execute such failed prepared statement, as described in WL#4166. Limitations of the versioning mechanism --------------------------------------- Use of table definition cache identifiers for metadata versions introduces a limitation: the table definition cache size must always exceed the number of tables used in a statement. Otherwise some of cache elements are flushed between prepare and execute, and thus for some tables the version is never the same. By default, the server can use no more than 64 tables per statement, while the minimal table definition cache size has been 64 and increased to 256 in this patch. However, with MERGE tables, the server can use up to tens of thousands of the table definition cache elements, while reading from just one merge parent. This case has no provision for in the patch, except one: increase of the size of the cache. This limitation will be addressed by WL#4180 "Persistent object versions". Other limitations are reported as bugs: 1) BUG#32868 Stored routines do not detect changes in meta-data. BUG#33082 Stored Procedure: crash if table replaced with a view in a loop BUG#33083 Stored Function: error if a temporary table is dropped in a loop BUG#33000 Triggers do not detect changes in meta-data. BUG#27011 Stored Procedures: bad data if view used inside a SP is dropped or modified BUG#33289 Stored Procedure: bad data if view is replaced within a iteration These limitations will be addressed by WL#4179 Stored programs: validation of stored program statements 2) BUG#36002 Prepared statements: if a view used in a statement is replaced, bad data. The solution for this bug is to implement table definition cache support for views, WL#4298 "Metadata cache for all objects".
        Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.