WL#7158: Move privilege system tables from MyISAM to transactional storage
Affects: Server-8.0 — Status: Complete
Up until MySQL 5.6 privilege system tables are stored in MyISAM. With the new data dictionary stored in transactional tables, we need to move these tables. The intention is to do a minimal effort for this move, so refactoring can happen afterwards.
Func-Req 1 Modify InnoDB to accept the privilege tables. The privilege tables are columns_priv, db, procs_priv, proxies_priv, tables_priv, user. Func-Req 2 Modify install/bootstrap to create the privilege tables in InnoDB, not MyISAM. Func-Req 3 Modify mysql_upgrade to move existing privilege tables in MyISAM to InnoDB. Func-Req 4 Tweak the tests where needed, preserving intended behavior wherever possible. Func-Req 5 Check that the privilege tables are stored in InnoDB while loading ACL data. Warn if some privilege table is not in InnoDB. We still should be able to start in normal mode (without requiring --skip-grant-tables) in this case for upgrade process to work. Func-Req 6 Deny ACL statements if storage engine of privilege tables is not InnoDB. Func-Req 7 Allow CREATE TABLE to create privilege tables in MyISAM. This is necessary to be able to upgrade installations via mysqldump. Allow ALTER TABLE to change the storage engine of privilege tables to MyISAM to enable downgrade. Func-Req 8 The ACL statement execution must be resistant to parallel SQL transactions on the privilege tables. Func-Req 9 The multi-user-account ACL statements must preserve the current well-adopted (but undocumented) behavior, which is skip the failing user account and proceed with the following user accounts. Func-Req 10 The ACL statements must be resistant to potential InnoDB errors which might happen due to some hardware/environment problem. An error might occur in any InnoDB handler-call. Func-Req 11 The ACL statements must use SERIALIZABLE isolation level. Func-Req 12 The ACL statements are DDL statements (implicit commit before and after statement execution).
Transactional and non-transactional SE ====================================== This WL is about moving MyISAM system tables into InnoDB. This WL however tends to use "non-transactional" instead of MyISAM and "transactional" instead of InnoDB. The reasons for that are the following: - not all system tables are MyISAM. There are (or might be) CSV tables for instance. This WL is about only privileges tables. This is a step forward to move all Data Dictionary tables to InnoDB; - at the moment, we are talking about the Data Dictionary in InnoDB only, but in the future, we might want to the DD tables in other SEs. Thus, it's better to talk in general terms from the start and highlight the current dependencies with / requirements from InnoDB. Essential changes ================= In the scope of this WL the following system tables will be migrated to the InnoDB storage engine: - columns_priv - db - procs_priv - proxies_priv - tables_priv - user These tables will be further referenced as "privilege tables" or "ACL tables". The privilege tables are accessed read-write. ACL statements ============== There are the following SQL statements to operate on the ACL tables: - CREATE USER - DROP USER - ALTER USER - RENAME USER - GRANT - REVOKE - REVOKE ALL - SET PASSWORD - FLUSH PRIVILEGES Even though that is not mentioned in the Manual, these statements are DDL statements, which means they force implicit commit before and after statement execution. This was not so important when the ACL tables were MyISAM (no transaction support anyway), but once ACL tables are stored in InnoDB, that must be noted in the documentation. Especially, that is relevant for the 'SET PASSWORD' statement, which looks similar to the regular SET statement for changing system variables. 'SET PASSWORD' however has nothing to do with that regular 'SET' statement and is a DDL statement. Differences between MyISAM and InnoDB ===================================== There are the following main differences between MyISAM and InnoDB which are relevant for this WL: - MyISAM locks the whole table for write (MyISAM has table-level locks). InnoDB locks specific rows for write, allowing concurrent access to other rows (InnoDB uses row-level locks). There are the following practical outcomes : 1. Error handling The ACL code should get ready to handle the following errors from the storage engine: - lock wait timeout -- another session might lock the row being updated by the ACL statement using regular SQL statement (SELECT ... FOR UPDATE); - duplicate key -- another session might insert a row being added by the ACL statement using regular SQL statement (INSERT INTO ...) - deadlock See more in the "Handling errors" section below. 2. Visibility problem Once a MyISAM table is locked, concurrent access is blocked, so the reader will always read the latest data. On the contrary, InnoDB employs MVCC, which means concurrent transactions might read different versions of the same row. In other words, FLUSH PRIVILEGES might read an outdated data. This problem is solved as follows: - We emulate MyISAM table-level locks by using exclusive MDL locks on privilege tables. That ensures that once privilege tables are opened and locked, there are no other concurrent reader/writers. See "Preventing lock-wait-timeout and deadlock (table-level locks)" below for more information. - The ACL statements use the SERIALIZABLE isolation level. See "InnoDB isolation level" below for more information. 3. Interrelation between InnoDB locks and server (ACL) locks. The problem of the existing code design is that it does I/O operations (InnoDB calls) while holding the server locks (mutex / rw-lock). Under some circumstances, that might lead to a deadlock between the server and InnoDB layers. This problem is also solved by emulating MyISAM table-level locks. See "Preventing lock-wait-timeout and deadlock (table-level locks)" below for more information. - Commit can not fail for MyISAM, but can fail for InnoDB. This is important because the ACL code employs the following pattern: 1. Acquire LOCK_grant; 2. Write data to the SE tables; 3. Update in-memory ACL-cache; 4. Release LOCK_grant; 5. Commit; 6. If commit on the previous step failed, do acl_reload(). So, the commit on step (5) happens outside of the critical section (without holding the LOCK_grant lock). If the commit fails, that means, the rows in SE tables were not updated, but the in-memory ACL cache is updated and unlocked. That in turn means, there is a gap between (4) and (6) when the ACL cache and the data in SE tables are not consistent: the ACL in-memory cache is newer than the persistent data. If the new information is retrieved from the cache within that gap, it might break replication: - let's say a new user was added (or a new privilege was granted). That user is in the ACL in-memory cache, but not in the persistent InnoDB tables. - a query comes under the authorization of that user; - the query is processed and written to the binlog; - acl_reload() happens, the new user disappears; - the slave will be unable to execute the query as the slave doesn't know anything about that new user. In order to fix that problem, the steps (5) and (6) (commit and acl_reload()) could be done while holding the LOCK_grant, but that's practically impossible as it requires much refactoring of the code because it's generally assumed that no commit / open_tables operation is done while holding the LOCK_grant. In other words, moving the commit/acl_reload() within the critical section changes the order in which locks are acquired. Thus, it's a risky change. NOTE: we need to do the full ACL cache reload (call acl_reload()) because the cache is not transactional, i.e. we can not rollback the changes done to it if we failed to commit them on disk. The only way to make it consistent with the data on disk is to throw the cache away and reload it from SE. Handling errors =============== Overview -------- The thing is that most ACL statements accept multiple user accounts (components) within one statement. For example: CREATE USER user1@host1, user2@host2, user3@host3; DROP USER user1@host1, user2@host2, user3@host3; The list of such statements: - CREATE USER - DROP USER - ALTER USER - RENAME USER - GRANT - REVOKE The question is what to do when we failed to process some component in the middle of the statement. For instance, what should be done if we're handling the DROP USER statement above and user2@host2 does not exist? The current behavior (which is undocumented, but nevertheless widely adopted) is to ignore the failed component and continue the statement execution. In the end, the statement will report a list of failed components. Example: > CREATE USER u1@h1, root@localhost, u2@h2, firstname.lastname@example.org; ERROR 1396 (HY000): Operation CREATE USER failed for 'root'@'localhost','root'@'127.0.0.1' > DROP USER u1@h1, u2@h2, u3@h3, u4@h4, u5@h5; ERROR 1396 (HY000): Operation DROP USER failed for 'u3'@'h3','u4'@'h4','u5'@'h5' The decision is to preserve the current behavior as much as possible. This must be reflected in the Manual. NOTE: - SET PASSWORD takes only one argument. - The GRANT behavior is controlled by the NO_AUTO_CREATE_USER, which prevents the GRANT statement from automatically creating new users if it would otherwise do so. If NO_AUTO_CREATE_USER is set, the GRANT behavior is similar to the behavior of other ACL statements: skip the failed user account, continue the statement execution. If NO_AUTO_CREATE_USER is unset, the GRANT can still fail due to invalid or non-existing privilege specification (e.g. specified column doesn't exist). In that case the whole GRANT statement is failed and no change is applied. Logic & Physical errors ----------------------- The errors in general can be divided into 2 large groups: * Logic errors A logic error is an error made by DBA, where she specified wrong or inappropriate input for the ACL statement. For example, a non-existing user for the DROP USER statement, or existing user for the CREATE USER statement, etc. Logic errors must be expected and can be frequent, which means it's inappropriate to do the ACL cache reload when we encounter a logic error. The current behavior must be preserved for logic errors, which is: - ignore the error; - continue the statement processing; - report errors at the end of of statement execution; - avoid acl_reload(). In order to preserve the current behavior in case of logic errors, the following actions are done: - failed ACL statement is written into the binary log along with the error code; - the ACL transaction is committed to persist valid changes. * Physical errors A so called physical error happens in InnoDB and is raised to the server layer. Those errors are results of either a user activity or internal InnoDB error. When a physical error has occurred, we rollback the whole statement and do acl_reload() to reload the ACL cache. More precisely, there are two major sources of the physical errors in general: 1. Concurrent access using SQL statements. A user/DBA can concurrently change or lock ACL tables using regular SQL statements like INSERT, UDPATE, DELETE, SELECT. Changing of ACL tables by modifying the raw data in the ACL tables can be seen as another interface to the ACL system. The ACL tables are documented: http://dev.mysql.com/doc/refman/5.6/en/grant-table-structure.html The user activity can result in the following errors: - Lock wait timeout A user locked a specific row in one session by 'SELECT ... FOR UPDATE' and is executing ACL-statement in another session. See more in the sub-section more. - Deadlock An ACL statement locks ACL records / tables in a particular order. It is theoretically possible to have a user transaction (with regular SQL statements) which locks the same records / tables in the opposite order, resulting in a deadlock. See more in the sub-section more. - Duplicate key error The 'duplicate key' error could also happen theoretically. Practically, however, it's not possible to get this error. See more in the sub-section below. NOTE: The lock wait timeout and deadlock errors might result in rollback of the entire InnoDB transaction: - there is an system variable (innodb_rollback_on_timeout) that specifies if the entire transaction should be rolled back on timeout; - in case of deadlock, InnoDB chooses the transaction to rollback according to its internal rules, which are not controllable to the server. In short, these errors (lock-wait-timeout, deadlock & duplicate-key error) will be handled by preventing them from happening. 2. Hardware & environmental failures. Internal InnoDB error or hardware failure, or errors like "out of space". We don't distinguish between errors in this category. This type of errors is expected to be rare. We do acl_reload() as the last resort when we encounter these errors. The user will get a new error code/message saying that the entire statement failed due to InnoDB error. Deadlocks --------- Even though ACL statement is a DDL statement (it commits before and after execution), one ACL statement can access several ACL records / tables. It locks them in some order. It is theoretically possible to have a user transaction (with regular SQL statements) which locks the same tables in the opposite order, resulting in a deadlock. It's a matter of race condition to expose such a deadlock. The thing is that ACL statements are executed under the ACL lock acquired. Recent refactorings changed the ACL logic so that every SQL statement processing starts with acquiring the ACL lock. Thus, the following sequence of operations might happen in order to get a deadlock: 1. Let's consider some ACL statement, which accesses two ACL tables (t1 and t2) in this particular order. 2. connection A: lock t2: SELECT * FROM t2 FOR UPDATE; 3. connection A: start an SQL statement, which will lock t1: SELECT * FROM t1 FOR UPDATE; The execution of this statement must go through the privilege check (the ACL lock is acquired, privileges are checked, the ACL lock is released) and immediately pause after that. 4. connection B: start executing the ACL statement. It will acquire the ACL lock, lock t1 and hang on attempt to lock t2 (t2 is already locked by connection A). 5. connection A: continue the SQL statement so that it will try to lock t1 (now locked by connection B). 6. This is a deadlock. Depending on weights, transaction A or B will be rolled back. Preventing lock-wait-timeout and deadlock (table-level locks) ------------------------------------------------------------- The current design of the ACL code doesn't allow to cope with the lock-wait-timeouts and deadlocks. There are two main design deficiencies: 1. I/O operations (InnoDB/SE operations) are performed under acquired LOCK_grant RW-lock. That means that once InnoDB starts any kind of wait, the whole server is put to a hold -- any other connection request or SQL statement execution need to acquire LOCK_grant in order to authorize the request. 2. I/O operations (InnoDB/SE operations) are interfered with the ACL cache updates. That means that when some InnoDB operation failed, the ACL cache might have already been updated. An example of scenarios which make the server practically unresponsive is the following: - connection A: lock a row in mysql.user INSERT INTO mysql.user(user, host, ...) VALUES ('u2', 'h', ...); - connection B: issue an ACL statement that need the locked row CREATE USER u1@h, u2@h; CREATE USER successfully adds an InnoDB record for u1@h, inserts the corresponding entry to the ACL cache and then hangs on inserting an InnoDB record for u2@h. The wait is done with the LOCK_grant acquired. So, that it's impossible for any other connection to issue any SQL statement which requires authorization. Later, when the InnoDB operation fails with the lock-wait-timeout, there is no way to restore the ACL cache -- it's impossible to do the acl_reload() as it would also hang while reading from mysql.user (also with the acquired LOCK_grant). The only way to migrate to InnoDB without massive re-engineering to fix these design deficiencies is to simulate the MyISAM behavior with regard to the locking principles. That is to use table-level locks on the ACL tables. In fact, we don't need row-level locking scheme for the ACL tables anyway, so this is not a bad trade-off. The simplest way to switch to the table-level locking scheme is to use the MDL layer. The thing is that the ACL tables must be locked with the MDL_SHARED_NO_READ_WRITE lock. That will guarantee that once the ACL tables are opened and locked, the ACL statement has an exclusive access to them and no other SQL statement could block it. Since we rely on the MDL layer, it is hypothetically possible to get a conflict (lock-wait-timeout) if another connection used NDB API / other no-SQL API to access the ACL tables. This WL will not address that issue. We will add DBUG_ASSERT() after all handler calls, where it's hypothetically possible to get lock-wait-timeout or a deadlock error, making sure the calls never return these errors. Duplicate key error ------------------- Looking at the ACL code, it looks like it's possible to get the duplicate key error by executing a scenario like the following: - Issue ACL statement which inserts a new record(s) in one or more tables. For example: CREATE USER u1@h1; - The ACL statement is executed as follows: - The server tries to find the matching record by issuing ha_index_read_idx_map(); - If the call returned the key-no-found error, the server believes the record does not exist and proceeds with inserting a new record; - Thus, there is a gap between checking that the record doesn't exist and inserting a new record. If another connection was able to insert the matching record, the ha_write_row() in the first connection would return the duplicate-key error. By closer investigation, it turned out, that it's not possible for another connection to come in the middle between ha_index_read_idx_map() and ha_write_row(). There are two reasons for that: 1. SERIALIZABLE isolation level. The ACL statement is executed under the SERIALIZABLE isolation level (see the next section), which means the first read starts a new transaction and other sessions/transactions will wait till the ACL transaction finishes (or return lock-wait-timeout). The logic that ensures the SERIALIZABLE isolation level is a bit intricate. Two InnoDB functions controls that: - ha_innobase::external_lock() - ha_innobase::store_lock() 2. Table-level locks at the MDL level. In order to protect from the future InnoDB changes which might lead to the change of the isolation mode being used for ACL statements, the following actions will be taken in scope of this WL: - we will add a test using a debug sync point. That test will simulate the situation described above (another connection inserts a record in the middle of ha_index_read_idx_map() and ha_write_row()). - we will add DBUG_ASSERT() after all handler calls, which write/update records, making sure the calls never return the duplicate-key-error. Handling "unknown error" ------------------------ Currently the ACL code sometimes assumes that if the handler call failed, it's because only one reason (the record not found, for instance). Since the caller's logic ensures that the handler call must not happen, the code reports the "unknown error" to the user. This is not appropriate any longer as InnoDB call might fail for the number of reasons. The actual (relevant) error must be reported to the user. InnoDB isolation level ====================== It's expected that all ACL statements are executed under the SERIALIZABLE isolation level (irrespective of the current isolation level set by the user). The SERIALIZABLE level is required as otherwise it might lead to wrong order in binlog (especially when ACL statements are mixed with the DML statements on the ACL tables). Also, the SERIALIZABLE level is expected for all DDL statements in general. There is no change required to ensure the SERIALIZABLE level for the ACL statements as it's already done in InnoDB: the lock type is determined in the following 2 InnoDB operations: - ha_innobase::external_lock() - ha_innobase::store_lock() They take LOCK_NONE only for pre-defined set of SQL-commands, that doesn't include ACL statements. See also "Preventing lock-wait-timeout and deadlock (table-level locks)" as the table-level locks described there also helps with getting the serializable semantics. FLUSH PRIVILEGES ---------------- Due to the transactional nature of InnoDB, the following scenario could be hypothetically possible: 1. connection A: received ACL statement (e.g. DROP USER); 2. connection A: start executing the ACL statement. The ACL cache and data on disk were updated, but the transaction hasn't been committed yet. 3. connection B: received FLUSH PRIVILEGES 4. connection B: start executing FLUSH PRIVILEGES, read the stale data from InnoDB; 5. connection A: committed the transaction; 6. connection B: updated the ACL cache using stale data. This scenario is not practically possible because of a) table-level locks emulated by the MDL-locks b) the SERIALIZABLE isolation level. FLUSH PRIVILEGES will wait for connection A to complete its transaction before proceed with the read. A test case should be added to the test coverage. FLUSH PRIVILEGES (non)atomicity problem --------------------------------------- The execution of the FLUSH PRIVILEGES statement is not atomic. This is an old known problem, which has nothing to do with this WL. Replication aspect (SBR) ======================== Once ACL tables are migrated to InnoDB, the ACL statements will be written to the binlog as transactional statements (technically, they will be written to the transactional cache, which will be put into the binary log on commit). Verifying error handling ======================== As it's been already mentioned, the InnoDB handler calls can return errors in cases which were impossible for MyISAM (like lock-wait-timeout, deadlock and so on). The existing code is not ready for handling such errors (which were simply not possible with MyISAM). Thus, the question is how to identify all the places where such errors can occur and make sure they are handled properly. It's proposed to use two ways to resolve this: 1. Code review 2. Adding DBUG_EXECUTE_IF() macros after the handler calls and make them return "new" error. Then, MTR test can make sure, all places are covered and any InnoDB failure will not result in an unhandled error. Changes to InnoDB ================= In order to allow the listed system tables to be created with InnoDB storage engine the function pointer handler::is_supported_system_table will be initialized by the function innobase_is_supported_system_table() that returns true in case when it is called for one of the system tables listed above. FLUSH PRIVILEGES and InnoDB table export ---------------------------------------- InnoDB has a special handling of the FLUSH TABLES WITH READ LOCK: InnoDB does a so-called "table export". The problem is that all FLUSH statements use the same SQL-command identifier. It must be ensured that the FLUSH PRIVILEGES statement does not lead to table export. Changes to SQL-scripts ====================== The sql-scripts that are used to deploy new database will be modified in order to set the InnoDB as a default storage for the system tables listed above. Similar changes will be done in sql-scripts used to make server upgrade. Changes in behavior on fetching results from ACL tables ======================================================= MyISAM fetches rows from a table in the order that they were inserted. InnoDB fetches rows in the order of the primary key. Thus, there will be behavioral change when the rows are fetched in SE specific order. In particular, this is relevant to the mysql.user table, which is fetched by the server code with no specific order: the ACL in-memory cache is populated from mysql.user in the order of some internal "weights" calculated for every row/account. However, when the weights are equal, the cache entries are inserted in the row (SE-specific) order. That leads to the following change in behaviour: - Let's assume there were the following user accounts created: - u1@localhost - email@example.com - u1@::1 - They are all essentially the same, but they are different in the MySQL terms. The internal weights for them are equal, so they are the ACL entries for them are inserted in the row order, which is: - MyISAM (the insertion order): - u1@localhost - firstname.lastname@example.org - u1@::1 - InnoDB (the primary key order): - email@example.com - u1@localhost - u1@::1 - Thus, an authentication request for the user 'u1' from the localhost used to be resolved as 'u1@localhost' (1st entry matched), but with WL#7158 it will be 'firstname.lastname@example.org'. It's a good way to shoot yourself in a foot to have different set of privileges for 'u1@localhost' and 'email@example.com', but still, technically from the MySQL view point, these are completely different accounts. It was discussed with the Server General team whether we should preserve the original behaviour, and the decision was that the result is not worth the hassle. Changes in ordering of fetched records led to many mtr test failures. There were two alternatives to fix arisen issues: * The first one was to fix every of failed tests to take into account changes in record ordering. * The second one was to remove records for root@::1 and firstname.lastname@example.org from the script mtr_system_tables_data.sql that is used to create mandatory accounts before running tests. Doing so we could retained original behavior for tests. Of course such modification required changes for those tests that relied on presence of accounts email@example.com and root@::1 but there were not so many such tests. Actually we only had to worry about those test cases that call explicit connect() on behalf of either root@::1 or firstname.lastname@example.org and where the option skip-name-resolve is set, because only in this case an ip part of a user account name isn’t converted to localhost. There were’t so many tests where this condition were satisfied. So in result it was chosen the second approach as least expensive. Also it makes sense to add that the the following approaches were considered in order to preserve the original behaviour: - Adding an additional AUTO_INCREMENT column that can be used to reorder result set based on insertion order; - Adding an additional TIMESTAMP column. In this case it could lead to possible issues with presence of several rows that have the same TIMESTAMP value (since rows were inserted at almost the same time). Neither of these approach were selected as be worthy of notice. Prerequisites ============= WL#7158 depends on WL#7976 (removal of skip-innodb option) that has already been pushed. MyISAM and InnoDB interoperability ================================== In place upgrade ---------------- Current procedure for in place upgrade assumes that newer server should be able to start on old privilege tables in regular mode (without any special start-up parameter). After that mysql_upgrade is invoked to upgrade privilege table definitions. To support this procedure server will be able to load privilege information from ACL tables in MyISAM at start-up/FLUSH PRIVILEGES time. However we will report a warning to error log about privilege tables needing upgrade. Also we will prohibit any user-manipulation statements if privilege tables are not using InnoDB, thus urging users to upgrade ACL tables definitions. Upgrade via mysqldump --------------------- Current procedure for upgrade using mysqldump involves restoring dump from old server to a newer server started in regular mode and then running mysql_upgrade to update privilege tables definition if necessary. Since dump from old server will include definitions of ACL tables in MyISAM storage engine in the form of CREATE TABLE statements, this means that we need to allow creation of privilege tables in MyISAM on newer server. However we don't need to allow user-manipulation statements until tables are upgraded to InnoDB, basic authentication/authorization functionality should be sufficient. Support for server downgrade ---------------------------- At the moment we have mysql_system_tables_fix_for_downgrade.sql script which is supposed to be executed against newer server to prepare instance for downgrade. Particularly this script needs to change privilege tables definitions to MyISAM. This means that we should allow ALTERing ACL tables to MyISAM on newer server in regular mode. However again we are not supposed to be able to run any ACL statements after that.
Copyright (c) 2000, 2022, Oracle Corporation and/or its affiliates. All rights reserved.