WL#14084: Lock improvements for ACL DDLs
At present ACL DDLs and FLUSH operation use very strong locks for ACL tables. These locks make sure that no other connection can interfere with DDL/Flush operation. However, depending on isolation level and autocommit status, these operations may be blocked by other connections holding lower but incompatible locks on these tables.
The later is more problematic because it allows a user with only SELECT privilege on mysql database OR ACL tables to completely block all user management DDLs. Such a SELECT privilege is necessary for taking backup(logical or otherwise).
This worklog will modify current locking mechanism to improve user management DDLs and FLUSH operation in a situation where other connections are holding locks on ACL tables.
- ACL tables: Tables that store authentication and authorization information of various authorization IDs. List of tables:
- ACL DDLs: DDL operations to add/update/remove authentication and/or authorization information for various authorization IDs. List of operations:
- CREATE USER
- ALTER USER
- RENAME USER
- DROP USER
- CREATE ROLE
- DROP ROLE
- SET PASSWORD
- SET DEFAULT ROLE
- ACL reload operations: Actions/Operations to read data from ACL tables in order to update in-memory structure. List of operations:
- FLUSH PRIVILEGES
- Blocking read operation: An operation which obtains explicit read lock on a table in order to read from it. Example: FLUSH TABLES WITH READ LOCK.
- Non-blocking read operation: A read operation on a table without explicitly locking the table. E.g. SELECT ... SHARE and INSERT INTO ... SELECT
- Write operation: An operation that modifies data and or metadata of a table. E.g. INSERT/UPDATE/DELETE operations on table data. CREATE/ALTER/DROP operations on table.
FR1: ACL DDLs should block concurrent ACL reload operations and vice versa.
FR2: An ACL DDL should block another ACL DDL running concurrently.
FR3: A non-blocking read operation on ACL tables should not block a concurrent ACL DDL.
FR4: A non-blocking read operation on ACL tables should not block concurrent ACL reload operations.
FR5: A blocking read operation at global level or involving ACL tables should block concurrent ACL DDLs and vice versa.
FR6: A block read operation at global level or involving ACL tables should not block concurrent ACL reload operations and vice versa.
FR7: A write operation involving ACL tables should block concurrent ACL DDLs and vice versa.
FR8: A write operation involving ACL tables should block concurrent ACL reload operations and vice versa.
NFR1: There should not be any impact on replication.
Definition: Please refer to Functional and Non-functional Requirement section.
HLS 1: ACL Operations that do not modify data within ACL tables should obtain MDL_SHARED_READ_ONLY lock on ACL tables. Complete list of operations is in HLS 1.2.
HLS 1.1: We continue to rely on open_and_lock_tables() for such operations as it will internally acquire other locks if needed.
HLS 1.2: Following operation falls into this category.
- Initialization of ACL caches during bootstrap
- ACL reload operations
- ACL table integrity checks related to storage engine and table definition.
HLS 2:ACL operations that modify data within ACL tables (See complete list in HLS 2.2) should obtain following locks:
- MDL_INTENTION_EXCLUSIVE on MDL_key::GLOBAL and MDL_key::BACKUP_LOCK: To protect against blocking read operations.
- For each ACL table (i.e. on MDL_key::TABLE level):
- MDL_SHARED_READ_ONLY: To protect against concurrent ACL DDL because MDL_SHARED_WRITE conflicts with MDL_SHARED_READ_ONLY. This MDL also protects against concurrent operation modifying ACL tables.
- MDL_SHARED_WRITE: To protect against concurrent ACL reload operation because MDL_SHARED_READ_ONLY conflicts with MDL_SHARED_WRITE. Any operation that modifies table require MDL_SHARED_WRITE lock - Unless Exclusive MDL is already obtained for the table. MDL_SHARED_READ_ONLY/MDL_SHARED_NO_WRITE alone are not sufficient for such modifications. Thus, MDL_SHARED_WRITE also protects against such operations.
HLS 2.1: While open_and_lock_tables() take care of MDL_INTENTION_EXCLUSIVE locks, there is no way for caller to specify multiple MDLs at MDL_key::TABLE level. Thus, we will follow following approach
- All required MDL locks are taken by calling MDL_context::acquire_locks(). This is an atomic operation and guarantees cleanup in case of error.
- ACL tables are then opened with appropriate flags to signal that required MDLs have been acquired.
- As a part of ACL DDL executions, transient changes made in ACL tables and in-memory caches may need to be rolled back because of certain error conditions. We rely on storage engine to rollback changes in ACL tables. In-memory changes are rolled back by essentially throwing current copy of in-memory caches and populating them again from ACL tables. Reload caches require closing and reopening ACL tables so that rollback takes effect. MDLs obtained at the beginning of ACL DDL operations will be retained to block other connections from locking these tables.
- All MDL locks are released at the end of the ACL DDL.
HLS 2.2: Following operations fall into this category:
- CREATE/ALTER/RENAME/DROP USER
- CREATE/DROP ROLE
- SET PASSWORD
- SET DEFAULT ROLE
HLS 2.3: Due to change in MDLs taken for ACL DDLs operation, it is now possible that two or more competing ACL DDLs may try to acquire same set of MDLs at the same time. Such threads will be able to acquire MDLs of type MDL_INTENTION_EXCLUSIVE and MDL_SHARED_READ_ONLY. However, there will be a conflict when they try to acquire MDL_SHARED_WRITE because it conflicts with other thread's MDL_SHARED_READ_ONLY. In this situation MDL's deadlock detection algorithm will kick in and identify a thread that can proceed. Other threads will be marked as VICTIMs and will get ER_DEADLOCK_ERROR. This is a new behavior because prior to this worklog, ACL DDLs used to acquire only MDL_SHARED_NO_READ_WRITE lock and in case of multiple threads trying to acquire same lock, all other threads would simply wait for their turn. To resolve this issue, following will be done:
- A custom error handler will be defined to suppress ER_DEADLOCK_ERROR and applied to the thread before acquiring MDLs. Once MDLs are acquired, error handler will be removed from the thread.
- If MDL_context::acquire_locks() fail due to ER_DEADLOCK_ERROR, the error will be suppressed and thread will try again to take required MDLs.
This will effectively simulate pre-worklog behavior for ACL DDLs.
- After this WL is implemented ACL DDL will allow concurrent read operations which still can take InnoDB-level row locks (e.g. SELECT ... FOR SHARE or INSERT ... SELECT * FROM acl_table in some of binlogging modes). This means that situations in which InnoDB deadlocks between such operations and ACL DDL is possible in theory. ACL DDL code is probably not ready to correctly handle such deadlocks yet. This issue will be addressed by follow-up WL#14087 which will ensure that we don't acquire InnoDB row-locks in these cases