WL#6671: Improve scalability by not using thr_lock.c locks for InnoDB tables
Affects: Server-Prototype Only — Status: Complete — Priority: Medium
Improve scalability by not using thr_lock.c locks for InnoDB tables (this task is mostly about avoiding calls to thr_lock.c code/locking THR_LOCK::mutex with minimal changes to SE API and locking code on SQL-layer). User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html http://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html (mentions new locking behavior re: LOCK TABLES ... READ)
Non-functional requirements: NF1) This patch should show performance/scalability improvement in 1-table Sysbench OLTP_RO/POINT_SELECT tests for InnoDB on multi-core machines. NF2) There should be no significant regressions in other benchmarks for InnoDB tables.
User visible changes caused by this task ======================================== THR_LOCK will be no longer used for InnoDB tables ------------------------------------------------- Besides the performance improvements aspect this means that user will never see waits for THR_LOCK locks for InnoDB tables in P_S schema or I_S.PROCESSLIST output. Instead in cases when user previously has seen wait on THR_LOCK lock for InnoDB table it will see wait on MDL. Particularly this means that for DML like INSERT INTO t1 VALUES(1) which is blocked due to LOCK TABLES t1 READ active in another connection we will see in I_S.PROCESSLIST.STATE column "Waiting for table metadata lock" state instead of "Waiting for table level lock". There won't be THR_LOCK acquisitions/waits registered in this case in P_S.TABLE_HANDLES table and for "wait/lock/table/sql/handler" event (calls to handler::external_lock() will still be registerd). Instead waits for/acquisitions of metadata lock will be visible in P_S.METADATA_LOCKS table and for "wait/lock/metadata/sql/mdl" event. LOCK TABLES READ and transactions --------------------------------- LOCK TABLES READ will block and will be blocked by concurrent transactions changing the table, for all storage engines, similarly to how LOCK TABLES WRITE work now. Particularly this means that: - Explicitly or implicitly started transactions updating non-transactional (e.g. MyISAM) table will start to block and get blocked by LOCK TABLES READ on this table (unlike before). - LOCK TABLES READ on InnoDB table in @@autocommit=1 mode or with @@innodb_table_locks=0 will start block and blocked by concurrent transactions changing this table. This means that @@innodb_table_locks will only have in a few scenarios involving cascading foreign keys and updates through no-SQL API. - Storage engines which prefer to ignore LOCK TABLES READ by downgrading THR_LOCK lock acquired for it won't be able to do so. Tables which are implictly locked by LOCK TABLES ------------------------------------------------ Tables which are implicitly used by LOCK TABLES (e.g. through view or trigger) will be locked using metadata locks in addition to (all SEs except InnoDB) or instead of (InnoDB) THR_LOCK locks. These means that changes described in the above two points will also apply to such tables. Multi-update and concurrent LOCK TABLES READ -------------------------------------------- Multi-update will be blocking/start to be blocked by concurrent LOCK TABLES READ on any table from its join, even though such table will be only used for reading and won't be updated. Such blocking will be visible as a wait on metadata lock. HANDLER READ and concurrent LOCK TABLES WRITE --------------------------------------------- HANDLER READ statement (for table using any SE) will wait for concurrent LOCK TABLES WRITE to go away on metadata lock and not THR_LOCK lock. Vice versa LOCK TABLES WRITE will wait for HANDLER READ to complete on metadata lock instead of THR_LOCK lock. This will be visible as a different STATE for waiter in I_S.PROCESSLIST and in statistics gathered by P_S. ALTER TABLE IMPORT/DISCARD TABLESPACE under LOCK TABLES ------------------------------------------------------- ALTER TABLE DISCARD/IMPORT under LOCK TABLES will no longer allow concurrent access to table metadata. This means that concurrent SHOW CREATE TABLE on the table will be blocked during such statement. Also information about table won't be accessible in I_S during this time. Bugs fixed or alleviated ------------------------ Also this WL is likely to completely fix or at least alleviate the issues described in the following bug reports: Bug #11751331 "CONCURRENT DML AND LOCK TABLE ... READ FOR INNODB TABLE CAUSE WARNINGS IN" Bug #11764618 "DEADLOCK WHEN DDL UNDER LOCK TABLES WRITE, READ + PREPARE" Bug #18913551 "LOCK TABLES USES INCORRECT LOCK FOR IMPLICITLY USED TABLES"
Background: why it is possible to get rid of thr_lock.c locks for InnoDB ======================================================================== Currently InnoDB: *) Downgrades all types of write locks to TL_WRITE_ALLOW_WRITE lock unless: -) This is ALTER TABLE IMPORT/DISCARD TABLESPACE -) This is TRUNCATE TABLE -) This is OPTIMIZE TABLE -) This is CREATE TABLE -) This is LOCK TABLES statement *) Strong, TL_READ_NO_INSERT, read locks are downgraded to TL_READ locks unless this is LOCK TABLES. In case of LOCK TABLES TL_READ_NO_INSERT locks are not downgraded and TL_READ locks are upgraded to TL_READ_NO_INSERT locks. This means that in most cases thr_lock.c locks for InnoDB tables are compatible with each other and don't play any significant role. Let us take a more detailed look at the above list of exceptions: 1) ALTER TABLE IMPORT/DISCARD TABLESPACE. Since these operations acquire exclusive (**) MDL lock on the table being discarded or imported all concurrent DML and DDL would be blocked for them anyway, so thr_lock.c lock is not really required in this case. (**) The fact that under LOCK TABLES we acquire only SNRW lock and never upgrade it to X lock can be treated as a bug according to Marko/Sunny and should be fixed (and if it is not a bug - the situation boils down to the same as for LOCK TABLES WRITE). 2) TRUNCATE TABLE We always acquire exclusive MDL lock on the table being truncated, so all concurrent DDL and DML will be blocked during this operation, without any participation of thr_lock.c locks. 3) OPTIMIZE TABLE Have to be split in two parts: a) Normal OPTIMIZE TABLE for InnoDB is implemented as ALTER TABLE FORCE thus it doesn't acquire write lock and relies on MDL/row-locking to isolate from concurrent DML and DDL. b) OPTIMIZE for FTS indexes is implemented in ha_innobase::optimize(), which is called under protection of MDL SNRW lock, which means that it is isolated from all DDL and DML from which thr_lock.c can isolate except open HANDLER cursors. But this situation is similar to one for LOCK TABLES WRITE and will be discussed later. 4) CREATE TABLE There are two cases when we acquire write locks in CREATE TABLE: a) Write lock is acquired on target table in CREATE TABLE SELECT. But at the point when it is acquired we will have exclusive MDL lock on it, so all concurrent DDL/DML on this table is going to be blocked anyway and strong thr_lock.c lock is not required. b) Write locks are acquired on source tables in CREATE TABLE SELECT if FOR UPDATE clause is used. AFAIU the fact that we acquire strong write lock TL_WRITE in this case is artefact from a) and is not really required (LOCK_X row-locks should be sufficient). 5) LOCK TABLES Again we have several cases for this statement: a) Tables explicitly locked for write (i.e. explicitly present in statement table list with with WRITE clause). We acquire strong SNRW metadata lock on such tables which blocks concurrent DDL and DML access to this table leaving only I_S, access from prepare phase of PS and open HANDLER cursors. Out of these three cases only the last one uses thr_lock.c locks. So the only scenario in which thr_lock.c plays some role in this case is preventing concurrent reads from HANDLER statements during LOCK TABLES WRITE duration. This problem can be solved by ensuring that HANDLER READ statement upgrade MDL from current S lock to SR lock for the duration of read (which will also remove nasty exception from assert ensuring that all operations which read or modify data acquire SR or SW or stronger locks on tables). Note that we also acquire table-level InnoDB locks in this case but they I) don't block non-locking reads which used by HANDLER cursor and II) are not acquired in case of @@autocommit=1 mode. b) Tables explicitly locked for read (i.e. explicitly present in statement table list with READ clause). For these tables we only acquire weak SR metadata lock and InnoDB table-level LOCK_S lock if @@autocommit=0. This means that strong read thr_lock.c lock which is acquired in this case plays significant role in this case. Possible options to replace their usage are: α) Change code to acquire table-level LOCK_S InnoDB locks and turn off @@autocommit for the duration of LOCK TABLES. This sounds like a rather radical change in behavior to me and has some other drawbacks (e.g. according to InnoDB documentation this significantly increases chances of deadlocks, introduces new intersubsystem deadlocks if HANDLER statements are involved, et cetera). β) Introduce new type of MDL lock - let us call it SHARED_READ_ONLY (SRO) type of lock which will block all concurrent updates to the table but will be compatible with itself and use it for LOCK TABLES READ. I.e. basically replace thr_lock.c lock with a similar MDL lock. There are some complications on this path (e.g. need to sort out problems with priorities if we are to preserve full compatibility) but overall it is doable. c) Tables implicitly locked for read and write by LOCK TABLES (i.e. underlying tables of views locked, tables used in triggers on tables locked, stored programs called from these views and triggers). We acquire weak SR and SW metadata locks on these tables. We also LOCK_X and LOCK_S table-level InnoDB locks but only in @@autocommit=0 mode. This means that strong thr_lock.c locks acquired in these cases play significant role in this case as well. Possible options to replace their usage are: α) Do not replace thr_lock.c with anything. This means that access to implicitly locked tables from under LOCK TABLES would block or even might deadlock in some cases. Which seems to be too radical change in behavior (e.g. think of SELECT from explicitly locked view being blocked due to problems with underlying table). β) Change code to acquire LOCK_S and LOCK_X table-level InnoDB locks and turn off @@autocommit mode until UNLOCK TABLES happens. Similarly to solution b.α) this is probably too big behavior change which is not free from other issues (more probable deadlocks, new deadlocks when there are open HANDLER cursors). γ) Acquire strong SNRW and newly introduced SRO metadata locks on tables implicitly locked for write and read correspondingly and thus replace thr_lock.c locks with MDL locks. Indeed, this approach is not without its own issues: *) We will have to sort out issues with lock priorities somehow. *) Thanks to the fact that we can't predict set of implicitly locked tables in advance probability of deadlocks is increasing so lock acquisition in LOCK TABLES becomes less efficient on average + some code acquiring locks for DDL statements needs to be adjusted to be able to handle ER_LOCK_DEADLOCK errors (e.g. by trying to restart acquisition). As a bonus, we might be able to fix some bugs involving intersubsystem deadlocks involving LOCK TABLES in this case. Summary of steps required to get rid of thr_lock.c for InnoDB ============================================================= Summing up the above it seems that we can totally get rid of thr_lock.c locks for InnoDB tables provided that we will: 1) Change HANDLER code to temporarily upgrade S metadata lock to SR lock for the duration of read from handler. 2) Fix ALTER TABLE IMPORT/EXPORT tablespace code to upgrade metadata lock to X lock when it is executed under LOCK TABLES. 3) Introduce new type of metadata lock - SRO-lock to replace TL_READ_NO_INSERT lock in LOCK TABLES. 4) Change code to acquire SNRW and SRO metadata locks on tables which are implicitly locked for LOCK TABLES. Let us discuss steps 3) and 4) and other issues which can arise in more details. Details: New type of metadata lock ================================== New type of lock should be compatible with S, SH and SR type of metadata locks and itself. Also for backward compatibility it should be compatible with SU and SNW types of locks (QQ: does this mean that we need to update descriptions of SU lock - currently it says that thread holding such lock can update data?). It should be incompatible with SW, SNRW and X types of locks. This new type of lock can be named SHARED_READ_ONLY (SRO). Renaming existing SNW type to SHARED_UPGRADABLE_NO_WRITE and reusing SHARED_NO_WRITE for new type of lock is another option, but such approach will complicate merges older code. Adding such type of lock would have been fairly straightforward if not issue with priorities. The problem is that in thr_lock.c TL_READ_NO_INSERT lock has lower priority than normal write lock (and higher than of low-prio write lock). While it is possible to make priority of new SRO type lower than of SW type, it means that LOCK TABLES READ can be easily starved by concurrent stream of DML. thr_lock.c solution to this problem is two-pronged a) it is possible to use DML with low-priority write locks by using LOW_PRIORITY clause and corresponding TL_WRITE_LOW_PRIO lock b) it is also possible to limit number of write locks acquired in a row without letting read locks in by using max_write_lock_count parameter. Possible solutions of priority problem for new metadata lock type: a) Break compatibility and declare that LOCK TABLES READ has higher priority than DML, i.e. that SRO type has higher priority than SW type (but not SNRW type used for LOCK TABLES WRITE). It is likely that we will be immediately requested to add some way to avoid LOCK TABLES READ starving out DML. This means new parameter similar to max_write_lock_count will have to be added. E.g. we can call it max_read_lock_count. b) Mimic thr_lock.c behavior fully by assigning SRO priority lower than of SW, adding low-priority SW locks and obeying max_write_lock_count parameter when granting SW locks. c) Break compatibility with both old thr_lock.c behavior and current MDL behavior implement fair scheduling, at least for locks used for DML and LOCK TABLES READ/WRITE. This will require fairly significant changes to MDL subsystem (specifically to the way we discover conflicting locks in waiters graph). And likely even more changes to test suite. d) Mimic thr_lock.c behavior partially by assigning SRO priority lower than of SW and obeying max_write_lock_count parameter when granting SW locks, without introducing low-priority SW locks. Note that with any choice of priority between SRO and SW types it probably makes sense to make pending SRO not to block acquisition of SNRW and X locks as otherwise we can easily get much more deadlock errors during upgrade of SU lock in ALTER TABLE. Since approach b) is the safest from backward compatibility point of view and also only a bit more complex from implementation perspective than approaches a) and d) we will stick to it. Details: Acquiring SNRW and SRO on implicitly locked tables =========================================================== Changing code responsible for opening views and adding tables used by stored programs to prelocking list to acquire SNRW and SRO metadata locks in case when this is done for LOCK TABLES statement should be fairly straightforward. The new problem arises thanks to the fact that after such a step we we won't be able to predict order in which strong MDL locks will be acquired by LOCK TABLES statements. Due to this we no longer will be able to rely on that in almost all cases (MERGE tables being exception) strong locks are acquired in certain order to avoid deadlock errors in DDL. Possible solutions to this problem: a) Change function which chooses deadlock resolution victim to prefer waits for strong locks from LOCK TABLES over locks from other DDL. Ensure that attempt to acquire locks for LOCK TABLES is restarted when we get ER_LOCK_DEADLOCK error. b) Simply change lock_table_names() and other similar places in our code where we acquire several strong locks to react to ER_LOCK_DEADLOCK error by releasing locks and restarting lock acquisition if possible (i.e. if there were no locks acquired before this step which we can't easily release and re-acquire). Special analysis needs to be performed to figure out if ER_LOCK_DEADLOCK error is possible after such change at the points where we do upgrade of MDL locks. c) Drastically change approach to the acquisition of strong locks for LOCK TABLES, acquire weak S locks on all tables first, then upgrade them to strong locks at once using order. The drawbacks of this approach are that a) we will have to handle deadlocks during lock upgrade somehow (plus possibly in other places where we acquire strong locks) b) in fact LOCK TABLES will acquire MDL locks twice meaning additional performance overhead as compared to other variants. The current plan is to follow approach a) as it is least intrusive and its implementation is less likely to break anything in existing code. Additional issue: LOCK TABLE READ LOCAL ======================================= LOCK TABLE READ LOCAL is special syntax which allows concurrent inserts into the locked table for MyISAM tables. It is not supported for InnoDB tables. Still since we don't know table's storage engine before opening it we will have to change our approach to metadata lock acquisition for this statement. Possible options are: a) Ignore LOCAL clause for LOCK TABLES. This is probably too big behavior change. (LOCK TABLES LOCAL is used by mysqldump so by doing this step we are likely to reduce concurrency of such backups). OTOH this option becomes much more attractive once MyISAM is deprecated. b) Acquire SRO locks first and then downgrade them to SR locks if SE supports LOCAL clause (e.g. for MyISAM). This approach reduces a bit concurrency between LOCK TABLE READ LOCAL (e.g. mysqldump) and concurrent inserts. c) Acquire SR locks if LOCAL clause is present, upgrade them to SRO if SE doesn't support this clause. This approach introduces possibility of MDL deadlocks for SE doing MDL upgrade. Such deadlocks will be properly detected and should be handled somehow (e.g. by restarting lock acquisition). Since it is unlikely that anybody will use LOCK TABLE READ LOCAL for InnoDB tables, performance impact of these deadlocks probably can be ignored. The suggestion is to go with approach c). It allows to fully preserve backward compatibility and initial code investigation shows that its implementation should be fairly straightforward. Additional issue: Multi-UPDATE ============================== Multi-update doesn't acquire thr_lock.c write locks on table participating in its join which are not updated. As result such statement can run even if some other thread locks tables which are not updated but participate in its main join using LOCK TABLE READ. Since we acquire SW metadata lock for all tables participating in Multi-UPDATE join the situation is going to change once LOCK TABLE READ starts to acquire SRO lock - concurrent LOCK TABLE READ on any table in join will block multi-update (AFAIU the multi-update is the only place where we acquire metadata lock which is stronger than thr_lock.c lock, so other places should not be affected). Possible solutions for this issue are: a) Accept change in behavior. The upside is that code stays simple, the downside is that some users may encounter problems + new behavior might be hard to explain without referring to implementation. Support Team was asked about this alternative and majority of people who answered seem to be OK with it (there is one person who objects). Also, to reduce the impact of this change in behavior, we can downgrade SW locks for tables which are read only to SR locks. b) Keep behavior backward-compatible. Change code to acquire S locks initially and then upgrade them to SR or SW locks after figuring out if specific table in join going to be only read or read and updated. Such an upgrade could lead to deadlocks which will be detected by MDL subsystem and need to be handled. Code which clean-ups after resolving columns from SET clause and allows to restart locking is present in 5.1. The awkwardness of this code (and the fact that it has caused some problems in the past) is the main downside of this approach. The upside - absence of change in behavior. Performance impact of lock upgrade and deadlock handling is negligible since upgrade takes "fast path" and deadlocks can occur only due to concurrent multi-update and DDL. The suggestion is to go with option a) since it is more straighforward and there were not strong opposition to it from Support. We still will be able to implement option b) later if there will be loud complaints after MRU/RC with this task is out. Details: Not acquiring thr_lock.c locks for InnoDB ================================================== Changing InnoDB to not acquire thr_lock.c locks itself is fairly straightforward: I) We need to change ha_innobase::lock_count() to return 0, so no (THR_LOCK*) elements are allocated in MYSQL_LOCK::locks array for InnoDB tables. II) We need to ensure that ha_innobase::store_lock() doesn't try to store type of thr_lock.c lock in MYSQL_LOCK::locks array. Note that this method still needs to be called as it plays important role in determining type of row-locks used by InnoDB (as well as ha_innobase::externa_lock() and ha_innobase::start_stmt()). Refactoring this part of SE API is subject of different WL. III) We need to double check that code in sql/lock.cc dealing with MYSQL_LOCK structure and thr_lock.c locks correctly handles all situations for SEs which don't use any thr_lock.c locks and return 0 as handler::lock_count().
Copyright (c) 2000, 2018, Oracle Corporation and/or its affiliates. All rights reserved.