WL#6671: Improve scalability by not using thr_lock.c locks for InnoDB tables
Affects: Server-Prototype Only
—
Status: Complete
Improve scalability by not using thr_lock.c locks for InnoDB tables (this task is mostly about avoiding calls to thr_lock.c code/locking THR_LOCK::mutex with minimal changes to SE API and locking code on SQL-layer). User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html http://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html (mentions new locking behavior re: LOCK TABLES ... READ)
Non-functional requirements:
NF1) This patch should show performance/scalability improvement in
1-table Sysbench OLTP_RO/POINT_SELECT tests for InnoDB on multi-core
machines.
NF2) There should be no significant regressions in other benchmarks for
InnoDB tables.
User visible changes caused by this task ======================================== THR_LOCK will be no longer used for InnoDB tables ------------------------------------------------- Besides the performance improvements aspect this means that user will never see waits for THR_LOCK locks for InnoDB tables in P_S schema or I_S.PROCESSLIST output. Instead in cases when user previously has seen wait on THR_LOCK lock for InnoDB table it will see wait on MDL. Particularly this means that for DML like INSERT INTO t1 VALUES(1) which is blocked due to LOCK TABLES t1 READ active in another connection we will see in I_S.PROCESSLIST.STATE column "Waiting for table metadata lock" state instead of "Waiting for table level lock". There won't be THR_LOCK acquisitions/waits registered in this case in P_S.TABLE_HANDLES table and for "wait/lock/table/sql/handler" event (calls to handler::external_lock() will still be registerd). Instead waits for/acquisitions of metadata lock will be visible in P_S.METADATA_LOCKS table and for "wait/lock/metadata/sql/mdl" event. LOCK TABLES READ and transactions --------------------------------- LOCK TABLES READ will block and will be blocked by concurrent transactions changing the table, for all storage engines, similarly to how LOCK TABLES WRITE work now. Particularly this means that: - Explicitly or implicitly started transactions updating non-transactional (e.g. MyISAM) table will start to block and get blocked by LOCK TABLES READ on this table (unlike before). - LOCK TABLES READ on InnoDB table in @@autocommit=1 mode or with @@innodb_table_locks=0 will start block and blocked by concurrent transactions changing this table. This means that @@innodb_table_locks will only have in a few scenarios involving cascading foreign keys and updates through no-SQL API. - Storage engines which prefer to ignore LOCK TABLES READ by downgrading THR_LOCK lock acquired for it won't be able to do so. Tables which are implictly locked by LOCK TABLES ------------------------------------------------ Tables which are implicitly used by LOCK TABLES (e.g. through view or trigger) will be locked using metadata locks in addition to (all SEs except InnoDB) or instead of (InnoDB) THR_LOCK locks. These means that changes described in the above two points will also apply to such tables. Multi-update and concurrent LOCK TABLES READ -------------------------------------------- Multi-update will be blocking/start to be blocked by concurrent LOCK TABLES READ on any table from its join, even though such table will be only used for reading and won't be updated. Such blocking will be visible as a wait on metadata lock. HANDLER READ and concurrent LOCK TABLES WRITE --------------------------------------------- HANDLER READ statement (for table using any SE) will wait for concurrent LOCK TABLES WRITE to go away on metadata lock and not THR_LOCK lock. Vice versa LOCK TABLES WRITE will wait for HANDLER READ to complete on metadata lock instead of THR_LOCK lock. This will be visible as a different STATE for waiter in I_S.PROCESSLIST and in statistics gathered by P_S. ALTER TABLE IMPORT/DISCARD TABLESPACE under LOCK TABLES ------------------------------------------------------- ALTER TABLE DISCARD/IMPORT under LOCK TABLES will no longer allow concurrent access to table metadata. This means that concurrent SHOW CREATE TABLE on the table will be blocked during such statement. Also information about table won't be accessible in I_S during this time. Bugs fixed or alleviated ------------------------ Also this WL is likely to completely fix or at least alleviate the issues described in the following bug reports: Bug #11751331 "CONCURRENT DML AND LOCK TABLE ... READ FOR INNODB TABLE CAUSE WARNINGS IN" Bug #11764618 "DEADLOCK WHEN DDL UNDER LOCK TABLES WRITE, READ + PREPARE" Bug #18913551 "LOCK TABLES USES INCORRECT LOCK FOR IMPLICITLY USED TABLES"
Background: why it is possible to get rid of thr_lock.c locks for InnoDB
========================================================================
Currently InnoDB:
*) Downgrades all types of write locks to TL_WRITE_ALLOW_WRITE lock
unless:
-) This is ALTER TABLE IMPORT/DISCARD TABLESPACE
-) This is TRUNCATE TABLE
-) This is OPTIMIZE TABLE
-) This is CREATE TABLE
-) This is LOCK TABLES statement
*) Strong, TL_READ_NO_INSERT, read locks are downgraded to TL_READ locks
unless this is LOCK TABLES. In case of LOCK TABLES TL_READ_NO_INSERT
locks are not downgraded and TL_READ locks are upgraded to
TL_READ_NO_INSERT locks.
This means that in most cases thr_lock.c locks for InnoDB tables are
compatible with each other and don't play any significant role.
Let us take a more detailed look at the above list of exceptions:
1) ALTER TABLE IMPORT/DISCARD TABLESPACE.
Since these operations acquire exclusive (**) MDL lock on the table
being discarded or imported all concurrent DML and DDL would be
blocked for them anyway, so thr_lock.c lock is not really required
in this case.
(**) The fact that under LOCK TABLES we acquire only SNRW lock and
never upgrade it to X lock can be treated as a bug according to
Marko/Sunny and should be fixed (and if it is not a bug - the
situation boils down to the same as for LOCK TABLES WRITE).
2) TRUNCATE TABLE
We always acquire exclusive MDL lock on the table being truncated,
so all concurrent DDL and DML will be blocked during this operation,
without any participation of thr_lock.c locks.
3) OPTIMIZE TABLE
Have to be split in two parts:
a) Normal OPTIMIZE TABLE for InnoDB is implemented as ALTER TABLE FORCE
thus it doesn't acquire write lock and relies on MDL/row-locking to
isolate from concurrent DML and DDL.
b) OPTIMIZE for FTS indexes is implemented in ha_innobase::optimize(),
which is called under protection of MDL SNRW lock, which means that
it is isolated from all DDL and DML from which thr_lock.c can isolate
except open HANDLER cursors. But this situation is similar to one for
LOCK TABLES WRITE and will be discussed later.
4) CREATE TABLE
There are two cases when we acquire write locks in CREATE TABLE:
a) Write lock is acquired on target table in CREATE TABLE SELECT.
But at the point when it is acquired we will have exclusive
MDL lock on it, so all concurrent DDL/DML on this table is
going to be blocked anyway and strong thr_lock.c lock is
not required.
b) Write locks are acquired on source tables in CREATE TABLE SELECT
if FOR UPDATE clause is used. AFAIU the fact that we acquire strong
write lock TL_WRITE in this case is artefact from a) and is not
really required (LOCK_X row-locks should be sufficient).
5) LOCK TABLES
Again we have several cases for this statement:
a) Tables explicitly locked for write (i.e. explicitly present in
statement table list with with WRITE clause). We acquire strong
SNRW metadata lock on such tables which blocks concurrent DDL
and DML access to this table leaving only I_S, access from prepare
phase of PS and open HANDLER cursors. Out of these three cases only
the last one uses thr_lock.c locks. So the only scenario in which
thr_lock.c plays some role in this case is preventing concurrent
reads from HANDLER statements during LOCK TABLES WRITE duration.
This problem can be solved by ensuring that HANDLER READ statement
upgrade MDL from current S lock to SR lock for the duration of read
(which will also remove nasty exception from assert ensuring that
all operations which read or modify data acquire SR or SW or stronger
locks on tables).
Note that we also acquire table-level InnoDB locks in this case
but they I) don't block non-locking reads which used by HANDLER
cursor and II) are not acquired in case of @@autocommit=1 mode.
b) Tables explicitly locked for read (i.e. explicitly present in
statement table list with READ clause). For these tables we only
acquire weak SR metadata lock and InnoDB table-level LOCK_S
lock if @@autocommit=0. This means that strong read thr_lock.c
lock which is acquired in this case plays significant role in
this case.
Possible options to replace their usage are:
α) Change code to acquire table-level LOCK_S InnoDB locks and
turn off @@autocommit for the duration of LOCK TABLES.
This sounds like a rather radical change in behavior to
me and has some other drawbacks (e.g. according to InnoDB
documentation this significantly increases chances of
deadlocks, introduces new intersubsystem deadlocks if
HANDLER statements are involved, et cetera).
β) Introduce new type of MDL lock - let us call it
SHARED_READ_ONLY (SRO) type of lock which will block all
concurrent updates to the table but will be compatible with
itself and use it for LOCK TABLES READ. I.e. basically replace
thr_lock.c lock with a similar MDL lock. There are some
complications on this path (e.g. need to sort out problems
with priorities if we are to preserve full compatibility)
but overall it is doable.
c) Tables implicitly locked for read and write by LOCK TABLES
(i.e. underlying tables of views locked, tables used in
triggers on tables locked, stored programs called from these
views and triggers). We acquire weak SR and SW metadata locks
on these tables. We also LOCK_X and LOCK_S table-level InnoDB
locks but only in @@autocommit=0 mode. This means that strong
thr_lock.c locks acquired in these cases play significant
role in this case as well.
Possible options to replace their usage are:
α) Do not replace thr_lock.c with anything. This means that
access to implicitly locked tables from under LOCK TABLES
would block or even might deadlock in some cases. Which
seems to be too radical change in behavior (e.g. think
of SELECT from explicitly locked view being blocked due
to problems with underlying table).
β) Change code to acquire LOCK_S and LOCK_X table-level InnoDB
locks and turn off @@autocommit mode until UNLOCK TABLES
happens. Similarly to solution b.α) this is probably too
big behavior change which is not free from other issues
(more probable deadlocks, new deadlocks when there are
open HANDLER cursors).
γ) Acquire strong SNRW and newly introduced SRO metadata locks
on tables implicitly locked for write and read correspondingly
and thus replace thr_lock.c locks with MDL locks.
Indeed, this approach is not without its own issues:
*) We will have to sort out issues with lock priorities somehow.
*) Thanks to the fact that we can't predict set of implicitly
locked tables in advance probability of deadlocks is increasing
so lock acquisition in LOCK TABLES becomes less efficient
on average + some code acquiring locks for DDL statements
needs to be adjusted to be able to handle ER_LOCK_DEADLOCK
errors (e.g. by trying to restart acquisition).
As a bonus, we might be able to fix some bugs involving
intersubsystem deadlocks involving LOCK TABLES in this case.
Summary of steps required to get rid of thr_lock.c for InnoDB
=============================================================
Summing up the above it seems that we can totally get rid of thr_lock.c
locks for InnoDB tables provided that we will:
1) Change HANDLER code to temporarily upgrade S metadata
lock to SR lock for the duration of read from handler.
2) Fix ALTER TABLE IMPORT/EXPORT tablespace code to upgrade
metadata lock to X lock when it is executed under LOCK
TABLES.
3) Introduce new type of metadata lock - SRO-lock to replace
TL_READ_NO_INSERT lock in LOCK TABLES.
4) Change code to acquire SNRW and SRO metadata locks on tables
which are implicitly locked for LOCK TABLES.
Let us discuss steps 3) and 4) and other issues which can arise
in more details.
Details: New type of metadata lock
==================================
New type of lock should be compatible with S, SH and SR type of metadata
locks and itself. Also for backward compatibility it should be compatible
with SU and SNW types of locks (QQ: does this mean that we need to update
descriptions of SU lock - currently it says that thread holding such lock
can update data?). It should be incompatible with SW, SNRW and X types of
locks. This new type of lock can be named SHARED_READ_ONLY (SRO). Renaming
existing SNW type to SHARED_UPGRADABLE_NO_WRITE and reusing SHARED_NO_WRITE
for new type of lock is another option, but such approach will complicate
merges older code.
Adding such type of lock would have been fairly straightforward if not
issue with priorities. The problem is that in thr_lock.c TL_READ_NO_INSERT
lock has lower priority than normal write lock (and higher than of low-prio
write lock). While it is possible to make priority of new SRO type lower
than of SW type, it means that LOCK TABLES READ can be easily starved by
concurrent stream of DML. thr_lock.c solution to this problem is two-pronged
a) it is possible to use DML with low-priority write locks by using
LOW_PRIORITY clause and corresponding TL_WRITE_LOW_PRIO lock b) it is also
possible to limit number of write locks acquired in a row without letting
read locks in by using max_write_lock_count parameter.
Possible solutions of priority problem for new metadata lock type:
a) Break compatibility and declare that LOCK TABLES READ has higher
priority than DML, i.e. that SRO type has higher priority than SW
type (but not SNRW type used for LOCK TABLES WRITE). It is likely
that we will be immediately requested to add some way to avoid
LOCK TABLES READ starving out DML. This means new parameter
similar to max_write_lock_count will have to be added. E.g.
we can call it max_read_lock_count.
b) Mimic thr_lock.c behavior fully by assigning SRO priority lower
than of SW, adding low-priority SW locks and obeying
max_write_lock_count parameter when granting SW locks.
c) Break compatibility with both old thr_lock.c behavior and
current MDL behavior implement fair scheduling, at least
for locks used for DML and LOCK TABLES READ/WRITE. This
will require fairly significant changes to MDL subsystem
(specifically to the way we discover conflicting locks in
waiters graph). And likely even more changes to test suite.
d) Mimic thr_lock.c behavior partially by assigning SRO priority
lower than of SW and obeying max_write_lock_count parameter
when granting SW locks, without introducing low-priority SW
locks.
Note that with any choice of priority between SRO and SW types it
probably makes sense to make pending SRO not to block acquisition
of SNRW and X locks as otherwise we can easily get much more
deadlock errors during upgrade of SU lock in ALTER TABLE.
Since approach b) is the safest from backward compatibility point
of view and also only a bit more complex from implementation perspective
than approaches a) and d) we will stick to it.
Details: Acquiring SNRW and SRO on implicitly locked tables
===========================================================
Changing code responsible for opening views and adding tables used by
stored programs to prelocking list to acquire SNRW and SRO metadata
locks in case when this is done for LOCK TABLES statement should be
fairly straightforward.
The new problem arises thanks to the fact that after such a step we
we won't be able to predict order in which strong MDL locks will be
acquired by LOCK TABLES statements. Due to this we no longer will be
able to rely on that in almost all cases (MERGE tables being exception)
strong locks are acquired in certain order to avoid deadlock errors
in DDL.
Possible solutions to this problem:
a) Change function which chooses deadlock resolution victim to prefer
waits for strong locks from LOCK TABLES over locks from other DDL.
Ensure that attempt to acquire locks for LOCK TABLES is restarted
when we get ER_LOCK_DEADLOCK error.
b) Simply change lock_table_names() and other similar places in
our code where we acquire several strong locks to react to
ER_LOCK_DEADLOCK error by releasing locks and restarting lock
acquisition if possible (i.e. if there were no locks acquired
before this step which we can't easily release and re-acquire).
Special analysis needs to be performed to figure out if
ER_LOCK_DEADLOCK error is possible after such change at
the points where we do upgrade of MDL locks.
c) Drastically change approach to the acquisition of strong locks
for LOCK TABLES, acquire weak S locks on all tables first, then
upgrade them to strong locks at once using order. The drawbacks
of this approach are that a) we will have to handle deadlocks
during lock upgrade somehow (plus possibly in other places where
we acquire strong locks) b) in fact LOCK TABLES will acquire MDL
locks twice meaning additional performance overhead as compared
to other variants.
The current plan is to follow approach a) as it is least intrusive and
its implementation is less likely to break anything in existing code.
Additional issue: LOCK TABLE READ LOCAL
=======================================
LOCK TABLE READ LOCAL is special syntax which allows concurrent inserts
into the locked table for MyISAM tables. It is not supported for InnoDB
tables. Still since we don't know table's storage engine before opening
it we will have to change our approach to metadata lock acquisition for
this statement. Possible options are:
a) Ignore LOCAL clause for LOCK TABLES. This is probably too big behavior
change. (LOCK TABLES LOCAL is used by mysqldump so by doing this step
we are likely to reduce concurrency of such backups). OTOH this option
becomes much more attractive once MyISAM is deprecated.
b) Acquire SRO locks first and then downgrade them to SR locks if SE
supports LOCAL clause (e.g. for MyISAM). This approach reduces a bit
concurrency between LOCK TABLE READ LOCAL (e.g. mysqldump) and
concurrent inserts.
c) Acquire SR locks if LOCAL clause is present, upgrade them to SRO if
SE doesn't support this clause. This approach introduces possibility
of MDL deadlocks for SE doing MDL upgrade. Such deadlocks will be
properly detected and should be handled somehow (e.g. by restarting
lock acquisition). Since it is unlikely that anybody will use LOCK
TABLE READ LOCAL for InnoDB tables, performance impact of these
deadlocks probably can be ignored.
The suggestion is to go with approach c). It allows to fully preserve
backward compatibility and initial code investigation shows that its
implementation should be fairly straightforward.
Additional issue: Multi-UPDATE
==============================
Multi-update doesn't acquire thr_lock.c write locks on table participating
in its join which are not updated. As result such statement can run even
if some other thread locks tables which are not updated but participate
in its main join using LOCK TABLE READ. Since we acquire SW metadata lock
for all tables participating in Multi-UPDATE join the situation is going
to change once LOCK TABLE READ starts to acquire SRO lock - concurrent
LOCK TABLE READ on any table in join will block multi-update (AFAIU the
multi-update is the only place where we acquire metadata lock which is
stronger than thr_lock.c lock, so other places should not be affected).
Possible solutions for this issue are:
a) Accept change in behavior. The upside is that code stays simple, the
downside is that some users may encounter problems + new behavior might
be hard to explain without referring to implementation. Support Team
was asked about this alternative and majority of people who answered
seem to be OK with it (there is one person who objects).
Also, to reduce the impact of this change in behavior, we can downgrade
SW locks for tables which are read only to SR locks.
b) Keep behavior backward-compatible. Change code to acquire S locks
initially and then upgrade them to SR or SW locks after figuring out if
specific table in join going to be only read or read and updated.
Such an upgrade could lead to deadlocks which will be detected by MDL
subsystem and need to be handled. Code which clean-ups after resolving
columns from SET clause and allows to restart locking is present in 5.1.
The awkwardness of this code (and the fact that it has caused some
problems in the past) is the main downside of this approach. The upside -
absence of change in behavior. Performance impact of lock upgrade and
deadlock handling is negligible since upgrade takes "fast path" and
deadlocks can occur only due to concurrent multi-update and DDL.
The suggestion is to go with option a) since it is more straighforward and
there were not strong opposition to it from Support. We still will be able
to implement option b) later if there will be loud complaints after MRU/RC
with this task is out.
Details: Not acquiring thr_lock.c locks for InnoDB
==================================================
Changing InnoDB to not acquire thr_lock.c locks itself is fairly
straightforward:
I) We need to change ha_innobase::lock_count() to return 0, so no
(THR_LOCK*) elements are allocated in MYSQL_LOCK::locks[] array for
InnoDB tables.
II) We need to ensure that ha_innobase::store_lock() doesn't try to store
type of thr_lock.c lock in MYSQL_LOCK::locks[] array. Note that this
method still needs to be called as it plays important role in
determining type of row-locks used by InnoDB (as well as
ha_innobase::externa_lock() and ha_innobase::start_stmt()).
Refactoring this part of SE API is subject of different WL.
III) We need to double check that code in sql/lock.cc dealing with
MYSQL_LOCK structure and thr_lock.c locks correctly handles all
situations for SEs which don't use any thr_lock.c locks and
return 0 as handler::lock_count().
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.