MySQL 9.1.0
Source Code Documentation
|
MySQL LOCK ORDER
LOCK ORDER is a debugging tool used to enforce that no deadlocks can happen in the server.
The general idea is as follows.
Every mutex is instrumented by the performance schema, and is assigned a name already.
There is a global, unique, oriented graph, that describes the lock sequences allowed during runtime.
Instrumented mutexes are nodes in the graph.
When some code locks mutex A then B, there is an explicit A -> B edge in the graph.
During runtime execution, the LOCK ORDER tool checks whether the actual locks taken comply with the graph.
Should any test not comply,
Once a state is achieved where:
then the server is guaranteed to have no dead locks.
The beauty of this approach is that there is no need to run tests concurrently to expose deadlocks.
When statements are executed, even alone in a single session, any statement that causes an execution path to be not compliant will be detected, and reported.
LOCK ORDER is a debugging tool only. It should not be used in production.
CMAKE
contains a new build option WITH_LOCK_ORDER
, to build with the tool.
To run a test without LOCK ORDER, use any of the following:
./mtr --lock-order=0 main.parser
export MTR_LOCK_ORDER=0 ./mtr main.parser
To run a test with LOCK_ORDER, use any of the following:
./mtr --lock-order=1 main.parser
export MTR_LOCK_ORDER=1 ./mtr main.parser
By default, LOCK ORDER is disabled in mysql-test-run.
Executing a test with LOCK_ORDER enabled will have the following effects:
lock_order_dependencies.txt
is read by the tool, to load the graph to enforce.lock_order-
(timestamp)-(pid).log is written by the tool, and contains various messages.lock_order.txt
is written, and contains a textual representation of the graph, with some analysis.The default dependencies file is (root)/mysql-test/lock_order_dependencies.txt.
To use a different dependencies file, use the lock_order_dependencies
system variable.
The file format is text, with one line per declaration.
Every node is named from the performance schema instrumentation, with a shorter prefix.
For example, the mutex mysql_mutex_t
LOCK_open
is named "wait/synch/mutex/sql/LOCK_open" in the performance schema, simplified as "mutex/sql/LOCK_open" in LOCK ORDER.
When the code locks A then B, this arc is declared in the graph as:
ARC FROM "A" TO "B"
For example:
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/TABLE_SHARE::LOCK_ha_data"
Note that arcs are not transitive in the graph.
For example, if the graph has the following arcs:
ARC FROM "A" TO "B" ARC FROM "B" TO "C"
Then the following code will comply:
But the following code will not comply:
This happens because the "A" -> "C" transition is not declared.
This is a desired feature: to understand contention in the server, the real dependencies (the paths actually taken) must be documented explicitly. The tool could, but does not, infer more arcs by transitivity.
Additional metadata can be associated with an ARC, by adding flags to the arc declaration.
The format is
ARC FROM "A" TO "B" FLAGS <flag1> <flag2> ... <flagN>
Individual flags are separated by spaces.
Supported arc flags are:
Read write locks are named from the performance schema, with a shorter prefix.
There are three kinds of read write locks instrumented:
mysql_rwlock_t
)mysql_prlock_t
)rw_lock_t
in innodb)The lock mysql_rwlock_t
LOCK_system_variables_hash
, which is a read write lock, is named "wait/synch/rwlock/sql/LOCK_system_variables_hash" in the performance schema, simplified as "rwlock/sql/LOCK_system_variables_hash" in LOCK ORDER.
Read write locks are recursive, but only on read operations. Due to the scheduling behavior in the underlying implementation, a reader might block indirectly another reader, if a write request is present.
When a lock is held on a read write lock, it can be in two states:
These states are exclusive.
When a thread holds a read write lock on "L" and locks "B", arcs are noted as follows:
ARC FROM "rwlock/sql/L" STATE "R" TO "B" ... ARC FROM "rwlock/sql/L" STATE "W" TO "B" ...
Operations on read write locks are:
When a thread holds a lock on "A" and then locks a read write lock "L", arcs are noted as follows:
ARC FROM "A" ... TO "rwlock/sql/L" OP "R" ARC FROM "A" ... TO "rwlock/sql/L" OP "TRY R" ARC FROM "A" ... TO "rwlock/sql/L" OP "W" ARC FROM "A" ... TO "rwlock/sql/L" OP "TRY W"
Recursive locks are noted as follows:
ARC FROM "rwlock/sql/L" STATE "..." TO "rwlock/sql/L" RECURSIVE OP "R" ARC FROM "rwlock/sql/L" STATE "..." TO "rwlock/sql/L" RECURSIVE OP "TRY R"
The lock mysql_prlock_t
MDL_context::m_LOCK_waiting_for
, which is a priority read write lock, is named "wait/synch/prlock/sql/MDL_context::LOCK_waiting_for" in the performance schema, simplified as "prlock/sql/MDL_context::LOCK_waiting_for" in LOCK ORDER.
Priority locks are recursive. A reader will never block another reader, even if a write request is present.
When a lock is held on a priority lock, it can be in two states:
These states are exclusive.
When a thread holds a priority lock on "L" and locks "B", arcs are noted as follows:
ARC FROM "prlock/sql/L" STATE "R" TO "B" ... ARC FROM "prlock/sql/L" STATE "W" TO "B" ...
Operations on priority locks are:
Note that the READ state can be acquired recursively on the same priority lock.
When a thread holds a lock on "A" and then locks a priority lock "L", arcs are noted as follows:
ARC FROM "A" ... TO "prlock/sql/L" OP "R" ARC FROM "A" ... TO "prlock/sql/L" OP "W"
Recursive locks are noted as follows:
ARC FROM "prlock/sql/L" STATE "..." TO "prlock/sql/L" RECURSIVE OP "R"
The lock rw_lock_t
dict_operation_lock
, which is a shared exclusive lock, is named "wait/synch/sxlock/innodb/dict_operation_lock" in the performance schema, simplified as "sxlock/innodb/dict_operation_lock" in LOCK ORDER.
Shared exclusive locks are recursive. Shared exclusive locks are implemented as spin locks, with fallback on condition variables.
When a lock is held on a shared exclusive lock, it can be in three states:
Because the same lock can be acquired recursively, when multiple locks are taken by the same thread on the same object, the overall equivalent state is computed (for example, SX + X counts as X).
When a thread holds a shared exclusive lock on "L" and locks "B", arcs are noted as follows:
ARC FROM "sxlock/innodb/L" STATE "S" TO "B" ... ARC FROM "sxlock/innodb/L" STATE "SX" TO "B" ... ARC FROM "sxlock/innodb/L" STATE "X" TO "B" ...
Operations on shared exclusive locks are:
Note that some states can be acquired recursively on the same shared exclusive lock.
When a thread holds a lock on "A" and then locks a shared exclusive lock "L", arcs are noted as follows:
ARC FROM "A" ... TO "sxlock/innodb/L" OP "S" ARC FROM "A" ... TO "sxlock/innodb/L" OP "TRY S" ARC FROM "A" ... TO "sxlock/innodb/L" OP "SX" ARC FROM "A" ... TO "sxlock/innodb/L" OP "TRY SX" ARC FROM "A" ... TO "sxlock/innodb/L" OP "X" ARC FROM "A" ... TO "sxlock/innodb/L" OP "TRY X"
Recursive locks are noted as follows:
ARC FROM "sxlock/innodb/L" STATE "..." TO "sxlock/innodb/L" RECURSIVE OP "S" ARC FROM "sxlock/innodb/L" STATE "..." TO "sxlock/innodb/L" RECURSIVE OP "SX" ARC FROM "sxlock/innodb/L" STATE "..." TO "sxlock/innodb/L" RECURSIVE OP "X"
Conditions are named from the performance schema, with a shorter prefix.
For example, the condition mysql_cond_t
COND_open
is named "wait/synch/cond/sql/COND_open" in the performance schema, simplified as "cond/sql/COND_open" in LOCK ORDER.
Explicit arcs are declared between mutexes and associated conditions, so that the tool also enforces that the same mutex is consistently used for the same condition, to comply with the posix APIs.
When a condition C is associated with a mutex M, this arc is declared in the graph as:
BIND "C" TO "M"
For example:
BIND "cond/sql/COND_open" TO "mutex/sql/LOCK_open"
In the following sequence of code:
mysql_mutex_lock(M); mysql_cond_signal(C); mysql_mutex_unlock(M);
The tool will verify, when calling mysql_cond_signal, that condition C is bound with M, and that M is locked.
Note that holding a mutex when using signal or broadcast is recommended, but not mandatory.
To allow the following code:
mysql_cond_signal(C); // mutex M not locked
The condition must be flagged explicitly as using 'UNFAIR' scheduling, as in:
BIND "C" TO "M" FLAGS UNFAIR
For example:
BIND "cond/sql/Source_info::start_cond" TO "mutex/sql/Source_info::run_lock" FLAGS UNFAIR
Files are named from the performance schema, with a shorter prefix.
For example, the relay log file is named "wait/io/file/sql/relaylog" in the performance schema, simplified as "file/sql/relaylog" in LOCK ORDER.
File io operations (read, write, etc) on the file are not documented. When any file io is performed while holding a lock, the dependency is documented, for example:
ARC FROM "rwlock/sql/channel_lock" STATE "W" TO "file/sql/relaylog" ARC FROM "sxlock/innodb/dict_operation_lock" STATE "X" TO "file/innodb/innodb_data_file"
During execution, the server writes to a log file located under the build directory, in a sub directory named lock-order, and named lock_order-
(timestamp)-(pid).log where (pid) is the process id for mysqld.
The log file contains various messages printed by LOCK ORDER.
This file is an optional output.
To print the lock_order.txt file, two actions are required:
The COM_DEBUG causes the file to be printed. It is desirable to load dynamic plugins and components before dumping this report, so that lock names instrumented inside the loaded code are checked against the dependencies graph.
A helper test, lock_order.cycle, performs all the steps required to dump the lock_order.txt report.
This command:
export MTR_LOCK_ORDER=1 ./mtr lock_order.cycle
will generate the graph dump as a side effect.
The generated file contains the following sections:
The section "DEPENDENCY GRAPH" is a textual representation of the graph, to facilitate investigations.
Each node is dumped with incoming and outgoing arcs, for example:
NODE: mutex/sql/LOCK_open 16 incoming arcs: FROM: mutex/sql/tz_LOCK FROM: mutex/p_dyn_loader/key_component_id_by_urn_mutex FROM: mutex/sql/LOCK_plugin_install FROM: mutex/sql/LOCK_table_cache FROM: mutex/sql/key_mta_temp_table_LOCK FROM: mutex/sql/LOCK_event_queue FROM: mutex/sql/LOCK_global_system_variables FROM: mutex/sql/LOCK_reset_gtid_table FROM: mutex/sql/Source_info::data_lock FROM: mutex/sql/Source_info::run_lock FROM: mutex/sql/MYSQL_BIN_LOG::LOCK_index FROM: mutex/sql/MYSQL_BIN_LOG::LOCK_log FROM: mutex/sql/MYSQL_RELAY_LOG::LOCK_log FROM: mutex/sql/Relay_log_info::data_lock FROM: mutex/sql/Relay_log_info::run_lock FROM: mutex/sql/TABLE_SHARE::LOCK_ha_data -- LOOP FLAG 22 outgoing arcs: TO: mutex/blackhole/blackhole TO: mutex/archive/Archive_share::mutex TO: mutex/myisam/MYISAM_SHARE::intern_lock TO: mutex/innodb/sync_array_mutex TO: mutex/innodb/rtr_active_mutex TO: mutex/innodb/srv_sys_mutex TO: mutex/innodb/rw_lock_list_mutex TO: mutex/innodb/rw_lock_debug_mutex TO: mutex/innodb/dict_table_mutex TO: mutex/innodb/dict_sys_mutex TO: mutex/innodb/innobase_share_mutex TO: mutex/csv/tina TO: mutex/sql/LOCK_plugin TO: cond/sql/COND_open TO: mutex/mysys/KEY_CACHE::cache_lock TO: mutex/mysys/THR_LOCK_heap TO: mutex/mysys/THR_LOCK_myisam TO: mutex/mysys/THR_LOCK_open TO: mutex/sql/DEBUG_SYNC::mutex TO: mutex/sql/MDL_wait::LOCK_wait_status TO: mutex/sql/TABLE_SHARE::LOCK_ha_data TO: mutex/sql/THD::LOCK_current_cond -- LOOP FLAG
The section "SCC ANALYSIS (full graph)" is a "Strongly Connected Component" (SCC) analysis of the entire graph.
See https://en.wikipedia.org/wiki/Strongly_connected_component
For each SCC, the report prints the nodes part of the SCCs:
Found SCC number 1 of size 26: mutex/innodb/dict_sys_mutex ... mutex/sql/LOCK_offline_mode Found SCC number 2 of size 2: mutex/sql/Relay_log_info::run_lock mutex/sql/Source_info::run_lock Number of SCC found: 2
Then the arcs internal to each SCC are printed.
Dumping arcs for SCC 1: SCC ARC FROM "mutex/myisam/MYISAM_SHARE::intern_lock" TO "mutex/sql/LOCK_plugin" SCC ARC FROM "mutex/innodb/parser_mutex" TO "mutex/innodb/dict_sys_mutex" SCC ARC FROM "mutex/innodb/dict_sys_mutex" TO "mutex/sql/LOCK_plugin" SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/LOCK_global_system_variables" SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/THD::LOCK_current_cond" SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/myisam/MYISAM_SHARE::intern_lock" SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/innodb/dict_sys_mutex" SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/sql/LOCK_plugin" ... SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/LOCK_plugin" SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/THD::LOCK_current_cond" Dumping arcs for SCC 2: SCC ARC FROM "mutex/sql/Relay_log_info::run_lock" TO "mutex/sql/Source_info::run_lock" SCC ARC FROM "mutex/sql/Source_info::run_lock" TO "mutex/sql/Relay_log_info::run_lock"
The section "IGNORED NODES" prints the nodes flagged as IGNORED in the dependencies file:
IGNORED NODE: mutex/sql/LOCK_thd_list IGNORED NODE: mutex/sql/LOCK_event_queue IGNORED NODE: mutex/sql/LOCK_offline_mode IGNORED NODE: mutex/sql/LOCK_global_system_variables
The section "LOOP ARC" prints the arcs flagged as LOOP in the dependencies file:
LOOP ARC FROM mutex/myisam/MYISAM_SHARE::intern_lock TO mutex/sql/LOCK_plugin LOOP ARC FROM mutex/innodb/dict_sys_mutex TO mutex/sql/LOCK_plugin LOOP ARC FROM mutex/mysys/THR_LOCK_myisam TO mutex/sql/LOCK_plugin LOOP ARC FROM mutex/sql/LOCK_plugin TO mutex/sql/LOCK_global_system_variables LOOP ARC FROM mutex/sql/LOCK_plugin TO mutex/sql/THD::LOCK_current_cond LOOP ARC FROM mutex/sql/TABLE_SHARE::LOCK_ha_data TO mutex/sql/LOCK_table_cache LOOP ARC FROM mutex/sql/LOCK_open TO mutex/sql/THD::LOCK_current_cond LOOP ARC FROM mutex/sql/TABLE_SHARE::LOCK_ha_data TO mutex/sql/LOCK_open LOOP ARC FROM mutex/sql/LOCK_global_system_variables TO mutex/sql/LOCK_thd_list LOOP ARC FROM mutex/sql/TABLE_SHARE::LOCK_ha_data TO mutex/sql/THD::LOCK_thd_data
The section "SCC ANALYSIS (revised graph)" prints the strongly connected components of a sub graph, obtained by:
Found SCC number 1 of size 14: mutex/sql/THD::LOCK_current_cond mutex/sql/MYSQL_RELAY_LOG::LOCK_log_end_pos mutex/sql/Relay_log_info::log_space_lock mutex/sql/MYSQL_RELAY_LOG::LOCK_index mutex/sql/THD::LOCK_thd_data mutex/sql/MYSQL_BIN_LOG::LOCK_log mutex/sql/LOCK_reset_gtid_table mutex/sql/MYSQL_BIN_LOG::LOCK_sync mutex/sql/MYSQL_BIN_LOG::LOCK_commit mutex/sql/MYSQL_BIN_LOG::LOCK_index mutex/sql/MYSQL_RELAY_LOG::LOCK_log mutex/sql/Relay_log_info::data_lock mutex/sql/key_mta_temp_table_LOCK mutex/sql/Source_info::data_lock Found SCC number 2 of size 2: mutex/sql/Relay_log_info::run_lock mutex/sql/Source_info::run_lock Number of SCC found: 2 Dumping arcs for SCC 1: ... Dumping arcs for SCC 2: ...
Finally, the section "UNRESOLVED ARCS" lists arcs found in the dependency graph that could not be matched with actual nodes from the code.
UNRESOLVED ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_log" TO "mutex/semisync/LOCK_binlog_" UNRESOLVED ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/semisync/LOCK_binlog_" UNRESOLVED ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/semisync/LOCK_binlog_" UNRESOLVED ARC FROM "mutex/sql/LOCK_plugin_install" TO "mutex/semisync/LOCK_binlog_" UNRESOLVED ARC FROM "mutex/sql/LOCK_plugin_install" TO "mutex/semisync/Ack_receiver::m_mutex" UNRESOLVED ARC FROM "mutex/semisync/LOCK_binlog_" TO "cond/semisync/COND_binlog_send_" UNRESOLVED ARC FROM "mutex/semisync/Ack_receiver::m_mutex" TO "cond/semisync/Ack_receiver::m_cond"
Arcs can be unresolved for two reasons:
First, the graph defining valid lock sequences should be unique for the server. Any attempt to use different graphs for different tests is fundamentally flawed.
Secondly, documenting the dependencies graph helps to document the design, as the order of locks does not happen by accident in the code but as a result of design decisions by development.
In an ideal world, the dependency graph should be documented up front when writing the code, and testing should only verify that the code complies with the graph.
In practice, such a graph is – not – documented, only fragments of it are "common knowledge".
As a result, the complete graph must be discovered from the code, by performing reverse engineering, to be documented.
The LOCK_ORDER tool support both:
Start with an empty lock_order_dependencies.txt file, and run a test.
For example,
./mtr main.parser
The resulting lock-order-
(timestamp)-(pid).log file will contain numerous messages, like:
MISSING: ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/sql/LOCK_open" Trace: using missing arc mutex/sql/LOCK_table_cache (/home/malff/GIT_LOCK_ORDER/sql/table_cache.h:140) -> mutex/sql/LOCK_open (/home/malff/GIT_LOCK_ORDER/sql/table_cache.h:319)
Here, the tool detected a lock sequence (LOCK_table_cache
then LOCK_open
) that is not declared in the graph. The "MISSING:" line prints the arc definition that should be added in the lock_order_dependencies.txt file, for easier copy and paste. The "Trace:" message gives details about where each lock was taken in the source code.
An efficient way to add at once all the missing arcs found while running tests is as follows:
cat lock_order*.log | grep MISSING | sort -u
Run this script, remove the MISSING: prefix, and add the result to the dependency graph. Then run tests again, with the new graph, and repeat until no more missing arcs are found.
The tool processes the dependency graph to detect "Strongly Connected Components". See https://en.wikipedia.org/wiki/Strongly_connected_component
Strictly speaking, a Strongly Connected Component can be a single node. In the context of LOCK_ORDER, "SCC" refers to Strongly Connected Components of size greater or equal to 2, that is, an actual cluster of several nodes.
This computation is done when dumping the lock_order.txt file.
A dedicated mtr test is written as a helper:
./mtr lock_order.cycle
Then read the section named "SCC ANALYSIS (full graph)" in file lock_order.txt
At time of writing, it reads as:
SCC ANALYSIS (full graph): ========================== Found SCC number 1 of size 26: mutex/innodb/dict_sys_mutex mutex/sql/LOCK_plugin mutex/sql/LOCK_global_system_variables mutex/innodb/parser_mutex mutex/sql/LOCK_table_cache mutex/myisam/MYISAM_SHARE::intern_lock mutex/mysys/THR_LOCK_myisam mutex/sql/LOCK_open mutex/sql/TABLE_SHARE::LOCK_ha_data mutex/sql/THD::LOCK_thd_data mutex/sql/LOCK_event_queue mutex/sql/MYSQL_BIN_LOG::LOCK_commit mutex/sql/THD::LOCK_current_cond mutex/sql/MYSQL_RELAY_LOG::LOCK_log_end_pos mutex/sql/Relay_log_info::log_space_lock mutex/sql/LOCK_thd_list mutex/sql/MYSQL_RELAY_LOG::LOCK_index mutex/sql/MYSQL_RELAY_LOG::LOCK_log mutex/sql/Relay_log_info::data_lock mutex/sql/key_mta_temp_table_LOCK mutex/sql/Source_info::data_lock mutex/sql/MYSQL_BIN_LOG::LOCK_log mutex/sql/LOCK_reset_gtid_table mutex/sql/MYSQL_BIN_LOG::LOCK_sync mutex/sql/MYSQL_BIN_LOG::LOCK_index mutex/sql/LOCK_offline_mode Found SCC number 2 of size 2: mutex/sql/Relay_log_info::run_lock mutex/sql/Source_info::run_lock Number of SCC found: 2
The tool found two Strongly Connected Components (SCC).
The details about each arcs are printed:
Dumping arcs for SCC 1: SCC ARC FROM "mutex/myisam/MYISAM_SHARE::intern_lock" TO "mutex/sql/LOCK_plugin" SCC ARC FROM "mutex/innodb/parser_mutex" TO "mutex/innodb/dict_sys_mutex" SCC ARC FROM "mutex/innodb/dict_sys_mutex" TO "mutex/sql/LOCK_plugin" SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/LOCK_global_system_variables" SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/THD::LOCK_current_cond" SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/myisam/MYISAM_SHARE::intern_lock" SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/innodb/dict_sys_mutex" SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/sql/LOCK_plugin" ... SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/LOCK_plugin" SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/THD::LOCK_current_cond" Dumping arcs for SCC 2: SCC ARC FROM "mutex/sql/Relay_log_info::run_lock" TO "mutex/sql/Source_info::run_lock" SCC ARC FROM "mutex/sql/Source_info::run_lock" TO "mutex/sql/Relay_log_info::run_lock"
Note that only arcs within the same SCC are printed here, as they are the ones that need closer investigation. Arcs coming in or out of this SCC are omitted, to reduce noise.
By its very own definition, a SCC of size greater than one is a cluster of nodes where any node is reachable, using locks, from any other node. Hence, SCC should not exist, and the code must be re-factored to avoid them.
In the list of SCC arcs printed, some (human) analysis is required to decide:
A bug fix is required to change the server code affected, to avoid locks, or take locks in the proper order.
Now, this is a sensitive task, for the following reasons:
For this reason, the tool supports ways to simulate "what-if" scenarios, and see in practice what the overall graph would look like if such and such arc were to be removed.
First, some nodes can (temporarily) be ignored entirely, to simplify the graph analysis, and identify smaller sub graphs in a big SCC.
By ignoring some nodes, a big SCC can be broken down into smaller, independent, sub graphs, which helps to investigate, identify, and resolve several dead lock root causes in isolation.
To ignore a node "A", use the following syntax in the dependency graph:
NODE "A" IGNORED
For example, using
NODE "mutex/sql/LOCK_event_queue" IGNORED NODE "mutex/sql/LOCK_global_system_variables" IGNORED NODE "mutex/sql/LOCK_offline_mode" IGNORED NODE "mutex/sql/LOCK_thd_list" IGNORED
will produce a graph without these nodes, also ignoring arcs from and to these nodes.
Secondly, arcs that are considered loops to fix in the code can be marked explicitly, like this:
ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/LOCK_open" FLAGS LOOP ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/LOCK_table_cache" FLAGS LOOP ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/THD::LOCK_thd_data" FLAGS LOOP
After tagging some IGNORED nodes or LOOP arcs, generate the lock_order.txt report again, to perform some analysis again.
The section "SCC ANALYSIS (full graph)" will be identical, as the real graph did not change.
The section "SCC ANALYSIS (revised graph)" will show what the graph would look like, with the loops fixed.
The goal is to iteratively tweak the LOOP flags in the graph, and perform analysis on the revised graph, until:
Once a viable set of LOOP arcs to remove is identified, file a bug report to address the issue found.
Each time a dependency is flagged as a LOOP, a matching bug report should be filed, and that bug should be eventually resolved.
Marking nodes are IGNORED or arcs as LOOP is equivalent to using suppressions in valgrind to avoid error messages.
This can be convenient to investigate further other areas, but it is by no means a satisfactory resolution in itself.
To achieve a state where the server can be declared as deadlock free with reasonable confidence, all the following conditions are required:
To facilitate investigations and debugging, the following features are available:
When an arc from A to B exists in the graph, it might be sometime necessary to understand where in the source code the A -> B dependency is actually taken.
By declaring the arc with the TRACE flag, as in
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/TABLE_SHARE::LOCK_ha_data" FLAGS TRACE
the tool will:
An example of trace:
Trace: using arc mutex/sql/LOCK_open (/home/malff/GIT_LOCK_ORDER/sql/sql_base.cc:1704) -> mutex/sql/TABLE_SHARE::LOCK_ha_data (/home/malff/GIT_LOCK_ORDER/sql/handler.cc:7764) statement when first lock was acquired: ALTER TABLE t1 ADD PARTITION stack when the first lock was acquired: [0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN14LO_stack_traceC2Ev+0x28) [0x2e9f654] [1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN7LO_lock18record_stack_traceEv+0x24) [0x2e9fe52] [2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN9LO_thread14add_mutex_lockEP8LO_mutexPKci+0x18c) [0x2e9aeb4] [3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN15LO_mutex_locker3endEv+0x3e) [0x2ea02dc] [4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834] [5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b30112] [6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f39] [7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z18close_thread_tableP3THDPP5TABLE+0x20e) [0x2b34196] [8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db] [9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z19close_thread_tablesP3THD+0x3fd) [0x2b33e83] [10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z21mysql_execute_commandP3THDb+0x5dca) [0x2be875c] [11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z11mysql_parseP3THDP12Parser_stateb+0x672) [0x2bea1bb] [12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x1496) [0x2be0291] [13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z10do_commandP3THD+0x448) [0x2bde867] [14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba] [15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c] [16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e] [17] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4] [18] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d] stack when the second lock was acquired: [0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN14LO_stack_traceC2Ev+0x28) [0x2e9f654] [1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN8LO_graph5checkEPK7LO_lockS2_+0x447) [0x2e99d25] [2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN9LO_thread11check_locksEPK7LO_lock+0xf1) [0x2e9aa99] [3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN9LO_thread14add_mutex_lockEP8LO_mutexPKci+0x1c9) [0x2e9aef1] [4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN15LO_mutex_locker3endEv+0x3e) [0x2ea02dc] [5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834] [6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2f04334] [7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN7handler19lock_shared_ha_dataEv+0x6c) [0x2f18e10] [8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN11ha_innopart5closeEv+0xec) [0x4125be8] [9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN7handler8ha_closeEv+0x173) [0x2f0aa1d] [10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z8closefrmP5TABLEb+0x8c) [0x2d1e118] [11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z18intern_close_tableP5TABLE+0xe5) [0x2b32d42] [12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f45] [13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z18close_thread_tableP3THDPP5TABLE+0x20e) [0x2b34196] [14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db] [15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z19close_thread_tablesP3THD+0x3fd) [0x2b33e83] [16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z21mysql_execute_commandP3THDb+0x5dca) [0x2be875c] [17] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z11mysql_parseP3THDP12Parser_stateb+0x672) [0x2bea1bb] [18] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x1496) [0x2be0291] [19] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z10do_commandP3THD+0x448) [0x2bde867] [20] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba] [21] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c] [22] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e] [23] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4] [24] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d]
Note that C++ symbol names are mangled. Using c++filt
on the log produces the following output:
Trace: using arc mutex/sql/LOCK_open (/home/malff/GIT_LOCK_ORDER/sql/sql_base.cc:1704) -> mutex/sql/TABLE_SHARE::LOCK_ha_data (/home/malff/GIT_LOCK_ORDER/sql/handler.cc:7764) statement when first lock was acquired: ALTER TABLE t1 ADD PARTITION stack when the first lock was acquired: [0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_stack_trace::LO_stack_trace()+0x28) [0x2e9f654] [1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_lock::record_stack_trace()+0x24) [0x2e9fe52] [2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_thread::add_mutex_lock(LO_mutex*, char const*, int)+0x18c) [0x2e9aeb4] [3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_mutex_locker::end()+0x3e) [0x2ea02dc] [4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834] [5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b30112] [6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f39] [7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_table(THD*, TABLE**)+0x20e) [0x2b34196] [8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db] [9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_tables(THD*)+0x3fd) [0x2b33e83] [10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x5dca) [0x2be875c] [11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x672) [0x2bea1bb] [12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1496) [0x2be0291] [13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(do_command(THD*)+0x448) [0x2bde867] [14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba] [15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c] [16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e] [17] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4] [18] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d] stack when the second lock was acquired: [0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_stack_trace::LO_stack_trace()+0x28) [0x2e9f654] [1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_graph::check(LO_lock const*, LO_lock const*)+0x447) [0x2e99d25] [2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_thread::check_locks(LO_lock const*)+0xf1) [0x2e9aa99] [3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_thread::add_mutex_lock(LO_mutex*, char const*, int)+0x1c9) [0x2e9aef1] [4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_mutex_locker::end()+0x3e) [0x2ea02dc] [5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834] [6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2f04334] [7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(handler::lock_shared_ha_data()+0x6c) [0x2f18e10] [8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(ha_innopart::close()+0xec) [0x4125be8] [9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(handler::ha_close()+0x173) [0x2f0aa1d] [10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(closefrm(TABLE*, bool)+0x8c) [0x2d1e118] [11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(intern_close_table(TABLE*)+0xe5) [0x2b32d42] [12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f45] [13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_table(THD*, TABLE**)+0x20e) [0x2b34196] [14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db] [15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_tables(THD*)+0x3fd) [0x2b33e83] [16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x5dca) [0x2be875c] [17] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x672) [0x2bea1bb] [18] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1496) [0x2be0291] [19] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(do_command(THD*)+0x448) [0x2bde867] [20] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba] [21] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c] [22] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e] [23] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4] [24] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d]
When tracing is not enough, the next step is to break the server execution in a debugger to understand the context, to investigate.
The tool allows to put breakpoints on dependencies (not just functions), using the DEBUG flag on an arc, as in:
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/TABLE_SHARE::LOCK_ha_data" FLAGS DEBUG
When this arc is found at runtime, a assert
will fail, that can be caught.
To help diagnostics, the tool will construct a string that details the reason for the failed assert.
In a debugger, a good place to put breakpoints to debug specific arcs is the function debug_lock_order_break_here()
.
Loops can be declared explicitly in the dependency graph, using the LOOP flag:
ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/LOCK_open" FLAGS LOOP
To facilitate investigation, nodes can be ignored to produce a smaller graph:
NODE "mutex/sql/LOCK_thd_list" IGNORED
This is a mini tutorial with examples, to better understand the notations used in file lock_order.txt.
Assume some server code as follows:
Then, assume three different threads (red, green and blue) execute the corresponding functions in the server.
When running tests, the lock_order tool will find the following dependencies, to add in the lock_order_dependencies.txt file:
ARC FROM "mutex/sql/A" TO "mutex/sql/B" ARC FROM "mutex/sql/B" TO "mutex/sql/C" ARC FROM "mutex/sql/C" TO "mutex/sql/A"
Executing the lock_order.cycle test, to get the lock_order.txt report, will show the following SCC:
Dumping classes for SCC 1: SCC Class mutex/sql/A SCC Class mutex/sql/B SCC Class mutex/sql/C Dumping nodes for SCC 1: SCC Node mutex/sql/A Girth 3 SCC Node mutex/sql/B Girth 3 SCC Node mutex/sql/C Girth 3 Dumping arcs for SCC 1: SCC ARC FROM "mutex/sql/A" TO "mutex/sql/B" SCC ARC FROM "mutex/sql/B" TO "mutex/sql/C" SCC ARC FROM "mutex/sql/C" TO "mutex/sql/A" SCC 1/1 summary: - Class Size 3 - Node Size 3 - Arc Count 3 - Girth 3 - Circumference 3
In other words, the tool found a cycle in the code, illustrated below.
This cycle involves three nodes, so the cycle girth is 3.
A possible fix is to change the implementation of func_blue() to take locks on A then C, in that order.
Fixing the code is not enough, as the lock_order_dependencies.txt file now contains an arc (C -> A) that is never taken in the code.
Once both:
the tool will then detect no more deadlocks involving these nodes.
Building on the previous example, let's now change B and C to read write locks.
The dependencies found, to add in the lock_order_dependencies.txt file, are:
ARC FROM "mutex/sql/A" TO "rwlock/sql/B" OP "R" ARC FROM "rwlock/sql/B" STATE "R" TO "rwlock/sql/C" OP "W" ARC FROM "rwlock/sql/C" STATE "R" TO "mutex/sql/A"
File lock_order.txt indicates a SCC, so there is a cycle.
Dumping classes for SCC 1: SCC Class mutex/sql/A SCC Class rwlock/sql/B SCC Class rwlock/sql/C Dumping nodes for SCC 1: SCC Node mutex/sql/A SCC Node rwlock/sql/B:-R SCC Node rwlock/sql/B:+R SCC Node rwlock/sql/C:-R SCC Node rwlock/sql/C:+W -- Note that these nodes exist, but are not part of the SCC: -- Node rwlock/sql/B:+W Node rwlock/sql/B:-W Node rwlock/sql/C:+R Node rwlock/sql/C:-W Dumping arcs for SCC 1: SCC ARC FROM "mutex/sql/A" TO "rwlock/sql/B:+R" SCC ARC FROM "rwlock/sql/B:+R" TO "rwlok/sql/B:-R" -- MICRO ARC SCC ARC FROM "rwlock/sql/B:-R" TO "rwlock/sql/C:+W" SCC ARC FROM "rwlock/sql/C:+W" TO "rwlock/sql/C:-R" -- MICRO ARC SCC ARC FROM "rwlock/sql/C:-R" TO "rwlock/sql/A" SCC 1/1 summary: - Class Size 3 - Node Size 5 - Arc Count 5 - Girth 5 - Circumference 5
First, notice how the read write lock class "rwlock/sql/B" is represented by four nodes, named:
A practical way to represent this graphically is to draw a box for the lock, that contains four ports. Internal connections between ports represent the logic table for the lock: an arc from "+R" to "-W" means that an incoming request for a read lock ("+R") is blocked by a write lock request already given ("-W").
Micro arcs represent the wired logic of the lock itself, these cannot be changed.
The complete SCC report can be represented graphically as follows:
Indeed in this graph, there is a cycle of girth 5 involving the following 3 objects:
The lock order tool indicates that this is a dead lock ... let's verify this with a test scenario:
Now, at t21, rwlock_B is locked in read mode by thread green, so a request for a read lock should be granted, right ?
Lock order claims there is a dead lock, because a read can indirectly block a read:
Consider the following code:
Executing func_grey() in thread grey any time after t8 will cause thread grey to block, with a write lock request in the queue in rwlock_B.
Such a request will in turn block thread red at t21.
The system is in deadlock:
A possible fix is to replace rwlock_B by a priority lock, prlock_B. Priority locks differ from read write locks precisely on this scheduling policy, as a read lock will never block another read lock, even when write lock requests are present.
The internal representation of a priority lock is as follows, note how the "+R" -> "-R" arc is missing, allowing parallel processing for reads.
Let's now look at innodb rw_lock_t
, which is a shared exclusive read write lock.
Consider the following code:
When executing func_red(), the tool detects the following dependencies:
ARC FROM "mutex/tutorial/mutex_A" TO "sxlock/tutorial/latch_B" OP "S"
Likewise for func_green(), the dependencies found are:
ARC FROM "sxlock/tutorial/latch_B" STATE "SX" TO "sxlock/tutorial/latch_B" RECURSIVE OP "X" ARC FROM "sxlock/tutorial/latch_B" STATE "X" TO "sxlock/tutorial/latch_C" OP "X"
Notice how the tool:
And finally for func_blue(), the tool detects:
ARC FROM "sxlock/tutorial/latch_C" STATE "SX" TO "mutex/tutorial/mutex_A"
Dumping classes for SCC 1: SCC Class sxlock/tutorial/latch_C SCC Class sxlock/tutorial/latch_B SCC Class mutex/tutorial/mutex_A Dumping nodes for SCC 1: SCC Node sxlock/tutorial/latch_C:+X Girth 5 SCC Node sxlock/tutorial/latch_C:-SX Girth 5 SCC Node sxlock/tutorial/latch_B:-X Girth 5 SCC Node sxlock/tutorial/latch_B:+S Girth 5 SCC Node mutex/tutorial/mutex_A Girth 5 -- Note that these nodes exist, but are not part of the SCC: -- Node sxlock/tutorial/latch_C:-X Node sxlock/tutorial/latch_C:+SX Node sxlock/tutorial/latch_C:+S Node sxlock/tutorial/latch_C:-S Node sxlock/tutorial/latch_B:+X Node sxlock/tutorial/latch_B:+SX Node sxlock/tutorial/latch_B:-SX Node sxlock/tutorial/latch_B:-S Dumping arcs for SCC 1: SCC ARC FROM "sxlock/tutorial/latch_C:+X" TO "sxlock/tutorial/latch_C:-SX" -- MICRO ARC SCC ARC FROM "sxlock/tutorial/latch_C:-SX" TO "mutex/tutorial/mutex_A" SCC ARC FROM "sxlock/tutorial/latch_B:-X" TO "sxlock/tutorial/latch_C:+X" SCC ARC FROM "sxlock/tutorial/latch_B:+S" TO "sxlock/tutorial/latch_B:-X" -- MICRO ARC SCC ARC FROM "mutex/tutorial/mutex_A" TO "sxlock/tutorial/latch_B:+S" -- Note that these arcs exist, but are not part of the SCC: -- ARC FROM "sxlock/tutorial/latch_B:-SX" TO "sxlock/tutorial/latch_B:-X" SCC 1/1 summary: - Class Size 3 - Node Size 5 - Arc Count 3 - Girth 5 - Circumference 5
The graphical representation of a shared exclusive lock is as follows:
Using this graphical representation, the SCC reported can be represented as:
There is a cycle of girth five involving the three nodes:
By code review, this is correct:
As can be found by reading the diagram, there are several options to break the deadlock:
As a site note about the notation used, see how the recursive lock taken by func_green() in latch_B is represented: An "-SX" -> "-X" arc means that the already given SX lock ("-SX") is promoted to a given X lock ("-X").