MySQL LOCK ORDER

Introduction

LOCK ORDER is a debugging tool used to enforce that no deadlocks can happen in the server.

The general idea is as follows.

Every mutex is instrumented by the performance schema, and is assigned a name already.

There is a global, unique, oriented graph, that describes the lock sequences allowed during runtime.

Instrumented mutexes are nodes in the graph.

When some code locks mutex A then B, there is an explicit A -> B edge in the graph.

During runtime execution, the LOCK ORDER tool checks whether the actual locks taken comply with the graph.

Should any test not comply,

either the arc is considered valid, so the graph is incomplete, and it should be amended (provided there are no cycles), to describe more accurately actual dependencies,
or the lock sequence is deemed illegal, in which case the code should be changed to avoid it.

Once a state is achieved where:

the graph has no cycle,
the entire test suite executes within the graph constraints,

then the server is guaranteed to have no dead locks.

The beauty of this approach is that there is no need to run tests concurrently to expose deadlocks.

When statements are executed, even alone in a single session, any statement that causes an execution path to be not compliant will be detected, and reported.

Building the code

LOCK ORDER is a debugging tool only. It should not be used in production.

CMAKE contains a new build option WITH_LOCK_ORDER, to build with the tool.

Running MySQL test with LOCK ORDER

To run a test without LOCK ORDER, use any of the following:

  ./mtr --lock-order=0 main.parser

  export MTR_LOCK_ORDER=0
  ./mtr main.parser

To run a test with LOCK_ORDER, use any of the following:

  ./mtr --lock-order=1 main.parser

  export MTR_LOCK_ORDER=1
  ./mtr main.parser

By default, LOCK ORDER is disabled in mysql-test-run.

Executing a test with LOCK_ORDER enabled will have the following effects:

the file lock_order_dependencies.txt is read by the tool, to load the graph to enforce.
the file lock_order-(timestamp)-(pid).log is written by the tool, and contains various messages.
optionally, the file lock_order.txt is written, and contains a textual representation of the graph, with some analysis.

Lock order file formats

lock_order_dependencies file

The default dependencies file is (root)/mysql-test/lock_order_dependencies.txt.

To use a different dependencies file, use the lock_order_dependencies system variable.

The file format is text, with one line per declaration.

Mutex nodes

Every node is named from the performance schema instrumentation, with a shorter prefix.

For example, the mutex mysql_mutex_t LOCK_open is named "wait/synch/mutex/sql/LOCK_open" in the performance schema, simplified as "mutex/sql/LOCK_open" in LOCK ORDER.

When the code locks A then B, this arc is declared in the graph as:

  ARC FROM "A" TO "B"

For example:

  ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/TABLE_SHARE::LOCK_ha_data"

Note that arcs are not transitive in the graph.

For example, if the graph has the following arcs:

  ARC FROM "A" TO "B"
  ARC FROM "B" TO "C"

Then the following code will comply:

mysql_mutex_lock(A);
mysql_mutex_lock(B);
mysql_mutex_unlock(A);
mysql_mutex_lock(C);
mysql_mutex_unlock(B);
mysql_mutex_unlock(C);

But the following code will not comply:

mysql_mutex_lock(A);
mysql_mutex_lock(B);
mysql_mutex_lock(C); // <- this will raise an error.
mysql_mutex_unlock(A);
mysql_mutex_unlock(B);
mysql_mutex_unlock(C);

This happens because the "A" -> "C" transition is not declared.

This is a desired feature: to understand contention in the server, the real dependencies (the paths actually taken) must be documented explicitly. The tool could, but does not, infer more arcs by transitivity.

Additional metadata can be associated with an ARC, by adding flags to the arc declaration.

The format is

  ARC FROM "A" TO "B" FLAGS <flag1> <flag2> ... <flagN>

Individual flags are separated by spaces.

Supported arc flags are:

TRACE
DEBUG
LOOP
IGNORED

Rwlock nodes

Read write locks are named from the performance schema, with a shorter prefix.

There are three kinds of read write locks instrumented:

read write locks (mysql_rwlock_t)
priority read write lock (mysql_prlock_t)
shared exclusive locks (rw_lock_t in innodb)

The lock mysql_rwlock_t LOCK_system_variables_hash, which is a read write lock, is named "wait/synch/rwlock/sql/LOCK_system_variables_hash" in the performance schema, simplified as "rwlock/sql/LOCK_system_variables_hash" in LOCK ORDER.

Read write locks are recursive, but only on read operations. Due to the scheduling behavior in the underlying implementation, a reader might block indirectly another reader, if a write request is present.

When a lock is held on a read write lock, it can be in two states:

either READ,
or WRITE.

These states are exclusive.

When a thread holds a read write lock on "L" and locks "B", arcs are noted as follows:

  ARC FROM "rwlock/sql/L" STATE "R" TO "B" ...
  ARC FROM "rwlock/sql/L" STATE "W" TO "B" ...

Operations on read write locks are:

READ
TRY READ
WRITE
TRY WRITE

When a thread holds a lock on "A" and then locks a read write lock "L", arcs are noted as follows:

  ARC FROM "A" ... TO "rwlock/sql/L" OP "R"
  ARC FROM "A" ... TO "rwlock/sql/L" OP "TRY R"
  ARC FROM "A" ... TO "rwlock/sql/L" OP "W"
  ARC FROM "A" ... TO "rwlock/sql/L" OP "TRY W"

Recursive locks are noted as follows:

  ARC FROM "rwlock/sql/L" STATE "..." TO "rwlock/sql/L" RECURSIVE OP "R"
  ARC FROM "rwlock/sql/L" STATE "..." TO "rwlock/sql/L" RECURSIVE OP "TRY R"

The lock mysql_prlock_t MDL_context::m_LOCK_waiting_for, which is a priority read write lock, is named "wait/synch/prlock/sql/MDL_context::LOCK_waiting_for" in the performance schema, simplified as "prlock/sql/MDL_context::LOCK_waiting_for" in LOCK ORDER.

Priority locks are recursive. A reader will never block another reader, even if a write request is present.

When a lock is held on a priority lock, it can be in two states:

either READ,
or WRITE.

These states are exclusive.

When a thread holds a priority lock on "L" and locks "B", arcs are noted as follows:

  ARC FROM "prlock/sql/L" STATE "R" TO "B" ...
  ARC FROM "prlock/sql/L" STATE "W" TO "B" ...

Operations on priority locks are:

READ
WRITE

Note that the READ state can be acquired recursively on the same priority lock.

When a thread holds a lock on "A" and then locks a priority lock "L", arcs are noted as follows:

  ARC FROM "A" ... TO "prlock/sql/L" OP "R"
  ARC FROM "A" ... TO "prlock/sql/L" OP "W"

Recursive locks are noted as follows:

  ARC FROM "prlock/sql/L" STATE "..." TO "prlock/sql/L" RECURSIVE OP "R"

The lock rw_lock_t dict_operation_lock, which is a shared exclusive lock, is named "wait/synch/sxlock/innodb/dict_operation_lock" in the performance schema, simplified as "sxlock/innodb/dict_operation_lock" in LOCK ORDER.

Shared exclusive locks are recursive. Shared exclusive locks are implemented as spin locks, with fallback on condition variables.

When a lock is held on a shared exclusive lock, it can be in three states:

SHARED,
or SHARED EXCLUSIVE,
or EXCLUSIVE.

Because the same lock can be acquired recursively, when multiple locks are taken by the same thread on the same object, the overall equivalent state is computed (for example, SX + X counts as X).

When a thread holds a shared exclusive lock on "L" and locks "B", arcs are noted as follows:

  ARC FROM "sxlock/innodb/L" STATE "S" TO "B" ...
  ARC FROM "sxlock/innodb/L" STATE "SX" TO "B" ...
  ARC FROM "sxlock/innodb/L" STATE "X" TO "B" ...

Operations on shared exclusive locks are:

SHARED
TRY SHARED
SHARED EXCLUSIVE
TRY SHARED EXCLUSIVE
EXCLUSIVE
TRY EXCLUSIVE

Note that some states can be acquired recursively on the same shared exclusive lock.

When a thread holds a lock on "A" and then locks a shared exclusive lock "L", arcs are noted as follows:

  ARC FROM "A" ... TO "sxlock/innodb/L" OP "S"
  ARC FROM "A" ... TO "sxlock/innodb/L" OP "TRY S"
  ARC FROM "A" ... TO "sxlock/innodb/L" OP "SX"
  ARC FROM "A" ... TO "sxlock/innodb/L" OP "TRY SX"
  ARC FROM "A" ... TO "sxlock/innodb/L" OP "X"
  ARC FROM "A" ... TO "sxlock/innodb/L" OP "TRY X"

Recursive locks are noted as follows:

  ARC FROM "sxlock/innodb/L" STATE "..." TO "sxlock/innodb/L" RECURSIVE OP "S"
  ARC FROM "sxlock/innodb/L" STATE "..." TO "sxlock/innodb/L" RECURSIVE OP "SX"
  ARC FROM "sxlock/innodb/L" STATE "..." TO "sxlock/innodb/L" RECURSIVE OP "X"

Cond nodes

Conditions are named from the performance schema, with a shorter prefix.

For example, the condition mysql_cond_t COND_open is named "wait/synch/cond/sql/COND_open" in the performance schema, simplified as "cond/sql/COND_open" in LOCK ORDER.

Explicit arcs are declared between mutexes and associated conditions, so that the tool also enforces that the same mutex is consistently used for the same condition, to comply with the posix APIs.

When a condition C is associated with a mutex M, this arc is declared in the graph as:

  BIND "C" TO "M"

For example:

  BIND "cond/sql/COND_open" TO "mutex/sql/LOCK_open"

In the following sequence of code:

  mysql_mutex_lock(M);
  mysql_cond_signal(C);
  mysql_mutex_unlock(M);

The tool will verify, when calling mysql_cond_signal, that condition C is bound with M, and that M is locked.

Note that holding a mutex when using signal or broadcast is recommended, but not mandatory.

To allow the following code:

  mysql_cond_signal(C); // mutex M not locked

The condition must be flagged explicitly as using 'UNFAIR' scheduling, as in:

  BIND "C" TO "M" FLAGS UNFAIR

For example:

  BIND "cond/sql/Source_info::start_cond" TO "mutex/sql/Source_info::run_lock" FLAGS UNFAIR

File nodes

Files are named from the performance schema, with a shorter prefix.

For example, the relay log file is named "wait/io/file/sql/relaylog" in the performance schema, simplified as "file/sql/relaylog" in LOCK ORDER.

File io operations (read, write, etc) on the file are not documented. When any file io is performed while holding a lock, the dependency is documented, for example:

  ARC FROM "rwlock/sql/channel_lock" STATE "W" TO "file/sql/relaylog"
  ARC FROM "sxlock/innodb/dict_operation_lock" STATE "X" TO "file/innodb/innodb_data_file"

lock_order.log file

During execution, the server writes to a log file located under the build directory, in a sub directory named lock-order, and named lock_order-(timestamp)-(pid).log where (pid) is the process id for mysqld.

The log file contains various messages printed by LOCK ORDER.

lock_order.txt file

This file is an optional output.

To print the lock_order.txt file, two actions are required:

enable the lock_order_print_txt system variable,
send a COM_DEBUG command to the server, using mysqladmin.

The COM_DEBUG causes the file to be printed. It is desirable to load dynamic plugins and components before dumping this report, so that lock names instrumented inside the loaded code are checked against the dependencies graph.

A helper test, lock_order.cycle, performs all the steps required to dump the lock_order.txt report.

This command:

  export MTR_LOCK_ORDER=1
  ./mtr lock_order.cycle

will generate the graph dump as a side effect.

The generated file contains the following sections:

DEPENDENCY GRAPH
SCC ANALYSIS (full graph)
IGNORED NODES
LOOP ARC
SCC ANALYSIS (revised graph)
UNRESOLVED ARCS

The section "DEPENDENCY GRAPH" is a textual representation of the graph, to facilitate investigations.

Each node is dumped with incoming and outgoing arcs, for example:

NODE: mutex/sql/LOCK_open
  16 incoming arcs:
    FROM: mutex/sql/tz_LOCK
    FROM: mutex/p_dyn_loader/key_component_id_by_urn_mutex
    FROM: mutex/sql/LOCK_plugin_install
    FROM: mutex/sql/LOCK_table_cache
    FROM: mutex/sql/key_mta_temp_table_LOCK
    FROM: mutex/sql/LOCK_event_queue
    FROM: mutex/sql/LOCK_global_system_variables
    FROM: mutex/sql/LOCK_reset_gtid_table
    FROM: mutex/sql/Source_info::data_lock
    FROM: mutex/sql/Source_info::run_lock
    FROM: mutex/sql/MYSQL_BIN_LOG::LOCK_index
    FROM: mutex/sql/MYSQL_BIN_LOG::LOCK_log
    FROM: mutex/sql/MYSQL_RELAY_LOG::LOCK_log
    FROM: mutex/sql/Relay_log_info::data_lock
    FROM: mutex/sql/Relay_log_info::run_lock
    FROM: mutex/sql/TABLE_SHARE::LOCK_ha_data -- LOOP FLAG
  22 outgoing arcs:
    TO: mutex/blackhole/blackhole
    TO: mutex/archive/Archive_share::mutex
    TO: mutex/myisam/MYISAM_SHARE::intern_lock
    TO: mutex/innodb/sync_array_mutex
    TO: mutex/innodb/rtr_active_mutex
    TO: mutex/innodb/srv_sys_mutex
    TO: mutex/innodb/rw_lock_list_mutex
    TO: mutex/innodb/rw_lock_debug_mutex
    TO: mutex/innodb/dict_table_mutex
    TO: mutex/innodb/dict_sys_mutex
    TO: mutex/innodb/innobase_share_mutex
    TO: mutex/csv/tina
    TO: mutex/sql/LOCK_plugin
    TO: cond/sql/COND_open
    TO: mutex/mysys/KEY_CACHE::cache_lock
    TO: mutex/mysys/THR_LOCK_heap
    TO: mutex/mysys/THR_LOCK_myisam
    TO: mutex/mysys/THR_LOCK_open
    TO: mutex/sql/DEBUG_SYNC::mutex
    TO: mutex/sql/MDL_wait::LOCK_wait_status
    TO: mutex/sql/TABLE_SHARE::LOCK_ha_data
    TO: mutex/sql/THD::LOCK_current_cond -- LOOP FLAG

The section "SCC ANALYSIS (full graph)" is a "Strongly Connected Component" (SCC) analysis of the entire graph.

See https://en.wikipedia.org/wiki/Strongly_connected_component

For each SCC, the report prints the nodes part of the SCCs:

Found SCC number 1 of size 26:
mutex/innodb/dict_sys_mutex
...
mutex/sql/LOCK_offline_mode

Found SCC number 2 of size 2:
mutex/sql/Relay_log_info::run_lock
mutex/sql/Source_info::run_lock

Number of SCC found: 2

Then the arcs internal to each SCC are printed.

Dumping arcs for SCC 1:
SCC ARC FROM "mutex/myisam/MYISAM_SHARE::intern_lock" TO "mutex/sql/LOCK_plugin"
SCC ARC FROM "mutex/innodb/parser_mutex" TO "mutex/innodb/dict_sys_mutex"
SCC ARC FROM "mutex/innodb/dict_sys_mutex" TO "mutex/sql/LOCK_plugin"
SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/LOCK_global_system_variables"
SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/THD::LOCK_current_cond"
SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/myisam/MYISAM_SHARE::intern_lock"
SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/innodb/dict_sys_mutex"
SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/sql/LOCK_plugin"
...
SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/LOCK_plugin"
SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/THD::LOCK_current_cond"

Dumping arcs for SCC 2:
SCC ARC FROM "mutex/sql/Relay_log_info::run_lock" TO "mutex/sql/Source_info::run_lock"
SCC ARC FROM "mutex/sql/Source_info::run_lock" TO "mutex/sql/Relay_log_info::run_lock"

The section "IGNORED NODES" prints the nodes flagged as IGNORED in the dependencies file:

IGNORED NODE: mutex/sql/LOCK_thd_list
IGNORED NODE: mutex/sql/LOCK_event_queue
IGNORED NODE: mutex/sql/LOCK_offline_mode
IGNORED NODE: mutex/sql/LOCK_global_system_variables

The section "LOOP ARC" prints the arcs flagged as LOOP in the dependencies file:

LOOP ARC FROM mutex/myisam/MYISAM_SHARE::intern_lock TO mutex/sql/LOCK_plugin
LOOP ARC FROM mutex/innodb/dict_sys_mutex TO mutex/sql/LOCK_plugin
LOOP ARC FROM mutex/mysys/THR_LOCK_myisam TO mutex/sql/LOCK_plugin
LOOP ARC FROM mutex/sql/LOCK_plugin TO mutex/sql/LOCK_global_system_variables
LOOP ARC FROM mutex/sql/LOCK_plugin TO mutex/sql/THD::LOCK_current_cond
LOOP ARC FROM mutex/sql/TABLE_SHARE::LOCK_ha_data TO mutex/sql/LOCK_table_cache
LOOP ARC FROM mutex/sql/LOCK_open TO mutex/sql/THD::LOCK_current_cond
LOOP ARC FROM mutex/sql/TABLE_SHARE::LOCK_ha_data TO mutex/sql/LOCK_open
LOOP ARC FROM mutex/sql/LOCK_global_system_variables TO mutex/sql/LOCK_thd_list
LOOP ARC FROM mutex/sql/TABLE_SHARE::LOCK_ha_data TO mutex/sql/THD::LOCK_thd_data

The section "SCC ANALYSIS (revised graph)" prints the strongly connected components of a sub graph, obtained by:

ignoring nodes tagged IGNORED,
ignoring arcs tagged LOOP.

Found SCC number 1 of size 14:
mutex/sql/THD::LOCK_current_cond
mutex/sql/MYSQL_RELAY_LOG::LOCK_log_end_pos
mutex/sql/Relay_log_info::log_space_lock
mutex/sql/MYSQL_RELAY_LOG::LOCK_index
mutex/sql/THD::LOCK_thd_data
mutex/sql/MYSQL_BIN_LOG::LOCK_log
mutex/sql/LOCK_reset_gtid_table
mutex/sql/MYSQL_BIN_LOG::LOCK_sync
mutex/sql/MYSQL_BIN_LOG::LOCK_commit
mutex/sql/MYSQL_BIN_LOG::LOCK_index
mutex/sql/MYSQL_RELAY_LOG::LOCK_log
mutex/sql/Relay_log_info::data_lock
mutex/sql/key_mta_temp_table_LOCK
mutex/sql/Source_info::data_lock

Found SCC number 2 of size 2:
mutex/sql/Relay_log_info::run_lock
mutex/sql/Source_info::run_lock

Number of SCC found: 2

Dumping arcs for SCC 1:
...

Dumping arcs for SCC 2:
...

Finally, the section "UNRESOLVED ARCS" lists arcs found in the dependency graph that could not be matched with actual nodes from the code.

UNRESOLVED ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_log" TO "mutex/semisync/LOCK_binlog_"
UNRESOLVED ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/semisync/LOCK_binlog_"
UNRESOLVED ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/semisync/LOCK_binlog_"
UNRESOLVED ARC FROM "mutex/sql/LOCK_plugin_install" TO "mutex/semisync/LOCK_binlog_"
UNRESOLVED ARC FROM "mutex/sql/LOCK_plugin_install" TO "mutex/semisync/Ack_receiver::m_mutex"
UNRESOLVED ARC FROM "mutex/semisync/LOCK_binlog_" TO "cond/semisync/COND_binlog_send_"
UNRESOLVED ARC FROM "mutex/semisync/Ack_receiver::m_mutex" TO "cond/semisync/Ack_receiver::m_cond"

Arcs can be unresolved for two reasons:

The code changed, and a node was either removed or renamed. The fix is to correct the dependency file accordingly.
A node is defined in a plugin (in this example, semisync), but the plugin is not loaded in the lock_order.cycle test script. The fix is to add the missing plugin.

Methodology

First, the graph defining valid lock sequences should be unique for the server. Any attempt to use different graphs for different tests is fundamentally flawed.

Secondly, documenting the dependencies graph helps to document the design, as the order of locks does not happen by accident in the code but as a result of design decisions by development.

In an ideal world, the dependency graph should be documented up front when writing the code, and testing should only verify that the code complies with the graph.

In practice, such a graph is – not – documented, only fragments of it are "common knowledge".

As a result, the complete graph must be discovered from the code, by performing reverse engineering, to be documented.

The LOCK_ORDER tool support both:

discovery, to rebuild a graph from runtime execution
enforcement, to ensure runtime execution complies with constraints

Collect missing arcs

Start with an empty lock_order_dependencies.txt file, and run a test.

For example,

  ./mtr main.parser

The resulting lock-order-(timestamp)-(pid).log file will contain numerous messages, like:

MISSING: ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/sql/LOCK_open"
Trace: using missing arc mutex/sql/LOCK_table_cache (/home/malff/GIT_LOCK_ORDER/sql/table_cache.h:140)
 -> mutex/sql/LOCK_open (/home/malff/GIT_LOCK_ORDER/sql/table_cache.h:319)

Here, the tool detected a lock sequence (LOCK_table_cache then LOCK_open) that is not declared in the graph. The "MISSING:" line prints the arc definition that should be added in the lock_order_dependencies.txt file, for easier copy and paste. The "Trace:" message gives details about where each lock was taken in the source code.

An efficient way to add at once all the missing arcs found while running tests is as follows:

  cat lock_order*.log | grep MISSING | sort -u

Run this script, remove the MISSING: prefix, and add the result to the dependency graph. Then run tests again, with the new graph, and repeat until no more missing arcs are found.

Perform graph analysis

The tool processes the dependency graph to detect "Strongly Connected Components". See https://en.wikipedia.org/wiki/Strongly_connected_component

Strictly speaking, a Strongly Connected Component can be a single node. In the context of LOCK_ORDER, "SCC" refers to Strongly Connected Components of size greater or equal to 2, that is, an actual cluster of several nodes.

This computation is done when dumping the lock_order.txt file.

A dedicated mtr test is written as a helper:

  ./mtr lock_order.cycle

Then read the section named "SCC ANALYSIS (full graph)" in file lock_order.txt

At time of writing, it reads as:

SCC ANALYSIS (full graph):
==========================

Found SCC number 1 of size 26:
mutex/innodb/dict_sys_mutex
mutex/sql/LOCK_plugin
mutex/sql/LOCK_global_system_variables
mutex/innodb/parser_mutex
mutex/sql/LOCK_table_cache
mutex/myisam/MYISAM_SHARE::intern_lock
mutex/mysys/THR_LOCK_myisam
mutex/sql/LOCK_open
mutex/sql/TABLE_SHARE::LOCK_ha_data
mutex/sql/THD::LOCK_thd_data
mutex/sql/LOCK_event_queue
mutex/sql/MYSQL_BIN_LOG::LOCK_commit
mutex/sql/THD::LOCK_current_cond
mutex/sql/MYSQL_RELAY_LOG::LOCK_log_end_pos
mutex/sql/Relay_log_info::log_space_lock
mutex/sql/LOCK_thd_list
mutex/sql/MYSQL_RELAY_LOG::LOCK_index
mutex/sql/MYSQL_RELAY_LOG::LOCK_log
mutex/sql/Relay_log_info::data_lock
mutex/sql/key_mta_temp_table_LOCK
mutex/sql/Source_info::data_lock
mutex/sql/MYSQL_BIN_LOG::LOCK_log
mutex/sql/LOCK_reset_gtid_table
mutex/sql/MYSQL_BIN_LOG::LOCK_sync
mutex/sql/MYSQL_BIN_LOG::LOCK_index
mutex/sql/LOCK_offline_mode

Found SCC number 2 of size 2:
mutex/sql/Relay_log_info::run_lock
mutex/sql/Source_info::run_lock

Number of SCC found: 2

The tool found two Strongly Connected Components (SCC).

The details about each arcs are printed:

Dumping arcs for SCC 1:
SCC ARC FROM "mutex/myisam/MYISAM_SHARE::intern_lock" TO "mutex/sql/LOCK_plugin"
SCC ARC FROM "mutex/innodb/parser_mutex" TO "mutex/innodb/dict_sys_mutex"
SCC ARC FROM "mutex/innodb/dict_sys_mutex" TO "mutex/sql/LOCK_plugin"
SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/LOCK_global_system_variables"
SCC ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/THD::LOCK_current_cond"
SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/myisam/MYISAM_SHARE::intern_lock"
SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/innodb/dict_sys_mutex"
SCC ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/sql/LOCK_plugin"
...
SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/LOCK_plugin"
SCC ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/THD::LOCK_current_cond"

Dumping arcs for SCC 2:
SCC ARC FROM "mutex/sql/Relay_log_info::run_lock" TO "mutex/sql/Source_info::run_lock"
SCC ARC FROM "mutex/sql/Source_info::run_lock" TO "mutex/sql/Relay_log_info::run_lock"

Note that only arcs within the same SCC are printed here, as they are the ones that need closer investigation. Arcs coming in or out of this SCC are omitted, to reduce noise.

Break Strongly Connected Components

By its very own definition, a SCC of size greater than one is a cluster of nodes where any node is reachable, using locks, from any other node. Hence, SCC should not exist, and the code must be re-factored to avoid them.

In the list of SCC arcs printed, some (human) analysis is required to decide:

which arc is valid, and compliant with the MySQL server design
which arc is invalid, which points to a flaw in the code.

A bug fix is required to change the server code affected, to avoid locks, or take locks in the proper order.

Now, this is a sensitive task, for the following reasons:

deciding which arc to remove is by itself difficult
changing the server code and / or design to change lock order can be even more difficult
removing an arc alone might not even fix anything, if there is another path in the graph that closes the same loop.

For this reason, the tool supports ways to simulate "what-if" scenarios, and see in practice what the overall graph would look like if such and such arc were to be removed.

First, some nodes can (temporarily) be ignored entirely, to simplify the graph analysis, and identify smaller sub graphs in a big SCC.

By ignoring some nodes, a big SCC can be broken down into smaller, independent, sub graphs, which helps to investigate, identify, and resolve several dead lock root causes in isolation.

To ignore a node "A", use the following syntax in the dependency graph:

NODE "A" IGNORED

For example, using

NODE "mutex/sql/LOCK_event_queue" IGNORED
NODE "mutex/sql/LOCK_global_system_variables" IGNORED
NODE "mutex/sql/LOCK_offline_mode" IGNORED
NODE "mutex/sql/LOCK_thd_list" IGNORED

will produce a graph without these nodes, also ignoring arcs from and to these nodes.

Secondly, arcs that are considered loops to fix in the code can be marked explicitly, like this:

ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/LOCK_open" FLAGS LOOP
ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/LOCK_table_cache" FLAGS LOOP
ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/THD::LOCK_thd_data" FLAGS LOOP

After tagging some IGNORED nodes or LOOP arcs, generate the lock_order.txt report again, to perform some analysis again.

The section "SCC ANALYSIS (full graph)" will be identical, as the real graph did not change.

The section "SCC ANALYSIS (revised graph)" will show what the graph would look like, with the loops fixed.

The goal is to iteratively tweak the LOOP flags in the graph, and perform analysis on the revised graph, until:

the list of LOOP arcs can be reasonably fixed in the code, without causing too much re-engineering effort.
the resulting revised graph has less complicated SCC (ideally, there should be none left), showing progress towards the resolution of dead locks.

Once a viable set of LOOP arcs to remove is identified, file a bug report to address the issue found.

Get reported bugs fixed

Each time a dependency is flagged as a LOOP, a matching bug report should be filed, and that bug should be eventually resolved.

Marking nodes are IGNORED or arcs as LOOP is equivalent to using suppressions in valgrind to avoid error messages.

This can be convenient to investigate further other areas, but it is by no means a satisfactory resolution in itself.

To achieve a state where the server can be declared as deadlock free with reasonable confidence, all the following conditions are required:

the GCOV code coverage is satisfactory, and in particular covers all lines of code taking locks.
all mutexes in the code are instrumented with the performance schema
the dependency graph contains no SCC
the dependency graph contains no LOOP arcs
the dependency graph contains no IGNORED nodes
the test suite passes under LOCK_ORDER without any complaints from the tool.

Advanced tooling

To facilitate investigations and debugging, the following features are available:

Tracing
Debugging
Simulating loop arcs
Simulating ignored nodes

Tracing dependencies

When an arc from A to B exists in the graph, it might be sometime necessary to understand where in the source code the A -> B dependency is actually taken.

By declaring the arc with the TRACE flag, as in

  ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/TABLE_SHARE::LOCK_ha_data" FLAGS TRACE

the tool will:

capture the current statement text,
capture the source code location, and the call stack, when the first lock is taken,
capture the source code location, and the call stack, when the second lock is taken, and print all the details in the log file when this arc is found during runtime execution.

An example of trace:

Trace: using arc mutex/sql/LOCK_open (/home/malff/GIT_LOCK_ORDER/sql/sql_base.cc:1704) -> mutex/sql/TABLE_SHARE::LOCK_ha_data (/home/malff/GIT_LOCK_ORDER/sql/handler.cc:7764)
statement when first lock was acquired:
ALTER TABLE t1 ADD PARTITION
stack when the first lock was acquired:
[0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN14LO_stack_traceC2Ev+0x28) [0x2e9f654]
[1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN7LO_lock18record_stack_traceEv+0x24) [0x2e9fe52]
[2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN9LO_thread14add_mutex_lockEP8LO_mutexPKci+0x18c) [0x2e9aeb4]
[3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN15LO_mutex_locker3endEv+0x3e) [0x2ea02dc]
[4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834]
[5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b30112]
[6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f39]
[7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z18close_thread_tableP3THDPP5TABLE+0x20e) [0x2b34196]
[8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db]
[9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z19close_thread_tablesP3THD+0x3fd) [0x2b33e83]
[10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z21mysql_execute_commandP3THDb+0x5dca) [0x2be875c]
[11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z11mysql_parseP3THDP12Parser_stateb+0x672) [0x2bea1bb]
[12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x1496) [0x2be0291]
[13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z10do_commandP3THD+0x448) [0x2bde867]
[14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba]
[15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c]
[16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e]
[17] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4]
[18] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d]
stack when the second lock was acquired:
[0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN14LO_stack_traceC2Ev+0x28) [0x2e9f654]
[1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN8LO_graph5checkEPK7LO_lockS2_+0x447) [0x2e99d25]
[2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN9LO_thread11check_locksEPK7LO_lock+0xf1) [0x2e9aa99]
[3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN9LO_thread14add_mutex_lockEP8LO_mutexPKci+0x1c9) [0x2e9aef1]
[4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN15LO_mutex_locker3endEv+0x3e) [0x2ea02dc]
[5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834]
[6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2f04334]
[7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN7handler19lock_shared_ha_dataEv+0x6c) [0x2f18e10]
[8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN11ha_innopart5closeEv+0xec) [0x4125be8]
[9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_ZN7handler8ha_closeEv+0x173) [0x2f0aa1d]
[10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z8closefrmP5TABLEb+0x8c) [0x2d1e118]
[11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z18intern_close_tableP5TABLE+0xe5) [0x2b32d42]
[12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f45]
[13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z18close_thread_tableP3THDPP5TABLE+0x20e) [0x2b34196]
[14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db]
[15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z19close_thread_tablesP3THD+0x3fd) [0x2b33e83]
[16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z21mysql_execute_commandP3THDb+0x5dca) [0x2be875c]
[17] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z11mysql_parseP3THDP12Parser_stateb+0x672) [0x2bea1bb]
[18] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x1496) [0x2be0291]
[19] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(_Z10do_commandP3THD+0x448) [0x2bde867]
[20] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba]
[21] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c]
[22] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e]
[23] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4]
[24] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d]

Note that C++ symbol names are mangled. Using c++filt on the log produces the following output:

Trace: using arc mutex/sql/LOCK_open (/home/malff/GIT_LOCK_ORDER/sql/sql_base.cc:1704) -> mutex/sql/TABLE_SHARE::LOCK_ha_data (/home/malff/GIT_LOCK_ORDER/sql/handler.cc:7764)
statement when first lock was acquired:
ALTER TABLE t1 ADD PARTITION
stack when the first lock was acquired:
[0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_stack_trace::LO_stack_trace()+0x28) [0x2e9f654]
[1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_lock::record_stack_trace()+0x24) [0x2e9fe52]
[2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_thread::add_mutex_lock(LO_mutex*, char const*, int)+0x18c) [0x2e9aeb4]
[3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_mutex_locker::end()+0x3e) [0x2ea02dc]
[4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834]
[5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b30112]
[6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f39]
[7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_table(THD*, TABLE**)+0x20e) [0x2b34196]
[8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db]
[9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_tables(THD*)+0x3fd) [0x2b33e83]
[10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x5dca) [0x2be875c]
[11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x672) [0x2bea1bb]
[12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1496) [0x2be0291]
[13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(do_command(THD*)+0x448) [0x2bde867]
[14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba]
[15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c]
[16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e]
[17] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4]
[18] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d]
stack when the second lock was acquired:
[0] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_stack_trace::LO_stack_trace()+0x28) [0x2e9f654]
[1] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_graph::check(LO_lock const*, LO_lock const*)+0x447) [0x2e99d25]
[2] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_thread::check_locks(LO_lock const*)+0xf1) [0x2e9aa99]
[3] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_thread::add_mutex_lock(LO_mutex*, char const*, int)+0x1c9) [0x2e9aef1]
[4] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(LO_mutex_locker::end()+0x3e) [0x2ea02dc]
[5] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2e9d834]
[6] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2f04334]
[7] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(handler::lock_shared_ha_data()+0x6c) [0x2f18e10]
[8] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(ha_innopart::close()+0xec) [0x4125be8]
[9] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(handler::ha_close()+0x173) [0x2f0aa1d]
[10] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(closefrm(TABLE*, bool)+0x8c) [0x2d1e118]
[11] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(intern_close_table(TABLE*)+0xe5) [0x2b32d42]
[12] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b33f45]
[13] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_table(THD*, TABLE**)+0x20e) [0x2b34196]
[14] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2b336db]
[15] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(close_thread_tables(THD*)+0x3fd) [0x2b33e83]
[16] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x5dca) [0x2be875c]
[17] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x672) [0x2bea1bb]
[18] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1496) [0x2be0291]
[19] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(do_command(THD*)+0x448) [0x2bde867]
[20] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x2d799ba]
[21] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld(lo_spawn_thread+0xda) [0x2e9cb5c]
[22] /home/malff/GIT_LOCK_ORDER/build/runtime_output_directory/mysqld() [0x466465e]
[23] /lib64/libpthread.so.0(+0x80a4) [0x7f094120a0a4]
[24] /lib64/libc.so.6(clone+0x6d) [0x7f093f75102d]

Debugging dependencies

When tracing is not enough, the next step is to break the server execution in a debugger to understand the context, to investigate.

The tool allows to put breakpoints on dependencies (not just functions), using the DEBUG flag on an arc, as in:

  ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/TABLE_SHARE::LOCK_ha_data" FLAGS DEBUG

When this arc is found at runtime, a assert will fail, that can be caught.

To help diagnostics, the tool will construct a string that details the reason for the failed assert.

In a debugger, a good place to put breakpoints to debug specific arcs is the function debug_lock_order_break_here().

Simulating loops

Loops can be declared explicitly in the dependency graph, using the LOOP flag:

  ARC FROM "mutex/sql/TABLE_SHARE::LOCK_ha_data" TO "mutex/sql/LOCK_open" FLAGS LOOP

Simulating ignored nodes

To facilitate investigation, nodes can be ignored to produce a smaller graph:

  NODE "mutex/sql/LOCK_thd_list" IGNORED

Understanding the notation

This is a mini tutorial with examples, to better understand the notations used in file lock_order.txt.

Basic mutex loop

Sample code

Assume some server code as follows:

mysql_mutex_t mutex_A;
mysql_mutex_t mutex_B;
mysql_mutex_t mutex_C;
 
void func_red() {
  mysql_mutex_lock(&mutex_A);
  mysql_mutex_lock(&mutex_B);
  do_something_red();
  mysql_mutex_unlock(&mutex_B);
  mysql_mutex_unlock(&mutex_A);
}
 
void func_green() {
  mysql_mutex_lock(&mutex_B);
  mysql_mutex_lock(&mutex_C);
  do_something_green();
  mysql_mutex_unlock(&mutex_C);
  mysql_mutex_unlock(&mutex_B);
}
 
void func_blue() {
  mysql_mutex_lock(&mutex_C);
  mysql_mutex_lock(&mutex_A);
  do_something_blue();
  mysql_mutex_unlock(&mutex_A);
  mysql_mutex_unlock(&mutex_C);
}

Then, assume three different threads (red, green and blue) execute the corresponding functions in the server.

Dependency graph

When running tests, the lock_order tool will find the following dependencies, to add in the lock_order_dependencies.txt file:

  ARC FROM "mutex/sql/A" TO "mutex/sql/B"
  ARC FROM "mutex/sql/B" TO "mutex/sql/C"
  ARC FROM "mutex/sql/C" TO "mutex/sql/A"

Lock order report

Executing the lock_order.cycle test, to get the lock_order.txt report, will show the following SCC:

  Dumping classes for SCC 1:
  SCC Class mutex/sql/A
  SCC Class mutex/sql/B
  SCC Class mutex/sql/C

  Dumping nodes for SCC 1:
  SCC Node mutex/sql/A Girth 3
  SCC Node mutex/sql/B Girth 3
  SCC Node mutex/sql/C Girth 3

  Dumping arcs for SCC 1:
  SCC ARC FROM "mutex/sql/A" TO "mutex/sql/B"
  SCC ARC FROM "mutex/sql/B" TO "mutex/sql/C"
  SCC ARC FROM "mutex/sql/C" TO "mutex/sql/A"

  SCC 1/1 summary:
   - Class Size 3
   - Node Size 3
   - Arc Count 3
   - Girth 3
   - Circumference 3

In other words, the tool found a cycle in the code, illustrated below.

This cycle involves three nodes, so the cycle girth is 3.

A possible fix is to change the implementation of func_blue() to take locks on A then C, in that order.

Fixing the code is not enough, as the lock_order_dependencies.txt file now contains an arc (C -> A) that is never taken in the code.

Once both:

the code is fixed
the C -> A arc is replaced by an A -> C arc in lock_order_dependencies.txt

the tool will then detect no more deadlocks involving these nodes.

With read write lock

Building on the previous example, let's now change B and C to read write locks.

Sample code

mysql_mutex_t mutex_A;
mysql_rwlock_t rwlock_B;
mysql_rwlock_t rwlock_C;
 
void func_red() {
  mysql_mutex_lock(&mutex_A);
  mysql_rwlock_rdlock(&rwlock_B);
  do_something_red();
  mysql_rwlock_unlock(&rwlock_B);
  mysql_mutex_unlock(&mutex_A);
}
 
void func_green() {
  mysql_rwlock_rdlock(&rwlock_B);
  mysql_rwlock_wrlock(&rwlock_C);
  do_something_green();
  mysql_rwlock_unlock(&rwlock_C);
  mysql_rwlock_unlock(&rwlock_B);
}
 
void func_blue() {
  mysql_rwlock_rdlock(&rwlock_C);
  mysql_mutex_lock(&mutex_A);
  do_something_blue();
  mysql_mutex_unlock(&mutex_A);
  mysql_mutex_unlock(&mutex_C);
}

Dependency graph

The dependencies found, to add in the lock_order_dependencies.txt file, are:

  ARC FROM "mutex/sql/A" TO "rwlock/sql/B" OP "R"
  ARC FROM "rwlock/sql/B" STATE "R" TO "rwlock/sql/C" OP "W"
  ARC FROM "rwlock/sql/C" STATE "R" TO "mutex/sql/A"

Lock order report

File lock_order.txt indicates a SCC, so there is a cycle.

  Dumping classes for SCC 1:
  SCC Class mutex/sql/A
  SCC Class rwlock/sql/B
  SCC Class rwlock/sql/C

  Dumping nodes for SCC 1:
  SCC Node mutex/sql/A
  SCC Node rwlock/sql/B:-R
  SCC Node rwlock/sql/B:+R
  SCC Node rwlock/sql/C:-R
  SCC Node rwlock/sql/C:+W

  -- Note that these nodes exist, but are not part of the SCC: --
  Node rwlock/sql/B:+W
  Node rwlock/sql/B:-W
  Node rwlock/sql/C:+R
  Node rwlock/sql/C:-W

  Dumping arcs for SCC 1:
  SCC ARC FROM "mutex/sql/A" TO "rwlock/sql/B:+R"
  SCC ARC FROM "rwlock/sql/B:+R" TO "rwlok/sql/B:-R" -- MICRO ARC
  SCC ARC FROM "rwlock/sql/B:-R" TO "rwlock/sql/C:+W"
  SCC ARC FROM "rwlock/sql/C:+W" TO "rwlock/sql/C:-R" -- MICRO ARC
  SCC ARC FROM "rwlock/sql/C:-R" TO "rwlock/sql/A"

  SCC 1/1 summary:
   - Class Size 3
   - Node Size 5
   - Arc Count 5
   - Girth 5
   - Circumference 5

First, notice how the read write lock class "rwlock/sql/B" is represented by four nodes, named:

"rwlock/sql/B:+R", incoming read lock
"rwlock/sql/B:-R", outgoing read lock
"rwlock/sql/B:+W", incoming write lock
"rwlock/sql/B:-W", outgoing write lock

A practical way to represent this graphically is to draw a box for the lock, that contains four ports. Internal connections between ports represent the logic table for the lock: an arc from "+R" to "-W" means that an incoming request for a read lock ("+R") is blocked by a write lock request already given ("-W").

Micro arcs represent the wired logic of the lock itself, these cannot be changed.

The complete SCC report can be represented graphically as follows:

Indeed in this graph, there is a cycle of girth 5 involving the following 3 objects:

mutex_A
rwlock_B
rwlock_C

The lock order tool indicates that this is a dead lock ... let's verify this with a test scenario:

t1: thread red executes function func_red
t2: thread red locks mutex_A
t3: thread red is interrupted, and sleeps for a while
t4: thread blue executes function func_blue
t5: thread blue locks rwlock_C in read mode
t6: thread blue attempts to lock mutex_A and is blocked.
t7: thread green executes function func_green
t8: thread green locks rwlock_B in read mode
t9: thread green attempts to lock rwlock_C in write mode and is blocked.
t20: thread red awakes
t21: thread red attempts to lock rwlock_B in read mode ...

Now, at t21, rwlock_B is locked in read mode by thread green, so a request for a read lock should be granted, right ?

Lock order claims there is a dead lock, because a read can indirectly block a read:

Consider the following code:

func_grey() {
  mysql_rwlock_wrlock(&rwlock_B);
  mysql_rwlock_unlock(&rwlock_B);
}

Executing func_grey() in thread grey any time after t8 will cause thread grey to block, with a write lock request in the queue in rwlock_B.

Such a request will in turn block thread red at t21.

The system is in deadlock:

thread red holds mutex_A, waits for rwlock_B:+R
thread blue holds rwlock_C:-R, waits for mutex_A
thread gren holds rwlock_B:-R, waits for rwlock_C:+W
thread grey holds nothing, waits for rwlock_B:+W

A possible fix is to replace rwlock_B by a priority lock, prlock_B. Priority locks differ from read write locks precisely on this scheduling policy, as a read lock will never block another read lock, even when write lock requests are present.

The internal representation of a priority lock is as follows, note how the "+R" -> "-R" arc is missing, allowing parallel processing for reads.

With shared exclusive locks

Let's now look at innodb rw_lock_t, which is a shared exclusive read write lock.

Sample code

Consider the following code:

mysql_mutex_t mutex_A;
-- these are innodb locks, see storage/innobase/include/sync0rw.h --
rw_lock_t latch_B;
rw_lock_t latch_C;
 
void func_red() {
  mysql_mutex_lock(&mutex_A);
  rw_lock_s_lock(&latch_B);
  do_something_red();
  rw_lock_s_unlock(&latch_B);
  mysql_mutex_unlock(&mutex_A);
}
 
void func_green() {
  rw_lock_sx_lock(&latch_B); -- Holds a SX lock.
  rw_lock_x_lock(&latch_B); -- Recursive, now holds a X lock.
 
  rw_lock_x_lock(&latch_C);
 
  do_something_green();
 
  rw_lock_x_unlock(&latch_C);
 
  rw_lock_x_unlock(&latch_B);
  rw_lock_sx_unlock(&latch_B);
}
 
void func_blue() {
  rw_lock_sx_lock(&latch_C);
  mysql_mutex_lock(&mutex_A);
  do_something_blue();
  mysql_mutex_unlock(&mutex_A);
  rw_lock_sx_unlock(&latch_C);
}

Dependency graph

When executing func_red(), the tool detects the following dependencies:

  ARC FROM "mutex/tutorial/mutex_A" TO "sxlock/tutorial/latch_B" OP "S"

Likewise for func_green(), the dependencies found are:

  ARC FROM "sxlock/tutorial/latch_B" STATE "SX" TO "sxlock/tutorial/latch_B" RECURSIVE OP "X"
  ARC FROM "sxlock/tutorial/latch_B" STATE "X" TO "sxlock/tutorial/latch_C" OP "X"

Notice how the tool:

detects that some locks are recursive
infere the overall STATE of a lock based on all lock calls made.

And finally for func_blue(), the tool detects:

  ARC FROM "sxlock/tutorial/latch_C" STATE "SX" TO "mutex/tutorial/mutex_A"

Lock order report

  Dumping classes for SCC 1:
  SCC Class sxlock/tutorial/latch_C
  SCC Class sxlock/tutorial/latch_B
  SCC Class mutex/tutorial/mutex_A

  Dumping nodes for SCC 1:
  SCC Node sxlock/tutorial/latch_C:+X Girth 5
  SCC Node sxlock/tutorial/latch_C:-SX Girth 5
  SCC Node sxlock/tutorial/latch_B:-X Girth 5
  SCC Node sxlock/tutorial/latch_B:+S Girth 5
  SCC Node mutex/tutorial/mutex_A Girth 5

  -- Note that these nodes exist, but are not part of the SCC: --
  Node sxlock/tutorial/latch_C:-X
  Node sxlock/tutorial/latch_C:+SX
  Node sxlock/tutorial/latch_C:+S
  Node sxlock/tutorial/latch_C:-S
  Node sxlock/tutorial/latch_B:+X
  Node sxlock/tutorial/latch_B:+SX
  Node sxlock/tutorial/latch_B:-SX
  Node sxlock/tutorial/latch_B:-S

  Dumping arcs for SCC 1:
  SCC ARC FROM "sxlock/tutorial/latch_C:+X" TO "sxlock/tutorial/latch_C:-SX" -- MICRO ARC
  SCC ARC FROM "sxlock/tutorial/latch_C:-SX" TO "mutex/tutorial/mutex_A"
  SCC ARC FROM "sxlock/tutorial/latch_B:-X" TO "sxlock/tutorial/latch_C:+X"
  SCC ARC FROM "sxlock/tutorial/latch_B:+S" TO "sxlock/tutorial/latch_B:-X" -- MICRO ARC
  SCC ARC FROM "mutex/tutorial/mutex_A" TO "sxlock/tutorial/latch_B:+S"

  -- Note that these arcs exist, but are not part of the SCC: --
  ARC FROM "sxlock/tutorial/latch_B:-SX" TO "sxlock/tutorial/latch_B:-X"

  SCC 1/1 summary:
   - Class Size 3
   - Node Size 5
   - Arc Count 3
   - Girth 5
   - Circumference 5

The graphical representation of a shared exclusive lock is as follows:

Using this graphical representation, the SCC reported can be represented as:

There is a cycle of girth five involving the three nodes:

mutex_A
latch_B
latch_C

By code review, this is correct:

func_red() will hold mutex_A, blocking func_blue()
func_blue() will hold a SX lock on latch_C, blocking func_green()
func_green() will hold a X lock on latch_B, blocking func_red()

As can be found by reading the diagram, there are several options to break the deadlock:

the order of locks between mutex_A and latch_C can be reversed, breaking the loop,
func_green() could avoid taking a X lock on latch_B, and use a SX lock only. This breaks the cycle because a S lock (from func_red()) and a SX lock (from modified func_green()) are not blocking each others, making the graph loop free.

As a site note about the notation used, see how the recursive lock taken by func_green() in latch_B is represented: An "-SX" -> "-X" arc means that the already given SX lock ("-SX") is promoted to a given X lock ("-X").