WL#3262: Add mutex lock order checking to the server

Affects: Server-8.0   —   Status: Complete

Rationale

The MySQL server is a multi threaded application, which uses numerous locks.

The list of locks present in the server is constantly evolving, as new features are implemented and code re factored for performance improvements.

As with any multi threaded application using locking primitives, there is always a risk of encountering a dead lock during execution, when multiple locks are held at once.

For MySQL, the effect of a deadlock is catastrophic, and causes a complete loss of service.

The goal of this task is to provide a methodology, and tooling to support it, to enforce that the runtime execution is free of deadlocks.

Methodology

An oriented graph is defined globally for the server, which documents a lock order for each lock.

In this graph, an "A -> B" edge represents that lock B can happen after lock A.

This graph represents the server design with regards to locks, and as such is a deliverable checked in with the source code in git.

To prevent dead locks during execution:

  • the graph needs to be acyclic.
  • code paths taken during the server execution must comply with the documented graph

Tooling

Some tooling, available to developers in DEBUG builds, helps to ensure that:

  • the graph defined is indeed free of cycles.
  • the runtime execution of the server does comply with the graph.

Contents


Build

NF-1.1

A new CMAKE option, "WITH_LOCK_ORDER", is used to build the server from source, including the LOCK_ORDER tooling.

When building without lock order, the server is unchanged.

Configuration

NF-2.1

In mysqld, a new system variable, "lock-order", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable controls whether to enable or disable the tool during runtime.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

Note that when lock-order=OFF, none of the related lock-order-xxx variables have an effect.

NF-2.2

In mysqld, a new system variable, "lock-order-output-directory", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to define where logs are written to.

Scope: Global Access: Read-only Type: String Default value: empty.

NF-2.3

In mysqld, a new system variable, "lock-order-dependencies", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to specify the path to file lock_order_dependencies.txt.

Scope: Global Access: Read-only Type: String Default value: empty.

NF-2.4

In mysqld, a new system variable, "lock-order-extra-dependencies", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to specify the path to a second dependency file. This is useful to provide separately the dependency graph for third party code.

Scope: Global Access: Read-only Type: String Default value: empty.

NF-2.5

In mysqld, a new system variable, "lock-order-print-txt", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to perform graph analysis and print a textual report.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

NF-2.6

In mysqld, a new system variable, "lock-order-trace-loop", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to print a trace in the log file, when the tool encounter at runtime a dependency flaged as a loop.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

NF-2.7

In mysqld, a new system variable, "lock-order-debug-loop", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to cause a debug assert, when the tool encounter at runtime a dependency flaged as a loop.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

NF-2.8

In mysqld, a new system variable, "lock-order-trace-missing-arc", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to print a trace in the log file, when the tool encounter at runtime a dependency not declared.

Scope: Global Access: Read-only Type: Boolean Default value: ON

NF-2.9

In mysqld, a new system variable, "lock-order-debug-missing-arc", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to cause a debug assert, when the tool encounter at runtime a dependency not declared.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

NF-2.10

In mysqld, a new system variable, "lock-order-trace-missing-unlock", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to print a trace in the log file, when the tool encounter at runtime a lock that is destroyed while still held.

Scope: Global Access: Read-only Type: Boolean Default value: ON

NF-2.11

In mysqld, a new system variable, "lock-order-debug-missing-unlock", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to cause a debug assert, when the tool encounter at runtime a lock that is destroyed while still held.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

NF-2.12

In mysqld, a new system variable, "lock-order-trace-missing-key", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to print a trace in the log file, when the tool encounter at runtime an object not properly instrumented with the performance schema.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

NF-2.13

In mysqld, a new system variable, "lock-order-debug-missing-key", is defined. This system variable only exist in WITH_LOCK_ORDER builds. This variable is used to cause a debug assert, when the tool encounter at runtime an object not properly instrumented with the performance schema.

Scope: Global Access: Read-only Type: Boolean Default value: OFF

NF-2.14

In mysql-test-run, a new option, "lock-order", is defined. This option allows to enable or disable the tool when executing a test script.

Execution

NF 3.1

During runtime, when mysqld is started with "lock-order-print-txt", the lock order graph is analyzed, and cycles are reported.

NF 3.2

During runtime, when mysqld is started with "lock-order", and when mysqld encounter a "A -> B" lock sequence that is not declared in the graph, an error is raised.

Documentation

NF 4.1

Developer documentation, covering basic usage of the tool and the methodology used, is delivered in doxygen pages.

Source Configuration Management

NF 5.1

The initial graph representing the server lock order is a deliverable, checked in with the source code.

High-Level Specification

There will be a unique graph.

This graph is oriented: an edge going from A to B means that it is ok to lock B when holding a lock on A. This graph is acyclic: a cycle in the graph reflects a bug in the design. Relations in this graph are not transitive:

  • if edges (A --> B) and (B --> C) exists, the edge (A --> C) may or may not exist.
  • If locking C while holding a lock on A is considered valid, an explicit A --> C edge should exist.

The graph is explicitly declared up-front.

In particular, edges "A --> B" must be explicitly declared.

At runtime, when locking a mutex X, the code checks for every mutex M held by the current thread that there is an explicit M --> X edge in the graph, or prints a message in a trace file and/or asserts.

The check happens in real time when mutex M is locked, so that failure to comply with the pre defined graph are detected on the spot -- not post mortem at mutex destroy or server shutdown -- which gives the opportunity for developers to have access to the exact query and call stack that caused the illegal lock order.

By virtue of being explicitly declared up front, the graph is unique for the entire code base, not only for a given server run.

In particular, this enforces that test 1 executed with server run 1 and test 2 executed with server run 2 do both comply with the *same* dependency graph. Executing any post mortem analysis based on data collected during a server run only does not provide this assurance.

For example, the "main" test suite when executed with mysql-test-run can execute around 600 tests while starting a server 100 times to do that, and this is not taking into account other tests suites.

There is a huge difference between:

(a) enforcing 100 times that on average 6 tests are compatible between

 themselves, using a graph that may or may not be compatible with
 the graph of a different server run.

(b) enforcing 600 times that every test is compatible with the unique,

 explicit, dependency graph.

Any solution that only provides (a) does not resolves the problem.

By virtue of being explicitly declared up-front, the dependency graph also serves as design documentation, which is critically needed.

By being enforced at runtime, this documentation is by definition up to date with the code.

This solution also addresses a problem related to management of software development.

Assuming for example that there is a common code used by 10 different teams to implement 10 different projects in parallel, that each add a new mutex, and that the combined code at the end creates a graph between the 10 new/existing mutexes:

(c) when doing post mortem analysis to find loops in a non documented graph, the last team that merges to common code repository is the team blamed for closing a loop, and the team that get to cancel a project.

(d) when doing analysis to find loops in a pre documented graph, code review and in particular design review should catch when each team is creating a suspicious dependency, and design issues should be resolved in the code that caused the problem (team X), not in the code that got to find the problem (the last one to merge in case c).

Any solution that only provides (c) does not resolves the problem: in fact, it makes the code base even worse.

Implementation:

Nodes in the graph are objects already instrumented for the performance schema.

This includes mutexes, but also rwlocks, that can benefit from this.

This includes every kind of mutex, not only safe_mutex: code that implements it's own mutex / rwlock / prlock / other can still benefit from the dead lock detection (namely, innodb).

Nodes in the graph are not mutex instances, but classes, identified by the "sql/LOCK_open" name already assigned by the instrumentation to mysql_mutex_t LOCK_open;

The mutex dead lock detector is *another* implementation of the instrumentation interface, that collects data with PSI_server->xxx() calls, which are already in place.

In other words, there are zero code changes for the existing code already instrumented for the performance schema.

Other benefits of this approach is that not only mutex deadlocks can be detected, but cases of poorly scaling code can also be detected.

For example, it may be deemed acceptable to hold mutex "mutex/sql/LOCK_open" while performing file IO on "file/sql/FRM", so that the is an edge between these nodes in the graph, but it may be un acceptable to hold mutex "mutex/sql/LOCK_thread_count" while writing to a file: the absence of an edge here will cause the code enforcing the graph order to complain when this is attempted at runtime, automatically pointing out areas in the code that could be revisited to gain performances.

See for example bugs like http://bugs.mysql.com/bug.php?id=41158 DROP TABLE holds LOCK_open during unlink()

A complete graph with both mutexes and files, with documented edges, can prevent this type of bugs.

Interface Specification

The lock order graph is declared in a text file, with its own language.

The full syntax is documented in the doxygen documentation (a deliverable), so not repeating it all here.

Example of graph, located in file <root>/mysql-test/lock_order_dependencies.txt

malff@linux-8edv:mysql-test> grep "mutex.sql.LOCK_open" lock_order_dependencies.txt
ARC FROM "mutex/alog/Filter_database::mutex" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/group_rpl/LOCK_recovery_donor_selection" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/innodb/dict_sys_mutex" TO "mutex/sql/LOCK_open" FLAGS LOOP
ARC FROM "mutex/p_dyn_loader/key_component_id_by_urn_mutex" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/key_mts_temp_table_LOCK" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/LOCK_event_queue" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/LOCK_global_system_variables" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/archive/Archive_share::mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/blackhole/blackhole"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/csv/tina"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/federated/federated"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/dict_sys_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/dict_table_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/innobase_share_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/rtr_active_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/rw_lock_debug_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/rw_lock_list_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/srv_sys_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/innodb/sync_array_mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/myisam/MYISAM_SHARE::intern_lock"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/mysys/KEY_CACHE::cache_lock"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/mysys/THR_LOCK_heap"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/mysys/THR_LOCK_myisam"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/mysys/THR_LOCK_open"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/session/LOCK_srv_session_threads"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/DEBUG_SYNC::mutex"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/LOCK_plugin_delete" FLAGS LOOP
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/LOCK_plugin" FLAGS LOOP COMMENT "Bug#28786704"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/LOCK_status" FLAGS LOOP
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/MDL_wait::LOCK_wait_status"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/TABLE_SHARE::LOCK_ha_data"
ARC FROM "mutex/sql/LOCK_open" TO "mutex/sql/THD::LOCK_current_cond" FLAGS LOOP
ARC FROM "mutex/sql/LOCK_open" TO "sxlock/innodb/dict_table_stats"
ARC FROM "mutex/sql/LOCK_open" TO "rwlock/sql/CRYPTO_dynlock_value::lock"
ARC FROM "mutex/sql/LOCK_open" TO "rwlock/sql/LOCK_system_variables_hash"
ARC FROM "mutex/sql/LOCK_open" TO "rwlock/sql/MDL_context::LOCK_waiting_for"
ARC FROM "mutex/sql/LOCK_open" TO "rwlock/sql/MDL_lock::rwlock"
ARC FROM "mutex/sql/LOCK_plugin_install" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/LOCK_plugin" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/LOCK_reset_gtid_table" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/LOCK_table_cache" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/Master_info::data_lock" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/Master_info::run_lock" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_commit" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_index" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/MYSQL_BIN_LOG::LOCK_log" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/MYSQL_RELAY_LOG::LOCK_log" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/Relay_log_info::data_lock" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/Relay_log_info::run_lock" TO "mutex/sql/LOCK_open"
ARC FROM "mutex/sql/tz_LOCK" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/components/LOCK_dynamic_loader" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/components/LOCK_registry" TO "mutex/sql/LOCK_open"
ARC FROM "sxlock/innodb/dict_operation_lock" TO "mutex/sql/LOCK_open"
ARC FROM "sxlock/innodb/fts_cache_rw_lock" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/channel_lock" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/gtid_commit_rollback" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/gtid_mode_lock" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/LOGGER::LOCK_logger" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/MDL_context::LOCK_waiting_for" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/MDL_lock::rwlock" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/THR_LOCK_servers" TO "mutex/sql/LOCK_open"
ARC FROM "rwlock/sql/Trans_delegate::lock" TO "mutex/sql/LOCK_open"
BIND "cond/sql/COND_open" TO "mutex/sql/LOCK_open"

This list gives all the arcs going to and from LOCK_open.

During runtime execution, if the tool finds a mutex X that with:

  • lock(X)
  • followed by a lock(mutex/sql/LOCK_open)

the tool will raise an error if X is not declared with a "X -> mutex/sql/LOCK_open" arc.

Likewise, if the tool finds a mutex X that with:

  • lock(mutex/sql/LOCK_open)
  • followed by lock(X)

the tool will raise an error if X is not declared with a "mutex/sql/LOCK_open -> X" arc.