Innodb Performance Schema data lock instrumentation

Data lock iterators

To provide content to the performance_schema.data_locks table, innodb implements Innodb_data_lock_iterator.

Likewise, table performance_schema.data_wait_locks is populated with Innodb_data_lock_wait_iterator.

Both these iterators need to return the data present in the innodb engine memory, which imply to take the proper mutex locks when inspecting it. The structure to inspect here is the transaction list (trx_sys)

How to implement this scan is critical for performances.

No full scan

Consider this implementation:

Take all necessary locks
Scan all the innodb internal locks
Report all of them to the performance schema
Release all the locks taken

This implementation materializes the entire table.

The benefits with this approach are:

The materialized table is consistent

The problems with this approach are:

The innodb engine is frozen for the entire duration, for a time that is unpredictable.
Memory consumption spikes, without bounds
Materializing all rows upfront is incompatible with supporting an index

For example with N = 10,000 transactions, a single scan reports all 10,000 transaction locks.

This alternative is rejected.

No single row scan

Consider this implementation:

Take all necessary locks
Resume the scan on innodb internal locks for 1 record
Report this record to the performance schema
Release all the locks taken

This implementation returns a row for a single transaction, or even a single lock, at a time.

The benefits with this approach are:

Memory consumption is well bounded, and low.

The problems with this approach are:

Data reported can be very inconsistent.
Implementing a restartable scan, on a very dynamic structure, without holding any lock, is complex.
Even assuming how to implement a scan is resolved, looping N times to find element i, i+1, i+2 ... in a list ends up having a complexity in O(N^2), consuming CPU.

For example with N = 10,000 transactions, the trx_list would be scanned 10,000 times to return 1 record each time. The total number of operations on the list is 100 Millions.

This alternative is rejected.

Restartable batch scan

What is implemented is:

Take all necessary locks
Resume the scan on innodb internal locks, for a given record range
Report all the records in the range to the performance schema
Release all the locks taken

This is a compromise, with the following properties:

Memory consumption is bounded, by the number of records returned in each range.
The duration of mutex locks on innodb structures is bounded by the number of records in each range
The data returned is not consistent, but at least it is "consistent by chunks"
The overall scan complexity is (N/RANGE)^2, where RANGE is the range size. This is still technically O(N^2), but in practice should be reasonable.

For example with N = 10,000 transactions and RANGE = 256, there are 40 batches at the trx list, where each batch reports (up to) 256 trx, with the trx locks. The total number of operations on the list is 400 thousands.