Innodb Performance Schema data lock instrumentation
Data lock iterators
To provide content to the performance_schema.data_locks table, innodb implements Innodb_data_lock_iterator.
Likewise, table performance_schema.data_wait_locks is populated with Innodb_data_lock_wait_iterator.
Both these iterators need to return the data present in the innodb engine memory, which imply to take the proper mutex locks when inspecting it. The structure to inspect here is the transaction list (trx_sys)
How to implement this scan is critical for performances.
No full scan
Consider this implementation:
- Take all necessary locks
- Scan all the innodb internal locks
- Report all of them to the performance schema
- Release all the locks taken
This implementation materializes the entire table.
The benefits with this approach are:
- The materialized table is consistent
The problems with this approach are:
- The innodb engine is frozen for the entire duration, for a time that is unpredictable.
- Memory consumption spikes, without bounds
- Materializing all rows upfront is incompatible with supporting an index
For example with N = 10,000 transactions, a single scan reports all 10,000 transaction locks.
This alternative is rejected.
No single row scan
Consider this implementation:
- Take all necessary locks
- Resume the scan on innodb internal locks for 1 record
- Report this record to the performance schema
- Release all the locks taken
This implementation returns a row for a single transaction, or even a single lock, at a time.
The benefits with this approach are:
- Memory consumption is well bounded, and low.
The problems with this approach are:
- Data reported can be very inconsistent.
- Implementing a restartable scan, on a very dynamic structure, without holding any lock, is complex.
- Even assuming how to implement a scan is resolved, looping N times to find element i, i+1, i+2 ... in a list ends up having a complexity in O(N^2), consuming CPU.
For example with N = 10,000 transactions, the trx_list would be scanned 10,000 times to return 1 record each time. The total number of operations on the list is 100 Millions.
This alternative is rejected.
Restartable batch scan
What is implemented is:
- As lock sys data structures are sharded with each shard having own latch, we inspect the shards one by one to avoid latching whole lock system
- We first process table locks, then record locks
- Table locks are processed one table at a time
- Record locks are processed one internal hash table bucket at a time
This is a compromise, with the following properties:
- Memory consumption is bounded, by the number of locks in each bucket and on each table.
- The duration of mutex locks on innodb structures is bounded by the number of locks in each bucket and on each table.
- The data returned is not consistent, but at least it is "consistent by chunks"