WL#7304: Improve MDL performance and scalability by implementing "fast-path" for DML locks

Affects: Server-5.7   —   Status: Complete

Since typical user workload/workload used in typical benchmarks consists of DML
statements it makes sense to improve performance/scalability by optimizing MDL
subsystem for such type of statements.

One of possible approaches is to implemenent "fast-path" for metadata locks
acquired by DML statements, which will convert acquisition/release of DML lock
(S, SH, SW, SR locks) into counter increment/decrement (under protection of
MDL_lock::m_rwlock) instead of more complex code involving list manipulation.
(Indeed this means that MDL lock acquisition for DDL becomes more complex).

Benchmarking of draft patch implementing this idea shown that it provides at
least 10% performance improvement in single-table OLTP_RO/POINT_SELECT SysBench
tests.

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-4.html
There are no functional requirements for this task as user visible functionality
is not supposed to be changed by it.

NF1) The only non-functional requirement is performance improvement for
POINT_SELECT/OLTP_RO tests of 1-table SysBench/InnoDB tables/multi-core machine.
None needed. See LLD for design.
General idea
============

General idea is to split all lock types for each of MDL namespaces
in two sets:

A) "DML" lock types
   1) Each type from this set should be compatible with all other
      types from the set (including itself).
   2) These types should be common for DML operations

Our goal is to optimize acquisition and release of locks of this type
by avoiding complex checks and manipulations on m_waiting/m_granted
sets/lists and replacing it with a check of and increment/decrement
of integer counters. We will call the latter type of acquisition/release
"fast path". Use of "fast path" should reduce the size of critical
section associated with MDL_lock::m_rwlock lock in the common case
and thus increases scalability.

2) "DDL" lock types
   1) Granted or pending lock of those type is incompatible with some
      other lock (including itself).
   2) Not common for DML operations

These locks have to be always acquired in the old fashion - involving
manipulations with m_waiting/m_granted sets/lists, i.e. using "slow
path". Moreover in the presence of active/pending locks from "slow"
set we have to acquire even locks of "DML" type using "slow path".

"DML" and "DDL" lock sets
===========================

1) For GLOBAL/COMMIT and SCHEMA namespaces (i.e. namespaces with lock
   represented by MDL_scoped_lock class):

   "DML" locks set consists of IX lock type.
   "DDL" locks set consists of S and X lock types.

   Note that DML statements that change data acquire only IX locks in
   GLOBAL/COMMIT namespaces so the above makes perfect sense.

2) For all other namespaces (i.e. represented by MDL_object_lock class):

   "DML" locks set consists of S, SH, SR and SW locks.
   "DDL" locks set consists of SU, SNW, SNRW and X.

   Again normally DML statements (including queries to I_S and prepare
   phase for prepared statements) acquire locks only from "DML" set.


Main code transformation
========================

MDL_lock object gets two new members:

- MDL_lock::m_ddl_locks_granted_pending_count - number of granted
  or pending locks of "DDL" types. Necessary to quickly verify that
  we can grant "DML" locks without further checking.

- MDL_lock::m_fast_path_granted_count - packed counter of number of
  granted locks of specific "DML" type which were granted using
  fast-path algorithm and not using "slow path".

  For namespaces using MDL_scoped_lock we can use 60 bits of 64-bit
  variable to count number of IX locks (we will use remaining 4
  bits in future MDL work).

  For namespaces using MDL_object_lock we can use chunks of 20 bits
  for S, SR and SW locks. SH locks do not need separate counter
  as they differ from S only in case of presence of "DDL" locks,
  in which case fast path optimization can't be used.
  We could have used separate counters for each type but this
  would complicate future switch to lock-free MDL implementation.
  (Again 4-bits reserved for future use).

  I assume that overflow is not an issue as we are far away from
  handling 2^20 concurrent connections.

The above two members are to be protected by MDL_lock::m_rwlock lock.
In future we will switch to updating MDL_lock::m_fast_path_granted_count
using atomic primitives (see WL#7306 and WL#7305, which is prerequisite
for the former).


Essentially we replace:

1) addition/removal of "DML" lock to MDL_lock::m_granted list during
   lock acquisition/release with and incrementing/decrementing of
   corresponding part of m_fast_path_granted_count.
2) check for granted or pending locks which conflict with "DML" lock
   is replaced with check on m_ddl_locks_granted_pending_count counter.

We still allocate MDL_ticket objects for requests which are satisfied
using fast path algoritm, but we mark them using MDL_ticket::m_is_fast_path
member, so we know that such ticket can be released in simplified fashion
without need to remove from MDL_lock::m_granted list.

In some cases we will need to do so called "materialization" of fast path
tickets which will clear this flag, add ticket to appropriate m_granted
list and decrement corresponding m_fast_path_granted_count counter under
protection of MDL_lock::m_rwlock. These cases are marked by (***) below.

Let us analyze places which can be affected by the above transformations:

1) The fact that we no longer include "DML" lock in some cases into
   MDL_lock::m_granted affects:
   
   a) Naturally, MDL_context::try_acquire_lock_impl() and
      MDL_context::release_lock() where this logic is to be implemented.
      
   b) MDL_lock::is_empty() should take value of m_fast_path_granted_count
      into account.
   
   c) Manipulation with MDL_lock::reschedule_waiters() is not affected -
      since we only add locks to m_granted there.
      
   d) MDL_lock::can_grant_lock():
      *) In addition to m_granted.bitmap() we should take into account
         contents of m_fast_path_granted_count when looking at granted
         locks.
      *) For code determining if incompatible lock belongs to same context
         to work properly we need to consider two cases:
         I) We are trying to acquire "DDL" lock. In this case for check to
            work we need to materialize "fast path" locks in advance.
            (e.g. consider acquiring X after S). (***)
         II) We are acquiring "DML" lock. Since such type of lock can only
             conflict with locks from "DDL" set and such locks will always
             present in "m_granted" explicitly, there is no issue here.
             (e.g. consider acquiring SW after SNW).

   e) MDL_lock::clone_ticket() needs to be adjusted to take into account
      "fast path" tickets.

   f) MDL_object_lock::notify_conflicting_locks().
      Nowadays this method does something only for tickets belonging to
      contexts with MDL_context::m_needs_lock_thr_abort set. I.e. contexts
      with open HANDLERs. We can always use slow path/materialize tickets
      for such contexts. (***)
   g) MDL_scoped_lock::notify_conflicting_locks().
      Can be removed after removal of INSERT DELAYED functionality.
   h) MDL_context::upgrade_shared_lock(). Needs to be adjusted to take into
      account both that new lock can be of "DML" type and that lock which
      is upgraded can be "fast path" lock.
   i) MDL_lock::visit_subgraph(). For deadlock detection to work properly
      we need to "materialize" all tickets belonging to context which is
      about to start waiting in MDL or other cases covered by our deadlock
      detector. (***)
   j) MDL_ticket::downgrade_lock(). We downgrade only "DDL" locks so we
      don't need to do anything about "m_granted" here.

2) The fact that we need to properly reflect number of active and pending
   "DDL" locks in m_ddl_locks_granted_pending_count might affect the
   following places where we modify MDL_lock::m_granted and
   MDL_lock::m_waiting:

   a) MDL_context::try_acquire_lock_impl() and MDL_context::release_lock()
      need to increment and decrement this counter when appropriate.
   b) MDL_lock::reschedule_waiters() is not affected as we only make
      pending lock granted there.
   c) MDL_lock::clone_ticket() should increment
      m_ddl_locks_granted_pending_count if "slow" type of lock ticket
      is cloned.
   d) MDL_context::upgrade_shared_lock() should be extended to handle
      m_ddl_locks_granted_pending_count when upgrading from "slow" ticket.
   e) MDL_ticket::downgrade_lock() should be extended to decrement
      m_ddl_locks_granted_pending_count if we are downgrading to "DML" type
      of lock.
   f) MDL_context::acquire_lock() should be extended to increment
      m_ddl_locks_granted_pending_count when we are adding or removing "DDL"
      type of lock to/from m_waiting list.

The above items can be used as a draft plan for unit tests covering the
suggested changes.