WL#7305: Improve MDL scalability by using lock-free hash

Affects: Server-5.7 — Status: Complete

Scalability of MDL subsystem for workloads where MDL_map_partition::m_mutex
becomes bottleneck can be improved by changing MDL_map to use lock-free hash
instead of our normal HASH implementation protected by mutex.

More importantly such a transition opens the way to making MDL acquisition for
DML statements lock-free.

Draft patch implementing the latter step (which includes transition to lock-free
hash) has shown promising performance results in preliminary benchmarks (e.g. in
some cases performance was on par with server version running with MDL disabled).

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-4.html

Functional:

FR1) --metadata_locks_cache_size and --metadata_locks_hash_instances startup
     parameters should no longer have any effect and should be marked as
     deprecated/non-working in help message. Attempt to specify their values
     leads to deprecation warning in error log.

Non-functional:

NF1) The only non-functional requirement is that there should not be any
     significant performance regression in standard SysBench tests.

None needed. See LLD for design.

The idea
========
Note that the main benefit of this task is that it opens the way
for implementing WL#7306 which brings significant improvement to
MDL performance/scalability in some scenarios.

The basic idea behind this task is to change MDL_map implementation
to use LF_HASH instead of partitioned HASH container, where each
partition is protected by individual mutexes.

Nice results of such a change:

- Since on systems with atomic support LF_HASH is lock-free we can
  remove MDL_map_partition::m_mutex and potential concurrency
  bottleneck associated with it.
- For the same reason it doesn't make sense to partition LF_HASH.
  So we can return to scheme with one hash for the whole MDL_map
  and remove MDL_map_partition class and mdl_locks_hash_partitions
  start-up parameter.
- Thanks to the fact that LF_HASH is integrated with LF_ALLOCATOR
  and uses per-thread hazard pointers to avoid objects in the hash
  to be deleted immediately after they were looked up, we can get
  rid of all MDL_map/MDL_lock machinery responsible for reference
  counting (i.e. MDL_lock::m_ref_usage/m_ref_release/m_version).
- We also won't need MDL_map_partition::m_unused_locks_cache as
  LF_ALLOCATOR has its own mechanism for caching objects which are
  expensive to create/destroy.

Since it is tricky to use LF_HASH with objects of different types
stored in LF_ALLOCATOR, implementing the above changes will require
getting rid of MDL_object_lock/MDL_scoped_lock dichotomy. This can
be done by moving out their differences to strategy class referenced
from MDL_lock object by pointer.


Details
=======

Changes to LF_HASH/LF_ALLOCATOR
-------------------------------
LF_HASH/LF_ALLOCATOR combination is more suited for storing POD types
and not classes, so it needs to be adjusted to properly support storage
of MDL_lock objects. The problem is that LF_HASH assumes that during
insert operation is to OK to initialize at least initial part of the
object stored in hash using memcpy() and thus this initial part should
have POD-property. The remainder of object is initialized and
deinitialized by special LF_ALLOCATOR hooks so it can be non-POD
(e.g. contain mutexes). Due to optimization, objects which were deleted
from the hash can be reused for newly inserted objects without passing
through deinit/init hooks, so one needs to ensure that non-POD part of
object deleted from LF_HASH is always in pristine shape.

Here we have a choice:
a) either to adjust MDL_lock to make it suit the above pattern (i.e. keep 
POD part at the beginning, move non-POD part to the tail and ensure it
returns to initial state at the point when object is deleted from the hash)
b) or extend LF_HASH implementation to use a hook instead of memcpy for
finalizing initialization during insert.

Note that to implement option a) cleanly we, AFAIU, need to make MDL_lock
a standard-layout class (in addition to making its initial part trivially
copyable. Currently it is not standard-layout). Particularly, this means
that we need to ensure that all classes used by MDL_lock (including list
templates!) have standard-layout.

Also we should consider adjusting LF_HASH to support user-specified hash
function (e.g. MurmurHash3) as it is done for HASH container.

Changes to MDL_lock
-------------------
As it was mentioned above in addition/order to do the above changes we
need to get rid of MDL_scoped_lock/MDL_object_lock dichotomy. This is
doable since we no longer need MDL_object_lock specific members related
to MDL_map_partition::m_unused_locks_cache and since all MDL_lock's
virtual functions can be moved to helper strategy class, referenced
from MDL_lock. In fact, instead of having hierarchy consisting of
abstract strategy and two descendant classes for concrete strategies
we can have a single strategy class, which can contain pointers to
static member functions of MDL_lock class and array members for
MDL_lock::m_granted_incompatible arrays and alike + two instances of the
class for the concrete strategies. Such approach with hand-implemented
vtable actually allows to save a few memory dereferences when accessing
strategy functions/members.

Here is example illustrating this idea.

Current code looks like this:

class MDL_lock
{
public:
  ...
  virtual const bitmap_t *incompatible_granted_types_bitmap() const = 0;
  ...
  virtual bitmap_t fast_path_granted_bitmap() const = 0;
  ...
};

class MDL_scoped_lock : public MDL_lock
{
public:
  ...
  virtual const bitmap_t *incompatible_granted_types_bitmap() const
  {
    return m_granted_incompatible;
  }
  ...
  virtual bitmap_t fast_path_granted_bitmap() const
  {
    return m_fast_path_granted_packed_count ?
           MDL_BIT(MDL_INTENTION_EXCLUSIVE) : 0;
  }
  ...
private:
  static const bitmap_t m_granted_incompatible[MDL_TYPE_END];
};

And it is converted to:

class MDL_lock
{
public:
  ...
  const bitmap_t *incompatible_granted_types_bitmap() const
  {
    return m_strategy->m_granted_incompatible;
  }
  ...
  bitmap_t fast_path_granted_bitmap() const
  {
    return m_strategy->fast_path_granted_bitmap(this);
  }
  ...

  static bitmap_t fast_path_granted_bitmap(const MDL_lock *lock)
  {
    return lock->m_fast_path_granted_packed_count ?
MDL_BIT(MDL_INTENTION_EXCLUSIVE) : 0;
  }
  ...
private:
  ...
  MDL_lock_strategy *m_strategy;
};

class MDL_lock_strategy
{
public:
  ...
  bitmap_t m_granted_incompatible[MDL_TYPE_END];
  ...
  bitmap_t (*fast_path_granted_bitmap) (const MDL_lock *lock);
};

MDL_lock_strategy scoped_lock_strategy =
{
  ...
  { // m_granted_incompatible
    MDL_BIT(MDL_EXCLUSIVE) | MDL_BIT(MDL_SHARED),
    MDL_BIT(MDL_EXCLUSIVE) | MDL_BIT(MDL_INTENTION_EXCLUSIVE), 0, 0, 0, 0, 0, 0,
    MDL_BIT(MDL_EXCLUSIVE) | MDL_BIT(MDL_SHARED) | MDL_BIT(MDL_INTENTION_EXCLUSIVE)
  },
  ...
  &MDL_lock::scoped_fast_path_granted_bitmap,
  ...
};


Changes to MDL_context
----------------------
Each thread accessing or updating LF_HASH needs to allocate an appropriate
number pins (aka hazard pointers) from its LF_ALLOCATOR pinbox and pass
references to these pins to LF_HASH functions.

This is easy to achieve if we will add a MDL_context::m_pins member for
storing pointer to pins object and allocate pins before trying to call
any of LF_HASH functions (and store in this member for later reuse).
Indeed, pins should be released back to pinbox when MDL_context object
is destroyed.

Changes to MDL_map::find_or_insert()
------------------------------------
Changes to this function is fairly straighforward and can be described
with the following pseudo-code:

MDL_map::find_or_insert(key)
{
  // Code handling MDL_lock objects for GLOBAL and COMMIT namespaces

retry:
  while ((lock= lf_hash_search(hash, pins, key)) != NULL)
  {
    /*
      MDL_lock for key isn't present in hash, try to insert new object.
      This can fail due to concurrent inserts.
    */
    lf_hash_insert(hash, pins, new_object_for_key);
  }

  /*
    We have pointer to MDL_lock for the key here which is pinned, so it
    can't be deleted until we unpin it (but it can be marked as destroyed).
  */
  lock(lock->m_rwlock);

  /*
    Check if hash contained object marked as destroyed or it was
    marked as such while it was pinned by us.
  */
  if (lock->is_destroyed)
  {
    unlock(lock->m_rwlock);
    /*
      We can't unpin object earlier as lf_hash_delete() might have
      been called for it already and so LF_ALLOCATOR is free to
      deallocate it once unpinned.
    */
    unpin(pins);
    goto retry;
  }
  /*
    Object was not marked as destroyed.
    Since it can't be deleted from hash and deallocated until this
    happens we can unpin it and work with it safely while
    MDL_lock::m_rwlock is held.
  */
  unpin(pins);

  return lock;
}

Note that above pseudo-code is simplified and doesn't take into account the
fact that MDL_lock objects for GLOBAL and COMMIT namespaces are statically
allocated singletons and thus don't need unpinning.

Changes to MDL_map::remove()
----------------------------
Again changes to this function can be described by the following
pseudo-code:

MDL_map::remove(lock)
{
  // Code handling MDL_lock objects for GLOBAL and COMMIT namespaces

  /*
    Mark the object as destroyed, after this point other threads
    can't rely on object being around unless it is pinned.
  */
  lock->m_is_destroyed= true;

  unlock(lock->m_rwlock);

  /*
    Even though other threads can't rely on MDL_lock being around
    once MDL_lock::m_destroyed is set, we know that it was not
    removed from the hash yet (as it is responsibility of current
    thread) and thus was not deallocated.
    And since lf_hash_delete() finds and pins the object for the
    key as its first step and keeps pins until its end it is safe
    to use MDL_lock::key as parameter to lf_hash_delete().
  */
  lf_hash_delete(hash, pins, lock->key);
}

User visible changes
--------------------
Since we no longer partition MDL_map and don't use caching for MDL_lock
objects metadata_locks_cache_size and metadata_locks_hash_instances
parameters become unnecessary. The are no longer have any effect,
marked and documented as deprecated (and emit appropriate warnings)
and should be removed eventually.