WL#3726: DDL locking for all metadata objects

Affects: Server-5.5   —   Status: Complete

NOTE: This task is needed for implementation of foreign
keys. Progress reports for this work can be found in
WL#148.


Implement DDL locking for all metadata objects.
This is necessary to solve BUG#25144 "replication / binlog with view breaks".
Resolution of the architecture meeting in Stockholm on BUG#25144 and BUG#989:
Extend open_cache to be:

1) transaction-wide
2) supports all objects types:
  * base tables
  * views
  * stored procedures and functions
  * time zone tables
  * databases
3) designed to contain only relevant data

Version: 5.1 (changed to version 6.0 after discussion; this change is considered
too risky for 5.1 at this late stage).

This worklog only implements item 2) and 3).
Please see WL#4284 for support of metadata locks across multiple statements in a
transaction.
Table of contents
=================

Introduction

Part I -- The new MDL API.
 A. The new API
   A-1 Main data structures
   A-2 Initialization and destruction
   A-3 Acquirer's primitives to work with lock objects
   A-4 Acquisition of locks
   A-5 Releasing locks
   A-6 Utility functions
 B. Deployment of the API.
 C. Metadata lock on a view.

Part II -- Transactional locks.

Appendix A. Future work.

Introduction
============

This worklog task consists of two parts.

Part I introduces a new API for metadata locking.
Its goal is to encapsulate locking functionality in a standalone
and extensible module. This part also aligns implementation of
auto-repair with the locking protocol supported by the system
(for details see Deployment of the API).
It generalizes the concept of a metadata lock and
allows to lock any object type, including views and stored
programs. Finally, it fixes BUG#25144, which root cause is in
absence of metadata locking for views.

Part II introduces transactional metadata locks, i.e. extends life
span of a metadata lock from one statement to one transaction.
It's primary goal is to fix BUG#989, which occurred because
currently MySQL keeps no metadata locks for tables used in
a transaction.  Note: Part II is not yet ready and is not
submitted for a review.

Please refer to MySQL internals manual for the description of the
current system and terminology:
http://forge.mysql.com/w/images/0/0a/Mdl.pdf.

Part I Metadata locking API.
============================

A. The new API.
---------------

Implementation of metadata locks is moved from a table cache into
a standalone module that has no dependencies on other parts of the
system.
The interface of the module is clearly defined: functions are
provided to acquire shared and exclusive locks and release locks.
The current locking protocol and algorithms, however, are kept
mostly unchanged: improvement of locking algorithms is not 
the goal of this task.

Since MySQL metadata locking relies on a number of external
invariants to resolve deadlocks, the module communicates with the
rest of the server by means of 2 identified external functions,
used at particular moments of lock acquisition.

Correct acquisition of metadata locks ensures consistency of the
data dictionary, statement and row based replication. (See
chapter "What metadata locks protect" in metadata locking
documentation).
The metadata locks are acquired for every operation on the
database state. They are intended to be acquired first order
during statement execution and thus the API requires nothing but
object names to operate (compare with the current API which
requires a TABLE_LIST instance to represent a lock request).

A-1 Main data structures
------------------------

/**
  Context of the owner of metadata locks. I.e. each server connection
  has such a context.
*/
struct MDL_CONTEXT;

/**
  A pending lock request or a granted metadata lock.
  A lock is requested or granted based on a fully
  qualified name and type. E.g. for a table the key
  consists of <0 (=table)>++.
  Later in this document this triple will be referred to
  simply as "key" or "name".
*/
struct MDL_EL;

/**
  The lock context. Created internally for an acquired lock.
  For a given name, there exists only one
  MDL_LOCK instance, and it exists only when the lock has been
  granted. Can be seen as an MDL subsystem's version of
  TABLE_SHARE.
*/
struct MDL_LOCK;

A-2 Initialiaztion and destruction
----------------------------------

/**
  Initialize the metadata locking subsystem.

  This function is called at server startup.

  In particular, initializes the new global mutex and
  the associated condition variable: LOCK_mdl and COND_mdl.
  These locking primitives are implementation details of the MDL
  subsystem and are private to it.

  Note, that even though the new implementation adds acquisition
  of a new global mutex to the execution flow of almost every SQL
  statement, the design capitalizes on that to later save on 
  look ups in the table definition cache. This leads to reduced
  contention overall and on LOCK_open in particular.
  Please see the description of mdl_acquire_shared_lock() for details.
*/
void mdl_init();

/**
  Release resources of metadata locking subsystem.

  Destroys the global mutex and the condition variable.
  Called at server shutdown.
*/
void mdl_destroy();
/**
  Initialize a metadata locking context.
  This is to be called when a new server connection is created.
  This is a no-cost operation to just reset to NULL the list
  of lock requests that this originator currently has.
*/
void mdl_context_init(MDL_CONTEXT *context);

/**
  Destroy metadata locking context.

  Assumes and asserts that there are no active or pending locks
  associated with this context at the time of the destruction.

  Currently does nothing. Asserts that there are no pending
  or satisfied lock requests. The pending locks must be released
  prior to destruction. This is a new way to express
  the assertion that all tables are closed before
  a connection is destroyed.
*/
void mdl_context_destroy(MDL_CONTEXT *context);

A-3 Acquirer's primitives to work with lock objects
---------------------------------------------------

/**
  Initialize a lock request.

  This is to be used for every lock request.

  Note that initialization and allocation are split
  into two calls. This is to allow flexible memory management
  of lock requests. Normally a lock request is stored
  in statement memory (e.g. is a member of struct TABLE_LIST),
  but we would also like to allow allocation of lock
  requests in other memory roots, for example in the grant
  subsystem, to lock privilege tables.

  The MDL subsystem does not own or manage memory of lock
  requests. Instead it assumes that the life time of every lock
  request encloses calls to mdl_acquire_shared_lock() and
  mdl_release_locks().

  @param  mdl        Pointer to an MDL_EL object to initialize
  @param  key_buff   Pointer to the buffer for key for the lock request
                     (should be at least strlen(db) + strlen(name)
                     + 2 bytes, or, if the lengths are not known,
                     MAX_DBNAME_LENGTH)
  @param  type       Id of type of object to be locked
  @param  db         Name of database to which the object belongs
  @param  name       Name of of the object

  Stores the database name, object name and the type in the key
  buffer. Initializes mdl_el to point to the key.
  We can't simply initialize mdl_el with type, db and name
  by-pointer because of the underlying HASH implementation
  requires the key to be a contiguous buffer.

  Suggested lock types: TABLE - 0 PROCEDURE - 1 FUNCTION - 2
  Note that tables and views have the same lock type, since
  they share the same name space in the SQL standard.
*/

void mdl_init_lock(MDL_EL *mdl_el, char *key_buff, int type, const
                   char *db, const char *name);

/**
  Allocate and initialize one lock request.
  Same as mdl_init_lock(), but allocates the lock and the key buffer
  on a memory root. Necessary to lock ad-hoc tables, e.g.
  mysql.* tables of grant and data dictionary subsystems.

  @param  type       Id of type of object to be locked
  @param  db         Name of database to which object belongs
  @param  name       Name of of object
  @param  root       MEM_ROOT on which object should be allocated

  @retval 0      Error
  @retval non-0  Pointer to an object representing a lock request
*/

MDL_EL *mdl_alloc_lock(int type, const char *db, const char *name,
                       MEM_ROOT *root);
/**
  Add a lock request to the list of lock requests of
  the context.

  The procedure to acquire metadata locks is:
  - allocate and initialize lock requests
  (mdl_alloc_lock())
  - associate them with a context
  (mdl_add_lock())
  - call mdl_acquire_shared_lock()/mdl_release_lock()
  (maybe repeatedly).

  Associates a lock request with the given context. Note, that
  since we continue using pre-locking and try-and-back-off
  techniques, this function asserts that there are no granted
  locks when a new lock request is added to the context.

  @param  context    The MDL context to associate the lock with.
                     There should be no more than one context per
                     connection, to avoid deadlocks.
  @param  lock       The lock request to be added.
  @param  exclusive  Type of the lock requested
*/
void mdl_add_lock(MDL_CONTEXT *context, MDL_EL *lock, bool exclusive);

/**
   Clear all lock requests in the context (clear the context).

   Disassociates lock requests from the context.
   All granted locks must be released prior to calling this
   function.

   In other words, the expected procedure to release locks is:
   - mdl_release_locks();
   - mdl_remove_all_locks();

   We could possibly merge mdl_remove_all_locks() and
   mdl_release_locks(), but this function comes in handy
   when we need to back off: in that case we release all
   the locks acquired so-far but do not free them, since
   we know that the respective lock requests will be used
   again.

   @param context Context to be cleared.
*/
void mdl_remove_all_locks(MDL_CONTEXT *context);

A-4 Acquisition of locks
------------------------

/**
   Try to acquire one shared lock.

   Unlike exclusive locks, shared locks are acquired one by
   one. This is interface is chosen to simplify introduction of
   the new locking API to the system. mdl_acquire_shared_lock()
   is currently used from open_table(), and there we have only one
   table to work with.

   In future we may consider allocating multiple shared locks
   at once.

   This function must be called after the lock is added to
   a context. There must be no prior acquired exclusive lock in
   the context.

   @param lock  Lock request object for lock to be acquired

   @retval  FALSE   success
   @retval  TRUE    a conflicting lock exists. Another attempt
                    should be made after releasing all current
                    locks and waiting for conflicting lock go
                    away (using mdl_wait_for_locks()).
*/

bool mdl_acquire_shared_lock(MDL_CONTEXT *context, MDL_EL *lock);

Pseudocode:
{
  

  let mdl_lock = look up an MDL_LOCK in the HASH
                 using object name
  if (mdl_lock is found)
  {
    if (there is an exclusive lock, granted or pending)
    {
      /*
        We have other locks granted on this name
        and must back off.
      */
      return TRUE;
    }
  }
  else
  {
    insert a new MDL_LOCK into the HASH;
  }
  Append the lock to the list of granted locks in MDL_LOCK
  return FALSE;
}

/**
  A callback to the server internals that is used to address
  special cases of the locking protocol.
  Invoked when acquiring an exclusive lock, for each thread that
  has a conflicting shared metadata lock.

  This function:

  - aborts waiting of the thread on a data lock, to make it notice
    the pending exclusive lock and back off.
  - if the thread has an open HANDLER to the table in question,
    closes the cursor of the HANDLER
  - if the thread is an INSERT DELAYED thread, sends it a KILL
    signal to terminate it.

  @note This function does not wait for the thread 
  to give away its locks. Waiting is done outside for all threads
  at once.

  @param in_use The thread to wake up

  @retval  TRUE  if the thread was woken up
  @retval  FALSE otherwise (e.g. it was not waiting for a table-level lock).

  @note It is one of two places where border between MDL and the rest of
        the server is broken.
*/

extern bool notify_thread_having_shared_lock(THD *in_use)


/**
  Acquire exclusive locks. The context must contain the list of
  locks to be acquired. There must be no granted locks in the
  context.

  This is a replacement of lock_table_names(). It is used in
  RENAME, DROP and other DDL SQL statements.

  @param context  A context containing requests for exclusive locks
                  The context may not have other lock requests.
*/
void mdl_acquire_exclusive_locks(MDL_CONTEXT *context);

Pseudocode:
{
  

  foreach (mdl_el: request for an exclusive lock from the context)
  {
    Assert: the request is for an exclusive lock.

    let mdl_lock = look up an MDL_LOCK in the HASH using object name

    if (mdl_lock is NULL)
    {
      add a new mdl_lock instance to the cache
    }
    append the lock request to the list of pending exclusive locks
  }

  while (there is some competing shared or exclusive lock)
  {
    foreach (mdl_el: request for an exclusive lock from the context)
    {
      let mdl_lock = mdl_el->lock;
      if (there are active readers in mdl_lock)
      {
        foreach (mdl_el_dep: reader in mdl_lock)
          notify_thread_having_shared_lock(mdl_el_dep->thd);
      }
      else if (there is a pending exclusive lock request ahead of
               us in the lsit or a granted exclusive lock)
        need_wait= TRUE;
    }
    if (need_wait)
      wait_on(LOCK_mdl, COND_mdl);
  }
  foreach (mdl_el: request for an exclusive lock)
  {
    move the lock from "pending list" to "granted list";
  }
}

/**
  Upgrade a shared metadata lock to exclusive.

  Used in ALTER TABLE, when a copy of the table with the
  new definition has been constructed.

  @pre To avoid deadlocks, this must be the only lock in the
  context.

  @param context Context to which shared long belongs
  @param type    Id of object type
  @param db      Name of the database
  @param name    Name of the object
*/
void mdl_upgrade_shared_lock_to_exclusive(MDL_CONTEXT *context, int type,
                                          const char *db, const char *name);

Pseudocode:
{
  

  let mdl_lock = look up the lock object by name

  add lock type to the list of pending exclusive locks
  remove lock type from the list of granted shared locks

  while (there is some competing shared or exclusive lock)
  {
    if (there are active readers in mdl_lock)
    {
      foreach (mdl_el_dep: reader in mdl_lock)
        notify_thread_having_shared_lock(mdl_el_dep->thd);
    }
    else if (there is a pending exclusive lock request ahead of us or
             a granted exclusive lock)
      need_wait= TRUE;
    if (need_wait)
      wait_on(LOCK_mdl, COND_mdl);
  }
  move the lock from "pending list" to "granted list";
}

/**
  Try to acquire an exclusive lock on the object if there are
  no conflicting locks.

  Similar to the previous function, but returns
  immediately without any side effect if encounters a lock
  conflict. Otherwise takes the lock.

  This function is used in CREATE TABLE ... LIKE
  to acquire a lock on the table to be created.
  In this statement we don't want to block and wait for the lock if
  the table already exists.

  @param context The context containing the lock request
  @param lock    The lock request

  @retval FALSE the lock was granted
  @retval TRUE  there were conflicting locks.
*/
bool mdl_try_acquire_exclusive_lock(MDL_CONTEXT *context, MDL_EL *lock);

/**
  Wait until there are no old versions of tables in the
  table definition cache for the locks that we try to
  acquire.

  This is the second point of interaction of the meta-data
  locking with the rest of the server.

  It is used by mdl_wait_for_locks(), which is called
  by the acquirer after a backs off.

  In other words, if mdl_acquire_shared_lock() returns
  TRUE the acquire must call:
  - mdl_release_locks()
  - mdl_wait_for_locks()
  and then try again.

  @param names Names of tables to be waited for
*/
extern void tdc_wait_for_old_versions(List *names);

/**
  Wait until there will be no locks that conflict with lock
  requests in the context.

  This is a part of the locking protocol and must be
  used by the acquirer of shared locks after a back-off.

  Does not acquire the locks!

  @param context Context with which lock requests are associated.
*/
void mdl_wait_for_locks(MDL_CONTEXT *context);

A-5 Releasing locks
-------------------

/**
   Release all locks associated with the context,
   but leave them in the context as lock requests.

   This function is used to back off in case of a lock
   conflict. It is also used to release shared locks
   in the end of an SQL statement.

   @param context The context with which the locks to be
                  released are associated.
*/
void mdl_release_locks(MDL_CONTEXT *context);

Pseudocode:
{
  
  foreach (mdl_el: lock in the context)
  {
    if (the lock is granted)
      move the lock from "granted list" to "pending list";
  }
  broadcast(COND_mdl);
}

/**
   Release all exclusive locks associated with context.
   Removes the locks from the context.

   @param context Context with exclusive locks.

   @note Shared locks are left intact.
*/
void mdl_release_exclusive_locks(MDL_CONTEXT *context);

/**
  Release a lock.
  Removes the lock from the context.

  @param context Context containing lock in question
  @param lock    Lock to be released
*/
void mdl_release_lock(MDL_CONTEXT *context, MDL_EL *lock);
/**
  Downgrade all exclusive locks in the context to
  shared.

  @param context A context with exclusive locks.
*/
void mdl_downgrade_exclusive_locks(MDL_CONTEXT *context);

A-6 Utility functions
--------------------

/**
  Associate pointer to an opaque object with a lock.

  @param l             Lock request for the lock with which the
                       object should be associated.
  @param cached_object Pointer to the object
  @param release_hook  Cleanup function to be called when MDL subsystem
                       decides to remove lock or associate another object.

 This is used to cache a pointer to TABLE_SHARE in the lock
 structure. Such caching can save one acquisition of LOCK_open
 and one table definition cache lookup for every table.

 Since the pointer may be stored only inside an acquired lock,
 the caching is only effective when there is more than one lock
 granted on a given table.

 This function has the following usage pattern:
 - try to acquire an MDL lock
 - when done, call for mdl_get_cached_object(). If it
 returns NULL, our thread has the only lock on this table.
     -  look up TABLE_SHARE in the table definition cache
     -  call mdl_set_cache_object() to assign the share
     to the opaque pointer.

 The release hook is invoked when the last shared metadata
 lock on this name is released.
*/

void mdl_set_cached_object(MDL_EL *l, void *cached_object,
                           mdl_cached_object_release_hook release_hook);

/**
  Get a pointer to an opaque object that associated with the lock.

  @param  l Lock request for the lock with which the object is
            associated.

  @return Pointer to an opaque object associated with the lock.
*/
void *mdl_get_cached_object(MDL_EL *l);


Part I. B. Deployment of the API
================================

The new API obsoletes the following functions:

- lock_table_name(), lock_table_names(), unlock_table_names(),
  lock_table_names_exclusively(), lock_table_name_if_not_cached(),
  lock_and_wait_for_table_name(),
  table_cache_insert_placeholder(),
  is_table_name_exclusively_locked_by_this_thread(),
  mysql_wait_completed_table(), get_name_lock(),
  release_name_lock()


- since the locking role of the table cache was moved
  to the MDL subsystem, it became possible to merge the table
  cache and the table definition cache. All open tables
  for a given name are attached to the respective TABLE_SHARE.
  A list of unused TABLE instances is still maintained
  in unused_list and is purged on FIFO basis.

  Used and unused TABLE elements are now attached to their
  respective TABLE_SHAREs, i.e. elements of the table definition
  cache.  The following structures and functions were removed:
  open_cache, table_cache_key(), table_cache_init(),
  table_cache_destroy(), table_cache_free(),
  remove_db_from_cache() TABLE_SHARE elements do not have to be
  looked up in the table definition cache in all  cases: they are
  attached to the metadata lock if there is one by means of
  mdl_{get,set}_cached_object() API.

  This addresses the performance bottleneck identified
  in BUG#33948.

- auto-repair implementation is rewritten to not acquire
  an exclusive metadata lock from within open_table().
  Since acquisition of an exclusive lock must wait for
  other threads to give away their locks, theoretically
  this could lead to deadlocks. Even though it's practically
  impossible to trigger this situation with auto-repair, the
  change was done to fullfill the assertions of the implemented
  locking protocol. 
  Instead of starting auto-repair, open_table() now returns
  a special return code, that causes the thread close
  all currently open tables, repair the table in question and
  try to reopen the tables after that.

Part I. C. A metadata lock on a view
====================================

Since views and base tables share the same name space according
to the SQL standard, a metadata lock on a view has the same
type (0) as one on a table. Acqusition and release primitives
are identical as well.

Part II. Transactional locks
============================



Appendix A: Future work
=======================

Most todo from this list items will be addressed in separate
tasks.  They are collected here to provide a roadmap
for metadata locking.

1. INFORMATION_SCHEMA and NDB cluster binlog invokes NDB auto-discovery
frivolously: ndbcluster_find_files() is called while keeping other
metadata locks. This may lead to deadlocks.

Auto-discovery should be re-written to not be invoked while keeping
other metaddata locks.

Requires architecture work and prototyping.

2. Auto-discovery code invoked inside open_unireg_entry() (Data Dictionary,
addition of a table to the table definition cache).

The same disease as in item 1.

DONE

3. TRUNCATE implementation. The current code first accesses the
data dictionary to detect the storage engine of the subject table
on hand, and only then acquires an exclusive metadata lock. This
is an opportunity for a race condition and must be eliminated.

A minor code change that can be done in a separate patch.

4. ALTER VIEW implementation. A similar gap is present
between an access to the data dictionary to establish the table type
(a view or a base table) and an exclusive metadata lock.

A minor code change that can be done in a separate patch.

5. INFORMATION_SCHEMA should use MDL subsystem to acquire
special metadata locks on used tables. A new lock type will be
introduced for this purpose.

A fairly substantial change but has been approved by the
architecture board in Orlando. Requires specification and
implementation.

DONE

6. LOCK TABLE t1 WRITE should acquire an shared-for-upgrade
lock type on the table.

When this is done, FLUSH TABLE or DROP TABLE under LOCK TABLES
can upgrade the lock type to EXCLUSIVE without having to
abort transactions, simply by waiting for all shared locks to go
away).

A minor change that can be addressed in a separate patch.
DONE

7. It should be possible to lock a table with LOCK TABLE statement
that does not yet exist.

At the same time, it should be impossible to use tables outside the
set locked with LOCK TABLES, even for DDL.

Example:

LOCK TABLE t1 WRITE, t2 WRITE; 
DROP TABLE t1;
ALTER TABLE t2 RENAME t1;
-- should be allowed (currently fails during ALTER)

LOCK TABLE t1;
DROP TABLE t2;
ALTER TABLE t1 RENAME t2;
-- should be prohibited

A minor change that can be addressed in a separate patch.
Since is an incompatible change, must be done in an alpha.
Should be done together with item 6.

8. Support for metadata locks for a database (schema), tablespace,
  user.

In order to support RENAME DATABASE and proper operation
with tablespaces, MySQL will need to implement metadata
locks on schemas, tablespaces, and other compound
objects.

Two alternative solutions have been identified:
 * do not support hierarchies inside MDL subsystem. If a database
 is used in a statement, lock it in shared mode just as if it
 was any other object type (table, procedure, view).

 * represent the list of locks requested or granted as a hierarchy
 or a tree. Allow to lock parts of this tree.

Requires prototyping and an approval of architects.
Will be addressed in a separate worklog task.