WL#6407: Code reorganization to avoid race condition between server main thread and the kill server thread

Affects: Server-5.7   —   Status: Complete

Lot of issues were fixed in the past related to server shutdown and
ready_to_exit flag. It is also believed to be the root cause of some unexplained
spurious failures in automated tests.
Refer Bug#11763896 for more information.

Currently the Server Main Thread and the Kill Server Thread are synchronized
based on a ready_to_exit flag. Once this flag is set, both threads perform 
cleanup operations concurrently which sometimes leads to double free of resources.
 
Currently, resource cleanup(clean_up() and mysqld_exit()) is common code for
abort and shutdown. Cleanup resources is done from main thread, signal thread or
kill server thread and the activities are distributed among these threads making
it vulnerable to sporadic issues. 

This worklog comprises the following activities

a)Reorganize code such that cleanup of resources is performed only by main
thread, instead of kill server thread.

b)In storage/perfschema/pfs_lock.h, assert for double free is relaxed(refer 
function allocated_to_free()). Remove ready_to_exit flag check in this function. 
One goal of this worklog is to enable currently disabled code in
shutdown_performance_schema and cleanup_performance_schema, and to remove all
the PFS-related valgrind suppression patterns.

c)Lot of Mutexes/Read-Write Locks, Condition variables are defined in global
scope(in mysqld.cc) and acquire/release mutex is spread everywhere in the code.
This leads to multiple mutexes being used to guard same global variable, wrong
order of acquisition leading to deadlock, etc. In order to avoid these issues,
these pthread primitives should be encapsulated with the global object which it
guards, rather than being acquired/released all over the code. 

User Documentation
==================

This is internal work only. No user docs required.
Existing Signal Handling Mechanism
----------------------------------
Signals such as SIGINT, SIGTERM, SIGQUIT are handled by a separate thread 
namely signal handling thread (signal_thread) whereas SIGSEGV, SIGABRT, 
SIGILL, SIGFPE are handled using old style signal handling mechanism by 
registering a signal handler namely handle_fatal_signal().

Signals are blocked by mysqld main thread and all threads including client 
handling service threads created by mysqld main thread inherit this signal mask
except signal handling thread which explicitly waits on these signals using
sigwait() function.

Existing shutdown Mechanism
---------------------------
Signal handling thread is created using DETACHABLE flag and synchronization 
between signal handling thread, kill server thread and mysqld main thread for 
cleanup is done using ready_to_exit flag

On receipt of termination signals, signal handling thread creates kill server 
thread (using DETACHABLE flag) which is responsible for performing cleanup of 
resources and shutdown of ancillary daemon threads.
clean_up() function sets ready_to_exit flag such that mysqld main thread shall 
continue to perform rest of the cleanup operations and exit. 

If shutdown request is sent from client(mysqladmin), kill_mysql() function is 
called which inturn sends termination signals to signal handling thread to 
initiate shutdown.
  
poll() and select() are called in blocking mode because of which abort_loop 
flag (set by signal handling thread or kill_mysql() function during server
shutdown) is never checked and does not break accept loop in
handle_connections_sockets() function.
Inorder to exit from the loop, either signal handling thread or kill server
thread, closes/shutdown server socket in close_connections() function.
   
unireg_abort() is called from kill server thread for signals other than SIGINT
and SIGTERM from signal handling thread. The unireq_abort() function is designed
to be called only from the main thread in case of error during server startup.
If called from any other thread, it may end up executing mysqld_exit() function
concurrently along with the main thread leading to double free of mutexes and
other resources handled by mysqld_exit() function

In mysql5.1, kill_server() is registered as signal handler for all signals, 
hence probability of this race condition is high in case of 5.1 source base 
compared to 5.5 and trunk.


Changes to Shutdown Mechanism
------------------------------
The ready_to_exit flag is removed to avoid cleanup operations being performed by
more than one thread. This is ensured by creating signal handling thread using 
PTHREAD_JOINABLE flag and join it with mysqld main thread before cleanup 
operations are performed by mysqld main thread. For Windows platform, shutdown
handler thread is created using PTHREAD_JOINABLE flag and joined with mysqld
main thread before cleanup operations are performed by mysqld main thread.

clean_up() function which is executed from signal handling thread or kill server
thread is moved to the server main thread such that thread which allocates
resources will perform cleanup activities rather than signal handler thread or
kill server thread. 

kill server thread is not required and will be removed as close_connections()
call will be called by signal thread on receipt of termination signal. On
Windows, it will be called by shutdown handler thread.

After changes, high level thread flow and functionality would be as follows

   mysqld main thread
          |
          |
      init_resources(mutex,etc)
          |
          | - - - - - - - - - - - - - ->create signal thread
          |                                     |
          |                               Wait for signal
   create_handlers/slaves/etc                   |
          |                             close_connections()
          |                                     |
    join signal thread<- - - - - - - - - - exit thread
          |
     cleanup_resources()
          |
      mysqld_exit()
        

After signal handling thread exits, fatal signals such as SIGSEGV 
will not be deregistered and will be available for debugging issues
during shutdown.

      
Other Modifications
--------------------
The mysql implementation of pthread_join() for windows does not work when the
thread to be joined is already finished.
Two new functions will be added which will redirect to pthread functions in case
of non Windows OS and emulate the following behavior in case of Windows.

  mythread_create() function will create thread using _beginthreadex 
                    and returns the handle.

  mythread_join()   will take handle as input and wait for the thread to  
                    finish its work and join.

The above two functions will be used only while creating/joining signal handling
thread.

Modified assert in storage/perfschema/pfs_lock.h which relaxes double free 
when ready_to_exit flag is set. After changes, it will lead to an assert 
if double free occurs.

Commented cleanup code in performance schema(shutdown_performance_schema() and
cleanup_performance_schema() function) is uncommented.

kill_in_progress and shutdown_in_progress flags are removed.

Removed killed_threads status variable as it is not displayed in show status.

Impact
------
Following modules can be affected
a)operating system related code
b)performance schema
c)embedded mode

Regression testing needs to be done on above modules for all operating systems
Code Reorganization
-------------------
Following two classes namely Global_THD_manager and Blocked_thread_manager 
will be added as part of code reorganization. 

Global_THD_manager
------------------
Global_THD_manager is singleton and encapsulates access to global THD list and
associated mutexes and condition variables. It maintains set of all registered
threads(global_THD_list) and provides mutators for inserting(add_thd()) and
removing(remove_thd()) an element.
It also provides functions to find THDs and perform some action for all THDs
such as get_thd_count(), find_thd() and do_for_all_thds(). LOCK_THD_count
mutex which guards THD list object is encapsulated within this class itself
and is acquired/released when corresponding accessors or mutators are called.
COND_THD_count which is used to notify/wait when new threads are registered/
deregistered is also moved to this class.
 
Thread level statistics such as thread_created, num_thread_running are moved
from mysqld.cc to Global_THD_manager class. Mutexes and
read/write lock(thread_running_lock, LOCK_THD_count) which guards these
statistics are encapsulated in this class.

A) enum_thd_lock_type
---------------------

It is used in Global_THD_manager::add_thd() and remove_thd()
function to specify whether LOCK_THD_count should be acquired 
during the operation. It is defined as below

enum enum_thd_lock_type {
   THD_NO_LOCK=0, //do not acquire LOCK_THD_count mutex
   THD_LOCK       //acquire LOCK_THD_count mutex
};

B) Do_THD and Do_THD_Impl 
--------------------------
These two classes help in implementing Global_THD_manager::do_for_all_thd() method.
To perform some function on all thd in global thread list, user needs to
subclass Do_THD_Impl and override operator(). 

class Do_THD_Impl
{
public:
  virtual ~Do_THD_Impl() {}
  virtual void operator()(THD*) = 0;
};

class Do_THD : public std::unary_function
{
public:
  explicit Do_THD(Do_THD_Impl *impl) : m_impl(impl) {}

  void operator()(THD* thd)
  {
    m_impl->operator()(thd);
  }
private:
  Do_THD_Impl *m_impl;
};
  

In the current code, following source code perform actions on all thds in global
thread list. This is rewritten to use do_for_all_thd() method.

a) Adjust offset of binary log file for slaves(binlog.cc).
b) Count threads which are using bin log file(binlog.cc).
c) To count number of worker threads in event scheduler (event_scheduler.cc).
d) Set KILL_CONNECTION flag on all thds(mysqld.cc).
e) Close vio connection for all thds(mysqld.cc).
f) List client process information (sql_show.cc).
g) I_S on client process(sql_show.cc)
h) Collect status of all threads(sql_show.cc)


C) Find_THD and Find_THD_Impl
-----------------------------
These two classes help in implementing Global_THD_manager::find_thd() method.
To find matching THD from global thread list, user needs to subclass
Find_THD_Impl and override operator() to embed logic to find matching THD. 

class Find_THD_Impl
{
public:
  virtual ~Find_THD_Impl() {}
  virtual bool operator()(THD*) = 0;
};

class Find_THD : public std::unary_function
{
public:
  explicit Find_THD(Find_THD_Impl *impl) : m_impl(impl) {}

  bool operator()(THD* thd)
  {
    return m_impl->operator()(thd);
  }
private:
  Find_THD_Impl *m_impl;
};

Before overridden operator() implementation is called,
Global_THD_manager::find_thd() acquires LOCK_THD_count mutex. Class which
subclass Find_THD_Impl and override operator() should acquire LOCK_THD_data
mutex while holding LOCK_THD_count mutex to avoid race condition. So it should
be acquired inside operator () override method. 
Also note that, caller of Global_THD_manager::find_thd() function need to
release LOCK_THD_data mutex. Please refer sample implementation given in Section
E for more information.

In the current code, following source code search for thd from the global thread
list. This is rewritten to use find_thd() method.
a) Find zombie dump thread from the global thd list(rpl_master.cc).
b) To find thd based on the thread id for kill_one_thread() function (sql_parse.cc).


D) Global_THD_manager class
---------------------------
class Global_THD_manager
{
  public:
  static Global_THD_manager* get_instance();
  static void destroy_instance();
  
  /*
    Call func function for all thds in global thd list after
    taking local copy of global thd list. Acquires LOCK_thd_remove 
    to prevent removal from global_thd_list.
  */
  int do_for_all_thd_copy(Do_THD_Impl *func);

  // Call func function for all thds in global thd list.
  int do_for_all_thd(Do_THD_Impl *func);

  /*
    This function calls func() for all thds in global thd list 
    to find matching thd specified in func(). Returns NULL if 
    no thd matches.
    Note: 
    Class which subclass Find_THD_Impl and override operator() should
    acquire LOCK_THD_data mutex as this mutex should be acquired while
    holding LOCK_THD_count mutex to avoid race condition.
    Caller of this function need to release LOCK_THD_data mutex.
  */
  THD* find_thd(Find_THD_Impl *func);
  
  /*
    Add THD to global THD list. If lock_type is THD_NO_LOCK it assumes 
    that caller already holds LOCK_THD_count mutex.
  */
  void add_thd(THD *thd, enum_thd_lock_type lock_type);

  /*
    Remove THD to global THD list. If lock_type is THD_NO_LOCK it assumes 
    that caller already holds LOCK_THD_count mutex.
  */
  void remove_thd(THD *thd, enum_thd_lock_type lock_type);

  // Accessors/mutators for status variable thread_running.
  uint get_num_thread_running()  { return num_thread_running; }
  void inc_thread_running();
  void dec_thread_running();

  // Accessors/mutators for status variable thread_created.
  void inc_thread_created();
  ulonglong get_num_thread_created();

  // Acquire LOCK_THD_count mutex.
  void acquire_thd_lock();

  // Release LOCK_THD_count mutex.
  void release_thd_lock();

  /*
    Assert used in functions to validate LOCK_THD_count mutex is [not] held
    by caller.
  */
  void assert_if_not_mutex_owner();
  void assert_if_mutex_owner();

  // Wait on COND_THD_count
  void wait_thd();

  /*
    Perform timed wait on COND_THD_count.
    Returns zero on success, EINTR when interrupted,
    ETIMEDOUT if the absolute time specified by abstime passes
    before the condition is signaled or broadcasted.
  */
  int timed_wait_thd(struct timespec *abstime);

  // Sends broadcast to all threads waiting on COND_THD_count.
  void notify_all_thd();

  // Returns the count of items in global_THD_list.
  uint get_thd_count();

  private:
  Global_THD_manager(); //singleton
  ~Global_THD_manager();

  // Initializes condition variables and mutex.
  void init();
  void deinit();

  // Singleton instance.
  static Global_THD_manager *thd_manager;

  std::set *global_thd_list;
  uint global_thd_count;

  mysql_cond_t COND_thd_count;
  // Mutex to guard global_THD_list.
  mysql_mutex_t LOCK_thd_count;

  // Mutex used to guard removal of elements from global_thd_list.
  mysql_mutex_t LOCK_thd_remove;

  // Guards thread_running statistics.
  my_atomic_rwlock_t thread_running_lock;

  // Count of active threads which are running queries in the system.
  uint num_thread_running;

  // Cumulative number of threads created by mysqld daemon.
  ulonglong thread_created;
};

E)Find_thd_with_id class
-------------------------
Sample code to find thd from global thd list by using(implementing)
Find_THD_Impl interface and calling find_thd() method is given below

/*
  Callback function used by kill_one_thread to find thd based
  on the thread id. 
*/
class Find_thd_with_id: public Find_THD_Impl
{
public:
  Find_thd_with_id(ulong value): m_id(value) {}
  virtual void operator()(THD *thd)
  {
    if (thd->get_command() == COM_DAEMON)
      return false;
    if (thd->thread_id == m_id) 
    {
      mysql_mutex_lock(&thd->LOCK_thd_data);
      return true;
    }
    return false;
  }
private:
  ulong m_id;
};

Below code snippet can be used, in order to get the matching thd

//id represents the thread id to be searched for
Global_THD_manager *thd_manager= Global_THD_manager::get_instance();
Find_thd_with_id find_thd_with_id(id);
THD* tmp= thd_manager->find_thd(&find_thd_with_id); 
if (tmp)
{
  // Perform operations using tmp
  mysql_mutex_unlock(&tmp->LOCK_thd_data);
}

Note that LOCK_thd_data is released after calling find_thd().


F) Blocked_thread_manager class
--------------------------------

Blocked_thread_manager class is singleton and it has responsibility of managing
thread cache. Idle threads are added to thread cache and reused when work
arrives from clients. 
After servicing the client, thread calls block_thread() method to register 
itself in the cache and waits on conditional variable COND_thread_cache till
mysqld main thread calls wakeup_thread() to service new incoming client
connection. 
waiting_thd_list and statistics such as blocked_thread_count, max_cached_threads
and condition variables such as COND_thread_cache, COND_flush_thread_cache are
moved to this class from mysqld.cc

class Blocked_thread_manager
{
  public:
  static Blocked_thread_manager* get_instance();
  static void destroy_instance();
  
  // Kill all threads which are in cache. called during shutdown
  void kill_cached_threads();

  // Block idle thread to wait till work arrives
  bool block_thread();

  // Wakeup idle thread to perform some task for thd
  bool wakeup_thread(THD *thd);

  // Returns the total number of blocked threads
  uint get_num_blocked_threads()  { return blocked_thread_count; }
  
  private:
  Blocked_thread_manager(); //singleton
  ~Blocked_thread_manager();

  // Singleton instance
  static Blocked_thread_manager *blocked_thread_manager;

  // Holds THD which will be picked up by blocked thread on receipt on 
  //COND_thread_cache signal
  std::list *waiting_thd_list;

  // Condition variable on which on blocked thread waits
  mysql_cond_t COND_thread_cache;

  // Condition variable used during shutdown to stop all cached threads
  mysql_cond_t COND_flush_thread_cache;

  // Set during shutdown to stop all blocked threads
  bool kill_blocked_threads_flag;

  // Represents the number of threads to be waken up
  uint wake_thread;

  // Represents the total number of threads blocked
  uint blocked_thread_count;

  // Represents system variable thread_cache_size
  uint max_cached_threads;
};