WL#441: Use a thread pool to serve user requests

Affects: Server-6.0 — Status: Complete

Description
High Level Architecture
Low Level Design

-- Background 
 
The current MySQL design works as follows with respect to how 
threads are used: 
 
- A main thread loops over the socket for incoming connections. 
- When a new connection is made, it's accept():ed to get the  I/O 
socket. 
- A new THD is allocated and initialized with this socket. 
- A new thread is created to serve this socket. (Actually, with a 
thread cache, it might reuse an existing thtread.) 
- This thread then repeatedly serves commands from the client. 
- When the connection is closed, the thread is terminated (or 
turned over to the thread cache for reuse). 
 
This is simple and in effect a way to avoid multiplexing over all 
connections (using select/poll) to serve requests (which would also 
require non-blocking I/O in order to handle high throughput). 
 
However, there are some serious drawbacks: 
 
- There will be as many threads as there are active connections. 
- Creating threads (and killing them) is, in the thread context, an 
expensive operation. 
 
The first cons is the biggest problem since each thread requires a 
certain amount of system (and kernel) resources. Among others, each 
thread has its own stack. To accomodate a large number of 
simultanous connections, the stack size has to be kept small, up to 
a point where either it's too small or the server consumes 
enourmous amounts of memory. Or, at some point we will run out of 
some other (kernel) resource, or the scheduling overhead will start 
to cause problems. (And blaming the kernel is not really fair: An 
improved kernel or scheduler can only push the limit slightly 
further, but not solve the actual problem.) 
 
In short: The above design does not scale. 
 
 
-- How it should be 
 
In general, when designing a threaded application, you do not want 
to have an unlimited number of threads, you want to have the number 
of threads under strict control. 
 
A common design is to have a pool of threads serving requests. This 
pool can be tuned to a suitable number proportional to the number 
of CPUs you have available. (Exactly how many per CPU depends on 
the applications; how much internal serialization and wait states 
you have among the threads, but usually it's a quite small number.) 
In servers where each connection has a very short lifetime, say in 
a web server, this is simple. The threads in the pool simply loop 
over a mutex wrapped accept(), each accepting a new connection, 
handle it, and then return to the mutex. 
 
 
-- A proposal 
 
In the MySQL case, however, we have the problem that connections 
might last for a very long (or indefinite) time, so the above 
solution obviously doesn't work. Instead I propose the following 
solution: 
 
- One (main) thread loops over a select/poll where one (or more) of 
the sockets is the one for new connections. 
- If a new connection was made, accept it and add the new socket to 
the set of sockets. 
- Input on the other sockets is read and the incomming messages 
each assembled. (Since we should use non-blocking I/O for 
performance, we can have partial requests in the making here.) 
- Each complete request is passed on to a pthread condition 
variable protected queue. 
 
- A (fixed) pool of service threads waits for new requests on the 
queue. 
- When a new requests is received in the queue, one of the threads 
will grab it and handle it, and then return to grab new requests 
from the queue. 
 
Note that what identifies a particular connection is the THD 
struct, so this what we pass through the queue to the worker 
threads in the pool. 
 
There are some variations on this theme: 
 
- The thread pool is created at boot and never changed; the number 
is of course configurable. But it's possible to make this tunable 
in run-time if we want. (Being able to grow the thread pool without 
having to shut down could be desirable.) 
 
- It's possible to have more than one "main" thread doing the I/O, 
if the I/O is believed to be a bottleneck in large, high-load 
installations. However, this also causes some problems, with load 
balancing among other things. A first step should be limited to 
just having one main thread. 
 
 
Issues: 
- There might be a problem with limitations of the FD set size in 
select and/or poll on some systems. 
- This is all under the assumption that the I/O can be done 
non-blocking. 
- It's always possible that I overlooked something in the current 
mysqld structure that makes all this difficult or impossible...

We should create a general interface that can be used to have many
different thread handling designs. All code in MySQL and mysys that
are hard coupled with threads (like thr_alarm and thr_lock) should be
made losely coupled so we can change physical threads between calls.

Initially we have should have implementations for Solaris (using
port_create, from the event completion framework) and windows (using
IOCP)..

Later we can enhance it to other areas (like epoll under Linux?)

Create general 'scheduler' object that is the main interface to the
different different thread handling designs. It will be a global object,
setup once at at start, that mainly is used as dispatch functions:

class scheduler_functions
{
public:
  uint max_threads;
  /*
    To initialize the scheduler object. This is called once.
    If the return value is 1, we need to abort mysqld
  */
  bool (*init)(void);
  /*
    Called to setup the thread.  Normally this calles the global function
    init_new_connection_handler_thread().
  */
  bool (*init_new_connection_thread)(void);
  /* Tell scheduler that we have a new connection */
  void (*add_connection)(THD *thd);
  /* Tell scheduler that we have want to kill a connection */
  void (*post_kill_notification)(THD *thd);
  /* Called by handle_one_connection() when we want to end a connection */
  bool (*end_thread)(THD *thd, bool cache_thread);
  /* Free all resources */
  void (*end)(void);
};


THD is extended with a contained class:

class thd_scheduler, that different scheduling implementations can use
to store states associated with one connection or statement.  (This is
to free the scheduler functions to have to maintain memory for holding
states)


- Added 'my_thread_id' as a typedef for our internal thread identifier to be
  used when identifying an internal working thread. In other words, a physical
  thread may over time do the work for many 'internal threads'.

- Changed some mysys functions to make them independent of the physical thread.
  (This allows us to switch physical thread more eaasily)

  - Make thr_alarm_kill() to depend on thread_id instead of thread
    (Safer as we can't accidently abort the wrong thread)

  - Renamed 'my_thread_id()' function to my_thead_dbug_id()
    (to not conflict with the new 'my_thread_id' typedef)

  - Make thr_abort_locks_for_thread() depend on thread_id instead of thread.
    (Safer as we can't accidently abort the wrong thread)

- MySQL SQL functions GET_LOCK and RELEASE_LOCK now works on
  my_thread_id instead of pthread_t

- my_thread_dbug_id() (former my_thread_id()), now always returns our own
  thread id (makes it easier to debug thing as things are now the more the
  same over different platforms).


Other cleanups done at the same time:

- Moved --one-thread to be handled trough the scheduler infrastructure.
  (Less code, faster code and fewer #ifdefs than before)

- Move all connection setup code from sql_parse.cc to sql_connect.cc
  (Should have done it a long time ago)

- Ensure that 'pseudo_thread_id' is setup the same way in every place.

- Removed thd->block_signals, as these where never used.


Implementation details:

mysqld.cc:
- Add global variable 'thread_scheduler' that contains dispatch functions
  for thread scheduling.  By using this for creating threads,
  initialization new connections etc we can easily have many different way
  to schedule query execution.

- Split create_new_thread() into reusable sub functions:
   - create_thread_to_handle_connection() and
     one_thread_per_connection_end() to handle the old
     one-thread-per-connection scheduling.

- Split 'end_thread()' into 'unlink_thd()' and 'cache_thread()'

- Added option --thread-handling=# that can take options:
  one-thread-per-connection, no-threads or pool-of-threads

- Added option --pool-of-threads, to specify how many threads should be
  created if we are using --thread-handling=pool-of-threads.


sql_class.cc:
- In THD::store_globals(), we set my_thread_var->id to be thd->thread_id.
  'thd->thread_id' is the 'connection specific identifier' that is unique
   over the whole connection.

  We are in effect renaming the physical thread to be associcated with the
  connection until the next call to store_globals().
  This allows us, in a pool-of-threads environment, to use different
  physical threads to execute different statements for the same connection.

  During one statement one thread is associated with one THD and one
  mysys_var object.

- Use new interfaces to thr_alarm_kill().

- Send thread-kill signal to scheduler (if not slave threads, which are not
  handled through the scheduler)

sql_class.h:

- Remove thd->dbug_thread_id
- thd->real_id is now only usable for debugging; One should not use this
  for anything else as this may change value during the lifetime of the
  connection.
- 'thd->thread_id' is now a connection identifier.
- Added 'thd->scheduler' as an element to THD, to allow different scheduler
  implementation to use THD to store it's states.


sql_connect.cc:
- Split handle_one_connection() into multiple functions that can be called
  separately by the scheduler functions.


sql_show.cc:
- Don't send pthread_kill() to threads to detect if they exists.
  (Not that useful and causes problems with future thread_handling code)


scheduler.h:

- Declarations needed for scheduler.cc and also for the thd_scheduler, which
  is different for each schduler implementation.

scheduler.cc:

- Implementations of the following schedulers:

- one-thread-per-connection
- no-threads
- pool-of-threads ; Implemented with event completion framework (Solaris)
- pool-of-threads ; Implemented with IOCP (windows)