WL#6936: Implementation of server-side statement timeout

Affects: Server-Prototype Only — Status: In-Documentation

Description
Requirements
High Level Architecture
Low Level Design

Implement server-side time limit for the execution of top-level
read-only SELECT statements. After completing specified amount
of time, the statement should be aborted without affecting
the session (connection). This feature will help in interrupting
and aborting the SELECT statements when their execution time
crosses specified amount of time limit instead of just waiting
for those queries to complete.

This work log addresses the bug
BUG#68252


Patch for this WorkLog is Submitted by Davi Arnaut, see BUG#68252.

Function Requirements:
----------------------

F-1: The execution time limit should be specified in milliseconds.
     Zero (0) is a special value meaning no time limit.
     Using of negative numbers is forbidden and must result in an error.

F-2: The execution time limit can be set per SELECT statement.

     For example:
      SELECT MAX_STATEMENT_TIME = 109 * FROM my_table;

F-3: The execution time limit can be set per connection
     (for all SELECT statements executed in a particular session).

     NOTE: setting the execution time limit per session is applicable also
     to prepared SELECT statements.

     For example:
      SET SESSION MAX_STATEMENT_TIME = 100; 
      SET GLOBAL MAX_STATEMENT_TIME = 100;   

F-4: The execution time limit can be set globally per server. The global
     value is the default value for session execution time limit.

     The default value of global execution time limit is 0, which means
     there is no execution time limit.

F-5: The MAX_STATEMENT_TIME clause takes precedence over the
     MAX_STATEMENT_TIME session variable.

     For example:
      SET SESSION MAX_STATEMENT_TIME = 100;
      SELECT MAX_STATEMENT_TIME = 200 ...;

      The SELECT statement will have MAX_STATEMENT_TIME == 200.

F-6: The execution time limit is ignored for non-read-only SELECT
     statements. A note (SQL-condition of the NOTE level) must be reported
     in that case.

     For the scope of this WL the following statements are considered
     read-only:

      - a statement that either doesn't access tables at all
        (SELECT 1+2, for example)

      - a statement that accesses tables in the read-only mode
        (i.e. only reads data)

      - "being read-only" is a property that is determined during the
        execution (runtime), not during the parsing. So, the same statement
        can be considered read-only or not depending on the external
        conditions.

        For example:

           SELECT f1();

           f1(): IF (@a = 1) THEN SELECT ... ; ELSE INSERT INTO ... END IF; 

      This requirement will be achieved by cancelling the timer if SELECT
      turned out to be read-write. In function open_tables(), we already
      have a logic check whether any stored functions of statements
      modifies data. If we find any stored function modifying data then
      timer will be reset for such SELECT statements.

F-7: The MAX_STATEMENT_TIME clause is allowed for top-level SELECT
     statements only. Using this clause in other statements must result
     in an error (either a parsing one, or a runtime one).

     The MAX_STATEMENT_TIME session variable is ignored for non-top-level
     SELECT statements.

F-8: The MAX_STATEMENT_TIME clause is not allowed in SELECT statements
     within stored programs (stored procedures, stored functions, triggers,
     events). Using the clause in such statements must result in an error
     (either a parsing one, or a runtime one). The MAX_STATEMENT_TIME
     session variable is ignored for such statements.

     NOTE: this requirement might be lifted in the future. The reason for
     having it now is that simply allowing MAX_STATEMENT_TIME would
     introduce misleading and unreliable behavior for the users as only 1st
     MAX_STATEMENT_TIME in a query is allowed. In the future we might want
     allow to reset timers for the inner SELECT statements.

F-9: The MAX_STATEMENT_TIME clause is allowed in prepared statements
     provided that it's a top-level read-only SELECT statement.

F-10:Proper error message should be reported on interrupting the
     read-only SELECT statements for exceeding the execution
     time limit.

F-11:SHOW STATUS should return the number of statements interrupted because
     of timeout.

F-12:The MAX_STATEMENT_TIME session and global variable must be available
     through SHOW VARIABLES.

Non-function Requirements:
--------------------------

NF-1: Should not have any effect on queries executed by slave
      in replication environment.

      To achieve this a new check will be introduced before setting timer
      for SELECT. If SELECT is from SLAVE thread then timer will not
      be set.

NF-2: Generally, this WL doesn't affect operation of replication.      
      
      The 'SET MAX_STATEMENT_TIME' statements and the MAX_STATEMENT_TIME
      SELECT-clause are written to binlog file. So, the replication
      between newer master and older slave will be affected (failed).
      However, using older slave and newer master is not a recommended
      scenario anyway.

NF-3: Should work on all the platforms.

NF-4: Should work with the thread pool plugin.

NF-5: Using the MAX_STATEMENT_TIME clause will affect the Query Cache
      (same statements which differ only in MAX_STATEMENT_TIME clause will
      be treated as different statements).

NF-6: When the MAX_STATEMENT_TIME clause is specified it
      - either must be in effect (i.e. the corresponding timer must be set
        up for the connection);
      - or it is ignored with a NOTE;
      - or an error is reported.

High level design overview
--------------------------

  - A new timer notification thread per server is introduced. This thread
    is responsible for notifying the thread executing statement about timer
    expiration.

  - Timer notification thread is started at the server startup time by
    calling function "my_timer_init_ext" function.

  - This thread exists till the end of the server and during server
    shutdown, function "my_timer_deinit" is called to stop  this server.

  - Timer for the SELECT statement can be set by the MAX_STATEMENT_TIME
    (timeout) SELECT-clause at statement level or by the MAX_STATEMENT_TIME
    session variable.

  - If timeout value is specified then for SELECT statement timer
    is started by calling function "thd_timer_set".

  - Notifier thread waits on the timers to expire.

  - If SELECT statement execution is completed before the timer
    expires then we cancel the timer.

  - If timer set for statement expired when query is still executing
    then the timer notifier thread executes timer notification function.
    Which interrupts query execution and sets the "THD::killed" state to
    "THD::KILLED_TIMEOUT".

  - The interrupted query exits with following error message:
    "Query execution was interrupted, max_statement_time exceeded".

NOTE: MAX_STATEMENT_TIME specified is a soft hint for a query, since there
might be small delay in notifying and interrupting an execution of
Statement. The interruption may not happen exactly at Nth millisecond.

Interface Specification:
------------------------

I-1: To specify the execution time limit with SELECT statement
     optional keyword "MAX_STATEMENT_TIME=" should be
     introduced.

     For example: SELECT MAX_STATEMENT_TIME=10 * FROM my_table;

    select_option:
              ...
              | MAX_STATEMENT_TIME_SYM EQ ulong_num
                { ... }
              ;

     Type of MAX_STATEMENT_TIME is "ulong" and allowed value range
     is 0 to ULONG_MAX. Negative values result in an error (parsing or
     runtime error). Zero means "no timeout".
 
I-2: System variable "max_statement_time" of SESSION and GLOBAL scope
     should be added.
 
     Type of MAX_STATEMENT_TIME is "ulong" and allowed value range
     is 0 to ULONG_MAX. Negative values result in an error (parsing or
     runtime error). Zero means "no timeout".

     Note:
     If variable "max_statement_time" is 0 and timeout is not
     specified at statement level also then no timer is started
     for the statement. 

I-3: The following error message should be added to report an error
     on SELECT query execution timeout.

     "Query execution was interrupted, max_statement_time exceeded"

I-4: The following files should be added for TIMER implementation,
      my_timer.h           : Declaration of timer functions.

      mysys/posix_timers.c : Timer implementation using 
                             timer_settime, timer_gettime -
                             arm/disarm and fetch state of POSIX
                             per process timer. 

      mysys/kqueue_timer.c : Timer implementation using kqueue
.
      mysys/win_timer.c    : Timer implementation for windows.

      sql/sql_timer.h      : Abstraction over timer
      sql/sql_timer.cc       implementation.


I-5: Number of timed out queries is accounted and listed by "SHOW STATUS".

I-6: No new tools added.

I-8: No new commands added.

I-9: If the MAX_STATEMENT_TIME is specified at system variable level then there
     will not be any change in the Query Cache Component behavior.
     If MAX_STATEMENT_TIME is set at statement level then also Query Cache
     works as it is until there is a change in MAX_STATEMENT_TIME value
     for queries which are semantically same but textually different.
     for example:
         stmt-1: SELECT MAX_STATEMENT_TIME=2 fld1, fld2 from table1;
         stmt-2: SELECT MAX_STATEMENT_TIME=5 fld1, fld2 from table1;

     For Query Cache Component stmt-1 and stmt-2 are different so there
     will be cache miss in this case.

1. TIMER:
----------
To time the select query execution, new timer module will be
added. All the functionalities supported by the timer module
are declared in the "my_timer.h" file. This header file is 
included by the modules actually implementing the timer.

Following are the functions declared in "my_timer.h"

  /* Initialize internal components. */
  int my_timer_init_ext(void);

  /* Release any resources acquired. */
  void my_timer_deinit(void);

  /* Create a timer object. */
  int my_timer_create(my_timer_t *timer);

  /*
    Set the time (in milliseconds) until the next expiration of
    the timer.
  */
  int my_timer_set(my_timer_t *timer, unsigned long time);

  /* Reset the time until the next expiration of the timer. */
  int my_timer_reset(my_timer_t *timer, int *state);

  /* Delete a timer object. */
  void my_timer_delete(my_timer_t *timer);

  typedef struct st_my_timer my_timer_t;
  /** Non-copyable timer object. */
  struct st_my_timer
  {
    /** Timer ID used to identify the timer in timer requests. */
    os_timer_t id;

    /** Timer expiration notification function. */
    void (*notify_function)(my_timer_t *);
  }; 

These functions are implemented in posix_timer.c and
kqueue_timer.c.  

 posix_timers.c : Timer implementation using POSIX per process 
                  timer. (using timer_settime, timer_gettime)

 kqueue_timer.c : Timer implementation using kqueue.

Depending on the timer available on system above modules are
compiled and built. 

1.1 Implementation details of posix_timer.c:
********************************************
1.1.1 my_timer_init_ext:
~~~~~~~~~~~~~~~~~~~~~~~~
This function starts the timer notification thread. The
notification thread loop algorithm is as below,

  sigset_t set;
  siginfo_t info;

  sigaddset(&set, SIGRTMIN);
  sigaddset(&set, SIGRTMIN+1);

  while(1)
  {
    if (sigwaitinfo(&set, &info) < 0)
      continue;       

    if (info.si_signo == SIGRTMIN)
    {
        my_timer *timer= info.si_value.sival_ptr;
        timer->notify_function(timer);
    }
    else (info.si_signo == SIGRTMIN + 1)
        break;
  }

1.1.2 my_timer_deinit:
~~~~~~~~~~~~~~~~~~~~~~
This function stops the timer notification thread by sending
signal "SIGRTMIN+1".
  pthread_kill(timer_notifier_thread, SIGRTMIN+1);

1.1.3 my_timer_create(my_timer_t *timer):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function creates the timer object as below,

  struct sigevent sigev;

  sigev.sigev_value.sival_ptr= timer; 
  sigev.sigev_signo= SIGRTMIN;
  ..
  return timer_create(CLOCK_MONOTONIC, &sigev, &timer->id);

1.1.4 my_timer_set(my_timer_t *timer, ulong time):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function sets the timer to "time" for the query by
calling a function.  

        timer_settime(timer->id, 0, time, NULL);

1.1.5 my_timer_reset(my_timer_t *timer, int *state):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function resets the timer set for the query by calling a
function,
        timer_settime(timer->id, 0, &zero_time_spec,
                      &old_time_spec);

If the timer is not signaled yes then set state as true. 

1.1.6 my_timer_delete(my_timer_t *timer):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function deletes the timer created by calling a function,

         timer_delete(timer->id);


1.2 Implementation details of kqueue_timer.c:
*********************************************
1.2.1 my_timer_init_ext:
~~~~~~~~~~~~~~~~~~~~~~~~
This function starts the timer notification thread. The
notification thread loop algorithm is below,
  
  struct kevent kev;      
        
  /* create a fd for event notification */
  kq_fd= kevent();

  EV_SET(&kev, 0, EVFILT_USER, EV_ADD, 0, 0, 0);
  kevent(kq_fd, &kev, 1, NULL, 0, NULL);

  while (1)
  {
    if (kevent(kq_fd, NULL, 0, *kev, 1, NULL) < 0)
      continue;

    if (kev.filter == EVFILT_TIMER)
    {
      my_timer_t *timer= kev.udata;
      timer->notify_function(timer);
    }
    else if (kev.filter == EVFILT_USER)
      break;
  }

1.2.2 my_timer_deinit:
~~~~~~~~~~~~~~~~~~~~~~
This function stops the notification thread by triggering the
event EVFILT_USER.

  struct kevent kev;

  EV_SET(&kev, 0, EVFILT_USER, 0, NOTE_TRIGGER, 0, 0);

  kevent(kq_fd, &kev, 1, NULL, 0, NULL);

1.2.3 my_timer_create:
~~~~~~~~~~~~~~~~~~~~~~
This function creates a new timer object. We will just set
my_timer_t.id in this function.

1.2.4 my_timer_set(my_timer_t *timer, ulong time):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function sets the timer to "time" for the query by adding
a event to kqueue.  

  struct kevent kev;

  EV_SET(&kev, timer->id, EVFILT_TIMER, EV_ADD | EV_ONESHOT, 0,
         time, timer);

  kevent(kq_fd, &kev, 1, NULL, 0, NULL);

1.2.5 my_timer_reset(my_timer_t *timer, int *state):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
This function resets the timer set for the query removing
the event added to kqueue.

  struct kevent kev;

  EV_SET(&kev, timer->id, EVFILT_TIMER, EV_DELETE, 0, 0, NULL);

  kevent(kq_fd, &kev, 1, NULL, 0, NULL);

If kevent call is successful then set state as 1.

1.2.6 my_timer_delete(my_timer_t *timer):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function deletes the timer by deleting the event added
to kqueue.

  struct kevent kev;

  EV_SET(&kev, timer->id, EVFILT_TIMER, EV_DELETE, 0, 0, NULL);

  kevent(kq_fd, &kev, 1, NULL, 0, NULL);


The timer implementation is abstracted through the module 
"sql_timer". 

1.3 Implementation details of win_timer.c
*********************************************
1.3.1 my_timer_init_ext:
~~~~~~~~~~~~~~~~~~~~~~~~
This function starts the timer notification thread. The
notification thread loop algorithm is below,
  
  //Get IO Completion status.
  if (GetQueuedCompletionStatus(io_compl_port, &timer_state, &compl_key,
                                &overlapped, INFINITE) == 0)
    break;

  timer= (my_timer_t*)compl_key;

  // timer is cancelled.
  if (timer->id == 0)
    continue;

  timer->id= 0;
  timer->notify_function(timer);

1.3.2 my_timer_deinit:
~~~~~~~~~~~~~~~~~~~~~~
This function stops the notification thread and closes
delete timer queue and IO completion port.

  // This breaks notification thread too.
  DeleteTimerQueueEx(timer_queue, NULL);

  CloseHandle(io_compl_port);

1.3.3 my_timer_create:
~~~~~~~~~~~~~~~~~~~~~~
This function creates a new timer object. We will just set
my_timer_t.id in this function.

1.3.4 my_timer_set(my_timer_t *timer, ulong time):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function sets the timer to "time" for the query by adding
a timer in timer queue.

  CreateTimerQueueTimer(&timer->id, timer_queue,
                        timer_callback_function, timer, time, 0,
                        WT_EXECUTEONLYONCE);

1.3.5 my_timer_reset(my_timer_t *timer, int *state):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
This function resets the timer by Deleting the timer from
the timer queue.

  DeleteTimerQueueTimer(timer_queue, timer->id, NULL);

If call is successful then timer->id is set to 0.

1.3.6 my_timer_delete(my_timer_t *timer):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function deletes the timer by deleting the timer from
timer queue.

  DeleteTimerQueueTimer(timer_queue, timer->id, NULL);

The timer implementation is abstracted through the module 
"sql_timer". 


1.4 sql_timer:
**************
"sql_timer" Module provides the functions to set, reset and
end the timer. 

Following is the new structure added to create the object to
hold the timer information of the thread.

typedef st_my_timer thd_timer_t;
struct st_thd_timer
{
  THD *thd;
  my_timer_t timer;
  /* to avoid execution of callback and reset at the same time*/
  pthread_mutex_t mutex; 
  bool destroy;
};

1.4.1 thd_timer_set(THD *thd, thd_timer_t *ttp, ulong time): 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function is called to set the timer for the query. This
function creates "timer" object if it is not created and then
set the timer.

   if (!ttp)
     ttp= malloc(sizeof(thd_timer_t));

   pthread_mutex_init(ttp->mutex, MY_MUTEX_FAST_INIT);
   ttp->timer.notify_function= timer_callback;
   ttp->thd= thd;

   my_timer_create(ttp->timer);

   my_timer_set(ttp->timer, time);


1.4.2 timer_callback(my_timer_t *timer):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
It is a function registered to execute on timer expiration. This
function just calls THD::awake(TIMEOUT).

   pthread_mutex_lock(timer->mutex);
   pthread_mutex_lock(&thd->LOCK_thd_data);

   thd->awake(THD::KILL_TIMEOUT); 

   pthread_mutex_unlock(&thd->LOCK_thd_data);
   pthread_mutex_unlock(timer->mutex);

   thd_timer_end(ttp);

THD::KILL_TIMEOUT is a new enum value added for kill status of
THD.

1.4.3 thd_timer_reset(thd_timer_t *ttp):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
This function is called to reset the timer set for query by
calling function "my_timer_reset".

   status= my_timer_reset(*ttp->timer, &state);
   //Return timer to cache if its not signaled yet
   if (state)
   {
     pthread_mutex_lock(&ttp->mutex);
     ttp->thd= NULL;
     pthread_mutex_unlock(&ttp->mutex);
     return (ttp);
   }
  
   return NULL;
  
1.4.4 thd_timer_end(thd_timer_t *ttp):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function ends and deletes the timer created for the query
by calling function,

       my_timer_delete(&ttp->timer);
       pthread_mutex_destroy(&ttp_mutex);

        
2. Extending SELECT:
----------------------
To specify the execution time limit in SELECT query, SELECT 
statement will be  extended to include optional keyword
MAX_STATEMENT_TIME=. 

    For example: SELECT MAX_STATEMENT_TIME=10 * FROM my_table;

    sql_yacc.yy:
    ************
    select_option:
              ...
              | MAX_STATEMENT_TIME_SYM EQ ulong_num
                {
                  if (Lex->current_select != &Lex->select_lex  ||
                      (Lex->sql_command == SQLCOM_CREATE_TABLE ||
                       Lex->sql_command == SQLCOM_CREATE_VIEW  || 
                       Lex->sql_command == SQLCOM_REPLACE_SELECT ||
                       Lex->sql_command == SQLCOM_INSERT_SELECT) 

                  {
                    //Error
                  }
                  Lex->max_statement_time= $3; 
                }
              ;

New member is added to hold the "max_statement_time" to LEX.

    LEX {
      ...
      ulong max_statement_time;
    };


3. System variable for execution timeout:
------------------------------------------
New system variable "max_statement_time" will be added to set
execution time for SELECT for sessions.

  sys_var.cc:
  ***********
  static Sys_var_ulong Sys_max_statement_time(
         "max_statement_time",
         "Kill any statement that takes over the"
         " specified number of milliseconds",
         SESSION(max_statement_time), NO_CMD_LINE,
         VALID_RANGE(0, ULONG_MAX), DEFAULT(0), BLOCK_SIZE(1));


4. New status members:
-----------------------        
Following new status members will be added to "system_status_var",
  
  struct system_status_var {
        ...
        
        /* Number of times timer expired */
        ulong max_statement_time_exceeded; 

        /* Number of timer set */
        ulong max_statement_timer_set;

        /*Number of times timer set failed */
        ulong max_statement_time_set_failed;
  };


5. New state for THD::killed:
-----------------------------
New state enum value "THD::KILL_TIMEOUT" will be added for
"THD::killed" state member. 


6. THD::timer:
---------------
New members "timer" and "timer_cache" will be added to THD
class to hold the timer object.

 struct THD {
   ...
   public:
       /** Timer object. */
       struct st_thd_timer *timer, *timer_cache;
 };


8. Algorithm to set and reset timer for statement:
---------------------------------------------------
New function "set_statement_timer(THD *thd)" will be added to set
timer for SELECT statements. 

  set_statement_timer(THD *thd)
  {
     ulong max_statement_time;


     max_statement_time= thd->lex->max_statement_time ? :
                                   thd->variables.max_statement_time;

     thd->timer= thd_timer_set(thd, thd->timer_cache, 
                               max_statement_time);
     thd->timer_cache= NULL;
  }

timer is reset by calling function "reset_statement_timer",

reset_statement_timer(THD *thd)
{
  thd->timer_cache= thd_timer_reset(thd->timer);
  thd->timer= NULL;
}


9. Changes to THD::awake() function:
------------------------------------
Query kill and query timeout scenarios are handled in the same
way. Only kill state for query timeout is set to THD::TIMEOUT.


10. Changes to open_tables() function:
---------------------------------------
In open_tables if timer is set and if there are any routines
which modifies data then reset timer to avoid timeing out
non-read-only SELECT statements.

bool open_tables()
{
  ...
  
  if (thd->timer && some_routine_modifies_data)
    reset_statement_timer(thd);

err:
  ...
}


11. Error message:
------------------
New error message will be added to report on SELECT query 
execution time

sql/share/errmsg-utf8.txt:

  ER_QUERY_TIMEOUT XXXXX
    eng "Query execution was interrupted, max_statement_time
    exceeded"


12. Flow chart diagrams:
-----------------------

Please find the attachments containing flowcharts for

*. Initializing the timer.

*. Creating and setting timer object.

*. Reset timer.