WL#4259: Debug Sync Facility

Affects: Server-5.1   —   Status: Complete

The Debug Sync Facility allows to place synchronization points in the
code:

    open_tables(...)

    DEBUG_SYNC(thd, after_open_tables);

    lock_tables(...)

When activated, a sync point can

  - Send a signal and/or
  - Wait for a signal

Nomenclature:

- signal:             A value of a global variable that persists
                      until overwritten by a new signal. The global
                      variable can also be seen as a "signal post"
                      or "flag mast". Then the signal is what is
                      attached to the "signal post" or "flag mast".

- send a signal:      Assign the value (the signal) to the global
                      variable ("set a flag") and broadcast a
                      global condition to wake those waiting for
                      a signal.

- wait for a signal:  Loop over waiting for the global condition until
                      the global value matches the wait-for signal.

By default, all sync points are inactive. They do nothing (except of
burning a couple of CPU cycles for checking if they are active).

A sync point becomes active when an action is requested for it.

For every sync point there can be one action per thread only. Every
thread can request multiple actions, but only one per sync point. In
other words, a thread can activate multiple sync points.


The proposal includes three ways how the action for a sync point can be
requested. Only one way will be implemented:

  - User Variables
  - SQL Syntax Extension
  - System Variable


==== User Variables (Syntax Proposal #1) ====

By setting a user variable with the same name as the sync point, you
specify a string with the requested action:

    SET @after_open_tables= 'SIGNAL opened WAIT_FOR flushed';

This method has the downside that it pollutes the name space of user
variables.

Another downside is the performance. A sync point needs to search for
"its" user variable every time the execution runs through it. [I call
this a "sync point hit" or simply a "hit".] User variables are kept in a
hash. This is optimized for user variable handling, but not for sync
points. Since good sync points are probably at frequently executed
places, a hash look up might have a measurable performance hit.

When a sync point finds its action, it needs to parse the user variable
string, which is also bad for performance.

Because of the performance problems the sync points must not exist in
production servers, if this method is implemented. Test case design
precautions are to be taken as mentioned in the "Dbug Sleep" section.

A prototype implementation is here: http://lists.mysql.com/commits/39192
(Warning: The action string syntax is very different in that patch.)

This proposal resulted from the initial idea. The downsides are so grave
that no one supports it any more.


==== SQL Syntax Extension (Syntax Proposal #2) ====

By a special SQL statement you request an action for a sync point:

    DEBUG SYNC after_open_tables SIGNAL opened WAIT_FOR flushed;

The main advantage is the excellent readability.

In contrast to the user variables the parsing happens when the 'DEBUG'
statement is parsed. The sync point does not need to do this.

The parsed action can be put into a new container (or say data
structure) that can be optimized for sync point performance. More about
this later.

The performance of this method may be good enough to have it even in a
production server.

A possible downside is the enlargement of the parser, which might make
it slightly slower.

Another possible downside might be that the new keywords (here 'DEBUG',
'SYNC', 'SIGNAL', 'WAIT_FOR') require quoting when statements like "SHOW
CREATE TABLE" are executed under SET sql_quote_show_create=0 and table
elements are named like them.

A prototype implementation is here: http://lists.mysql.com/commits/40450

This proposal is from Monty. He favors it.


==== System Variable (Syntax Proposal #3) ====

By setting a system variable you request an action for a sync point:

    SET DEBUG_SYNC= 'after_open_tables SIGNAL opened WAIT_FOR flushed';

There is no name space pollution. System variable names are under full
control of MySQL developers. No clash with user defined names can
happen. No new syntax keywords are defined, so there is neither any
pollution on this side.

Like with the SQL syntax extension, the parsing happens when the
'DEBUG_SYNC' variable is set. The sync point does not need to do this.

Like with the SQL syntax extension, the parsed action can be put into a
new container (or say data structure) that can be optimized for sync
point performance. More about this later.

There is no parser extension, so no performance hit on this side.

The performance of this method may be good enough to have it even in a
production server.

A small performance regression might be possible due to the fact that
there is one more system variable. Whenever a system variable is set,
the search for the variable in the list of variables might take some CPU
cycles longer. However, setting a system variable is not a frequent
action.

The readability of the synchronization requests is worse that with the
syntax extension. A bit odd is the fact that the assignment of this
variable does not just set the variable value itself. But we do already
have similar behavior with the 'debug' variable. Setting
'key_buffer_size' does also do a lot more than just change the variable
value.

This method requires an extra parser for the string, though an extremely
trivial one. There are no loops and there is a fixed order of elements.

A prototype implementation is here: http://lists.mysql.com/commits/40767

This proposal is favored by Ingo and, if understood correctly, by
Konstantin, Magnus, Rafal, Sergei. I haven't heard other opinions so far.


==== Synchronization Point Action Details ====

Here is an example how to activate and use the sync points, using the
syntax extension proposal (syntax proposal #2):

    --connection conn1
    DEBUG SYNC after_open_tables SIGNAL opened WAIT_FOR flushed;
    send INSERT INTO t1 VALUES(1);
        --connection conn2
        DEBUG SYNC now WAIT_FOR opened;
        DEBUG SYNC after_abort_locks SIGNAL flushed;
        FLUSH TABLE t1;

When conn1 runs through the INSERT statement, it hits the sync point
'after_open_tables'. It notices that it is active and executes its
action. It sends the signal 'opened' and waits for another thread to
send the signal 'flushed'.

conn2 waits immediately at the special sync point 'now' for another
thread to send the 'opened' signal.

A signal remains in effect until it is overwritten. If conn1 signals
'opened' before conn2 reaches 'now', it will still find the 'opened'
signal. It does not wait in this case.

When conn2 reaches 'after_abort_locks', it signals 'flushed', which lets
conn1 awake.

Normally the activation of a sync point is cleared when it has been
executed. Sometimes it is necessary to keep the sync point active for
another execution. You can add an execute count to the action:

    DEBUG SYNC name SIGNAL sig EXECUTE 3;

This will set an activation counter to 3. Each execution decrements the
counter. After the third execution the sync point becomes inactive in
this example.

One of the primary goals of this facility is to eliminate sleeps from
the test suite. In most cases it should be possible to rewrite test
cases so that they do not need to sleep. (But this facility cannot
synchronize multiple processes.) However to support developing of the
tests, and as a last resort, sync point waiting times out. There is a
default timeout, but it can be overridden:

    DEBUG SYNC name WAIT_FOR sig TIMEOUT 10 EXECUTE 2;

TIMEOUT 0 is special: If the signal is not present, the wait times out
immediately.

When a wait timed out (even on TIMEOUT 0), a warning is generated so
that it shows up in the test result.

You can throw an error message and kill the query when a synchronization
point is hit a certain number of times:

    DEBUG SYNC name HIT_LIMIT 3;

Or combine it with signal and/or wait:

    DEBUG SYNC name SIGNAL sig EXECUTE 2 HIT_LIMIT 3;

Here the first two hits send the signal, the third hit returns the error
message and kills the query.

For cases where you are not sure that an action is taken and thus
cleared in any case, you can force to clear (deactivate) a sync point:

    DEBUG SYNC name CLEAR;

If you want to clear all actions and clear the global signal, use:

    DEBUG SYNC RESET;

This is the only way to reset the global signal to an empty string.

For testing of the facility itself you can test a sync point:

    SET DEBUG_SYNC= 'name TEST';


==== Implementation ====

The facility shall be an optional part of the MySQL server. This
requires a switch to the configure tool.

===== Enabling/Disabling of the whole Facility =====

If the Debug Sync Facility is compiled into a production
server, it must be disabled by default. It is to be enabled by a mysqld
command line option:

    --debug-sync

This could also be combined with setting a default wait timeout:

    --debug-sync-timeout[=default_wait_timeout_value_in_seconds]

'default_wait_timeout_value_in_seconds' is the default timeout for the
WAIT_FOR action. If set to zero, the facility stays disabled.

The facility is enabled by default in the test suite, but can be
disabled with:

    mysql-test-run.pl ... --debug-sync-timeout=0 ...

Likewise the default wait timeout can be set:

    mysql-test-run.pl ... --debug-sync-timeout=10 ...

Pseudo code for a sync point:

    #define DEBUG_SYNC(thd, sync_point_identifier)
              if (unlikely(opt_debug_sync))
                debug_sync(thd, sync_point_identifier)

Alternatively one could use -1 for disabling. Then one could have a
default timeout of zero.

===== Enum sync points (implementation proposal #1) =====

Monty suggested to have an enum with synchronization point identifiers
and an array of actions per sync point referenced from THD.

    void debug_sync(thd, sync_point_identifier)
    {
      if (!thd->debug_sync_action[sync_point_identifier].activation_count)
        return;
      execute sync point action
    }

The DEBUG SYNC statement searches for the array slot of the
sync point by using a typelib. It does then modify the array slot.

The advantage is that inactive sync points have little overhead. However
to add a new sync point one must:

    1. Edit the enum in debug_sync.h
    2. Edit the string array in debug_sync.cc
    3. Edit the file where the sync point is to be put.

Then the make compiles almost all of the server. Even if this takes just
two minutes, in my experience it takes a couple of attempts until a sync
point is at a good place. Even when moving a sync point within the same
file, one should rename it so that it reflects its position. Thus it is
easy to burn 15-20 minutes.

I would like to be able to use the facility for quick experiments ("see
what happens ..."), even if I do not aim for a new test case. I would
have to add sync points and remove them later. Not much fun with that
much of editing.

This proposal is from Monty. He favors it.


====== Alternative for enum sync points ======

one could split the enum in an extra header file and include this just
in those files that needs it. But then there is another step:

    4. Verify that the file with the new sync point includes debug_sync.h

I feel that these 3-4 steps are likely to discourage developers from
using this facility widely.


===== String type sync points (implementation proposal #2) =====

Use strings as sync point identifiers:

    DEBUG_SYNC(thd, "after_open_tables");

No enum and no string array need to be edited.

The macro would look like:

    #define DEBUG_SYNC(thd, sync_point_name)
              if (unlikely(opt_debug_sync))
                debug_sync(thd, STRING_WITH_LEN(sync_point_name))

The DEBUG SYNC statement adds the action to an dynamic array
and keeps it sorted by string length and string. In my prototype
implementation only 2 of 12 sync point names have the same length.
String comparison will rarely be done for searching, but just for the
final verification.

The debug_sync() function would do a binary search over the array.
In my prototype implementation for merge.test, I don't have more than 3
actions active at a time. Even a very complicated case with 4-7 active
actions would have a sync point find its action in three steps.

When the debug sync facility is disabled, it has the same
overhead as the other approach. The command line option is checked first
in any case.

When there are no active actions in a thread, this is detected
immediately after the command line option check. The search function is
not even entered.


===== Enum + String sync points (implementation proposal #3) =====

One could implement both approaches. The developer can first play with
the string identifiers. When his sync points prove useful and he wants
to keep them, he could turn them into "static" sync points.

This would mean that the DEBUG SYNC statement would first try
to find the sync point in the typelib and then in the dynamic array.

The developer would have to add

    DEBUG_SYNC_STR(thd, "after_update");

And change it later to

    DEBUG_SYNC(thd, AFTER_UPDATE);

plus editing the enum and the string array.


===== Spare Enum sync points (implementation proposal #4) =====

If we decide against string type sync points, we could have a couple
of predefined, not location bound sync point identifiers that can be
used during test case development, and have to be turned into location
bound identifiers before commit/push. For example:

    #ifndef DBUG_OFF
      /* WARNING: use these only for test case development ... */
      DEBUG_SYNC_POINT_1, DEBUG_SYNC_POINT_2, ...
      "debug_sync_point_1", "debug_sync_point_2", ...
    #endif

and have them to be renamed to "after_insert" or such and add the
belonging enums and strings before commit/push.


===== Self organizing string sync points (implementation proposal #5) =====

The trick is that the DEBUG_SYNC macro provides static storage for
the sync point index, which replaces the enum value. The index is
dynamically selected when the sync point is hit for the first time.

We have a global dynamic array of sync point names and a thread specific
array of sync point actions:

    external int  debug_sync_point_count= 0;
    external char **debug_sync_point_name= NULL;

    int                  THD::debug_sync_point_count;
    st_debug_sync_action *THD::debug_sync_point_action;

The global array is empty when the server starts. Every time a sync
point is hit for the first time, a new array element is established for
it in the global array. Its array index is stored in static storage.
This index is then used to address the thread specific array. On a later
hit, the index is present already. Only the second step is taken then.

Pseudo code:

    #define DEBUG_SYNC(thd, sync_point_name)
              if (unlikely(opt_debug_sync))
              {
                static uint sync_point_index= 0;
                debug_sync(thd, &sync_point_index, sync_point_name);
              }

    void debug_sync(thd, *sync_point_index, sync_point_name)
    {
      if (!*sync_point_index)
        *sync_point_index= debug_sync_point_new(thd, sync_point_name);
      /*
        sync_point_index 0 is reserved for an always inactive action.
      */
      if (*sync_point_index < thd->debug_sync_point_count &&
          !thd->debug_sync_action[*sync_point_index].activation_count)
        return;
      execute sync point action
    }

    int debug_sync_point_new(thd, sync_point_name)
    {
      
      int sync_point_index= debug_sync_point_count++;
      debug_sync_point_name= realloc(debug_sync_point_name,
                                     debug_sync_point_count);
      /* If malloc error, return special sync_point_index 0 */
      debug_sync_point_name[sync_point_index]= sync_point_name;
      
      return(sync_point_index);
    }

When a thread starts, it allocates an action array of the current size
of the global sync point name array. When a "new" sync point is hit in a
thread, its index is beyond the threads array. This means that the sync
point is inactive in this thread. When an action is requested for a
"new" sync point, the thread array is reallocated to the current size of
the global array.

This is a complicated algorithm, but now we find the action for a sync
point relatively quickly like so (beginning from the second hit of a
sync point):

    if (unlikely(opt_debug_sync) &&
        sync_point_index &&
        sync_point_index < thd->debug_sync_point_count &&
        thd->debug_sync_point_action[sync_point_index].activation_count)

This are just two conditions more than with enum sync points
(sync_point_index && sync_point_index < thd->debug_sync_point_count).

A downside of this proposal is that a sync point must be run through at
least once before an action can be requested for it. Otherwise it would
not appear in the sync point list and no index can be retrieved. While
this should be easy to do in the tests, it is still somewhat
uncomfortable, if not annoying.

A possible way to get around this could be to "queue" the request if the
sync point is not registered. A sync point that is hit for the first
time registers itself and scans the queue.


=== Further reading ===

For a discussion of other methods to synchronize threads see
http://forge.mysql.com/wiki/MySQL_Internals_Test_Synchronization


Comments:

Implementation proposal #5:
to limit run-time impact, on C++ we can use constructors to register points on
load. In GCC and ICC we could use __attribute__("constructor").
Also, mallocs could go away.

Syntax proposal #3:
variable name should be debug_syncronize (I'd personally prefer debug_sync,
even). It's not SQL and doesn't deal with data - it works and refers to
server's source code. Just as "debug" variable does. It's cleaner to have
a dedicated namespace for these stuff. Will help in the future too, when
we'll be adding more similar features.