WL#6128: Session Tracker: Add GTIDs context to the OK packet

Affects: Server-5.7   —   Status: Complete   —   Priority: Medium

MOTIVATION
==========

As part of the framework to implement several degrees of session
consistency over a farm of MySQL servers, one needs to put in the OK
packet session state. This information must provide input for that the
connector (or middleware) to act and make sure it provides the correct
consistency level to the application, when routing queries to
different servers. For example, putting GTIDs in the OK packet may
help the connector to track dependencies between transactions and
state database transitions, by comparing GTID state on different
servers and the GTID information it holds.

OBJECTIVE
=========

Therefore, this worklog adds a tracker, to the response packet of the
MySQL protocol, to be able to pass to the connector information about
the session state collected that shall be used to implement session
consistency.

In this worklog, we will be taking, packing and sending the data
provided by WL#6972, i.e., a set of GTIDs.

SCOPE
=====

This worklog implements only the tracker. The part that changes the
server and replication layer to collect the necessary data is designed
and implemented on WL#6972. 
F1. If session_track_gtids is set to a value other than 'OFF' then the
    OK packet MUST include the collected data.

F2. The tracker will consider a state change whenever a GTID is
    collected.

F3. The GTIDs representation MUST be extensible in the future without
    breaking the connectors w.r.t. the structure of the information
    exchanged. Therefore, an encoding specification field must exist
    in the payload of the OK packet, to give the connector/decoder the
    ability to decide whether it is able to decode the payload or not.
INTRODUCTION
============

To implement different degrees of session consistency over a set of
mysql servers, the connector or the middleware software needs to track
the state of different servers in the distributed system. The
mechanism under development will collect data after a transaction
commits (the current design considers a set of GTIDs - see
WL#6972). The data is extracted from the server and from the
replication layer. Then, it is put in the OK packet allowing the
connector to track distributed state.

In the current design, the data provided to the tracker is a set of
GTIDs. GTIDs allow tracking dependencies between transactions and
database state transitions. However, it could be other type of
information provided that it could be used to track dependencies among
changes, for instance, an "epoch" from cluster. Getting back to GTIDs,
knowing which transactions have been applied where, makes it easier
for the connector to compare states and identify dependencies among
transactions.

For instance (and very roughly), consider that the user has configured
the connector to make sure that whatever read is issued, it will
always include the latest write that was issued before by him (even if
the connector decides that the read operation is to be done on a
slave). Resorting to GTIDs, the connector can compare implicitly the
slave's state against the master's state and defer the read until all
meaningful writes, coming from the master, have been replayed on the
slave. 

The following diagram illustrates in detail a possible sequence.

    ,-----------.          ,---------.          ,------.          ,-----.
     |Application|          |Connector|          |Master|          |Slave|
     `-----+-----'          `----+----'          `--+---'          `--+--'
           |     INSERT...       |                  |                 |   
           |-------------------->|                  |                 |   
           |                     |                  |                 |   
           |                     |    INSERT...     |                 |   
           |                     |----------------->|                 |   
           |                     |                  |                 |   
           |                     |   (OK, gtids)    |                 |   
           |                     |<- - - - - - - - -|                 |   
           |                     |                  |                 |   
           |         OK          |                  |                 |   
           |<- - - - - - - - - - |                  |                 |   
           |                     |                  |                 |   
           |     SELECT...       |                  |                 |   
           |-------------------->|                  |                 |   
           |                     |                  |                 |   
           |                     |          GTID_WAIT(gtids)          |   
           |                     |----------------------------------->|   
           |                     |                  |                 |   
           |                     |             SELECT...              |   
           |                     |----------------------------------->|   
           |                     |                  |                 |   
           |                     |             (OK, gtids)            |   
           |                     |<- - - - - - - - - - - - - - - - - -|   
           |                     |                  |                 |   
           |         OK          |                  |                 |   
           |<- - - - - - - - - - |                  |                 |   
     ,-----+-----.          ,----+----.          ,--+---.          ,--+--.
     |Application|          |Connector|          |Master|          |Slave|
     `-----------'          `---------'          `------'          `-----'

Note that even though it is possible to query the server for the "last
committed global transaction ID", this introduces an extra roundtrip,
which introduces additional latency, potentially leading to performance
degradation.

PROBLEM STATEMENT
=================

To be able to correctly track the gtid state, the necessary data must
be returned back to the connector after each transaction commits, in
the OK packet.

Since this work is handling the data collected according to WL#6972
spec, this means that the data inserted into the OK packet is a GTID
set. Which elements are in this set is very dependent on the
consistency level. 

The information that is put into the OK packet is controled by the
session variable introduced in WL#6972, which is named
session_track_gtids. This is relevant to this worklog, since it also
controls whether the tracker is enabled or not.

The main problem is then how to serialize and put into the OK packet
the data to be returned. This is the focus of this worklog.

SOLUTION
========

The solution to the problem mentioned is to implement a new
State_tracker, as defined in WL#4797. This state tracker will collect
the data provided by WL#6972 (a GTID set) and add it to the OK packet.

encoding
--------

The encoding will be done as a plain string. In the future we may
decide to encode on a different format (e.g., json).

optimizations
-------------

There will not be any optimization regarding keeping and/or encoding
partial state in this version. In the future, this may be an
optimization that one can do.

Open-Ended Encoding
-------------------

Given that we may want to encode using different formats or even do
some optimizations to the way connectors and the server keep state of
which GTIDs have been exchanged, we will reserve a field in the
payload to be able to state which encoding specification this GTIDs
state relates to. In the future, the connector may check this field
and decide which unpacking mechanism it should use if it chooses to
decode the GTID set in the OK packet.

This opens room for implementing a configuration option that governs
the encoding of the GTID payload in the tracker, e.g., 

SET @@session_track_gtids_encoding_spec='JSON';
SET @@session_track_gtids_encoding_spec='DEFAULT';
SET @@session_track_gtids_encoding_spec='PLAIN | DELTA';

etc.

This worklog implements only one type of encoding, thus there is no
point adding the variable right away. If need be - for instance, we
add some more encoding types - we can implement this as part of
another worklog.

Payload (ABI change)
--------------------

As such, considering the above, the payload regarding the information
serialized by the tracker into the OK packet is:

+-----------------------------------------------------------------------+  
| tracker type |  entity len  | encoding spec |  gtids len   |   gtids  |
| (1-9 bytes)  | (1-9 bytes)  |  (1-9 bytes)  | (1-9 bytes)  | (N bytes)|
+-----------------------------------------------------------------------+

Given that there is only one encoder, the default encoding spec is set
to be number 0 and as such it shall be encoded using only one byte.

API Change
----------

The list of trackers in mysql_com.h interface needs to be extended. We 
added SESSION_TRACK_GTIDS to the list of session state types:

   enum enum_session_state_type
   {
     SESSION_TRACK_SYSTEM_VARIABLES,     /* Session system variables */
     SESSION_TRACK_SCHEMA,               /* Current schema */
     SESSION_TRACK_STATE_CHANGE,         /* track session state changes */
     SESSION_TRACK_GTIDS                 /* track gtids */
   };
 
   #define SESSION_TRACK_BEGIN SESSION_TRACK_SYSTEM_VARIABLES
   #define SESSION_TRACK_END SESSION_TRACK_GTIDS
TASKS
-----

The work needed to implement this worklog may be split into the
following subtasks:

A. Create a new State_tracker for tracking GTIDS:

   class Session_gtids_ctx_encoder;
   class Session_gtids_tracker : public State_tracker, 
                                        Session_gtids_ctx::Ctx_change_listener
   {
   private:
     Session_gtids_ctx_encoder *encoder;
     (...)
   public:
     bool store(THD* thd, String& buf) { return encoder->encode(thd, buf); }
     (...)
   }

   This class will listen for changes in the GTIDs context and will
   hold a reference to an instance of an encoder.

   We shall not have instances of encoders when they are not
   needed. As such, the lifetyime of an instance depends on the value
   assigned to SESSION_TRACK_GTIDS.

B. Create an interface for the gtid set encoder.

   class Session_gtids_ctx_encoder
   {
   public:
     ulonglong encoding_specification()= 0;
     bool encode(THD *thd, String& buf)= 0;
   }

   This is an abstract class that defines the interface for the
   encoder.

C. Create an implementation of the encoder interface.

   class Session_gtids_ctx_encoder_string : public Session_gtids_ctx_encoder
   {
   public:
     ulonglong encoding_specification() { return 0; }
     bool encode(THD *thd, String& buf)
     {
       /* encoding implementation goes here */
       (...)
     }
   }

   This is the default encoder implementation. Different encoders
   shall extend either the abstract class or a specialized class and 
   implement their own encode member function. The long integer
   returned by the encoding_specification member function MUST be
   unique.

D. Define a listener interface. This interface must be implemented by
   the tracker to listen to Session_gtids_ctx changes.

   class Session_gtids_ctx 
   {
   public:
     class Ctx_change_listener
     {
     public:
       Ctx_change_listener() {}
       virtual void notify_session_gtids_ctx_change()= 0;
     private:
       // not implemented
       Ctx_change_listener(const Ctx_change_listener& rsc);
       Ctx_change_listener& operator=(const Ctx_change_listener& rsc);
     };
   }

   The Session_gtids_tracker must extend Session_gtids_ctx::Ctx_change_listener
   and implement the member function:

     void notify_session_gtids_ctx_change();

   The implementation will then call the State_tracker interface function

     void mark_as_changed(LEX_CSTRING *name);

   By extending Session_gtids_ctx, by implementing 
   Ctx_change_listener::notify_session_gtids_ctx_change and by registering
   on Session_gtids_ctx the link between the GTIDs collector and the 
   tracker + encoder is established and there is no cross-dependency 
   between both.

E. Extend the Session_gtids_ctx with member functions to register a
   listener. 

   The class Session_gtids_ctx interface must be extended to hold two
   new member functions definitions:


    /**
     Registers the listener. The pointer MUST not be NULL.

     @param listener a pointer to the listener to register.
     */
    void
    register_ctx_change_listener(Session_gtids_ctx::Ctx_change_listener*
    listener);

    /**
     Unregisters the listener. The listener MUST have registered previously.

     @param listener a pointer to the listener to register.
    */
    void
     unregister_ctx_change_listener(Session_gtids_ctx::Ctx_change_listener*
     listener);

   NOTES:
   - The act of collecting GTIDs happens only if there are listeners
     registered.
   - The internal structures Gtid_set and Sid_map lifetime is governed
     by whether there is a listener registered or not. The reason
     being, because we do not want to use memory unnecessarily.

F. Extend the list of session trackers in the mysql_com.h interface:

   enum enum_session_state_type
   {
     SESSION_TRACK_SYSTEM_VARIABLES,     /* Session system variables */
     SESSION_TRACK_SCHEMA,               /* Current schema */
     SESSION_TRACK_STATE_CHANGE,         /* track session state changes */
     SESSION_TRACK_GTIDS                 /* track gtids */
   };
 
   #define SESSION_TRACK_BEGIN SESSION_TRACK_SYSTEM_VARIABLES
   #define SESSION_TRACK_END SESSION_TRACK_GTIDS

G. Add support in libmysql to read the data from the new
   SESSION_TRACK_GTIDS, in "void read_ok_ex(MYSQL *mysql, ulong length)".

FOR TESTING PURPOSES:

H. Change client/mysqltest.cc to support the new SESSION_TRACK_GTIDS 
   tracker type.