WL#6972: Collect GTIDs to include in the protocol's OK packet

Affects: Server-5.7   —   Status: Complete   —   Priority: Medium

EXECUTIVE SUMMARY
=================

This worklog implements a mechanism to collect the necessary set of
GTIDs to be sent over the wire in the response packet. 

DETAILS
=======

This worklog is a stepping stone towards implementing session consistency
throughout a MySQL based replicated system.  It is built on top of the
GTIDs infrastructure.

There are three levels :

  L1. SESSION_CONSISTENCY_BEST_EFFORT

      No session consistency guarantees. The current situation. No
      changes required.

  L2. SESSION_CONSISTENCY_READ_OWN_WRITES

      If an application, A, issues T1 on S1 (master) and then issues a
      read only or read write transaction T2 on S2 (slave), then T2
      will be executed only after S2 has replayed T1 (i.e., through 
      replication).

      As such, it is said that the application will always read its
      own writes, regardless of the server where these reads are
      issued.

  L3. SESSION_CONSISTENCY_READ_ALL_WRITES

      Assume an application, A, that issues the read-write transaction 
      T1 on S1 (master) and then commits. Later, A issues another 
      read-write transaction T2 on S1 and commits it also. Again, 
      later, through a different connection, A issues a read only 
      transaction T3.

      If A later goes to S2 (slave) and issues a read only transaction,
      T4, then the connector will ensure that T4 will only be set to
      executed after T1, T2 have been replicated and applied on S2.

The decision on when and for which transactions to wait, will be done
by leveraging the GTID information available and that will be exposed
by this worklog. The connector will make the decision whether to wait
or not. The server will provide the connector with sufficient knowledge
(gtid set) for the it to make the decision.

CONSISTENCY LEVELS CONFIGURATION
--------------------------------

Three levels of consistency:

  - SESSION_CONSISTENCY_BEST_EFFORT
  - SESSION_CONSISTENCY_READ_OWN_WRITES
  - SESSION_CONSISTENCY_READ_ALL_WRITES

These require that the server provides different information on
the set of GTIDs that have been seen and/or introduced by the
most recently executed statement.

The server exports an interface to control which GTIDs to track. This
interface builds on the session track GTIDs and is further detailed
later in the Low-Level Description section. The new GTID tracker is
controlled by the dynamic variable @@SESSION_TRACK_GTIDS, which can be
set to one of the following values:

 - OFF
 - OWN_GTID
 - ALL_GTIDS

The mapping to the consistency levels is the following:

 SESSION_CONSISTENCY_BEST_EFFORT     -> SESSION_TRACK_GTIDS= OFF
 SESSION_CONSISTENCY_READ_OWN_WRITES -> SESSION_TRACK_GTIDS= OWN_GTID
 SESSION_CONSISTENCY_READ_ALL_WRITES -> SESSION_TRACK_GTIDS= ALL_GTIDS

Finally the specs mandate that the following information is returned
in the OK packet:

 |---------------------+-------+---------------+---------------------|
 | session_track_gtids | OFF   | OWN_GTID      | ALL_GTIDS           |
 |          vs         |       |               |                     |
 |       scenario      |       |               |                     |
 |---------------------+-------+---------------+---------------------|
 | Single RW trx       | Empty | Its own GTID  | All GTIDs or delta* |
 | Single RO trx       | Empty | Empty         | All GTIDs or delta* |
 | Multiple RW trx     | Empty | Its own GTIDs | All GTIDs or delta* |
 | Multiple RO trx     | Empty | Empty         | All GTIDs or delta* |
 | RW and RO trx       | Empty | Its own GTIDs | All GTIDs or delta* |
 |---------------------+-------+---------------+---------------------|
 (* We go with "all gtids" to begin with. Later we can optimize.)

FR1. The following set of gtids MUST be saved before the reply packet
     is sent to the client after a transaction finishes. They SHALL be
     discarded after being included in the OK packet.

FR2. The user MUST have means to dynamically tell the server what is
     the amount of GTIDs to gather. This facility captures the amount
     of GTIDs specified in the HLD table and is controlled through a
     new dynamic variable. This variable is introduced in the High Level
     Specification section. Its name: SESSION_TRACK_GTIDS.

FR3. The functionality designed in this worklog shall only be available
     if the server is operating with GTID_MODE=ON.

FR4. GTIDs for implicitly terminated transactions SHALL be collected and
     included in the response packet of the statement that terminated the
     ongoing transaction. (E.g., BEGIN INSERT BEGIN <-- returns the GTID for
     the INSERT.).

FR5. FR5. No gtids shall be collected on ROLLBACK for OWN_GTID nor ALL_GTIDS.

FR6. The act of collecting GTIDs for prepared statements observes the same
     rules for non-prepared statements. As such the same requirements listed
     above apply.

FR7. The new system variable, system_track_gtids, SHALL NOT be settable inside
     a transactional context.
The changes to the protocol will happen in WL#4797. The gtids
will be appended to the OK packet in WL#6128. So nothing to
write about the changes in the protocol in this WL.

The user visible interface is reduced to a new option:

SESSION_TRACK_GTIDS
-------------------

  - Type: Server System Variable
  - Settable: Yes
  - Scope: GLOBAL, SESSION
  - Default: OFF
  - Valid Values: OFF, OWN_GTID, ALL_GTIDS
  - Description:

    Controls the GTID information that is appended to the
    mysql protocol OK packet. The values are:

    - OFF          - no gtids are included in the OK packet

    - OWN_GTID     - Collect GTIDs generated by successful committed RW 
                     transactions. Therefore, once a RW transaction is
                     committed its GTID is included in the OK packet for the
                     last statement in the transaction. RO transactions 
                     do not collect GTIDs, so no GTIDs are included in
                     the OK packet for these transactions.

                     A RW transaction GTID is included only once, by the time
                     the RW transaction commits.

    - ALL_GTIDS    - The GTID_EXECUTED at the time the current transaction
                     commits, regardless whether it is RW or RO.
TASKS
-----

The work needed to implement this worklog may be split into the 
following subtasks:

A. Create replication context object for tracking session
   consistency related data.

   class Rpl_consistency_ctx
   {
     ...
   }

B. Add the following properties to the context:
   
   /* To store the maps between sidno and sid (UUID).*/
   Sid_map m_sid_map;
  
   /**
     Set holding the transaction identifiers of the gtids
     to reply back on the response packet.

     Lifecycle: Emptied after the reply is sent back to the application.
     Remains empty until:
     - a RW transaction commits and a GTID is written to the binary log.
     - a RO transaction is issued, the @@SESSION_TRACK_GTIDS is set to 
       ALL_GTIDS and the transaction is committed.
   */  
   Gtid_set m_gtid_set;

C. Create accessors for m_gtid_set in Rpl_consistency_ctx.

   const Gtid_set& get_gtids() { return m_gtid_set; }

D. Create two member functions (of Rpl_consistency_ctx) to 
   save relevant gtids. These will be called at certain points of
   the transaction execution flow. The goal is to encapsulate the
   logic of which GTIDs to store inside these functions. Therefore,
   calls to these functions shall be placed at the relevant points
   of the execution and then these functions will save either 
   @@GLOBAL.GTID_EXECUTED or thd->variables.gtid_next.gtid .

   Note that alternatively, we could register this class as a 
   Trans_observer and Binlog_storage_observer. But those hooks that
   trigger notifications for these observers are too much tied into
   the binary log at this point in time. Nonehtless, if later these
   get refactored, adapting the current logic will be fairly easy 
   since the behavior is already encapsulated in these two new member
   functions.

  /**
     This function MUST be called when a GTID is propagated throughout
     the replication protocol. This could mean, for instance, that it has
     been written to the binary log, thus slaves will get it.

     This function SHALL store the gtid if thd->variables.session_track_gtids
     is set to OWN_WRITES.

     @param thd   The thread context.
     @return true on error, false otherwise.
   */
   void notify_after_transaction_replicated(THD *thd);

  /**
     This function MUST be called after a transaction is committed
     in the server. It should be called regardless whether it is a
     RO or RW transaction. Also, DDLs, DDS are considered transaction
     for what is worth.

     This function SHALL store GTID_EXECUTED if
     thd->variables.session_track_gtids is set to ALL_GTIDS.

     @param thd    The thread context.
     @return true on error, false otherwise.
   */
   void notify_after_transaction_commit(THD* thd);

  /**
     This function MUST be called after the response packet is set to the
     client connected. The implementation may act on the collected gtid state
     for instance to do garbage collection.

     @param thd The thread context.
   * @return true on error, false otherwise.
   */
  bool notify_after_response_packet(THD* thd);

E. Place calls to notify_after_transaction_commit on trans_commit_stmt,
   trans_commit and trans_commit_implicit.

F. Place calls to notify_after_transaction_replicated in 
   Gtid_state::update_on_flush.

G. In sql_parse.cc, dispatch_command:

   Add this after responding to the client connection.

   thd->rpl_session_ctx.notify_after_response_packet(thd);

H. In sys_vars.cc add the new option:

   static const char *session_track_gtids_names[]=
     { "OFF", "OWN_GTID", "ALL_GTIDS" };
   static Sys_var_enum Sys_session_track_gtids(
          "session_track_gtids",
          "Controls the amount of global transaction ids to be "
          "included in the response packet sent by the server."
          "(Default: OFF).",
          SESSION_VAR(session_track_gtids), CMD_LINE(REQUIRED_ARG),
          session_track_gtids_names, DEFAULT(OFF),
          NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(NULL),
          ON_UPDATE(NULL));