WL#7334: Group Replication: Read-set free Certification Module - Part 2

Affects: Server-5.7   —   Status: Complete

EXECUTIVE SUMMARY
=================

This worklog is a continuation of WL#6833.
Under the global task to provide MySQL with clustered, fault-tolerant, multi-
master update-everywhere replication, this worklog deals with tasks which were 
postponed for the first working prototype of GCS certifier, namely:
  - Stable transaction computation
  - Garbage Collection
  - Persistence of Next sequence number
and some other functionalities, like providing the certification Db which can be 
used at some other module, like recovery.

The Certifier module is responsible for four major tasks:

1. Identifying conflicts using Certification Database.
2. Updating Stable set using periodic message passing between all nodes in the 
GCS cluster.
3. Garbage Collection on the Certification Database using a periodic purging 
mechanism based on stable set computation.
4. Next Sequence Number is persisted at a Node for later use. 
When a node "re" joins a group, next_seqno value is initiated from the higher 
value of the two: from LAST_GTID_EXECUTED's GNO at local server and highest GNO 
value at GTID_Retrieved from the donor.

At WL#6833, first task is handled.
Rest, 2 & 3, are to be handled at this worklog.
For Certifier Related Functional and Non-functional Requirements, please refer:
WL#6833.
For the tasks handled here

 Functional Requirements:

F1. Next sequence number to be persisted. Whenever a node, previously part of the 
GCS cluster, rejoins the same group, this number is re-used or updated, based on 
the state of the group in terms of present sequence number.

 Non-Functional Requirements:
NF1. Stable Transactions are to be mapped in the CertifcationDb until garbage 
     collection.
NF2. Transactions marked stable are to be persisted in the stable list until 
     garbage collection.
NF3. On recovery, CertificationDb, and last stable transaction information
     is to be exported and must be ready to use as soon as node status changes
     from Recovering to Online in the group.
NF4. CertificationDb and stable transaction information is not persisted for a 
     node which leaves the group.
The basic interface is set by the public methods in certifier class.
The ones made available under this worklog are as follows:

1. This shall provide certification database to the recovery module.
   hash_map get_CertificationDb(void);
2. The latest stable transaction information is also shared across components.
   list get_stable(void);
3. Garbage Collection shall be invoked for various events and at regular
   intervals of time.
   bool garbage_collect(void);

These are the basic methods which allow other modules in GCS to communicate with 
Certifier.

For further details:
Refer WL#6833
  /**
   * This member function shall add transactions to the stable list 
   *
   * @param ctid     GTIDs of transactions to be added to the stable set. This 
   *                 transaction hence has successfully committed and is not
   *                 part of snapshot for subsequent optimistic transactions.
   *
   * @returns        false, if successfully sets the transaction to stable,
   *                 true otherwise.
   */  
  Bool set_stable(Gtid_set gtid ); 

  /**
   * This member function shall return the set of GTIDs of the stable 
   * transactions.
   * This information is further used for garbage collection.
   * 
   */
  Gtid_set get_stable(void);
  
 /**
   * This member function shall garbage collect all transactions that are stable
   * and remove them from the Certification Database, i.e. the Certified item
   * hash-mapped to the transaction sequence number.
   * 
   * @return         false, if successful garbage collection, true otherwise.
   */
  bool garbage_collect(void);

 /**
  * This member function shall return the certified hash-map.
  * 
  */
 cert_db get_CertificationDb(void);[2]

Here, we do the stable set calculation by broadcasting informational messages[1]
between nodes, carrying GTID_executed, at regular intervals.
This message is sent using a new thread spawned to timeout at equal intervals.
The timeout is reset at all nodes every time there is a view change.
The message is handled at the handle_message_delivery event handler.
Such informational messages would be:
	- handled by the certifier 
	- information about gtid executed persisted till all nodes' broadcast is
	  received.

For garbage collection, after every fixed number of rounds of stable set
calculation, garbage collection is invoked.
Here, Certification Db is locked while deleting records.

Note: 
[1] Informational message, can be reusing the MSG_STATE_EXCHANGE from WL#7332, 
OR
a new message type can be introduced.
[2] When providing certification DB, a copy of the map is made, with guarded
access.