WL#14019: Automatic connection failover for Async Replication Channels - Step II: Automatic senders list

Affects: Server-8.0   —   Status: Complete

EXECUTIVE SUMMARY

This worklog is second step of 'WL#12649: Automatic connection failover for Async Replication Channels'. It will focus on keeping receiver's sender list in sync with the Group Replication member's changes especially its state and role.

After this worklog is implemented the list of senders as potential replication connection failover targets is automatically updated, for the senders that are in a group (group replication). The list of targets (senders) is dynamic and updated according to the group membership. The receiver will update its list of target senders after there is a membership change in the group. Also after this worklog is implemented receiver will always stay connected to source having highest failover weight through asynchronous replication channel, even when existing asynchronous replication connection isn't failed/disconnected. This will be useful where User/DBA who want receiver to always stay connected to the primary of the group by always keeping higher failover weight for primary, to keep binary logs on receiver in sync with minimum delay.

TERMINOLOGY

Let us first define the terminologies which will be used in this worklog.

Asynchronous Replication terminologies:

The different roles that servers are playing while engaging in asynchronous replication activities.

  • sender: endpoint that sends data.
  • receiver: endpoint that receives data from the source.

  • sender list: a list of multiple senders in which each item contains sender connection details and a priority. The user would add this sender list so that receiver can connect to new sender in case existing sender fails. The sender with higher priority would be choosen to connect next.

sender list                               sender list
 [sender1] -----> [receiver]               [sender1] ---x--> [receiver]
                                                                ^
                             --------->                         |
 [sender2]                                 [sender2] -----------|


       sender1 fails, receiver connects to  sender2
       --------------------------------------------

  • weight (priority): When more than one sender are present for the same (failover weight) channel, then next sender would be selected with highest weight (priority) for the same channel. The weight is a number between 1 and 100, where 1 means lowest priority, and 100 means highest priority. To avoid confusion with group_replication_member_weight it will be also called 'failover weight' in this design document.

  • Group Configuration: To assign different weights to primary and secondary member of the group so that User/DBA can control whom they want receiver to stay connected through asynchronous replication channel.

The configuration weight will contain these data:

  • primary weight: which will be assigned to the primary of the group
  • secondary weight: which will be assigned the rest of the members of the group having quorum.

MOTIVATION

The main driver of this worklog is to keep group membership changes at sender's end in sync with stored sender list on receiver, thereby eliminating the need for user to manually update sender list (on receiver) everytime a member joins or leaves a group.

Also in some scenarios User/DBA may want receiver to always stay connected to sender having highest failover weight. And that can be useful for primary of the group to keep binary logs in sync with receiver's with minimum delay, by assigning primary always highest failover. In this worklog we would provide changes that would keep receiver always stay connected to higher sender weight and way through which user/DBA can assign highest failover weight to primary.

USER STORIES

  • As a MySQL DBA I want to setup asynchronous replication between two servers, S1 (sender) who is member of the group (group replication) to R1 (receiver), and in case group replication membership changes on S1, I want it to get automatically sync with stored sender list on R1, so that I do not have to manually update sender list (on receiver) everytime.

New member joins group:
-----------------------

sender list [S1, S2]                      sender list [S1, S2, S3]
[S1:50] ---> [R1]                         [S1:50] ------> [R1]

                       ------->
[S2:50]              S3 join group        [S2:50]

                                          [S3:50]


       S3 joins sender group, sender list on R1 (receiver)
       which initially had [S1, S2] is updated and now
       contains [S1, S2, S3].
       ------------------------------------------------


Existing member leaves group:
-----------------------------

sender list [S1, S2, S3]                  sender list [S1, S2]
[S1:50] ---> [R1]                         [S1:50] ------> [R1]

                       ------->
[S2:50]              S3 leaves group      [S2:50]


[S3:50]


       S3 leaves sender group, sender list on R1 (receiver)
       which initially had [S1, S2, S3] is updated and now
       contains [S1, S2].
       ------------------------------------------------

  • As a MySQL DBA I want to setup asynchronous replication between two servers, S1 (sender) who is member of the group (group replication) to R1 (receiver), and in case S1 loses quorum I want asynchronous replication to automatically failover the replication connection to one of S3, S4 or S5, according to highest weight.

sender list [S1,S2,S3,S4,S5]           sender list [S1,S2,S3,S4,S5]
-----------                            -----------
| [S1:50]-|----> [R1]                  | [S1:50]-| ---x--> [R1]
|         |                            | [S2:50] |          ^
|         |              ------->      -----------          |
| [S2:50] |            Majority Lost                        |
| [S3:50] |                            -----------          |
| [S4:50] |                            | [S3:50]-|-----------
| [S5:50] |                            | [S4:50] |
-----------                            | [S5:50] |
                                       -----------


       S3, S4 and S5 are not reachable from S1 and S2 and S1 and S2
       has lost majority. The asynchronous replication will get
       connected to one of the S3, S4 and S5.
       ------------------------------------------------

  • As a MySQL DBA I want to setup asynchronous replication between two servers, S1 (sender) to R1 (receiver), and in case a new sender S3 is added with higher failover weight then S1, I want asynchronous replication to automatically switch to S3, as its sender with higher weight.

sender list [S1, S2]              sender list [S1, S2, S3]
[S1:60] ---> [R1]                         [S1:60]          [R1]
                                                            ^
                      ------->                              |
[S2:50]             S3 added with         [S2:50]           |
                    higher weight                           |
                                                            |
                                          [S3:70] ----------|


    S3 is added in sender list with higher weight as compared to
    S1 and S2. R1 gets re-connected to S3.
    -------------------------------------------------------------

  • As a MySQL DBA I want to setup asynchronous replication between two servers, S1 (sender) who is primary member of the group (group replication) to R1 (receiver), and in case primary member changes i.e. S2 becomes PRIMARY and S1 becomes SECONDARY, I want asynchronous replication to automatically switch the replication connection to S2.

sender list [S1:PRI, S2, S3]              sender list [S1, S2:PRI, S3]
[S1:60] ---> [R1]                         [S1:50]          [R1]
                                                            ^
                       ------->                             |
[S2:50]             S2 becomes PRIMARY    [S2:60] ----------|


[S3:50]                                   [S3:50]


       The primary changes from S1 to S2. The weight is also changed as
       per mode and the asynchronous replication gets connected to S2.
       ----------------------------------------------------------------