WL#11123: Group Replication: hold reads and writes when the new primary has replication backlog to apply

Affects: Server-8.0   —   Status: Complete

Executive Summary

This worklog implements a fencing mechanism when a new primary is being promoted in Group Replication (GR). The fencing will hold connections from writing and reading from the new primary until it has applied all the pending backlog of changes that came from the old primary. Applications will not read stale data or do writes for a short period of time (during the new primary promotion).

Motivation

There is an expectation that if an application reads from the primary, it will always read its own writes. This expectation can even be subconsciously enforced when GR is deployed in single primary mode.

However, on primary fail-over, the new primary will not reject or hold reads after being promoted. Instead, it will turn on conflict detection while the backlog is being applied. This means that if the application wrote A to the previous primary and then reads A from the new primary, it may see a stale value (i.e., miss its previous write, because it is still in the backlog to be applied).

For example:

  1. session1 @ server1 (primary) : W(a) C
  2. server1 fails, server2 is promoted and is now the new primary
  3. server2 is still applying the backlog (thence W(a) has not been applied yet to server2)
  4. router fails over application to server2 (new primary)
  5. session1 @ server2 (new primary) : R(a) C
  6. Application reads an old version of "a", even though it was always connected to the primary (!)

The user of GR not expect this behavior, the primary member always shall have the most recent data from the group.

NOTES - In async master-slave replication there is no provision in the system to prevent anything like this -- meaning that MySQL users have always faced this issue and they choose how to deal with it. The same techniques that users use today in async master-slave promotion can be applied to GR single primary promotion; - And the current GR behavior is already better than the case in async master-slave promotions; - But, GR is raising the bar and expectations are higher when it comes to synchrony and consistency; - The user/middleware/router can hold incoming read/writes to the new primary until it finishes applying the backlog or we can do it in the server itself.

BIG FAT NOTE: This is not about implementing read your writes consistency, but rather making sure that the following simple and straightforward assumption holds: if one reads from the primary, at any point in time, one will always read one's own writes.

Potential Solutions

There are three approaches to fix this (though 2 of them are pretty much the same):

  1. Fix it in the router, so that fences the new primary until the backlog is flushed;
  2. Fix it in GR, so that the plugin rejects incoming traffic, until its backlog is flushed;
  3. Fix it in GR, so that the plugin holds incoming traffic, until its backlog is flushed.

NOTES: - Something similar to #3 will have to be implemented for "WL#10379: Group Replication: consistent reads"; - #1 is valid only for InnoDB Cluster setups, i.e., when MySQL Router is deployed; - #2 is just a variation of #3. #1 vs #2,#3 is really what the decision boils down to.

Chosen Solution

Incoming transit when a member is being elected as Primary will be put on hold until it applies all backlog. After this it will resume and retrieve the most recent data from the group.

The option can be enabled on a GLOBAL or SESSION scope. The goal is to allow create a session that can be used to do operations without cause impact on others sessions that are on hold until backlog is applied.