In terms of distributed consistency guarantees, either in normal or failure repair operations, Group Replication has always been an eventual consistency system. This means that as soon as the incoming traffic slows down or stops, all group members have the same data content. The events that relate to the consistency of a system can be split into control operations, either manual or automatically triggered by failures; and data flow operations.
For Group Replication, the control operations that can be evaluated in terms of consistency are:
a member joining or leaving, which is covered by Group Replication's Section 20.5.4, “Distributed Recovery” and write protection.
network failures, which are covered by the fencing modes.
in single-primary groups, primary failover, which can also be an operation triggered by
group_replication_set_as_primary()
.
In a single-primary group, in the event of a primary failover when a secondary is promoted to primary, the new primary can either be made available to application traffic immediately, regardless of how large the replication backlog is, or alternatively access to it can be restricted until the backlog has been applied.
With the first approach, the group takes the minimum time
possible to secure a stable group membership after a primary
failure by electing a new primary and then allowing data
access immediately while it is still applying any possible
backlog from the old primary. Write consistency is ensured,
but reads can temporarily retrieve stale data while the new
primary applies the backlog. For example, if client C1 wrote
A=2 WHERE A=1
on the old primary just
before its failure, when client C1 is reconnected to the new
primary it could potentially read A=1
until
the new primary applies its backlog and catches up with the
state of the old primary before it left the group.
With the second alternative, the system secures a stable group
membership after the primary failure and elects a new primary
in the same way as the first alternative, but in this case the
group then waits until the new primary applies all backlog and
only then does it permit data access. This ensures that in a
situation as described previously, when client C1 is
reconnected to the new primary it reads
A=2
. However, the trade-off is that the
time required to failover is then proportional to the size of
the backlog, which on a correctly configured group should be
small
.
Prior to MySQL 8.0.14 there was no way to set the failover
policy; by default, availability was maximized as described in
the first approach. In a group with members running MySQL
8.0.14 and higher, you can determine the level of transaction
consistency guarantees provided by members during primary
failover using the
group_replication_consistency
variable. See
Impact of Consistency on Primary Election.
Data flow is relevant to group consistency guarantees due to
the reads and writes executed against a group, especially when
these operations are distributed across all members. Data flow
operations apply to both modes of Group Replication:
single-primary and multi-primary, however to make this
explanation clearer it is restricted to single-primary mode.
The usual way to split incoming read or write transactions
across a single-primary group's members is to route writes to
the primary and evenly distribute reads to the secondaries.
Since the group should behave as a single entity, it is
reasonable to expect that writes on the primary are
instantaneously available on the secondaries. Although Group
Replication is written using Group Communication System (GCS)
protocols that implement the Paxos algorithm, some parts of
Group Replication are asynchronous, which implies that data is
asynchronously applied to secondaries. This means that a
client C2 can write B=2 WHERE B=1
on the
primary, immediately connect to a secondary and read
B=1
. This is because the secondary is still
applying backlog, and has not applied the transaction which
was applied by the primary.
You configure a group's consistency guarantee based on the point at which you want to synchronize transactions across the group. To help you understand the concept, this section simplifies the points of synchronizing transactions across a group to be at the time of a read operation or at the time of a write operation. If data is synchronized at the time of a read, the current client session waits until a given point, which is the point in time that all preceding update transactions have been applied, before it can start executing. With this approach, only this session is affected, all other concurrent data operations are not affected.
If data is synchronized at the time of write, the writing session waits until all secondaries have written their data. Group Replication uses a total order on writes, and therefore this implies waiting for this and all preceding writes that are in secondaries’ queues to be applied. Therefore when using this synchronization point, the writing session waits for all secondaries queues to be applied.
Any alternative ensures that in the situation described for
client C2 would always read B=2
even if
immediately connected to a secondary. Each alternative has its
advantages and disadvantages, which are directly related to
your system workload. The following examples describe
different types of workloads and advise which point of
synchronization is appropriate.
Imagine the following situations:
You want to load-balance reads without deploying additional restrictions on which server you read from to avoid reading stale data, group writes are much less common than group reads.
For a group that has predominantly read-only data, you want read/write transactions to be applied everywhere once they commit, so that subsequent reads are done on up-to-date data that includes the latest write. This ensures that you do not pay the synchronization cost for every read-only transaction, but only for read/write transactions.
In these cases, you should choose to synchronize on writes.
Imagine the following situations:
You want to load balance your reads without deploying additional restrictions on which server you read from to avoid reading stale data, group writes are much more common than group reads.
You want specific transactions in your workload to always read up-to-date data from the group, for example whenever sensitive data is updated (such as credentials for a file or similar data) and you want to enforce that reads retrieve the most up to date value.
In these cases, you should choose to synchronize on reads.