MySQL :: Fail early, fail fast - preventing stale reads on a member that dropped out of the group!

In Group Replication, starting with MySQL 8.0.12, the user is able to specify the behavior for members that enter an error state. The member can either shoot itself on the head or set itself to read-only.

But what does that mean in practice? This blog post will give you a background of what problems this feature solves and how you can use it to build more fault-tolerant and highly-available systems.

Introduction

The Group Replication plugin strives to provide a uniform view of the data, i.e. if you write to a primary you expect to (eventually) read that write on another member of the group. All members in the group act as if they were one single indivisible entity, and the user views as such.
With Group Replication we provide fault-tolerance by various mechanisms. One such mechanism is the reconfiguration of the group when a member leaves (either by its own accord or by being kicked out by the other members). You can think of group reconfiguration as a series of coordinated actions that result in an updated shared state in the group. In the case of a member leaving, this group reconfiguration essentially means that the rest of the members will exchange that information between themselves and eventually agree on the number of members that make the group. On the other hand, the member that left the group will enter an OFFLINE or ERROR state, depending on whether it left of its own accord or was kicked out, and stay in a read-only mode

The problem with this scenario is that a member that leaves the group, even if it is in read-only mode, is still up and running and thus, accepting queries. This means that if you are holding a connection to a member that left involuntarily, you can eventually read stale data, because the rest of the group is being updated but not that member (note that if you are using MySQL Router then your app is not exposed to this, because the router will sever the connection in this case). The member that left is in a read-only mode.
Originally this read-only mode was thought of to allow the DBA to assess what went wrong and then manually rejoin the member to the group again. Nowadays, a need to specify the behaviour of a member that has left the group has risen. When we think of middleware built on top of MySQL Group Replication, we realize that these layers want to act fast when the member is stopped. Even for DBAs this is important, if you consider a scenario where the DBA sets up a watchdog tool (like systemd) that restarts killed processes, in the hopes of minimizing the chance of serving stale reads to consumers of the database.
So, in order to fulfill this need, we introduced what we called group_replication_exit_state_action as a configuration option to the Group Replication that allows the user to specify if the server should go to its usual read-only mode or if it should simply abort.

But what could go wrong in a group?

As mentioned above, the group_replication_exit_state_action kicks in once a member involuntarily leaves the group, either because it detected a failure and decided to leave or it was kicked out of the group.
We will go through each possible scenario and provide a succinct explanation of why it happens.

The transaction applier fails

You can view the transaction applier as what’s actually executing the transactions broadcasted in the group. In practice, it’s a bit more complicated than that, but let’s imagine for now that it sits on top on the traditional replication technologies.
So, for example, imagine we have some filesystem issues. The applier would not be able to apply transactions that were already certified within the group (transaction certification is another crucial mechanism to Group Replication) and it would error out. This is one of the situations upon which the member leaves the group.

The member is expelled

The member can be expelled from the group (a nice way to say kicked out) when it is suspected as dead. This means that each member in the group has a failure detector that can see if other members in the group are taking too long to reply. If they take too long, they are ‘suspected’ as dead and eventually kicked out. A small note here is that the time a member takes between being suspected and actually being expelled by the group is determined by the group_replication_expel_timeout system variable.

This scenario results in the member being proactively expelled from the group by other members.

The member is unable to connect to a majority of the group

As you may recall, Group Replication relies on a consensus algorithm (Paxos) to agree upon which transaction to commit. Consensus algorithms usually require a certain number of participants to be able to reach an agreement on a value, i.e. a consensus. For Group Replication’s purposes, this number is at least 51% of the number of members in the group.

If a member loses contact with a majority of the group, through its built-in failure detector (as detailed in the previous scenario), it will leave the group. The rationale is simple – it can’t know if the network problem lies within itself or the group but it will take a defensive stance and assume the problem is his. Also, as with the expel scenario, the time it takes for a member to detect that it is in a majority loss is controlled via a system variable, in this case, group_replication_unreachable_majority_timeout.

Distributed recovery failure

The initial phase for any member joining the group is the recovery phase. To summarize, the joining member must catch up with the rest of the group, thus it receives the missing transactions from a donor (another member) and applies them. Eventually it syncs up with the rest of the group. If any error happens during this phase, for example, if the joining member data has drifted from the rest of the group, so much that it cannot apply the missing state, then the member will error out and involuntarily leave the group.

So all of these scenarios result in an involuntary leave and at this point in time, the member will check which group_replication_exit_state_action the user configured and react accordingly.

So how do I use this?

Using this feature is as simple as setting a Group Replication system variable. We introduced the group_replication_exit_state_action system variable which is an enumerable that allows the following values:

READ_ONLY
ABORT_SERVER

So, in order to enable the new abort behaviour, you can configure the group_replication_exit_state_action variable like so:

SET GLOBAL group_replication_exit_state_action = ‘ABORT_SERVER’

1	SET GLOBAL group_replication_exit_state_action = ‘ABORT_SERVER’

group_replication_exit_state_action set to READ_ONLY

If we set the group_replication_exit_state_action system variable to READ_ONLY , the member that is kicked will react with the same behaviour as in previous versions, meaning, it will go into super_read_only mode and enter the ERROR state.

group_replication_exit_state_action set to ABORT_SERVER

On the other hand, if we set it to ABORT_SERVER, the member, upon an involuntary leave, will log that occurrence and shut itself down cleanly and gracefully (bare in mind that the current default is ABORT_SERVER).

What do I get from this feature?

You can reap many benefits from this, the main ones being to avoid stale reads and automatic fail-over or restart (by monitoring the process state or the connection to the server).
Imagine you set up a system or use a tool where you define a pool of servers that you can use. When one of those servers fails, you want to remove it from the pool. In previous versions, using Group Replication, you would have to query each member state and verify that it didn’t enter the ERROR state. Now you can simply set group_replication_exit_state_action to ABORT_SERVER and once you know that your connection to that server has been severed, then you know it has failed.
There are many other scenarios where this might useful, but in short it helps high-availability by instantly informing the user that the server has errored out in the context of Group Replication.

Summary

We introduced a new system variable called group_replication_exit_state_action that allows the user to specify the behaviour of a member that involuntarily leaves the group. The behaviour can be either to shut itself down (by setting it to ABORT_SERVER) or to go into super-read-only-mode (by setting it to READ_ONLY). This helps the user to avoid stale reads by knowing instantly that a server has failed.