MySQL :: MySQL InnoDB Cluster – Controlling data consistency and expel timeouts

This blog post follows the series that we have been composing to detail every single new feature added in the latest MySQL InnoDB Cluster release. As listed in the release announcement of 8.0.14, we’re very happy with new features that greatly enhance the whole experience and capabilities of InnoDB cluster!

This post is focused on two very hot topics: “Defining the timeout to expel an unresponsive member of a cluster” and “Increasing data consistency by enabling the “read your writes” fencing mechanism”.

Defining the timeout to expel an unresponsive member of a cluster

MySQL InnoDB cluster aims to give full control of the cluster’s settings whilst maintaining ease-of-use and flexibility. Group Replication has introduced important new features to handle particular scenarios and also to extend the management capabilities.

The underlying Group Replication plugin automatically manages the cluster membership, i.e. acknowledges that servers joined or left the cluster. Leaving a cluster can mean that the server voluntarily left the group, by DBA’s action, informing the other members of the cluster that is leaving; or it means that it left involuntarily. For the latter, in order to expel a cluster member that left it involuntarily, the other members need to realize about the event to take action (failure detector).
Currently, the period of time that goes between a suspicion that a cluster member failed or became unreachable is predefined and unchangeable.

One could think about many possible scenarios that would result in an undesired member eviction. But the main ones are:

Flaky network prone to false suspicions (such as WAN)
Maintenance tasks in an InnoDB cluster member

To improve the management of clusters in such scenarios, Group Replication has introduced an option to configure the failure detector window to allow for delays or suspension of previously active members of the cluster.

So in order to support this feature, the AdminAPI command dba.createCluster() was extended with a new option expelTimeout that allows defining the period of time, in seconds, that cluster members should wait for a non-responding member before evicting it from the cluster.

This option is a general cluster setting, meaning all members of the cluster will have the same value. Thence, the option is only allowed when creating the cluster with dba.createCluster() and any new member adding to the cluster will automatically obtain and use the value is used by the cluster.

Note: The option is general because otherwise, the used timeout value will be the same set in the “killer node”, i.e. the lowest value. This may not be what the user really wants since the “killer node” status is determined automatically from the instance’s position in the group.

Increasing data consistency by enabling the “read your writes” fencing mechanism

As previously mentioned, a default cluster setup runs in single-primary mode, i.e. the cluster has a single-primary server that accepts read and write queries (R/W). There is then an expectation that if a client reads from the primary server it will always read its own writes. However, on the event of a primary failover and a new one is elected, the new primary won’t reject or hold reads after being promoted. The main consequence of this behavior is that if a client wrote X to the previous primary and then reads X from the new primary it may see stale data because the backlog is still being applied.

Since this is a general expectation, Group Replication has improved the control of this behavior by introducing a new option to configure how a cluster behaves in such a situation. This new option allows enabling a fencing mechanism that prevents connections from reading or writing data to the new primary until it has applied any pending backlog of changes coming from the old primary.

The newly elected primary can either:

Allow reads even if the backlog isn’t fully applied.

Note: Preventively, writes are blocked due to super_read_only mode being enabled during the period that the backlog is being applied.

Block read and write queries until the backlog is fully applied.

Note: This ensures that clients always read the newest value which they have written, but also means that clients might have to wait until the backlog has been applied before they can read from the new primary.

In order to support this feature, the dba.createCluster() command was extended with a new option failoverConsistency that allows defining the consistency guarantees for primary failover in single-primary mode. The option has two possible values:

EVENTUAL (default): read queries allowed in new primary
BEFORE_ON_PRIMARY_FAILOVER: “read your writes”

This option is a general cluster configuration, meaning all members of the cluster will have the same value. Thence, the option is only allowed when creating the cluster with dba.createCluster() and any new member adding to the cluster will automatically obtain and use the value is used by the cluster. It is possible to change it afterwards as shown previously with <Cluster.>setOption()

Try it now and send us your feedback

MySQL Shell 8.0.14 GA is available for download from the following links.

MySQL Community Downloads website: https://dev.mysql.com/downloads/shell/
MySQL Shell is also available on GitHub: https://github.com/mysql/mysql-shell

The documentation of MySQL Shell can be found in https://dev.mysql.com/doc/mysql-shell/8.0/en/ and the official documentation of InnoDB cluster can be found in the MySQL InnoDB Cluster User Guide.

Enjoy, and Thank you for using MySQL!