MySQL Blog Archive

For the latest blogs go to blogs.oracle.com/mysql

Announcing MySQL InnoDB Cluster Read Replicas

Posted on Aug 15, 2023 by Miguel Araújo, Kenny Gryp
Category: High Availability / Replication
Tags: replication, shell

MySQL's first Innovation Release is out, 8.1.0, and with it, we're introducing MySQL InnoDB Cluster Read Replicas.

The main purpose of secondaries on MySQL InnoDB Cluster is to be ready to take over when a primary member has failed (High Availability). This is done using MySQL Group Replication. Another commonly used purpose for the secondaries is to use them to offload read workloads away from the primary. With MySQL InnoDB Cluster Read Replicas, it's now possible to add asynchronous replicas to the database topology, to be used to offload read traffic away from primary or secondaries, to have dedicated read replicas, special purpose read replicas (e.g. for reporting), or to scale beyond what the secondaries can handle by adding multiple read replicas.

MySQL InnoDB Cluster handles the operation of these replicas, including automatically redirecting traffic by MySQL Router, MySQL Shell does configuration, initial provisioning (using InnoDB CLONE plugin), the replication configuration and any topology changes desired. Read Replicas also work with MySQL InnoDB ClusterSet, where Replicas can be added to both primary and secondary clusters.

MySQL Router is aware of those Read Replicas and will redirect read traffic to them (depending on some configuration options). Those read replicas will replicate from either the primary or one of the secondaries and will automatically reconnect to another member if replication breaks or any membership changes happened (e.g. new primary was elected).

Creating Read Replicas

Adding a Read Replica to a Cluster is very similar to adding a secondary instance, the AdminAPI's look and feel haven't changed to ensure optimal usability.

So assuming you have a running Cluster, to add a Read Replica use the new command:

<Cluster>.addReplicaInstance(instance[, options])

mysqlsh-js> <strong>cluster.addReplicaInstance("rome4:3306")</strong>

Setting up 'rome4:3306' as a Read Replica of Cluster 'rome'.

Validating instance configuration at rome4:3306...

This instance reports its own address as rome4:3306

Instance configuration is suitable.
* Checking transaction state of the instance...

NOTE: The target instance 'rome4:3306' has not been pre-provisioned (GTID set is empty).
The Shell is unable to decide whether incremental state recovery can correctly provision it.
The safest and most convenient way to provision a new instance is through automatic clone
provisioning, which will completely overwrite the state of 'rome4:3306' with a physical
snapshot from an existing cluster member. To use this method by default, set the
'recoveryMethod' option to 'clone'.

The incremental state recovery may be safely used if you are sure all updates ever executed
in the cluster were done with GTIDs enabled, there are no purged transactions and the new
instance contains the same GTID set as the cluster or a subset of it. To use this method by
default, set the 'recoveryMethod' option to 'incremental'.


Please select a recovery method [C]lone/[I]ncremental recovery/[A]bort (default Clone): c
Monitoring Clone based state recovery of the new member. Press ^C to abort the operation.
Clone based state recovery is now in progress.

NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.

* Waiting for clone to finish...
NOTE: rome4:3306 is being cloned from rome1:3306
** Stage DROP DATA: Completed
** Clone Transfer  
    FILE COPY  ############################################################  100%  Completed
    PAGE COPY  ############################################################  100%  Completed
    REDO COPY  ############################################################  100%  Completed

NOTE: rome4:3306 is shutting down...

* Waiting for server restart... ready 
* rome4:3306 has restarted, waiting for clone to finish...
** Stage RESTART: Completed
* Clone process has finished: 4.30 GB transferred in 4 sec (1.08 GB/s)

* Configuring Read-Replica managed replication channel...
** Changing replication source of rome4:3306 to rome1:3306

* Waiting for Read-Replica 'rome4:3306' to synchronize with Cluster...
** Transactions replicated  ############################################################  100% 

<strong>'rome4:3306</strong><strong>' successfully added as a Read-Replica of Cluster 'rome'.</strong>

And with that ease, a new replica was added to the Cluster. As well as with regular Cluster members, Clone or Incremental provisioning is also available for Read Replicas.

As usual, the Cluster's status can be checked with <Cluster>.status():

mysqlsh-js> <strong>cluster.status()</strong>
{
    "clusterName": "rome", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "rome1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "rome1:3306": {
                "address": "rome1:3306", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                <strong>"readReplicas": {
                    "rome4:3306</strong><strong>": {
                        "address": "rome4:3306</strong><strong>", 
                        "role": "READ_REPLICA", 
                        "status": "ONLINE", 
                        "version": "8.1.0"
                    }
                }</strong>, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.1.0"
            }, 
            "rome2:3306": {
                "address": "rome2:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.1.0"
            }, 
            "rome3:3306": {
                "address": "rome3:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.1.0"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "rome1:3306"
}

You may have noticed that the new Read Replica, running on rome4:3306, is placed under the Cluster's primary, rome1:3306, which indicates it's acting as a replica of it, i.e. it's replicating from it.

Replicating from the primary has advantages, such as reducing any potential replication lag, however, it may also increase the load on the primary. To allow flexibility and power of choice depending on the use case, we have made that configurable.

Under the Hood: Follow the source

As shown above, Read Replicas will replicate from the primary member by default and, in the event of a failover of switchover, will re-establish the connection to the new primary of the Cluster automatically. In other words, will always follow the primary and be kept synchronized with the Cluster.

This is achieved with the asynchronous replication connection failover feature of MySQL's replication. Replicas check Group Replication it's status and stores a list of potential source servers, and in the event of a connection failure to the current source, a new one is selected from that list.

That list is either manually or automatically populated, with the latter supported for Group Replication topologies. In that case, the failover mechanism is able to monitor changes to the group membership and add / remove candidates from the list accordingly. On top of that, it's also capable of distinguishing between Primary and Secondary members. This happens in the same way replica clusters in MySQL InnoDB ClusterSet replicates from the primary cluster.

Having that framework with such flexibility, the AdminAPI can manage and configure Read Replicas according to each user's needs.

Under the Hood: Configuring Read Replicas

Choose your failover candidate(s)

Replicating from the primary has advantages, such as reducing potential replication lag, however, it may also increase the load on the primary.

We have learned how the automatic failover mechanism works, so we understand the different possible configurations. In summary, there are three different approaches:

Prefer to replicate from the PRIMARY
Prefer replicating from one of the SECONDARIES
Have a LIST of possible candidates

To make that easily configurable, it can be either defined while adding a new Read Replica or can changed at any time for an existing one. In the diagram below, the replicas in Rome are configured to follow the secondary while the replicas in Brussels are configured to replicate from the primary.

As an example, let's add a new Replica to the Cluster using a specific instance as source and with a predefined list of failover candidates:

mysqlsh-js> <strong>cluster.addReplicaInstance("rome5:3306", {replicationSources:</strong> <strong>["rome2:3306", "rome3:3306"]})</strong><strong>
</strong>Setting up 'rome5:3306' as a Read Replica of Cluster 'rome'.

Validating instance configuration at rome5:3306...

This instance reports its own address as rome5:3306

Instance configuration is suitable.
* Checking transaction state of the instance...
NOTE: A GTID set check of the MySQL instance at 'rome5:3306' determined that it is missing transactions that were purged from all cluster members.
NOTE: The target instance 'rome5:3306' has not been pre-provisioned (GTID set is empty). The Shell is unable to determine whether the instance has pre-existing data that would be overwritten with clone based recovery.
The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of 'rome5:3306' with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.


Please select a recovery method [C]lone/[A]bort (default Clone): c
* Waiting for the donor to synchronize with PRIMARY...
** Transactions replicated  ############################################################  100% 


Monitoring Clone based state recovery of the new member. Press ^C to abort the operation.
Clone based state recovery is now in progress.

NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.

* Waiting for clone to finish...
NOTE: rome5:3306 is being cloned from rome2:3306
** Stage DROP DATA: Completed
** Clone Transfer  
    FILE COPY  ############################################################  100%  Completed
    PAGE COPY  ############################################################  100%  Completed
    REDO COPY  ############################################################  100%  Completed

NOTE: rome5:3306 is shutting down...

* Waiting for server restart... ready 
* rome5:3306 has restarted, waiting for clone to finish...
** Stage RESTART: Completed
* Clone process has finished: 4.30 GB transferred in 6 sec (717.27 MB/s)

* Configuring Read-Replica managed replication channel...
** Changing replication source of rome5:3306 to rome2:3306

* Waiting for Read-Replica 'rome5:3306' to synchronize with Cluster...
** Transactions replicated  ############################################################  100% 

<strong>'rome5:3306</strong><strong>' successfully added as a Read-Replica of Cluster 'rome'.
</strong>

That completes the addition of a new Read Replica running on rome5:3306.

This Replica uses a fixed list of failover candidates, composed of rome2:3306 and rome3:3306 being the first member of that list the current active source. The remaining members of the list are the potential failover candidates and the order dictates the weight, being the first one the one with the highest weight, and the last one the one with the lowest.

Let's check the Cluster's description for a more visual understanding of the current topology:

 
mysqlsh-js> <strong>cluster.describe()</strong>
{
    "clusterName": "rome", 
    "defaultReplicaSet": {
        "name": "default", 
        "topology": [
            {
                "address": "rome1:3306", 
                "label": "rome1:3306", 
                "role": "HA"
            }, 
            {
                "address": "rome2:3306", 
                "label": "rome2:3306", 
                "role": "HA"
            }, 
            {
                "address": "rome3:3306", 
                "label": "rome3:3306", 
                "role": "HA"
            }, 
            <strong>{
                "address": "rome4:3306</strong><strong>", 
                "label": "rome4:3306</strong><strong>", 
                "replicationSources": [
                    "PRIMARY"
                ], 
                "role": "READ_REPLICA"
            }, </strong>
            <strong>{
                "address": "rome5:3306</strong><strong>", 
                "label": "rome5:3306</strong><strong>", 
                "replicationSources": [
                    "rome2:3306", 
                    "rome3:3306"
                ], 
                "role": "READ_REPLICA"
            }</strong>
        ], 
        "topologyMode": "Single-Primary"
    }
}

Also, using the extended status, we can see more information about the current topology:

 
mysqlsh-js> <strong>cluster.status({extended:1})</strong>
{
    "clusterName": "rome", 
    "defaultReplicaSet": {
        "GRProtocolVersion": "8.0.27", 
        "communicationStack": "MYSQL", 
        "groupName": "33cfdab9-3469-11ee-9f3b-d08e7912e4ee", 
        "groupViewChangeUuid": "33cfe2b0-3469-11ee-9f3b-d08e7912e4ee", 
        "groupViewId": "16913336945761559:7", 
        "name": "default", 
        "paxosSingleLeader": "OFF", 
        "primary": "rome1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "rome1:3306": {
                "address": "rome1:3306", 
                "applierWorkerThreads": 4, 
                "fenceSysVars": [], 
                "memberId": "e304af5d-3466-11ee-8d97-d08e7912e4ee", 
                "memberRole": "PRIMARY", 
                "memberState": "ONLINE", 
                "mode": "R/W", 
                <strong>"readReplicas": {
                    "rome4:3306</strong><strong>": {
                        "address": "rome4:3306</strong><strong>", 
                        "applierStatus": "APPLIED_ALL", 
                        "applierThreadState": "Waiting for an event from Coordinator", 
                        "applierWorkerThreads": 4, 
                        "receiverStatus": "ON", 
                        "receiverThreadState": "Waiting for source to send event", 
                        "replicationLag": null, 
                        "replicationSources": [
                            "PRIMARY"
                        ], 
                        "replicationSsl": "TLS_AES_256_GCM_SHA384 TLSv1.3", 
                        "role": "READ_REPLICA", 
                        "status": "ONLINE", 
                        "version": "8.1.0"
                    }
                }</strong>, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.1.0"
            }, 
            "rome2:3306": {
                "address": "rome2:3306", 
                "applierWorkerThreads": 4, 
                "fenceSysVars": [
                    "read_only", 
                    "super_read_only"
                ], 
                "memberId": "e6eb91c6-3466-11ee-aca6-d08e7912e4ee", 
                "memberRole": "SECONDARY", 
                "memberState": "ONLINE", 
                "mode": "R/O", 
                <strong>"readReplicas": {
                    "rome5:3306": {
                        "address": "rome5:3306", 
                        "applierStatus": "APPLIED_ALL", 
                        "applierThreadState": "Waiting for an event from Coordinator", 
                        "applierWorkerThreads": 4, 
                        "receiverStatus": "ON", 
                        "receiverThreadState": "Waiting for source to send event", 
                        "replicationLag": null, 
                        "replicationSources": [
                            "rome2:3306", 
                            "rome3:3306"
                        ], 
                        "role": "READ_REPLICA", 
                        "status": "ONLINE", 
                        "version": "8.1.0"
                    }
                }, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.1.0"
            },</strong> 
            "rome3:3306": {
                "address": "rome3:3306", 
                "applierWorkerThreads": 4, 
                "fenceSysVars": [
                    "read_only", 
                    "super_read_only"
                ], 
                "memberId": "ea08833f-3466-11ee-b87c-d08e7912e4ee", 
                "memberRole": "SECONDARY", 
                "memberState": "ONLINE", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": "applier_queue_applied", 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.1.0"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "rome1:3306", 
    "metadataVersion": "2.2.0"
}

Change your failover candidate(s)

Changing the sources configuration is as easy as changing an instance option. Cluster.setInstanceOption() was extended with a new option to allow that operation: replicationSources.

Let's instruct our first Read Replica to prefer replicating from a Secondary instance rather than the Primary.

mysqlsh-js> <strong>cluster.setInstanceOption("rome4:3306", "replicationSources", "secondary")</strong>
Setting the value of 'replicationSources' to 'secondary' in the instance: 'rome4:3306' ...

WARNING: To update the replication channel with the changes the Read-Replica must be reconfigured using Cluster.rejoinInstance().
<strong>Successfully set the value of 'replicationSources' to 'secondary' in the cluster member: 'rome4:3306</strong><strong>'.</strong>

And, for the new settings to have immediate effect, we need to force a rejoin of the instance in the Cluster using a very well known AdminAPI command, Cluster.rejoinInstance():

mysqlsh-js> <strong>cluster.rejoinInstance("rome4:3306")</strong>
Rejoining Read-Replica 'rome4:3306' to Cluster 'rome'...

* Checking transaction state of the instance...
The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of 'rome4:3306' with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.

The incremental state recovery may be safely used if you are sure all updates ever executed in the cluster were done with GTIDs enabled, there are no purged transactions and the new instance contains the same GTID set as the cluster or a subset of it. To use this method by default, set the 'recoveryMethod' option to 'incremental'.

Incremental state recovery was selected because it seems to be safely usable.

NOTE: User 'mysql_innodb_replica_2506922964'@'%' already existed at instance 'rome1:3306'. It will be deleted and created again with a new password.
** Changing replication source of rome4:3306 to rome2:3306

* Waiting for Read-Replica 'rome4:3306' to synchronize with Cluster...
** Transactions replicated  ############################################################  100% 

<strong>Read-Replica 'rome4:3306</strong><strong>' successfully rejoined to the Cluster 'rome'.</strong>

NOTE: Read Replicas can be removed from their Cluster like a regular instance, i.e. using Cluster.removeInstance()

Routing Reads To The Replica

MySQL Router plays a crucial role in InnoDB Cluster and as such, it has now full awareness of Read Replicas.

Sitting between your application and the Cluster, redirecting client traffic to the right destinations, Router is able to use Read Replicas for Read-Only requests. But Secondary members too, so how does it choose?

Configuring Router's destination pool

By default, Router's behavior remains unchanged, i.e. read traffic is routed to the Cluster's secondary members. However, that is now configurable.

The "read_only_targets" mode, accepts the possible following behaviors:

secondaries: Only SECONDARY members of the target Cluster should be used for R/O traffic (default).
read_replicas: Only Read Replicas of the target Cluster should be used for R/O traffic.
all: All Read Replicas of the target Cluster should be used along with the other SECONDARY Cluster members for R/O traffic.

The mode is configurable with another well known command, .setRoutingOption(), that can be used to configure the mode at Router level, Cluster level, or ClusterSet level.

For example:

mysqlsh-js> <strong>cluster.setRoutingOption("read_only_targets", "all")</strong>
Routing option 'read_only_targets' successfully updated.

And the current configuration in place, can be checked with .routingOptions():

 
mysqlsh-js> <strong>cluster.routingOptions()</strong>
{
    "clusterName": "rome", 
    "global": {
        "read_only_targets": "all", 
        "stats_updates_frequency": null, 
        "tags": {}
    }, 
    "routers": {
        "domus::": {}
    }
}

The following diagram shows a more complex topology, with a ClusterSet composed of a Primary Cluster (Rome) and a Replica Cluster (Brussels) with multiple Routers deployed.

In detail, the Primary Cluster is a 3-member Cluster with 3 Read Replicas attached to it. The Read Replicas are using 'sourcesList: "secondary' so they replicate from the Secondary members of the Cluster.

The 2 Routers deployed in that Data Center, are using the Primary Cluster as target_cluster, and are configured to use all read targets, i.e. 'read_only_targets: all'.

The Replica Cluster, on the other Data Center, is also a 3-member Cluster and has 3 Read Replicas, being those configured to replication from the Primary member of it. The Routers have different configurations, being the first one configured to use the Primary Cluster as target_cluster, and as read only targets, only Read Replicas, i.e. 'read_only_targets: read_replicas'. The other Router is configured to use Brussels as the target_cluster, and only Secondary members as read-only targets, i.e. 'read_only_targets: secondaries'.

Health Checking and Quarantine

MySQL Router is a stateless service, so it depends on the InnoDB Cluster membership information to perform accurate routing. Group Replication being based on a Paxos implementation provides a group membership service that defines which servers are online and participating in the group. By leveraging that information, Router avoids connecting to each of the members to check its state.

However, when Read Replicas are also part of the Cluster, that information is not available in the group's membership info and Router cannot rely on that mechanism. But also, the Group membership information might not be accurate since that it conveys how the Cluster members perceive the availability among themselves that may be different from Router's perspective.

To tackle those challenges, Router implements a built-in quarantine mechanism.

In short, when Router attempts a connection to a destination endpoint as a result of a new user connection, and it fails, it puts that destination in quarantine. But the destination won't be in quarantine indefinitely, there is a timeout that as soon as it's hit, Router performs a health check on that server to determine whether it can be removed from quarantine or not.

The quarantine mechanism is configurable, on two fronts:

The quarantine threshold: how many failed connection attempts are accepted until a server is put in quarantine.
How frequently should the quarantined servers be checked for health

By default, both settings have the value of 1, meaning that a failed connection results in the target instance being put in quarantine, and each second a health-check is performed to access whether the quarantined instances can be removed from the quarantine pool or not.

Replication Lag

Considering that Read Replicas use asynchronous replication, replication lag is a common factor to be taken into consideration. If an instance is lagging behind, the DBA may want to hide it from applications until it catches up, and replication lag is reduced.

But in addition, the DBA may want to hide a Read Replica from traffic to:

Perform rolling upgrades on it without disrupting incoming traffic
Performing maintenance operations or configuration changes on it without having to stop MySQL
Exclude it from the read workload while backups are taken or reports are generated to avoid affecting other queries
Exclude it from any read traffic because its intended to be a backup server

Likewise with regular Cluster members, this can be achieved by instructing Router to not use marking it with a specific tag, which is covered in the next section.

Hiding Replicas

To hide instances from any Router traffic, the built-it '_hidden' tag can be used and is set very easily with the .setInstanceOption() command:

mysqlsh-js> <strong>cluster.setInstanceOption("rome5:3306", "tag:_hidden", true)</strong>

Note that this functionality is not limited to replicas, secondaries can also be marked as hidden.

Summary

Read Replicas allow scaling and improving the performance of read-intensive workloads, while offloading other Cluster members, and additionally providing even more data redundancy to the dataset.

More Information

For more information please consult the documentation on the topic:

MySQL InnoDB Cluster Read Replicas Documentation

As always, we are eager to listen to the community feedback. You can reach us at #shell and #mysql_innodb_cluster in Slack: https://mysqlcommunity.slack.com/

If you find a problem or a missing feature, please reach out to us via a bug report, or a support ticket.

Resources:

Download our MySQL Community Shell & Server packages at https://dev.mysql.com/downloads/.
MySQL Enterprise binaries are available on My Oracle Support: https://support.oracle.com (Choose the “Patches & Updates” tab, and then choose the “Product or Family (Advanced Search)” side tab in the “Patch Search” portlet.)
or use Oracle Software Delivery Cloud on http://edelivery.oracle.com/ to download MySQL Enterprise binaries.

Enjoy, and Thank you for using MySQL!