MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
MySQL Group Replication: A Quick Start Guide

With the new Group Replication Beta release—0.8 for MySQL 5.7.14—available for download on labs.mysql.com, I expect more and more people to begin engaging in various forms of proof-of-concept attempts. So I wanted to create a “quick start guide” (to replace this outdated getting started post) focused on an example/test Linux environment, that covers how to get started, what a full working example test setup can look like, and also include some best practice and operations advice along the way.

Note: While I use Linux in the example walkthrough, Group Replication will be supported on all MySQL 5.7 supported platforms. For the 0.8 Beta release we’ve provided a common subset of packages for testing (all 64 bit): Ubuntu 16.04, EL6, EL7, FreeBSD 10, Solaris 11 (SPARC and x86), and Windows.

What is Group Replication?

Group Replication is a plugin for the standard MySQL 5.7 Server. This plugin provides virtually synchronous replication, with built-in conflict detection/handling and consistency guarantees, all of which supports multi-master write anywhere usage. It allows you to move from a stand-alone instance of MySQL, which is a single point of failure, to a natively distributed highly available MySQL service (the Group Replication set) that’s made up of N MySQL instances (the group members). Then individual machines and/or MySQL instances can fail or be taken offline for maintenance while the distributed MySQL service continues to operate and handle application traffic.

Note: Group Replication is all about providing highly available replica sets; data and writes are duplicated on each member in the group. For scaling beyond what a single system can bear, you will need an orchestration and sharding framework built around N Group Replication sets, where each replica set maintains and manages a given shard or partition of your total dataset. This type of setup—often called a “sharded cluster” today—will allow you to scale reads and writes linearly and without limit. Stay tuned for future announcements on related work!

Why Would I Use Group Replication?

If MySQL downtime would have a serious impact on your business and its operations, then you should use Group Replication. In practice, this means that most user-facing production systems should leverage Group Replication as it provides a native means for making a MySQL database highly available.

Where Can I Get It?

Given the early development stages, it’s always important to start with the latest available packages. Before GA, they will be available on labs.mysql.com, and after GA they will be available as part of the standard MySQL 5.7 Server releases (just as, for example, the Semi-Sync replication plugin is today).

Group Replication has also been marked as a “rapid” server plugin (see the informal rapid plugin concept introduced here), so after the initial GA release you can expect new releases of the plugin—that contain bug fixes and new features—in later MySQL 5.7 Enterprise and Community builds. The rate of bug fixes and improvements are quite high with every passing month—a testament to the QA and development teams—so you should always start with the latest and plan to upgrade with some regularity. I would note that the ability to do rolling upgrades also helps us significantly here.

How Would I Set It Up?

You would then install the given MySQL 5.7 packages that include the Group Replication plugin (the group_replication.so file located in the @@global.plugin_dir directory) on the machines that you want to participate in Group Replication. You want these machines to be on different physical hardware if possible—as hardware fails, and you need redundancy there as well—with a reliable low latency network connection between all of them (as they need to coordinate, communicate, form consensus, and of course replicate data between them).

You also want the group to consist of at least 3 members because the consensus protocol used for writes relies on majority consensus. So with 3 members, one can fail and the other two can continue on w/o needing agreement from the failed member on the writes, as 2/3rds is still a majority. You also don’t generally want to have an even number, because it’s then too easy to end up with a split-brain situation where both sides—the even number of group members on either side of the network partition, or the half of the members left which simply don’t know if the other half of the group has failed or it just can’t communicate with them—will block writes in order to preserve consistency, requiring manual intervention to unblock it by forcing a reconfiguration of the group membership using the group_replication_force_members option.

Note: From here on out, I’ll use a 3 member group as the example setup—consisting of hosts hanode2/192.168.1.92, hanode3/192.168.1.93, and hanode4/192.168.1.94—where each node is running Oracle Linux 7. A 3 node group is a good starting point, and you can always go from 3 to 5 (or more) later.

Once you have the packages installed on all the machines that you want to participate, we’ll then move on to setting the instances of MySQL up for participating in Group Replication in order to form a single highly available service. Below is the basic MySQL configuration that I use on all 3 members in my lab setup, this one being specifically from hanode2/192.168.1.92 (some things are variable, e.g.: server-id, group_replication_local_address, group_replication_group_seeds). This can give you a good place to start, modifying the parameters to fit your setup (filesystem paths, IP addresses, etc.):

Aside from the static config file, there’s also one runtime configuration step needed. We need to specify valid MySQL account credentials that this node can use when requesting GTIDs from existing members of the group—any available seed or donor from the those listed in group_replication_group_seeds—when it’s necessary to perform an automated recovery (e.g. when taking the node offline for maintenance and then having it re-join the group):

Note: If SELinux is enabled—you can check with sestatus -v—then you’ll need to enable the use of the Group Replication communication port (6606 in my example config) by mysqld so that it can bind to it and listen there. You can see what ports MySQL is currently allowed to use with semanage port -l | grep mysqld and you can add the necessary port with (using 6606 again from my example config) semanage port -a -t mysqld_port_t -p tcp 6606 .

Note: If iptables is enabled, then you’ll also need to open up the same port for communication between the machines. You can see the current firewall rules in place on each machine with iptables -L and you can allow communication over the necessary port with (again using port 6606 from my example config) iptables -A INPUT -p tcp --dport 6606 -j ACCEPT .

How Can I Start or Bootstrap the Group?

A Group Replication set is meant to be alive 24/7/365, so the initial bootstrapping of the group is a special case. We need to pick one member and declare that it is the bootstrap node by setting group_replication_bootstrap_group=ON. This means that:

  1. It will not try and participate in any group communication when starting but will instead configure the group as consisting only of itself.
  2. Any subsequent member that attempts to join the group will sync itself up with the state of this instance.

Just remember to turn group_replication_bootstrap_group=OFF again after the first member is up. You can verify that it’s bootstrapped the group by looking to see that it lists itself as the only ONLINE member:

How Can I Add Additional Members?

Now that we’ve bootstrapped the group (we have a group made up of only one member), we can add additional members (2 more in our 3 node test setup). In order to add a new member to an existing group, you need to take the following steps:

  1. Take a backup from one of the current members of the group using your preferred MySQL backup client tool, e.g. mysqldump or mysqlbackup. You would then restore that backup onto the node that we want to add to the group, thus applying a snapshot of the state (tables, rows, and other SQL objects, along with the GTID metadata) from that current member of the group. This part is no different than when setting up a slave in a standard async MySQL master/slave replication setup.
  2. Set up the configuration file so that this new node can participate in group replication generally (see the two group_replication specific sections in the example config above), and become a member of this group specifically (group_replication_group_name).
  3. Specify valid MySQL credentials that this node will use when requesting GTIDs from existing members of the group (a seed or donor) necessary to perform an automated recovery (such as when joining the group, which we’ll do next): CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' FOR CHANNEL 'group_replication_recovery';
  4. Have the new node join to become a member with: STOP GROUP_REPLICATION; START GROUP_REPLICATION; (STOP is necessary because we have start_on_boot enabled).

At this point after the node has joined the group and become a member, it will enter the RECOVERING state where it will automatically sync up with the rest of the group—using the MySQL account credentials we specified in step 3—by requesting and applying all GTIDs which the group has executed but the joining node has not (any transactions the group has executed since the snapshot taken via the backup in step 1). Once the node has reached a synchronization point with the group (also referred to as a view), then its state will go from RECOVERING to ONLINE as it is now a fully functioning participant in the virtually synchronous replication group.

Note: You should also now execute step 3 (the CHANGE MASTER TO) on the node we used to bootstrap the group, if you haven’t previously done so. Then that node can later perform any automated recovery work as needed. If you’re unsure, you can check the current credentials in place with: select user_name, user_password from mysql.slave_master_info where channel_name = 'group_replication_recovery';

How Can I Monitor the Group and Member Status?

For past status and to piece together a timeline and process flow across the group, you would piece together timestamped MySQL error log messages on each member. All messages will be prepended with the Plugin group_replication tag. For example:

How Can I Debug Failures?

Your primary tools are the same ones noted above for monitoring. When something goes wrong, your primary tool for figuring out what happened and why are the MySQL error logs on each node. We have pretty good info/debug style logs today, that cover the entire code path: MySQL_Server->Group_Replication_plugin->MySQL_GCS_library (see upcoming blog posts that cover the internal MySQL Group Communication System component in more detail). If something fails or doesn’t work as expected, always look to the error log for answers.

What About Application Traffic Routing and Failover?

While we work on a full end-to-end solution around Group Replication—for routing, sharding, automation/orchestration, administration, and more—we can use HAProxy for the client traffic routing in our test setup, utilizing its easily customizable nature to specify routing checks and rules specific to Group Replication—we only want it to route connections to ONLINE members of the group that are in the primary partition and not in read-only mode. In order to do this, I’ve installed HA Proxy 1.5 on one of my primary physical lab machines—a 4th machine in this test setup: mylab/192.168.1.10—which is running Oracle Linux 6, where the yum repo contains HAProxy 1.5.4.

Until we’re able to add some similar things to the official SYS schema, let’s first go ahead and add the following helpful functions and views on any one member of the group so that they get replicated everywhere (we’ll leverage these in the HAProxy setup next):

We can then use the gr_member_routing_candidate_status view in a shell script—mysql_gr_routing_check.sh—which we’ll then place on all 3 Group Replication member machines (modify the username and password to match valid accounts on your 3 Group Replication members):

I then copied mysql_gr_routing_check.sh to /var/lib/mysql-files — which is a directory owned by mysql:mysql and that has 750 permissions, so that we have at least some basic security in place: drwxr-x---. 2 mysql mysql 60 Aug  2 13:33 /var/lib/mysql-files — on all three member machines. We can then leverage xinet.d to provide HAProxy with the ability to call that shell script using its httpchk module over port 6446 by first creating a service file called /etc/xinetd.d/mysql_gr_routing_check (modify the port used and IP CIDR range allowed as needed) on all three members:

Note: If you don’t have xinetd installed, you can install it this way on many RPM based distros yum install xinetd. You would then also need to ensure that the xinetd service is enabled using chkconfig or systemctl etc.

Note: You will also need to specify the service/port combination used in /etc/services in order for xinetd to successfully load the service. So for my example here, make sure that you have this line in there (replacing any existing MySQL Proxy line for the same port):

Once our new xinetd service config file is in place, let’s (re)start the service with service xinetd restart or systemctl restart xinetd (etc.) to be sure that our new mysql_gr_routing_check service config file is loaded and active. You can check syslog to see that the service was loaded OK and there were no errors.

Let’s now verify that we can get the routing candidate’s status from port 6446 on any of the three member’s machines with:

We can then modify HAProxy’s /etc/haproxy/haproxy.cfg config file on the machine we want to use for MySQL traffic routing—the application’s single-point-of-entry (mylab/192.168.1.10 in my test setup) for our distributed and highly available MySQL service—and utilize the new HTTP service available on port 6446 on each of the Group Replication member candidates by adding the following at the end of the file:

Note: If you would instead prefer to do active/passive or “first available” style routing, where all connections get routed to one node and the others serve only as backups, then you can declare the server list this way:

We’ll then need to restart the haproxy service with service haproxy restart or systemctl restart haproxy (etc.) so that the new config file is loaded and active. We can then use the mysql command-line client to verify that MySQL traffic is getting routed to all 3 member nodes now as we expect:

What’s Next?

As you can see, getting a working MySQL service consisting of 3 Group Replication members is not an easy “point and click” or orchestrated single command style operation. The entire process is ripe for orchestration tooling and a new MySQL Router plugin. Our goal is to have an integrated easy to use end to end solution, with Group Replication as its foundation. So please stay tuned as we complete the bigger native MySQL HA picture built around Group Replication!

Conclusion

I hope that you’re able to try out the latest Group Replication Beta, and if you are, that this quick start guide makes the process easier for you. If you have any questions or feedback, please feel free to post a comment here or reach out to me directly. I’d love to get your thoughts on how we can make it better as we approach GA, and beyond.

If you encounter any issues with the release, I would encourage you to file a bug report or open a support ticket. Your feedback is really critical to myself and the rest of the team as we work on Group Replication!

As always, THANK YOU for using MySQL!