The multi master plugin for MySQL is here “MySQL Group Replication“. It is a virtual synchronous solution for MySQL with conflict detection. It also supports automatic group membership management, failure detection and automatic distributed recovery.
With the introduction of this new feature there was a need to perform some good amount of testing as it involves complex functionalities like :
- Servers execute local transactions and broadcasts the update to the group.
- All servers in the group, even the sender, receive same transaction in the same order and check for conflicts.
- All servers, independently, decide to commit the transaction – no conflicts.
- A new member can join an existing group, so in this case we need distributed recovery to bring it at par with the other servers.
So a great deal of effort has been spent on testing these functionality and making MySQL Group Replication stable.
STEPS TAKEN TO TEST MySQL GROUP REPLICATION
Making the existing MTR rpl suite agnostic to Group Replication
We need to make the MTR rpl suite work with Group Replication, so that we can run it against our significant regression test set. To implement this we need to identify those tests which are supported by Group Replication, those which needs some modification and those that are not supported at all due to one or more of the limitations of Group Replication.
One cannot just turn on the necessary mysqld switches and expect all of the test cases to pass due to differences in behavior when run with Group Replication.
Some of the challenges that we expected were:
- Tests that do not run in row based replication mode.
- Run with MyISAM storage engine.
- Tests that rely on positon ie. Test that do not run with GTID_MODE=ON.
- Tests that do not use table with primary key.
To overcome these challenges, we decided to disable them for now before starting a gradual adaptation towards a fully functional MTR suite.
To run the compatible tests, MTR was adapted in the following way:
- To run the tests with group replication, we need a group_name that must be generated or passed manually by the user. In order to generate it we use the UUID() function to get a valid UUID which is set during the server initialization step in the master-slave.inc call. On the other hand, to set group_name manually the user
needs to use the new mtr option: $rpl_skip_group_replication_start and pass a valid group_name. - We need to skip the CHANGE MASTER and other method calls that are only valid for Async Replication, by using a new mtr option: $rpl_group_replication, used when running the tests with Group Replication.
- We use the new WAIT_FOR_EXECUTED_GTID_SET function for syncing the servers on the group. A new mtr option has been introduced: $wait_for_executed_gtid_set, which is used in Group Replication runs.
Tests that have been covered
Using the regression testing we have tried to cover most of the possible scenarios that could be met with Group Replication in place. Tests dealing with automatic distributed recovery, transaction application, certification outcomes both positive and negative have also been tested. Also testing of unction related to syncing of the servers in the group and different performance schema tables has been done. The behavior of Group Replication when started automatically and after server restarts has been tested.
Distributed Testing
MySQL Group Replication ensures synchronous updates on any member in a group of MySQL servers, with conflict handling and failure detection. Distributed recovery is also in the package to ease the process of adding new members. In fact, due to the nature of group based mechanisms, the group should not only support the planed exit and addition of new members, but also machines and process crashes.
To test this stability and the normal functioning of the group as whole, we need to test the behavior of the group in a highly distributed scenario. These will go from testing basic replication features, passing by server provisioning tests and including also stress tests with server and machine crashes.
The new tests used to evaluate the new MySQL multi master features and its stability under group changes are :
- Basic test for group based replication
The base test for group based replication will then consist of test cases for:
- Basic plugin installation routines (implicit on the test setup)
- Basic member join primitives and recovery
- Basic member state change
- Basic query handling
- We also need to test regarding the provisioning of servers in Group Replication scenarios. Some of the provisioning and recovery scenarios tested here are
- Blank server and normal recovery.
Then together with recovery:
- Blank server and data replayed from mysql dump
- Integral copy of one of the members data directory.
- Copied binlogs after data replayed from dump
- Copying the data directory and faking transactions
- Copying the data directory and setting GTID_PURGED
- We need to test the basic state change testing which is intended to evaluate the state changes in the group. This test focus on well defined state changes and the verification of group reactions. Some of the cases identified for this include :
- When a new member joins a group.
- When a group with more than 2 members remaining in the group and also the case when the group remains with only 2 members.
- The group should be agnostic to MySQL servers crashing or machine crashing and return online as a the group member.
- Basic concurrency test
This test consist on the execution of concurrent requests where some transactions won’t be certified and are therefore rolled back. The workload consists of concurrent updates to the same table, to a set of different columns.
The objectives of was to :
- Know that the negative path of certification works and that the client can handle that gracefully.
- Test the certification stats table data.
- Stress test
The stress tests intends to simulate a live group, emulating many of the events expected to be happening during its lifetime.
This was tested with setup involving the following steps :
- Initialization of N members in the group : We start with a group of N members, that in number should not go below 2 members during the test lifetime.
- Execute queries in parallel to the test load. Also start a parallel thread to the test where DML queries are executed in random online members. These queries wont be conflicting as this is not the test subject here, but will serve to attest the group consistency.
- Provoke a random group event while executing queries like :
- Add and removing a member,
- Crash a server or killing a machine
- Verify group consistency by checking data on all the members similar and also the other stats should be in sync on all of the group members.
CONCLUSION
Having used the above two approaches for testing MySQL Group Replication we have greatly improved the test coverage and tested the functionality. This has also allowed us to identify problem related to the distributed nature of Group replication.
If you have any questions or feedback regarding this project, please post them here. I would love to hear what the community thinks about all of this.