MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Enhanced support for large transactions in Group Replication

MySQL 8.0.16 is out and it enhances Group Replication as usual. For example, the group_replication_exit_state_action system variable has a new default value, and members that the group expels can now automatically try to rejoin. This post presents a new feature MySQL 8.0.16 brings to Group Replication: message fragmentation.

Context

As you may be aware, Group Replication currently uses XCom, a group communication engine, to, among other things, atomically broadcast messages (transactions) to the group’s members and detect when group members fail. Each group member’s Group Replication plugin forwards messages to its local XCom instance when it needs to broadcast the messages to the group. XCom eventually delivers those messages in the same order to the Group Replication plugin of each group member.

XCom is implemented using a single thread of execution which is responsible for broadcasting messages as well as for deciding whether some group member has failed. When your application makes some member broadcast a big message—which may vary according to your system and workload—the XCom thread has to spend more time than usual processing that big message. If the XCom thread of a member is busy processing the big message for too long, it may look to the other members’ XCom instances like the busy member failed. If so, the group may evict the busy member from the group.

MySQL 8.0.13 introduced one way to ameliorate this situation in the form of the group_replication_member_expel_timeout system variable. You can use it to tune the time it takes between a member being suspected of having failed and actually expelling it from the group. For example, waiting for enough time before evicting a member that the group suspects to have failed, but that is actually busy processing a big message, may give the busy member enough time to finish processing the big message and be seen as alive by the group again. Increasing the member expulsion timeout in this case is a trade off, like most decisions: on one side, you can tune the timeout such that big messages do not lead the group to evict busy members; on the other, the system will take more time to evict members that actually fail.

Message fragmentation

This post presents a new feature that the Group Replication plugin from MySQL 8.0.16 onwards can use to cope with big messages: message fragmentation. In a nutshell, you can specify a maximum size for the messages that a member broadcasts to the group. Messages that exceed the maximum size are fragmented into smaller chunks. You can specify the maximum size you allow using the group_replication_communication_max_message_size system variable. The default is 10 MiB, which we expect to circumvent the big message issue in most cases.

Example

Let us go through an explanation of the new feature using an example. Figure 1 shows the new feature at work when the middle green member broadcasts a message to the group.

Fragmenting an outgoing message.
Figure 1. Fragmenting an outgoing message.
  1. If the message size exceeds the maximum size that the user allows (group_replication_communication_max_message_size), the member fragments the message into chunks that do not exceed the maximum size.
  2. The member broadcasts each chunk to the group, i.e. forwards each chunk individually to XCom.

XCom eventually delivers the chunks to the group members. Figures 2a–c show the new feature at work when the group members deliver the chunks that the middle green member sent.

Reassembling an incoming message: first fragment.
Figure 2a. Reassembling an incoming message: first fragment.
  1. The members conclude that the incoming message is actually a fragment of a bigger message.
  2. The members buffer the incoming fragment because they conclude the fragment is a chunk of a still-incomplete message. (Fragments contain the necessary metadata to reach this conclusion.)
Reassembling an incoming message: second fragment.
Figure 2b. Reassembling an incoming message: second fragment.
  1. See step 3 above.
  2. See step 4 above.
Reassembling an incoming message: last fragment.
Figure 2c. Reassembling an incoming message: last fragment.
  1. The members conclude that the incoming message is actually a fragment of a bigger message.
  2. The members conclude that the incoming fragment is the last chunk missing, reassemble the original, whole message, and process it.

Conclusion

MySQL 8.0.16 is out. Group Replication can now ensure that the size of messages exchanged within the group do not exceed a user-defined threshold. This can help you prevent the group from evicting members when your workload generates big messages. Give the new feature a spin and let us know what you think.