WL#11615: MYSQL GCS: Improve XCom Cache Management

Affects: Server-8.0   —   Status: Complete

Introduction

=============

XCom keeps the messages exchanged as a part of the consensus protocol in a cache. The XCom cache is the single most important consumer of memory in the GCS system as a whole and it is one of its most essential components. Among other things, the cache is used for node recovery: if an XCom node becomes unreachable for some time without being expelled, when it comes back to the group it will recover the missed messages from the caches of the other nodes.

Currently, the cache has a fixed number of entries and a fixed 1GB size limit. This can be problematic because since WL#11570 users can define arbitrary expel timeouts. This will open the possibility for the caches of alive nodes to become exhausted while another node is unreachable without being expelled. If that occurs, when the unreachable node comes back it will not be able to recover and, as a consequence, will be expelled, even though it respected the expel timeout.

Having a fixed number of entries is also problematic in high load systems with small transactions, since the number of available cache entries can quickly be depleted. This can lead XCom to become too slow or even block.

User Stories

=============

  • As a MySQL DBA, I want to be able to configure the maximum size of the XCom cache, so that I am able to choose the ideal cache size having into account my system load and network requirements.

  • As a MySQL DBA, I want to set the cache size limit to a higher value than 1GB, so that I can maximize the chances that the messages missed by an unreachable node are still in the other nodes' caches when it becomes reachable again.

  • As a MySQL DBA, I want to be informed when messages needed by an unreachable node have been evicted from the cache, so that I can set a more appropriate value for the maximum size of the cache.

Goal

=====

This worklog will remove the static nature of the XCom cache by allowing users to set the the size limit of the XCom cache. To that end, we will expose a new Group Replication (GR) option named group_replication_message_cache_size. In addition, the cache will no longer be bound by a fixed number of entries; instead, it will grow dynamically, so long as the size limit is respected.

To help guide the configuration of the new option, we will print out a warning informing the user when a message that has been missed by an unreachable node is evicted from the cache of one of the alive nodes. Such event indicates that the current size limit of that node's cache is not high enough to handle similar failures. The user can, then, act by tweaking the cache size appropriately.

Scope

======

This worklog is part of the effort to make GR more suitable for WAN operation and more maintenance friendly. It will give users more control over the configuration of GR. It will also have the side effect of making users aware of the XCom cache and consider it when doing capacity planning.

This worklog will not altogether prevent unreachable nodes from being expelled. In fact, this situation would only be completely preventable if the cache had infinite memory. It is, thus, the responsibility of the user to set the cache size limit to a value that is large enough to hold the amount of messages missed by a node that will be unreachable for the time specified by the user.

While the worklog will allow the user to have more control over the recovery process, no changes will be made to the recovery process itself.