WL#13979: MySQL GCS: Reduce minimum value of group_replication_message_cache_size

Affects: Server-8.0   —   Status: Complete

EXECUTIVE SUMMARY
-----------------

This worklog lowers the bound for setting the max size of XCom's
message cache. It is currently 1GB and some customers requested
that they are allowed to set it below 1GB, potentially as low as
128 MB.


USER/DEV STORIES
----------------

- As a DBA I want to be able to configure XCom cache to be capped at a
  few megabytes (e.g., 128MB) so I am able to deploy successfully
  InnoDB cluster on a host with small amount of memory (e.g., 16GB)
  and good network connectivity.

SCOPE
-----

- This work will not change anything other than just reducing the lower bound 
  for group_replication_message_cache_size.

LIMITS
------

- No new limits.
- Mind you that reducing the value will have implications if you run on
  an unstable network system since you could end up without being able to
  do implicit reconnections among the members.
- Mind that cache structures take an additional ~50MB per block created.
  So, setting it to 128MB means we are capping the data only. It does not
  include the data structures used to manage and run the cache.
Functional Requirements
-----------------------

- FR1: The user must be able to set group_replication_message_cache_size to a 
       value as low as 134217728 bytes.

SECURITY CONTEXT
----------------

- No implications on security related areas.

UPGRADE/DOWNGRADE and CROSS-VERSION REPLICATION
-----------------------------------------------

- No implications on upgrades/downgrades.
- We keep the existing default value, we just set a new minimum limit.

- Side-note: Those that have their own MySQL configuration
  systems/repositories/tooling may have "recorded" that this variable
  would have a lower bound of 1GB. In those cases, they need to adjust
  their scripts/tooling to match the new lower limit.

USER INTERFACE
--------------

- Nothing changes, apart from the fact that the system variable
  group_replication_message_cache_size now reports 128 MB instead of
  1GB as the lower limit.

  - NAME: group_replication_message_cache_size
    UNIT: Bytes
    SCOPE: Global
    TYPE: Integer
    MIMIMUM VALUE: *134217728 (128MB)*
    MAXIMIM VALUE (32-bit platforms): 4294967295 (4GB)
    MAXIMIM VALUE (64-bit platforms): 18446744073709551615 (16EiB)
    DEFAULT: 1073741824 (1GB)
    DYNAMIC: Yes
    PRIVILEGES: SUPER/SYSTEM_VARIABLES_ADMIN
    PERSISTENCE: PERSIST
    DESCRIPTION: Maximum size of the message cache.

  - Comparing to the original description of this variable, only the
    MINIMUM VALUE field has changed.

OBSERVABILITY
-------------

- No new things added.
- No old things removed.

DEPLOYMENT and INSTALLATION
---------------------------

- When deploying/configuring the user may now set a lower value than 1GB for this option.

PROTOCOL
--------

- No changes.


FAILURE MODEL SPECIFICATION
---------------------------

The smaller the cache, the less XCom is resilient to transient
failures of single members. I.e., if a member gets isolated for a bit,
and the cache is not sufficiently large to hold the messages running
through the system while it was away, when the member comes back again
it may not be able to recover from the other members cache.

If using GR, in that case if auto-rejoins are enabled, the member will
automatically rejoin, recover from the binary log and get back online
again, thus ending up in a much slower process of recovering such
failures.

So, small cache may lead to suboptimal performances, but still
recoverable scenarios in GR.