WL#16239: GR: Flow-control metrics

Affects: Server-9.x   —   Status: Complete

EXECUTIVE SUMMARY
=================

After this worklog is implemented Group Replication flow control will track
more stats that will be provided by `group_replication_flow_control_stats` 
component.

From 9.1.0, this feature is available in Enterprise Edition.
From 9.7.0, this feature is available in Community and all other editions.

The new stats will present:

 * The number of times flow control did throttle
 * The total throttle time in microseconds
 * The number of sessions currently being throttled
 * Timestamp of last transaction was throttled

User Stories
============

- As a MySQL admin (or MDS operator/SME)
    I want more information on Group Replication flow control execution to
    allow me to better tune it to my system load.
Functional Requirements
=======================

- FR1: Create a Group Replication service that will have methods for all 4 new
       stats:
         * The number of times flow control did throttle
         * The total throttle time in microseconds
         * The number of sessions currently being throttled
         * Timestamp of last transaction was throttled

- FR2: The stats shall have member scope since they reflect what the local
       member observes.

- FR3: The stats shall be reset on group bootstrap.

- FR4: The stats shall be reset on member join.

- FR5: The stats shall be reset on member automatic rejoin.

- FR6: The stats shall be reset on server restart.

- FR7: Create a new component, called
       `group_replication_flow_control_stats` that will register Group
       Replication service methods and create performance schema global
       variable status for each new stat.

- FR8: To be able to install the component
       `group_replication_flow_control_stats` Group Replication needs to be
       installed.

- FR9: On component it will add a new status variable to have the
       number of times flow control did throttle:
       `Gr_flow_control_throttle_count`.

- FR10: On component it will add a new status  variable to have the
        number of the total throttle time in microseconds:
        `Gr_flow_control_throttle_time_sum`.

- FR11: On component it will add a new status variable to have the
        number of sessions currently being throttled:
        `Gr_flow_control_throttle_active_count`.

- FR12: On component it will add a new status variable to have the
        timestamp the last transaction was throttled:
        `Gr_flow_control_throttle_last_throttle_timestamp`.

- FR13: Installation of component will fail if not able to register Group
        Replication service `group_replication_flow_control_service` and will
        return error to client:
        > INSTALL COMPONENT 'file://component_group_replication_flow_control_stats';
        > ERROR HY000: Cannot satisfy dependency for service
          'group_replication_flow_control_metrics_service' required by component
          'mysql:group_replication_flow_control_stats'.

- FR14: When uninstalling Group Replication plugin and the component
        `component_group_replication_flow_control_stats` is installed, the
        plugin uninstall is disallowed with the following error returned to
        the client session:
        > UNINSTALL PLUGIN group_replication;
        > ERROR HY000: Plugin 'group_replication' cannot be uninstalled now.
          Please uninstall the component 'component_group_replication_flow_control_stats'
          and then UNINSTALL PLUGIN group_replication.

Non-Functional Requirements
===========================
Summary of the approach
=======================

Expand Group Replication global status variables to include more
information on flow control execution.

A component will connect to Group Replication to gather information
and will register global status variable so DBA can access it.


Security context
================

No applicable changes on security context.


Observability
=============

The following metrics will be added:

Gr_flow_control_throttle_count
------------------------------
Number of transactions throttled by flow control mechanism.

Gr_flow_control_throttle_time_sum
---------------------------------
Number of microseconds that transactions were throttled by flow control
mechanism.

Gr_flow_control_throttle_active_count
-------------------------------------
Number transactions currently being throttled by flow control mechanism.

Gr_flow_control_throttle_last_throttle_timestamp
------------------------------------------------
Timestamp last time a transaction was throttled.


Upgrade/downgrade and cross-version replication
===============================================

No applicable changes on Upgrade/downgrade and cross-version replication.


User interface
==============

To use the component it shall be installed using the following statement:

```
INSTALL COMPONENT 'file://component_group_replication_flow_control_stats';
```

After usage, if need to be removed it shall call:

```
UNINSTALL COMPONENT 'file://component_group_replication_flow_control_stats';
```

The metrics can be read through global status variables on the
`performance_schema.global_status` table:
```
mysql> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME LIKE 'Gr_flow_control%';
+--------------------------------------------------+---------------------+
| VARIABLE_NAME	                                   | VARIABLE_VALUE      |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_active_count	           | 10                  |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_count	           | 10                  |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_last_throttle_timestamp | 2024-07-01 12:50:56 |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_time_sum	           | 10                  |
+--------------------------------------------------+---------------------+

```

The metrics can be read also using `SHOW` command:

```
SHOW GLOBAL STATUS LIKE 'Gr\_flow\_control%';
+--------------------------------------------------+---------------------+
| Variable_Name	                                   | Value               |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_active_count	           | 10                  |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_count	           | 10                  |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_last_throttle_timestamp | 2024-07-01 12:50:56 |
+--------------------------------------------------+---------------------+
| Gr_flow_control_throttle_time_sum	           | 10                  |
+--------------------------------------------------+---------------------+

```

Deployment and installation
===========================

No applicable changes on deployment and installation.

Protocol
========

On Group Replication metrics will be extended to track the new stats.

The service `group_replication_flow_control_service` will be implemented on
Group Replication that allow consults the stats. One method for each stat:

 * get_throttle_count
 * get_throttle_time_sum
 * get_throttle_active_count
 * get_throttle_last_throttle_timestamp

On the new component `group_replication_flow_control_stats` that will
register performance_schema global status variables.

Component will create following global status variables:

 * `Gr_flow_control_throttle_count`
 * `Gr_flow_control_throttle_time_sum`
 * `Gr_flow_control_throttle_active_count`
 * `Gr_flow_control_throttle_last_throttle_timestamp`

That global status variables will allow to read values available by Group
Replication service.

FAILURE MODEL SPECIFICATION
===========================
There are no repercussions.


SECURITY CONTEXT
================
`SELECT * FROM performance_schema.global_status` statements do not require any
privilege.


UPGRADE/DOWNGRADE AND CROSS-VERSION REPLICATION
===============================================
There are no repercussions.
Summary of changes
==================

The Group Replication will change function `Flow_control_module::do_wait()` to
update new Flow control metrics.

On Group replication it will be created a service with methods:

```
BEGIN_SERVICE_DEFINITION(group_replication_flow_control_service)

DECLARE_BOOL_METHOD(get_throttle_count, (uint64_t *counts));
DECLARE_BOOL_METHOD(get_throttle_time_sum, (uint64_t *time_sum));
DECLARE_BOOL_METHOD(get_throttle_active_count, (uint64_t *active_count));
DECLARE_BOOL_METHOD(get_throttle_last_throttle_timestamp, (char *timestamp));

END_SERVICE_DEFINITION(group_replication_flow_control_service)
```

An component will be created and it registers 4 new global status
variables:

 * `Gr_flow_control_throttle_count`
 * `Gr_flow_control_throttle_time_sum`
 * `Gr_flow_control_throttle_active_count`
 * `Gr_flow_control_throttle_last_throttle_timestamp`

The component will register Group Replication service and read the
values when someone reads global status variables.