WL#14940: GR: Automatic Eviction on Resource Exhaustion

Affects: Server-9.x   —   Status: Complete

EXECUTIVE SUMMARY
=================
This worklog enhances the high-availability(HA) Group Replication(GR) system to
automatically detect and mitigate issues related to lagging secondary servers
and resource exhaustion, thereby ensuring continuous operation and resilience of
the group.

In a GR, when the secondary's applier lags or experiences swapping, it
exacerbates high availability issues.
Partially responsive or unreliable secondaries can destabilize the entire group.
This worklog aims to enhance group resilience by implementing automatic ejection
of servers encountering these critical conditions.

Safeguarding the group from problematic members involves:
Preventing resource depletion within the group by automatically removing members
facing resource shortages.
Enhancing the management of MHS operations by seamlessly handling transient
failures across various instances.
Implementing self-ejection followed by automatic rejoin, enabling members to
autonomously provision themselves and ensure continuous operation.

From 9.2.0, this feature is available in Enterprise Edition.
From 9.7.0, this feature is available in Community and all other editions.

Actors
======
1. MySQL Database Service Robot Operator
2. MySQL Database Service Human Operator
3. MySQL Admin users
4. Tools, e.g., sys procedure that outputs a consolidated view over the metrics
- useful for admins or the MySQL Shell.

USER/DEV STORIES
================
Automatic Ejection Mechanism:
As an MySQL user, I want the system to automatically detect when a secondary
server's applier is lagging or swapping excessively so that problematic servers
can be ejected from the group to maintain high availability.

Threshold Detection:
As an MySQL user, I want to define thresholds for resource utilization beyond
which a member is considered to be experiencing critical conditions so that the
system can trigger appropriate actions based on predefined criteria.

Intermittent Failure Handling:
As a MySQL User, I want the system to detect and handle intermittent failures in
secondary servers, so that unreliable members can be identified and removed from
the group to prevent disruptions.

Group Resilience:
As a MySQL user, I want the group to remain operational even in the presence of
faulty or problematic members, so that services can continue to be provided
without interruption.

Automatic Rejoin:
As a MySQL user, I want to automatically rejoin the group after resolving issues
that caused my ejection, so that I can resume contributing to the group's
operations.

Provisioning Automation:
As a MySQL User, I want ejected members to automatically provision themselves
and rejoin the group, so that downtime is minimized and the group remains fully
operational.

Fault Tolerance:
As a MySQL User, I want the system to handle faults gracefully and maintain high
availability, so that services are not disrupted even in the event of individual
member failures.

Quarantine Period Handling:
As a MySQL user, I want the system to enforce a quarantine period for restarted
members before considering them for ejection, so that they have sufficient time
to catch up and synchronize with the group without being prematurely ejected.
FR1: Continuous monitoring
The system shall continuously monitor the lag time of appliers on secondary
servers and the available memory on secondary servers at 5-seconds intervals.

FR2: Applier Channel Lag Handling
A.
Introduce a configurable parameter named
`group_replication_resource_manager.applier_channel_lag` (seconds) to specify
the maximum tolerable lag time for an applier on a secondary server.
The minimum value will be 0, max value allowed will be 43200 and default will be
3600 (1 hour).
Setting `group_replication_resource_manager.applier_channel_lag` to 0 disables
this lag handling behavior.

B.
If the current lag of a secondary applier exceeds the
`group_replication_resource_manager.applier_channel_lag` for a continuous 10
times, the system shall take the following actions:
Initiate a graceful exit from the current group by invoking the "leave group"
call.
Transition the member state to "ERROR" state.

FR3: Recovery Channel Lag Handling
A.
Introduce a configurable parameter named
`group_replication_resource_manager.recovery_channel_lag` (seconds) to specify
the maximum tolerable lag time for a recovery channel on a secondary server.
The minimum value will be 0, max value allowed will be 43200 and default will be
3600 (1 hour).
Setting `group_replication_resource_manager.recovery_channel_lag` to 0 disables
this lag handling behavior.

B.
If the current lag of a recovery channel exceeds the
`group_replication_resource_manager.recovery_queue_lag` for a continuous 10
times, the system shall take the following actions:
Initiate a graceful exit from the current group by invoking the "leave group"
call.
Transition the member state to "ERROR" state.


FR4: Secondary Server Memory Threshold and Graceful Exit
A. Parameter Introduction
Introduce a configurable parameter named
`group_replication_resource_manager.memory_used_limit` (percentage) to specify
the maximum acceptable used memory level on secondary servers.
The range of this parameter will be from 0 to 100, with the default value set to
100.
Setting `group_replication_resource_manager.memory_used_limit` to 0 disables
memory threshold checking.

B. Memory Threshold Action
If the current memory usage on a secondary server exceeds the
`group_replication_resource_manager.memory_used_limit` for 10 consecutive
checks, the system shall:
1. Initiate a graceful exit from the current group by invoking the "leave group"
call.
2. Transition the member state to "ERROR".

FR5: Automatic Rejoin After Ejection
The automatic rejoin mechanism already exists and is controlled by
`group_replication_autorejoin_tries`.
The evictions caused by this component will allow the member to rejoin when
`group_replication_autorejoin_tries` is enabled.

FR6: Quarantine period
A.
There shall be a quarantine period to prevent immediate re-ejection of a member
attempting to join or rejoin a group after encountering issues.
Each member has its own quarantine period timer, meaning that member A was eject
and its quarantine period is ticking, member B can be ejected.

B.
The quarantine duration is configurable and specified by the parameter
`group_replication_resource_manager.quarantine_time` (seconds).
The default value for `group_replication_resource_manager.quarantine_time` is
set to 3600 seconds (1 hour).
The minimum value will be 0, max value allowed will be 43200.

FR7: Minimum group members
A.
There shall be a minimum members requirement i.e. 3 members in a group to
prevent single member left in a group.

B.
Primary member shall not ejected from the group, since primary member
disqualifications is associated with the HA downtime.
Please note that the decision to evict a secondary may happen moments before
that secondary is promoted to primary due to a concurrent primary failure, on
this case a just elected primary may be evicted.

FR8: Status Variables for Resource Manager Tracking
Nine status variables will be used to monitor the lags and status of the
resource manager.

FR9: Status Variables Gr_resource_manager_applier_channel_lag
The current applier lag value in seconds, indicating the delay in applying
changes to the system.

FR10: Status Variables Gr_resource_manager_recovery_channel_lag
The current recovery lag value in seconds, representing the time taken by the
recovery channel.

FR11: Status Variables Gr_resource_manager_memory_used
The percentage of used memory in the system, showing the amount of memory that
is currently being used.

FR12: Status Variables Gr_resource_manager_applier_channel_threshold_hits
The number of samples that exceeded the applier lag threshold, resetting on
eviction. This metric helps identify frequent applier lag issues.

FR13: Status Variables Gr_resource_manager_recovery_channel_threshold_hits
The number of samples that exceeded the recovery lag threshold, resetting on
eviction. This metric highlights frequent recovery lag problems.

FR14: Status Variables Gr_resource_manager_memory_threshold_hits
The number of samples that exceeded the memory usage threshold, resetting on
eviction. This metric detects frequent memory usage issues.

FR15: Status Variables Gr_resource_manager_applier_channel_eviction_timestamp
The timestamp of the last eviction caused by applier channel lag, showing when
the issue occurred.

FR16: Status Variables Gr_resource_manager_recovery_channel_eviction_timestamp
The timestamp of the last eviction caused by recovery channel lag, showing when
the issue occurred.

FR17: Status Variables Gr_resource_manager_memory_eviction_timestamp
The timestamp of the last eviction caused by the low memory, showing when the
issue occurred.

FR18: Component Implementation
The new functionality will be implemented on the
`group_replication_resource_manager` component.

FR19: The status variables shall have member scope since they reflect what the local
member observes.

FR20: The status variables shall be reset on group bootstrap.

FR21: The status variables shall be reset on member join.

FR22: The status variables shall not be reset on member rejoin.

FR23: The status variables shall be reset on server restart.

FR24: Status Variables
Gr_resource_manager_channel_lag_monitoring_error_timestamp
The timestamp of the last error fetching channel lags from the query service.
Empty if there are no errors.

FR25: Status Variables Gr_resource_manager_memory_monitoring_error_timestamp
The timestamp of the last error fetching memory status. Empty if there are no
errors.

Non-function Requirements
None
User Visible Changes
====================
New system variables introduced:
1.
NAME: group_replication_resource_manager.applier_channel_lag
VALUES: unsigned int [0 - 43200]
DEFAULT: 3600
SCOPE: global
DYNAMIC: Can be changed while Group Replication is running. Value can be 
different on members.
REPLICATED (written to the binary log): no
PERSIST: PERSIST, PERSIST_ONLY
PRIVILEGES REQUIRED: SYSTEM_VARIABLES_ADMIN
DESCRIPTION: The maximum tolerable lag time in seconds for an applier channel on 
a secondary server.

2.
NAME: group_replication_resource_manager.recovery_channel_lag
VALUES: unsigned int [0 - 43200]
DEFAULT: 3600
SCOPE: global
DYNAMIC: Can be changed while Group Replication is running. Value can be 
different on members.
REPLICATED (written to the binary log): no
PERSIST: PERSIST, PERSIST_ONLY
PRIVILEGES REQUIRED: SYSTEM_VARIABLES_ADMIN
DESCRIPTION: The maximum tolerable lag time in seconds for a recovery channel on 
a secondary server.

3.
NAME: group_replication_resource_manager.memory_used_limit
VALUES: unsigned int [0 - 100]
DEFAULT: 100
SCOPE: global
DYNAMIC: Can be changed while Group Replication is running. Value can be 
different on members.
REPLICATED (written to the binary log): no
PERSIST: PERSIST, PERSIST_ONLY
PRIVILEGES REQUIRED: SYSTEM_VARIABLES_ADMIN
DESCRIPTION: The maximum acceptable used memory level in percentage on secondary 
servers. If the current memory usage exceeds the limit, the member will leave 
the group.

4.
NAME: group_replication_resource_manager.quarantine_time
VALUES: unsigned int [0 - 43200]
DEFAULT: 3600
SCOPE: global
DYNAMIC: Can be changed while Group Replication is running. Value can be 
different on members.
REPLICATED (written to the binary log): no
PERSIST: PERSIST, PERSIST_ONLY
PRIVILEGES REQUIRED: SYSTEM_VARIABLES_ADMIN
DESCRIPTION: Prevent immediate re-ejection for the specified seconds of a member 
attempting to join or rejoin a group after encountering issues.

Upgrades
========
Customer will be able to install new component:
component_group_replication_resource_manager
The Component Group Replication Resource Manager operates in isolation, keeping 
its data separate from other group members.
This allows for seamless coexistence of multi-version groups, even when older 
versions are present.

Security
========
group_replication_resource_manager is a component.
INSTALL COMPONENT requires the INSERT privilege for the mysql.component system 
table because it adds a row to that table to register the component.

Observability
=============
New status variables introduced:
1: 
NAME: Gr_resource_manager_applier_channel_lag
VALUES: unsigned int
DESCRIPTION: The current applier lag value, indicating the delay in seconds in 
applying changes to the system by applier channel.
2.
NAME: Gr_resource_manager_recovery_channel_lag
VALUES: unsigned int
DESCRIPTION: The current recovery lag value, indicating the delay in seconds in 
applying changes to the system by recovery channel.
3.
NAME: Gr_resource_manager_memory_used
VALUES: unsigned int (0 - 100)(percentage)
DESCRIPTION: The percentage of used memory in the system, showing the amount of 
memory that is currently being used.
4.
NAME: Gr_resource_manager_applier_channel_threshold_hits
VALUES: unsigned int
DESCRIPTION: The number of samples that exceeded the applier lag threshold, 
resetting on eviction. This metric helps identify frequent applier lag issues.
5.
NAME: Gr_resource_manager_recovery_channel_threshold_hits
VALUES: unsigned int
DESCRIPTION: The number of samples that exceeded the recovery lag threshold, 
resetting on eviction. This metric highlights frequent recovery lag problems.
6.
NAME: Gr_resource_manager_memory_threshold_hits
VALUES: unsigned int
DESCRIPTION: The number of samples that exceeded the memory usage threshold, 
resetting on eviction. This metric detects frequent memory usage issues.
7.
NAME: Gr_resource_manager_applier_channel_eviction_timestamp
VALUES: string(timestamp)
DESCRIPTION: The timestamp of the last eviction caused by applier lag, showing 
when the issue occurred.
8.
NAME: Gr_resource_manager_recovery_channel_eviction_timestamp
VALUES: string(timestamp)
DESCRIPTION: The timestamp of the last eviction caused by recovery lag, showing 
when the issue occurred.
9.
NAME: Gr_resource_manager_memory_eviction_timestamp
VALUES: string(timestamp)
DESCRIPTION: The timestamp of the last eviction caused by the low memory, 
showing when the issue occurred.
10.
Name: Gr_resource_manager_channel_lag_monitoring_error_timestamp
VALUES: string(timestamp)
DESCRIPTION: The timestamp of the last error fetching channel lags from the 
query service. Empty if there are no errors.
11.
Name: Gr_resource_manager_memory_monitoring_error_timestamp
VALUES: string(timestamp)
DESCRIPTION: The timestamp of the last error fetching memory status. Empty if 
there are no errors.

New error messages added:
Name: ER_GR_RM_GR_MGMT_SERVICE_ACQUIRE_FAILED
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Failed to acquire Group 
Replication management service. Verify Group Replication status and 
configuration.'
Rationale: Component "Group Replication Resource Manager" was not able to 
acquire the group_replication.group_replication_management service.

Name: ER_GR_RM_GR_MEMBER_NOT_REMOVED_APPLIER_THRESHOLD_HIT_QUARANTINE
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member threshold 
exceeded: applier channel lag of  seconds exceeded tolerance threshold 
configured to  seconds. 
However, since it joined X seconds ago and the quarantine period is configured 
to Y seconds, the member will not leave the group.
Rationale: This message informs the DBA that component "Group Replication 
Resource Manager" observed applier channel lag threshold has been breached 
however component did not initiate member leave.

Name: ER_GR_RM_GR_MEMBER_NOT_REMOVED_RECOVERY_THRESHOLD_HIT_QUARANTINE
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member threshold 
exceeded: recovery channel lag of  seconds exceeded tolerance threshold 
configured to  seconds. 
However, since it joined X seconds ago and the quarantine period is configured 
to Y seconds, the member will not leave the group.
Rationale: This message informs the DBA that component "Group Replication 
Resource Manager" observed recovery channel lag threshold has been breached 
however component did not initiate member leave.

Name: ER_GR_RM_GR_MEMBER_NOT_REMOVED_MEMORY_THRESHOLD_HIT_QUARANTINE
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member threshold 
exceeded: memory used of % exceeded tolerance threshold configured to 
%. However, since it joined X 
seconds ago and the quarantine period is configured to Y seconds, the member 
will not leave the group.
Rationale: This message informs the DBA that component "Group Replication 
Resource Manager" observed memory threshold has been breached however component 
did not initiate member leave.

Name: ER_GR_RM_GR_MEMBER_NOT_REMOVED_APPLIER_THRESHOLD_HIT_N_MEMBERS
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member threshold 
exceeded: applier channel lag of  seconds exceeded tolerance threshold 
configured to  seconds. 
However, the group has less than 3 members, the member will not leave the group.
Rationale: This message informs the DBA that component "Group Replication 
Resource Manager" observed applier channel lag threshold has been breached 
however component did not initiate member leave.

Name: ER_GR_RM_GR_MEMBER_NOT_REMOVED_RECOVERY_THRESHOLD_HIT_N_MEMBERS
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member threshold 
exceeded: recovery channel lag of  seconds exceeded tolerance threshold 
configured to  seconds. 
However, the group has less than 3 members, the member will not leave the group.
Rationale: This message informs the DBA that component "Group Replication 
Resource Manager" observed recovery channel lag threshold has been breached 
however component did not initiate member leave.

Name: ER_GR_RM_GR_MEMBER_NOT_REMOVED_MEMORY_THRESHOLD_HIT_N_MEMBERS
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member threshold 
exceeded: memory used of % exceeded tolerance threshold configured to 
%. However, the group has less 
than 3 members, the member will not leave the group.
Rationale: This message informs the DBA that component "Group Replication 
Resource Manager" observed memory threshold has been breached however component 
did not initiate member leave.

Name: ER_GR_RM_CHANNEL_LAG_QUERY_EXECUTION_FAILED
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Replication channel lag 
retrieval failed due to query execution error. Retrying in 5 seconds.'
Rationale: Component fetches channel lags from the query service, query service 
did not provide output. Only one error message is logged until the fetch 
succeeds.

Name: ER_GR_RM_CHANNEL_LAG_QUERY_EXECUTION_SUCCESS
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Replication channel lag 
retrieved successfully resumed.'
Rationale: A previous error during the fetch of channel lag is solved, the 
component resumed its regular behavior.

Name: ER_GR_RM_MEMORY_STATS_FETCH_FAILED
Materialized message:
2024-10-08T07:07:48.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Error fetching system 
memory stats. Retrying in 5 seconds.'
Rationale: Component "Group Replication Resource Manager" uses system APIs to 
fetch memory information, system API failed to fetch memory status. Only one 
error message is logged until the fetch succeeds.

Name: ER_GR_RM_MEMORY_STATS_FETCH_SUCCESS
Materialized message:
2024-10-08T07:07:58.100736Z 0 [WARNING] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Fetching system memory 
stats successfully resumed.'
Rationale: A previous error during the fetch of memory status is solved, the 
component resumed its regular behavior.

Name: ER_GR_RM_MEMBER_LEAVING_APPLIER_THRESHOLD_HIT
Materialized message:
2024-10-08T07:07:48.100736Z 0 [ERROR] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member exiting group: 
applier channel lag of  seconds exceeded tolerance threshold configured 
to  seconds.'
Rationale: Component "Group Replication Resource Manager" called the 
group_replication.group_replication_management for member to leave the group due 
to high applier lag and member returned positive status to leave the group.

Name: ER_GR_RM_MEMBER_LEAVING_RECOVERY_THRESHOLD_HIT
Materialized message:
2024-10-08T07:07:48.100736Z 0 [ERROR] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member exiting group: 
recovery channel lag of  seconds exceeded tolerance threshold configured 
to  seconds.'
Rationale: Component "Group Replication Resource Manager" called the 
group_replication.group_replication_management for member to leave the group due 
to high recovery lag and member returned positive status to leave the group.

Name: ER_GR_RM_MEMBER_LEAVING_MEMORY_THRESHOLD_HIT
Materialized message:
2024-10-08T07:07:48.100736Z 0 [ERROR] [MY-XXXXXX] [Server] Component 
component_group_replication_resource_manager reported: 'Member exiting group: 
memory used of % exceeded tolerance threshold configured to %.'
Rationale: Component "Group Replication Resource Manager" called the 
group_replication.group_replication_management for member to leave the group due 
to low memory avaiability and member returned positive status to leave the 
group.

=========================
Cross-version replication will not be impacted.
The Component Group Replication Resource Manager operates in isolation, keeping 
its data separate from other group members.
This allows for seamless coexistence of multi-version groups, even when older 
versions are present.

Protocol
========
No changes.

Deployment and Installation
===========================
To use the component it shall be installed using the following statement:
INSTALL COMPONENT 'file://component_group_replication_resource_manager';

After usage, if need to be removed it shall call:
UNINSTALL COMPONENT 'file://component_group_replication_resource_manager';


Behavior Change
===============
Please refer section `User Visible Changes` and `Upgrades`.

User Interface
==============
To use the component it shall be installed using the following statement:

```
INSTALL COMPONENT 'file://component_group_replication_resource_manager';
```

After usage, if need to be removed it shall call:

```
UNINSTALL COMPONENT 'file://component_group_replication_resource_manager';
```

The metrics can be read through global status variables on the
`performance_schema.global_status` table:
```
mysql> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME LIKE 
'Gr_resource_manager_%';
```

The metrics can be read also using `SHOW` command:

```
SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME LIKE 
'Gr_resource_manager_%';
VARIABLE_NAME	VARIABLE_VALUE
Gr_resource_manager_applier_channel_lag	1000
Gr_resource_manager_applier_channel_threshold_hits	1
Gr_resource_manager_applier_eviction_timestamp	2024-10-08 07:10:23.011529
Gr_resource_manager_memory_eviction_timestamp	
Gr_resource_manager_memory_free	42
Gr_resource_manager_memory_threshold_hits	0
Gr_resource_manager_recovery_channel_lag	0
Gr_resource_manager_recovery_channel_threshold_hits	1
Gr_resource_manager_recovery_eviction_timestamp	2024-10-08 07:07:48.100782

SHOW GLOBAL STATUS LIKE 'Gr_resource_manager%';
Variable_name	Value
Gr_resource_manager_applier_channel_lag	1000
Gr_resource_manager_applier_channel_threshold_hits	1
Gr_resource_manager_applier_eviction_timestamp	
Gr_resource_manager_memory_eviction_timestamp		2024-10-08 
07:10:23.011529
Gr_resource_manager_memory_free	42
Gr_resource_manager_memory_threshold_hits	0
Gr_resource_manager_recovery_channel_lag	0
Gr_resource_manager_recovery_channel_threshold_hits	1
Gr_resource_manager_recovery_eviction_timestamp	2024-10-08 07:07:48.100782

```

The variable can be set through SET command example:
SET GLOBAL group_replication_resource_manager.applier_channel_lag = 0;
Monitoring System Memory on macOS, Linux, and Windows:

In this section, we will explore how to gather system memory information on
macOS, Linux, and Windows using different methods and code snippets.
--------------------------------------------------------------------------------
1. macOS
On macOS, we can use system calls to get memory information.
Below is a C++ code snippet that retrieves the total and free memory on a macOS
system.

#include 
#include 
#include 
#include 
#include 
uint64_t get_total_memory() {
  uint64_t total_memory;
  int mib[2] = {CTL_HW, HW_MEMSIZE};
  size_t length = sizeof(total_memory);
  if (sysctl(mib, 2, &total_memory, &length, NULL, 0) != 0) {
    return 0;
  }
  return total_memory;
}

uint64_t get_free_memory() {
  mach_msg_type_number_t count = HOST_VM_INFO_COUNT;
  vm_statistics64_data_t vmstat;
  if (host_statistics64(mach_host_self(), HOST_VM_INFO,
reinterpret_cast(&vmstat), &count) != KERN_SUCCESS) {
    return 0;
  }
  return vmstat.free_count * sysconf(_SC_PAGESIZE);
}

Example Output:
System_Memory_Info: m_total_bytes: 68719476736
System_Memory_Info: m_free_bytes: 36681252864

--------------------------------------------------------------------------------

2. Linux
On Linux, we can read from the /proc/meminfo file to get detailed memory
information.
This file provides a snapshot of the current memory usage.
This is inline with Health Monitoring implemented in Heatwave.

Example Command:
cat /proc/meminfo
void table_health_block_device::make_row(const Disk_Info &record) {
  m_row.m_device_name = record.m_device_name;
  m_row.m_timestamp = record.m_timestamp;
  m_row.m_total_bytes = record.m_total_bytes;
  m_row.m_avail_bytes = record.m_avail_bytes;
  m_row.m_mount_point = record.m_mount_point;
  double used_pct =   static_cast(record.m_total_bytes -
record.m_avail_bytes) / record.m_total_bytes;
  m_row.m_use_percent = static_cast(used_pct * 100.0);
}
-->
  double used_pct = static_cast(record.m_total_bytes -
record.m_avail_bytes) / record.m_total_bytes;
  m_row.m_use_percent = static_cast(used_pct * 100.0);

--------------------------------------------------------------------------------

3. Windows
On Windows, we use the GlobalMemoryStatusEx function to get memory information.
Here is a C++ snippet demonstrating this.
#include 
void get_memory_info() {
  MEMORYSTATUSEX win_mem_info{};
  win_mem_info.dwLength = sizeof(MEMORYSTATUSEX);
  if (GlobalMemoryStatusEx(&win_mem_info)) {
    uint64_t total_memory = win_mem_info.ullTotalPhys;
    uint64_t free_memory = win_mem_info.ullAvailPhys;
    // Output or use the memory information
    std::cout << "System_Memory_Info: m_total_bytes: " << total_memory <<
std::endl;
    std::cout << "System_Memory_Info: m_free_bytes: " << free_memory <<
std::endl;
  } else {
    // Handle error
    std::cerr << "GlobalMemoryStatusEx failed with error: " << GetLastError() <<
std::endl;
  }
}

Example Output:
System_Memory_Info: m_total_bytes: 34359185408
System_Memory_Info: m_free_bytes: 22219345920

--------------------------------------------------------------------------------


4. Conclusion
Monitoring system memory is essential for optimizing performance and ensuring
efficient resource usage.
The methods discussed here provide a way to gather memory information on macOS,
Linux, and Windows.