WL#10380: Group Replication: Monitoring improvements
Affects: Server-8.0
—
Status: Complete
This worklog goal is to improve Group Replication by introducing additional columns in existing performance schema. No existing table/column will be removed. There will not be any change in names, data types or order for existing tables and columns minimizing impact on customers. We are not creating new performance_schema tables for Group Replication. This worklog is related to extending existing GR performance schema tables "replication_group_member" and "replication_group_member_stats" by appending new columns at end, so previous column order is preserved. Since we are extending existing GR P_S tables, we expect minimum to no impact for existing customers. Since we are extending existing GR P_S tables, we expect all things to remain same w.r.t. compiling/enabling performance_schema, output of show variables etc, other then display of new additional columns. We need to show additional member related information like Role (Primary or Secondary) and Version. "replication_group_members" table shows information of all members(1 row/member), this needs to be maintained for new columns being added. At present only information related to certification is shown in GR table "replication_group_member_stats". We need to extend table "replication_group_member_stats" to show statistics of Applier, Local queue and related information that might help customers for analysis. Since some machines may be lagging, overloaded.., information of all members should be visible from any group member machine. At present customer can get stats information only by logging into local member of which information is needed.
FR 1. Functional Requirements for Table "replication_group_members".
FR 1.1. MEMBER_ROLES - Table must show ROLE information of group member,
PRIMARY, SECONDARY
FR 1.2. MEMBER_VERSION - Table must show VERSION information of group members.
FR 1.3. At present table shows information of all members, that must remain.
FR 2. Functional Requirements for Table "replication_group_member_stats".
FR 2.1. Table must show information of ALL members. At present only single
row is shown indicating member(self) statistics.
FR 2.2. Table must show information related to applier, mentioned below:
FR 2.2.1. COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE - Number of
transactions waiting to be applied
FR 2.2.2. COUNT_TRANSACTIONS_REMOTE_APPLIED - Number of transactions applied
FR 2.3. Table must show local member transactions information, mentioned below:
FR 2.3.1. COUNT_TRANSACTION_LOCAL_PROPOSED - Number of transaction
originated from local machine
FR 2.3.2. COUNT_TRANSACTION_LOCAL_ROLLBACK - Number of transaction
originated from local machine but rolledback at GROUP level
(due to conflict)
1) Acronyms and DEFINITIONS
Before going into details lets first present the terms that will be used in
this design.
1. GR – Group Replication
2. P_S(performance_schema) – Existing MySQL database. MySQL should be
compiled with P_S enabled. Since this work log is about extending existing
P_S tables we will not get into more details of P_S.
3. replication_group_members – Existing table in performance_schema database.
This table is used by GR to show information about GR members. Information
of all members should be displayed.
4. replication_group_member_stats - Existing table in performance_schema
database. This table is used by GR to show information about GR members.
Information of all members should be displayed.
5. Flow Control - Mechanism in the group-replication protocol to avoid
having too much distance, in terms of transactions applied, between fast
and slow members. This is known as the flow control mechanism.
2) OVERVIEW
In this worklog we will extend existing performance_schema tables
"replication_group_members" and "replication_group_member_stats"
by adding new columns showing information of Group replication.
3) replication_group_members
I. Existing
Field Description
CHANNEL_NAME Name of the group replication channel.
MEMBER_ID Identifier for this member; same as the server UUID.
MEMBER_HOST Network address of this member (host name or IP address).
MEMBER_PORT Port on which the server is listening.
MEMBER_STATE Current state of this member; can be any one of the OFFLINE,
RECOVERING, ONLINE, UNREACHABLE and ERROR
II. Changes Suggested
New suggested columns:
Field Type Null Key Default Description
MEMBER_ROLE char(64) NO NULL Member role in a group; can be any of
the PRIMARY or SECONDARY
MEMBER_VERSION char(64) NO NULL The MySQL version of the member.
NOTE: Columns will be added in same sequence as listed above.
III. describe performance_schema.replication_group_member_stats
+----------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------+------+-----+---------+-------+
| CHANNEL_NAME | char(64) | NO | | NULL | |
| MEMBER_ID | char(36) | NO | | NULL | |
| MEMBER_HOST | char(60) | NO | | NULL | |
| MEMBER_PORT | int(11) | YES | | NULL | |
| MEMBER_STATE | char(64) | NO | | NULL | |
| MEMBER_ROLE | char(64) | NO | | NULL | |
| MEMBER_VERSION | char(64) | NO | | NULL | |
+----------------+----------+------+-----+---------+-------+
iv. Information for new columns i.e. "MEMBER_ROLE" and "MEMBER_VERSION"
will be fetched from Group_member_info.
4) replication_group_member_stats
At present table shows information of single member(self).
Table will be extended to show information of all members.
I. Existing
Field Description
CHANNEL_NAME Name of the group replication channel
VIEW_ID Current view identifier for this group
MEMBER_ID Identifier for this member; same as the
server UUID
COUNT_TRANSACTIONS_IN_QUEUE Number of transactions pending certification
COUNT_TRANSACTIONS_CHECKED Number of transactions already certified
COUNT_CONFLICTS_DETECTED Number of transactions that were negatively
certified
COUNT_TRANSACTIONS_ROWS_VALIDATING Number of transactions with which one can
execute certification with them, but have not been garbage collected
TRANSACTIONS_COMMITTED_ALL_MEMBERS Set of stable group transactions
LAST_CONFLICT_FREE_TRANSACTION Latest transaction certified without
conflicts
II. Changes Suggested
New suggested columns:
Field Type Null Key
Default Description
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE bigint(20) unsigned NO NULL
Transactions waiting apply, received from group.
COUNT_TRANSACTIONS_REMOTE_APPLIED bigint(20) unsigned NO NULL
Transactions applied, received from group.
COUNT_TRANSACTIONS_LOCAL_PROPOSED bigint(20) unsigned No NULL
Number of local transaction requested by member to Group Replication Plugin for
commit.
COUNT_TRANSACTIONS_LOCAL_ROLLBACK bigint(20) unsigned No NULL
Number of transaction originated from local member but rolledback at GROUP level
NOTE: Columns will be added in same sequence as listed above.
III. describe performance_schema.replication_group_member_stats
+--------------------------------------------+---------------------+------+-----
+---------+-------+
| Field | Type | Null | Key
| Default | Extra |
+--------------------------------------------+---------------------+------+-----
+---------+-------+
| CHANNEL_NAME | char(64) | NO |
| NULL | |
| VIEW_ID | char(60) | NO |
| NULL | |
| MEMBER_ID | char(36) | NO |
| NULL | |
| COUNT_TRANSACTIONS_IN_QUEUE | bigint(20) unsigned | NO |
| NULL | |
| COUNT_TRANSACTIONS_CHECKED | bigint(20) unsigned | NO |
| NULL | |
| COUNT_CONFLICTS_DETECTED | bigint(20) unsigned | NO |
| NULL | |
| COUNT_TRANSACTIONS_ROWS_VALIDATING | bigint(20) unsigned | NO |
| NULL | |
| TRANSACTIONS_COMMITTED_ALL_MEMBERS | longtext | NO |
| NULL | |
| LAST_CONFLICT_FREE_TRANSACTION | text | NO |
| NULL | |
| COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE | bigint(20) unsigned | NO |
| NULL | |
| COUNT_TRANSACTIONS_REMOTE_APPLIED | bigint(20) unsigned | NO |
| NULL | |
| COUNT_TRANSACTIONS_LOCAL_PROPOSED | bigint(20) unsigned | NO |
| NULL | |
| COUNT_TRANSACTIONS_LOCAL_ROLLBACK | bigint(20) unsigned | NO |
| NULL | |
+--------------------------------------------+---------------------+------+-----
+---------+-------+
iv. Information for local member(self) will be fetched from local structures.
i.e. from Certifier, applier_module and Pipeline_stats_member_collector.
Information for remote members will be fetched classes storing broadcasted
information i.e. "Pipeline_member_stats"
This will help to present latest information for local members,
rest of members will be behind by seconds(flow control module exchange
time) and Transaction identifiers by will be set to true for broadcast
every 30 seconds.
This also means during execution of query in local member, local member
information will be updated/differ from rest of members in group.
5) Message exchange
Below information needs to be exchanged between members which is not
presently done.
1. Exchange of additional information between members with flow control
exchanged messages:
i. COUNT_CONFLICTS_DETECTED
ii. COUNT_TRANSACTIONS_ROWS_VALIDATING
iii. COUNT_TRANSACTIONS_LOCAL_ROLLBACK
Already exchanged information with flow control exchanged messages that
will now be displayed in P_S.replication_group_member_stats:
i. COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE
ii. COUNT_TRANSACTIONS_REMOTE_APPLIED
iii. COUNT_TRANSACTION_LOCAL_PROPOSED
2. Exchange of Transaction Identifiers between members with flow control
exchanged messages:
i. TRANSACTIONS_COMMITTED_ALL_MEMBERS
ii. LAST_CONFLICT_FREE_TRANSACTION
Exchange "TRANSACTIONS_COMMITTED_ALL_MEMBERS" and
"LAST_CONFLICT_FREE_TRANSACTION" along with flow control exchanged messages
every 30-60 seconds.
New flags will be introduced, only when flags are set Transaction
identifiers will be broadcasted.
--------------------------------------------------------------------------------
1. Add new columns to SQL scripts of performance_schema and make corresponding
changes in Group Replication files to indicate new columns and similar callback
fetch function.
2. Make replication_group_member_stats multi-row. This indicates fetching number
of member rows, passing additional parameter index etc.
3. Exchange flow control number via Flow Control mechanism add new variables
and functions.
4. Implement new class Pipeline_stats_transaction_info_exchange and
Pipeline_stats_transaction_info_collector for exchange of GTID information
every 60 seconds.
5. Add variable of new class to Pipeline_member_stats for information fetching.
6. At present stats are collected from internal structures, change it to show
information from Flow Control.
--------------------------------------------------------------------------------
SUMMARY OF CHANGES
==================
modified: include/mysql/plugin_group_replication.h
modified: rapid/plugin/group_replication/include/applier.h
modified: rapid/plugin/group_replication/include/certifier.h
modified: rapid/plugin/group_replication/include/gcs_event_handlers.h
modified: rapid/plugin/group_replication/include/gcs_plugin_messages.h
modified: rapid/plugin/group_replication/include/member_info.h
modified: rapid/plugin/group_replication/include/pipeline_stats.h
modified: rapid/plugin/group_replication/include/plugin.h
modified: rapid/plugin/group_replication/include/ps_information.h
modified: rapid/plugin/group_replication/src/applier.cc
modified: rapid/plugin/group_replication/src/certifier.cc
modified: rapid/plugin/group_replication/src/gcs_event_handlers.cc
modified: rapid/plugin/group_replication/src/member_info.cc
modified: rapid/plugin/group_replication/src/observer_trans.cc
modified: rapid/plugin/group_replication/src/pipeline_stats.cc
modified: rapid/plugin/group_replication/src/plugin.cc
modified: rapid/plugin/group_replication/src/ps_information.cc
modified: scripts/mysql_system_tables.sql
modified: sql/rpl_group_replication.cc
modified: sql/rpl_group_replication.h
modified: storage/perfschema/table_replication_group_member_stats.cc
modified: storage/perfschema/table_replication_group_member_stats.h
modified: storage/perfschema/table_replication_group_members.cc
modified: storage/perfschema/table_replication_group_members.h
modified: rapid/plugin/group_replication/include/member_version.h
modified: rapid/plugin/group_replication/src/member_version.cc
SQL SCRIPT CHANGES
------------------
Worklog will add columns to existing table, tables are created via below
script file:
scripts/mysql_system_tables.sql
1. Addition of below columns to table
"performance_schema.replication_group_members"
A. MEMBER_ROLE CHAR(64) collate utf8_bin not null
B. MEMBER_VERSION CHAR(64) collate utf8_bin not null
2. Addition of below columns to table
"performance_schema.replication_group_member_stats"
A. COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE BIGINT unsigned not null
B. COUNT_TRANSACTIONS_REMOTE_APPLIED BIGINT unsigned not null
C. COUNT_TRANSACTIONS_LOCAL_PROPOSED BIGINT unsigned not null
D. COUNT_TRANSACTIONS_LOCAL_ROLLBACK BIGINT unsigned not null
3. Similar changes in file:
storage/perfschema/table_replication_group_members.cc
storage/perfschema/table_replication_group_member_stats.cc
A. Add details of new columns to static const TABLE_FIELD_TYPE field_types
B. Indicate increase in number of columns via
table_replication_group_members::m_field_def
Server core changes
-------------------
1A. Changes to make replication_group_member_stats multi-row:
make_row() -> make_row(uint index) and similar changes to below flow:
Add additional column of index through chain of function CALLBACK.
-> get_group_replication_group_member_stats_info
-> Group_replication_handler::get_group_member_stats_info
-> get_group_member_stats_info
-> plugin_get_group_member_stats
-> get_group_member_stats
1B. Change of function signature, to identify if local transaction needed
for local rollback:
rapid/plugin/group_replication/include/certifier.h
+ void update_certified_transaction_count(bool result,
bool local_transaction);
2A. Variables added for SQL tables:
storage/perfschema/table_replication_group_members.h
+ char member_role[NAME_LEN];
+ uint member_role_length;
+ char member_version[NAME_LEN];
+ uint member_version_length;
storage/perfschema/table_replication_group_member_stats.h
+ ulonglong trx_remote_in_applier_queue;
+ ulonglong trx_remote_applied;
+ ulonglong trx_local_proposed;
+ ulonglong trx_local_rollback;
2B. New methods to fetch data(callback methods to set each column)
storage/perfschema/table_replication_group_member_stats.cc
+ set_transactions_remote_in_applier_queue
To set Applier queue length
+ set_transactions_remote_applied
To set Applier queue length
+ set_transactions_local_proposed
To set local transaction queue length
+ set_transactions_local_rollback
To set local negatively certified transactions
storage/perfschema/table_replication_group_members.cc
+ set_member_version
To set remote member version
+ set_member_role
To set remote member role
3A. Changed function "get_group_members_info" and "get_group_member_stats":
rapid/plugin/group_replication/src/ps_information.cc
3B. Changes in function "get_group_members_info"
(collects information for table "replication_group_members")
Member version is already present in Group_member_info, we need to
convert it to Hex.
Member role is already present in Group_member_info.
New function get_member_role_string will be implemented to provide
role in string format.
3C. Changes in function "get_group_member_stats"
(collects information for table "replication_group_member_stats")
At present for table "replication_group_member_stats" column data is
collected from local structs/classes.
After code changes:
- MEMBER_ID information will be fetched from Group_member_info.
- VIEW_ID information will be fetched from gcs_module(no changes)
- Rest of information is fetched from "Pipeline_member_stats".
4. Addition of new function "get_version_string" in class "Member_version".
Function will help to retrieve version in string format.
There is already function get_version which returns version in
uint32 format.
+ const std::string get_version_string() const
Extension of Pipeline_stats_member_message
------------------------------------------
Some of transaction counts related information will be exchanged via existing
flow control message.
Since information is not big, transmitting every second will not have major
impact.
Transaction Identifiers will be transmitted every 30-60 seconds, information
has been tagged "Transaction Identifiers".
1A. New information being exchanged in class "Pipeline_stats_member_message"
+ // Length of the payload item: 8 bytes
+ PIT_TRANSACTIONS_LOCAL_ROLLBACK
+
+ // Length of the payload item: 8 bytes
+ PIT_TRANSACTIONS_CONFLICTS_DETECTED
+
+ // Length of the payload item: 8 bytes
+ PIT_TRANSACTIONS_ROWS_VALIDATING
// Transaction Identifiers
+ // Length of the payload item: variable
+ PIT_TRANSACTIONS_COMMITTED_ALL_MEMBERS
+ // Length of the payload item: variable
+ PIT_TRANSACTION_LAST_CONFLICT_FREE
1B. Corresponding variables and get methods:
+ int64 get_transactions_local_rollback();
+ int64 get_transactions_conflicts_detected();
+ int64 get_transactions_rows_validating();
+ int64 m_transactions_local_rollback;
+ int64 m_transactions_conflicts_detected;
+ int64 m_transactions_rows_validating;
// Transaction Identifiers get methods
+ const std::string& get_transactions_committed_all_members()
+ const std::string& get_transaction_last_conflict_free()
// Transaction Identifiers variables
+ bool m_have_transactions_committed_all_members;
+ std::string m_transactions_committed_all_members;
+ bool m_have_last_conflict_free_transaction;
+ std::string m_last_conflict_free_transaction;
Flags "m_have_transactions_committed_all_members" and
"m_have_last_conflict_free_transaction" will not be broadcasted.
May be bool flags not needed, string empty function will indicate if broadcast
is needed or not. Both methods flag and empty string will work.
"Pipeline_stats_member_collector" manages broadcast and will set
flags/fill string for broadcast. Hence flag is mandatory in
class "Pipeline_stats_member_collector".
During decode if "PIT_TRANSACTIONS_COMMITTED_ALL_MEMBERS" and
"PIT_TRANSACTION_LAST_CONFLICT_FREE" have been received flags will be
set/string will be filled.
2A. Local rollback counter will be maintained in class
"Pipeline_stats_member_collector"
Local transaction failure will be maintained here.
+ int64 m_transactions_local_rollback;
+ void increment_transactions_local_rollback();
// Transaction Identifiers
+ bool send_gtids;
// Class "Pipeline_stats_member_collector" actually collects information of
// flow control, creates packet(creates class object of
// "Pipeline_stats_member_message") and sends information.
// Flag "send_gtids" informs if Transaction Identifiers have to be broadcasted
// or not.
// We will set send_gtids after every 30 seconds to broadcast GTIDs.
// Gtids will go with next flow control message.
// Encode only when send_gtids is set.
2B. Class "Pipeline_stats_member_collector" maintains number of transaction
waiting apply, number of transaction certified, number of transaction
applied, number of local transaction proposed and number of local
transactions rolledback.
Local member information needs to be fetched from local structures to
provide latest information of local members. So get functions needs to
implemented.
+ int64 get_transactions_waiting_apply();
+ int64 get_transactions_certified();
+ int64 get_transactions_applied();
+ int64 get_transactions_local();
+ int64 get_transactions_local_rollback();
2C. In class Certifier, function update_certified_transaction_count will be
modified to pass additional local_transaction flag.
update_certified_transaction_count function will increment
increment_transactions_local_rollback i.e. Local transactions rolledback.
3. class "Pipeline_member_stats" will store new information:
+ int64 m_transactions_local_rollback;
+ int64 m_transactions_conflicts_detected;
+ int64 m_transactions_rows_validating;
// Transaction Identifiers
+ std::string tx_committed_all_members;
+ std::string tx_last_conflict_free;
4. For access to "Pipeline_member_stats" class "Flow_control_module"
need to return variable reference:
Pipeline_member_stats * get_pipeline_stats(std::string memberId)
// NOTE: Argument is GCS Member ID HOST:PORT and NOT UUID.
// Class Flow_control_module holds information of all members.
// To access individual member information i.e. object of class
// "Pipeline_member_stats", we need a function in class
// "Flow_control_module" that will return reference to the individual
// member information. get_pipeline_stats is going to server that purpose.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.