WL#10380: Group Replication: Monitoring improvements
Affects: Server-8.0
—
Status: Complete
This worklog goal is to improve Group Replication by introducing additional columns in existing performance schema. No existing table/column will be removed. There will not be any change in names, data types or order for existing tables and columns minimizing impact on customers. We are not creating new performance_schema tables for Group Replication. This worklog is related to extending existing GR performance schema tables "replication_group_member" and "replication_group_member_stats" by appending new columns at end, so previous column order is preserved. Since we are extending existing GR P_S tables, we expect minimum to no impact for existing customers. Since we are extending existing GR P_S tables, we expect all things to remain same w.r.t. compiling/enabling performance_schema, output of show variables etc, other then display of new additional columns. We need to show additional member related information like Role (Primary or Secondary) and Version. "replication_group_members" table shows information of all members(1 row/member), this needs to be maintained for new columns being added. At present only information related to certification is shown in GR table "replication_group_member_stats". We need to extend table "replication_group_member_stats" to show statistics of Applier, Local queue and related information that might help customers for analysis. Since some machines may be lagging, overloaded.., information of all members should be visible from any group member machine. At present customer can get stats information only by logging into local member of which information is needed.
FR 1. Functional Requirements for Table "replication_group_members". FR 1.1. MEMBER_ROLES - Table must show ROLE information of group member, PRIMARY, SECONDARY FR 1.2. MEMBER_VERSION - Table must show VERSION information of group members. FR 1.3. At present table shows information of all members, that must remain. FR 2. Functional Requirements for Table "replication_group_member_stats". FR 2.1. Table must show information of ALL members. At present only single row is shown indicating member(self) statistics. FR 2.2. Table must show information related to applier, mentioned below: FR 2.2.1. COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE - Number of transactions waiting to be applied FR 2.2.2. COUNT_TRANSACTIONS_REMOTE_APPLIED - Number of transactions applied FR 2.3. Table must show local member transactions information, mentioned below: FR 2.3.1. COUNT_TRANSACTION_LOCAL_PROPOSED - Number of transaction originated from local machine FR 2.3.2. COUNT_TRANSACTION_LOCAL_ROLLBACK - Number of transaction originated from local machine but rolledback at GROUP level (due to conflict)
1) Acronyms and DEFINITIONS Before going into details lets first present the terms that will be used in this design. 1. GR – Group Replication 2. P_S(performance_schema) – Existing MySQL database. MySQL should be compiled with P_S enabled. Since this work log is about extending existing P_S tables we will not get into more details of P_S. 3. replication_group_members – Existing table in performance_schema database. This table is used by GR to show information about GR members. Information of all members should be displayed. 4. replication_group_member_stats - Existing table in performance_schema database. This table is used by GR to show information about GR members. Information of all members should be displayed. 5. Flow Control - Mechanism in the group-replication protocol to avoid having too much distance, in terms of transactions applied, between fast and slow members. This is known as the flow control mechanism. 2) OVERVIEW In this worklog we will extend existing performance_schema tables "replication_group_members" and "replication_group_member_stats" by adding new columns showing information of Group replication. 3) replication_group_members I. Existing Field Description CHANNEL_NAME Name of the group replication channel. MEMBER_ID Identifier for this member; same as the server UUID. MEMBER_HOST Network address of this member (host name or IP address). MEMBER_PORT Port on which the server is listening. MEMBER_STATE Current state of this member; can be any one of the OFFLINE, RECOVERING, ONLINE, UNREACHABLE and ERROR II. Changes Suggested New suggested columns: Field Type Null Key Default Description MEMBER_ROLE char(64) NO NULL Member role in a group; can be any of the PRIMARY or SECONDARY MEMBER_VERSION char(64) NO NULL The MySQL version of the member. NOTE: Columns will be added in same sequence as listed above. III. describe performance_schema.replication_group_member_stats +----------------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------+----------+------+-----+---------+-------+ | CHANNEL_NAME | char(64) | NO | | NULL | | | MEMBER_ID | char(36) | NO | | NULL | | | MEMBER_HOST | char(60) | NO | | NULL | | | MEMBER_PORT | int(11) | YES | | NULL | | | MEMBER_STATE | char(64) | NO | | NULL | | | MEMBER_ROLE | char(64) | NO | | NULL | | | MEMBER_VERSION | char(64) | NO | | NULL | | +----------------+----------+------+-----+---------+-------+ iv. Information for new columns i.e. "MEMBER_ROLE" and "MEMBER_VERSION" will be fetched from Group_member_info. 4) replication_group_member_stats At present table shows information of single member(self). Table will be extended to show information of all members. I. Existing Field Description CHANNEL_NAME Name of the group replication channel VIEW_ID Current view identifier for this group MEMBER_ID Identifier for this member; same as the server UUID COUNT_TRANSACTIONS_IN_QUEUE Number of transactions pending certification COUNT_TRANSACTIONS_CHECKED Number of transactions already certified COUNT_CONFLICTS_DETECTED Number of transactions that were negatively certified COUNT_TRANSACTIONS_ROWS_VALIDATING Number of transactions with which one can execute certification with them, but have not been garbage collected TRANSACTIONS_COMMITTED_ALL_MEMBERS Set of stable group transactions LAST_CONFLICT_FREE_TRANSACTION Latest transaction certified without conflicts II. Changes Suggested New suggested columns: Field Type Null Key Default Description COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE bigint(20) unsigned NO NULL Transactions waiting apply, received from group. COUNT_TRANSACTIONS_REMOTE_APPLIED bigint(20) unsigned NO NULL Transactions applied, received from group. COUNT_TRANSACTIONS_LOCAL_PROPOSED bigint(20) unsigned No NULL Number of local transaction requested by member to Group Replication Plugin for commit. COUNT_TRANSACTIONS_LOCAL_ROLLBACK bigint(20) unsigned No NULL Number of transaction originated from local member but rolledback at GROUP level NOTE: Columns will be added in same sequence as listed above. III. describe performance_schema.replication_group_member_stats +--------------------------------------------+---------------------+------+----- +---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------------------------------------+---------------------+------+----- +---------+-------+ | CHANNEL_NAME | char(64) | NO | | NULL | | | VIEW_ID | char(60) | NO | | NULL | | | MEMBER_ID | char(36) | NO | | NULL | | | COUNT_TRANSACTIONS_IN_QUEUE | bigint(20) unsigned | NO | | NULL | | | COUNT_TRANSACTIONS_CHECKED | bigint(20) unsigned | NO | | NULL | | | COUNT_CONFLICTS_DETECTED | bigint(20) unsigned | NO | | NULL | | | COUNT_TRANSACTIONS_ROWS_VALIDATING | bigint(20) unsigned | NO | | NULL | | | TRANSACTIONS_COMMITTED_ALL_MEMBERS | longtext | NO | | NULL | | | LAST_CONFLICT_FREE_TRANSACTION | text | NO | | NULL | | | COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE | bigint(20) unsigned | NO | | NULL | | | COUNT_TRANSACTIONS_REMOTE_APPLIED | bigint(20) unsigned | NO | | NULL | | | COUNT_TRANSACTIONS_LOCAL_PROPOSED | bigint(20) unsigned | NO | | NULL | | | COUNT_TRANSACTIONS_LOCAL_ROLLBACK | bigint(20) unsigned | NO | | NULL | | +--------------------------------------------+---------------------+------+----- +---------+-------+ iv. Information for local member(self) will be fetched from local structures. i.e. from Certifier, applier_module and Pipeline_stats_member_collector. Information for remote members will be fetched classes storing broadcasted information i.e. "Pipeline_member_stats" This will help to present latest information for local members, rest of members will be behind by seconds(flow control module exchange time) and Transaction identifiers by will be set to true for broadcast every 30 seconds. This also means during execution of query in local member, local member information will be updated/differ from rest of members in group. 5) Message exchange Below information needs to be exchanged between members which is not presently done. 1. Exchange of additional information between members with flow control exchanged messages: i. COUNT_CONFLICTS_DETECTED ii. COUNT_TRANSACTIONS_ROWS_VALIDATING iii. COUNT_TRANSACTIONS_LOCAL_ROLLBACK Already exchanged information with flow control exchanged messages that will now be displayed in P_S.replication_group_member_stats: i. COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE ii. COUNT_TRANSACTIONS_REMOTE_APPLIED iii. COUNT_TRANSACTION_LOCAL_PROPOSED 2. Exchange of Transaction Identifiers between members with flow control exchanged messages: i. TRANSACTIONS_COMMITTED_ALL_MEMBERS ii. LAST_CONFLICT_FREE_TRANSACTION Exchange "TRANSACTIONS_COMMITTED_ALL_MEMBERS" and "LAST_CONFLICT_FREE_TRANSACTION" along with flow control exchanged messages every 30-60 seconds. New flags will be introduced, only when flags are set Transaction identifiers will be broadcasted.
-------------------------------------------------------------------------------- 1. Add new columns to SQL scripts of performance_schema and make corresponding changes in Group Replication files to indicate new columns and similar callback fetch function. 2. Make replication_group_member_stats multi-row. This indicates fetching number of member rows, passing additional parameter index etc. 3. Exchange flow control number via Flow Control mechanism add new variables and functions. 4. Implement new class Pipeline_stats_transaction_info_exchange and Pipeline_stats_transaction_info_collector for exchange of GTID information every 60 seconds. 5. Add variable of new class to Pipeline_member_stats for information fetching. 6. At present stats are collected from internal structures, change it to show information from Flow Control. -------------------------------------------------------------------------------- SUMMARY OF CHANGES ================== modified: include/mysql/plugin_group_replication.h modified: rapid/plugin/group_replication/include/applier.h modified: rapid/plugin/group_replication/include/certifier.h modified: rapid/plugin/group_replication/include/gcs_event_handlers.h modified: rapid/plugin/group_replication/include/gcs_plugin_messages.h modified: rapid/plugin/group_replication/include/member_info.h modified: rapid/plugin/group_replication/include/pipeline_stats.h modified: rapid/plugin/group_replication/include/plugin.h modified: rapid/plugin/group_replication/include/ps_information.h modified: rapid/plugin/group_replication/src/applier.cc modified: rapid/plugin/group_replication/src/certifier.cc modified: rapid/plugin/group_replication/src/gcs_event_handlers.cc modified: rapid/plugin/group_replication/src/member_info.cc modified: rapid/plugin/group_replication/src/observer_trans.cc modified: rapid/plugin/group_replication/src/pipeline_stats.cc modified: rapid/plugin/group_replication/src/plugin.cc modified: rapid/plugin/group_replication/src/ps_information.cc modified: scripts/mysql_system_tables.sql modified: sql/rpl_group_replication.cc modified: sql/rpl_group_replication.h modified: storage/perfschema/table_replication_group_member_stats.cc modified: storage/perfschema/table_replication_group_member_stats.h modified: storage/perfschema/table_replication_group_members.cc modified: storage/perfschema/table_replication_group_members.h modified: rapid/plugin/group_replication/include/member_version.h modified: rapid/plugin/group_replication/src/member_version.cc SQL SCRIPT CHANGES ------------------ Worklog will add columns to existing table, tables are created via below script file: scripts/mysql_system_tables.sql 1. Addition of below columns to table "performance_schema.replication_group_members" A. MEMBER_ROLE CHAR(64) collate utf8_bin not null B. MEMBER_VERSION CHAR(64) collate utf8_bin not null 2. Addition of below columns to table "performance_schema.replication_group_member_stats" A. COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE BIGINT unsigned not null B. COUNT_TRANSACTIONS_REMOTE_APPLIED BIGINT unsigned not null C. COUNT_TRANSACTIONS_LOCAL_PROPOSED BIGINT unsigned not null D. COUNT_TRANSACTIONS_LOCAL_ROLLBACK BIGINT unsigned not null 3. Similar changes in file: storage/perfschema/table_replication_group_members.cc storage/perfschema/table_replication_group_member_stats.cc A. Add details of new columns to static const TABLE_FIELD_TYPE field_types B. Indicate increase in number of columns via table_replication_group_members::m_field_def Server core changes ------------------- 1A. Changes to make replication_group_member_stats multi-row: make_row() -> make_row(uint index) and similar changes to below flow: Add additional column of index through chain of function CALLBACK. -> get_group_replication_group_member_stats_info -> Group_replication_handler::get_group_member_stats_info -> get_group_member_stats_info -> plugin_get_group_member_stats -> get_group_member_stats 1B. Change of function signature, to identify if local transaction needed for local rollback: rapid/plugin/group_replication/include/certifier.h + void update_certified_transaction_count(bool result, bool local_transaction); 2A. Variables added for SQL tables: storage/perfschema/table_replication_group_members.h + char member_role[NAME_LEN]; + uint member_role_length; + char member_version[NAME_LEN]; + uint member_version_length; storage/perfschema/table_replication_group_member_stats.h + ulonglong trx_remote_in_applier_queue; + ulonglong trx_remote_applied; + ulonglong trx_local_proposed; + ulonglong trx_local_rollback; 2B. New methods to fetch data(callback methods to set each column) storage/perfschema/table_replication_group_member_stats.cc + set_transactions_remote_in_applier_queue To set Applier queue length + set_transactions_remote_applied To set Applier queue length + set_transactions_local_proposed To set local transaction queue length + set_transactions_local_rollback To set local negatively certified transactions storage/perfschema/table_replication_group_members.cc + set_member_version To set remote member version + set_member_role To set remote member role 3A. Changed function "get_group_members_info" and "get_group_member_stats": rapid/plugin/group_replication/src/ps_information.cc 3B. Changes in function "get_group_members_info" (collects information for table "replication_group_members") Member version is already present in Group_member_info, we need to convert it to Hex. Member role is already present in Group_member_info. New function get_member_role_string will be implemented to provide role in string format. 3C. Changes in function "get_group_member_stats" (collects information for table "replication_group_member_stats") At present for table "replication_group_member_stats" column data is collected from local structs/classes. After code changes: - MEMBER_ID information will be fetched from Group_member_info. - VIEW_ID information will be fetched from gcs_module(no changes) - Rest of information is fetched from "Pipeline_member_stats". 4. Addition of new function "get_version_string" in class "Member_version". Function will help to retrieve version in string format. There is already function get_version which returns version in uint32 format. + const std::string get_version_string() const Extension of Pipeline_stats_member_message ------------------------------------------ Some of transaction counts related information will be exchanged via existing flow control message. Since information is not big, transmitting every second will not have major impact. Transaction Identifiers will be transmitted every 30-60 seconds, information has been tagged "Transaction Identifiers". 1A. New information being exchanged in class "Pipeline_stats_member_message" + // Length of the payload item: 8 bytes + PIT_TRANSACTIONS_LOCAL_ROLLBACK + + // Length of the payload item: 8 bytes + PIT_TRANSACTIONS_CONFLICTS_DETECTED + + // Length of the payload item: 8 bytes + PIT_TRANSACTIONS_ROWS_VALIDATING // Transaction Identifiers + // Length of the payload item: variable + PIT_TRANSACTIONS_COMMITTED_ALL_MEMBERS + // Length of the payload item: variable + PIT_TRANSACTION_LAST_CONFLICT_FREE 1B. Corresponding variables and get methods: + int64 get_transactions_local_rollback(); + int64 get_transactions_conflicts_detected(); + int64 get_transactions_rows_validating(); + int64 m_transactions_local_rollback; + int64 m_transactions_conflicts_detected; + int64 m_transactions_rows_validating; // Transaction Identifiers get methods + const std::string& get_transactions_committed_all_members() + const std::string& get_transaction_last_conflict_free() // Transaction Identifiers variables + bool m_have_transactions_committed_all_members; + std::string m_transactions_committed_all_members; + bool m_have_last_conflict_free_transaction; + std::string m_last_conflict_free_transaction; Flags "m_have_transactions_committed_all_members" and "m_have_last_conflict_free_transaction" will not be broadcasted. May be bool flags not needed, string empty function will indicate if broadcast is needed or not. Both methods flag and empty string will work. "Pipeline_stats_member_collector" manages broadcast and will set flags/fill string for broadcast. Hence flag is mandatory in class "Pipeline_stats_member_collector". During decode if "PIT_TRANSACTIONS_COMMITTED_ALL_MEMBERS" and "PIT_TRANSACTION_LAST_CONFLICT_FREE" have been received flags will be set/string will be filled. 2A. Local rollback counter will be maintained in class "Pipeline_stats_member_collector" Local transaction failure will be maintained here. + int64 m_transactions_local_rollback; + void increment_transactions_local_rollback(); // Transaction Identifiers + bool send_gtids; // Class "Pipeline_stats_member_collector" actually collects information of // flow control, creates packet(creates class object of // "Pipeline_stats_member_message") and sends information. // Flag "send_gtids" informs if Transaction Identifiers have to be broadcasted // or not. // We will set send_gtids after every 30 seconds to broadcast GTIDs. // Gtids will go with next flow control message. // Encode only when send_gtids is set. 2B. Class "Pipeline_stats_member_collector" maintains number of transaction waiting apply, number of transaction certified, number of transaction applied, number of local transaction proposed and number of local transactions rolledback. Local member information needs to be fetched from local structures to provide latest information of local members. So get functions needs to implemented. + int64 get_transactions_waiting_apply(); + int64 get_transactions_certified(); + int64 get_transactions_applied(); + int64 get_transactions_local(); + int64 get_transactions_local_rollback(); 2C. In class Certifier, function update_certified_transaction_count will be modified to pass additional local_transaction flag. update_certified_transaction_count function will increment increment_transactions_local_rollback i.e. Local transactions rolledback. 3. class "Pipeline_member_stats" will store new information: + int64 m_transactions_local_rollback; + int64 m_transactions_conflicts_detected; + int64 m_transactions_rows_validating; // Transaction Identifiers + std::string tx_committed_all_members; + std::string tx_last_conflict_free; 4. For access to "Pipeline_member_stats" class "Flow_control_module" need to return variable reference: Pipeline_member_stats * get_pipeline_stats(std::string memberId) // NOTE: Argument is GCS Member ID HOST:PORT and NOT UUID. // Class Flow_control_module holds information of all members. // To access individual member information i.e. object of class // "Pipeline_member_stats", we need a function in class // "Flow_control_module" that will return reference to the individual // member information. get_pipeline_stats is going to server that purpose.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.