WL#3656: PERFORMANCE SCHEMA table for SHOW SLAVE STATUS
Affects: Server-5.7
—
Status: Complete
AIM: === To Implement Performance Schema (P_S) tables for the information returned by SHOW SLAVE STATUS. BENEFITS: ======== - The information on slave status can be searched with ad hoc queries, - Pieces of information can be assigned to variables, and thus used in stored procedures, - Easier testing to let SELECT item from Performance_schema tables, - We can get rid of a few old fields to present only relevant information and not expose too much internals, - We can structure the information better, to align it with multi-source replication and multi-threaded slave. Earlier implemented as I_S plugin https://launchpad.net/is-replication-status unfortunately does not fit to definition of I_S which is to contain the meta info not any kind of transient data. RELATED BUGS: ============ | MySQL BUG# | STATUS | |------------+----------| | BUG#50316 | CLOSED | | BUG#56414 | CLOSED | | BUG#35994 | CLOSED | |------------+----------| USER DOCUMENTATION: =================== http://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication- tables.html
Functional requirements: F1: P_S tables must be read-only, avaialble for SQL queries (SELECT ...) as for regular tables. F2: Before CHANGE MASTER statement P_S table should be empty. F3: After CHANGE MASTER statement P_S table should be filled by a data. F4: START/STOP SLAVE statements and running slave should update P_S "status" tables. F5: Concurent access to P_S replication tables must not affect to replication (no locks, no errors, etc) Non-functional requirements: NF1: P_S for SHOW SLAVE STATUS data and SHOW SLAVE STATUS output should contain same values except new/removed fields NF2: Disabed P_S for the server shouldn't affect to the behavior of replication.
1. EXPECTATIONS FROM THE P_S TABLES: ================================ 1.1. The new P_S tables must follow all conventions of P_S database, including the table definition is static and therefore to exists at bootstrap. 1.2. The type of the attributes must correspond to their internal representations in the server. 1.3. The P_S tables must be not updatable by the user. 1.4. The P_S tables must be upgradable. It should be easy to add a new attribute, modify an existing one in future. 2. COMPARISON TO FIELDS SHOWN IN "SHOW SLAVE STATUS": ================================================ The current MySQL replication capabilities are implemented using various threads that can be broadly classified into the following categories: (a) An IO thread which is responsible for sending the transacions executed on the master server to the slave(s). (b) A coordinator thread to coordinate with the applier threads and distribute the replication task between these applier threads. (c) A set of applier threads which apply the transactions executed on the master server at the slaves. Currently, the status of these threads is unnecessarily coupled. The current worklog organizes all this slave status information into six P_S tables. In addition to that, this worklog will also change the names of various fields in SHOW SLAVE STATUS. The P_S tables are presented below. To show an easy correlation, the corresponding fields in SHOW SLAVE STATUS, have also been mentioned below under the column "OLD NAME". Note: CHANNEL_NAME will be added to all the P_S tables below as a part of WL#1697(Multi Source Replication). Note: The table titled replication_connection_configuration ( sction 2.1 below) shows fields that are similar to those shown by mysql.slave_master_info table. The idea is to keep the mysql.slave_master_info table for recovery and persistence purposes and the table in section 2.1 below for monitoring replication. 2.1 ================================================================= performance_schema.replication_connection_configuration ----------------------------------------------------------------- NEW NAME OLD NAME ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697. HOST Master_Host PORT Master_Port USER Master_User NETWORK_INTERFACE Master_Bind AUTO_POSITION Auto_Position SSL_ALLOWED Master_SSL_Allowed SSL_CA_FILE Master_SSL_CA_File SSL_CA_PATH Master_SSL_CA_Path SSL_CERTIFICATE Master_SSL_Cert SSL_CIPHER Master_SSL_Cipher SSL_KEY Master_SSL_Key SSL_VERIFY_SERVER_CERTIFICATE Master_SSL_Verify_Server_Cert SSL_CRL_FILE Master_SSL_Crl SSL_CRL_PATH Master_SSL_Crlpath CONNECTION_RETRY_INTERVAL Connect_Retry CONNECTION_RETRY_COUNT Master_Retry_Count ================================================================= 2.2 ================================================================= performance_schema.replication_connection_status ----------------------------------------------------------------- NEW NAME OLD NAME ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697. SOURCE_UUID MASTER_UUID THREAD_IDSERVICE_STATE Slave_IO_Running RECEIVED_TRANSACTION_SET Retrieved_Gtid_Set LAST_ERROR_NUMBER Last_IO_Errno LAST_ERROR_MESSAGE Last_IO_Error LAST_ERROR_TIMESTAMP Last_IO_Error_Timestamp ================================================================= Note: If WL#1697 is implemented so as to have one IO thread and one coordinator thread, we will consider re-organizing tables 2.3, 2.4 and 2.5. The present WL will implement tables as defined below. If the need be, WL#1697 will change the implementation. 2.3 ================================================================= performance_schema.replication_execute_configuration ----------------------------------------------------------------- NEW NAME OLD NAME ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697. DESIRED_DELAY SQL_Delay ================================================================= 2.4 ================================================================= performance_schema.replication_execute_status ----------------------------------------------------------------- NEW NAME OLD NAME ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697. SERVICE_STATE REMAINING_DELAY SQL_Remaining_Delay ================================================================= Note: The errors shown in Last_SQL_Error in SHOW SLAVE STATUS has been divided into the two tables in 2.5 and 2.6 below. Same is the case with Last_SQL_Errno and Last_SQL_Error_Timestamp. The idea is to ensures that no two threads share the same storage for error messages/numbers/timestamps. 2.5 ================================================================= performance_schema.replication_execute_status_by_coordinator ----------------------------------------------------------------- NEW NAME OLD NAME ----------------------------------------------------------------- THREAD_ID(PK) CHANNEL_NAME New field, will be added by WL#1697. SERVICE_STATE slave_SQL_Running LAST_ERROR_NUMBER Last_SQL_Errno LAST_ERROR_MESSAGE Last_SQL_Error LAST_ERROR_TIMESTAMP Last_SQL_Error_Timestamp ================================================================= 2.6 ================================================================= performance_schema.replication_execute_status_by_worker ----------------------------------------------------------------- NEW NAME OLD NAME ----------------------------------------------------------------- WORKER_ID(PK) THREAD_ID CHANNEL_NAME New field, will be added by WL#1697 SERVICE_STATE LAST_SEEN_TRANSACTION LAST_ERROR_NUMBER Last_SQL_Errno LAST_ERROR_MESSAGE Last_SQL_Error LAST_ERROR_TIMESTAMP Last_SQL_Error_Timestamp ================================================================= 3. DATA TYPES USED TO STORE THE FIELDS IN P_S TABLES: ================================================= The tables below show the data types/size used to define the DDL of the P_S tables added as a part of this worklog. After STOP SLAVE, the thread_id field will show NULL, the statistics for that thread will remain the same as at that point when STOP SLAVE was executed. These stats are reset on next START SLAVE. 3.1 ================================================================= performance_schema.replication_connection_configuration ----------------------------------------------------------------- FIELD NAME DATA TYPE ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697 HOST char(60) PORT int USER char(16) NETWORK_INTERFACE char(60) AUTO_POSITION tinyint(1) SSL_ALLOWED enum('Yes', 'No', 'Ignored') SSL_CA_FILE varchar (FN_REFLEN) SSL_CA_PATH varchar (FN_REFLEN) SSL_CERTIFICATE varchar (FN_REFLEN) SSL_CIPHER varchar (FN_REFLEN) SSL_KEY varchar (FN_REFLEN) SSL_VERIFY_SERVER_CERTIFICATE enum('Yes', 'No') SSL_CRL_FILE varchar (FN_REFLEN) SSL_CRL_PATH varchar (FN_REFLEN) CONNECTION_RETRY_INTERVAL uint CONNECTION_RETRY_COUNT ulong ================================================================= 3.2 ================================================================= performance_schema.replication_connection_status ----------------------------------------------------------------- FIELD NAME DATA TYPE ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697 SOURCE_UUID char(36) THREAD_ID bigint unsigned SERVICE_STATE enum('On', 'Off', 'Conecting') RECEIVED_TRANSACTION_SET text LAST_ERROR_NUMBER uint LAST_ERROR_MESSAGE varchar (MAX_SLAVE_ERRMSG) LAST_ERROR_TIMESTAMP timestamp ================================================================= 3.3 ================================================================= performance_schema.replication_execute_configuration ----------------------------------------------------------------- FIELD NAME DATA TYPE ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697 DESIRED_DELAY uint32 ================================================================= 3.4 ================================================================= performance_schema.replication_execute_status ----------------------------------------------------------------- FIELD NAME DATA TYPE ----------------------------------------------------------------- CHANNEL_NAME(PK) New field, will be added by WL#1697 SERVICE_STATE enum('On', 'Off') REMAINING_DELAY integer unsigned ================================================================= 3.5 ================================================================= performance_schema.replication_execute_status_by_coordinator ----------------------------------------------------------------- FIELD NAME DATA TYPE ----------------------------------------------------------------- THREAD_ID(PK) bigint unsigned CHANNEL_NAME New field, will be added by WL#1697 SERVICE_STATE enum('On', 'Off') LAST_ERROR_NUMBER uint, LAST_ERROR_MESSAGE varchar (MAX_SLAVE_ERRMSG) LAST_ERROR_TIMESTAMP timestamp ================================================================= 3.6 ================================================================= performance_schema.replication_execute_status_by_worker ----------------------------------------------------------------- FIELD NAME DATA TYPE ----------------------------------------------------------------- WORKER_ID(PK) bigint THREAD_ID bigint unsigned CHANNEL_NAME New field, will be added by WL#1697 SERVICE_STATE enum('On','Off') LAST_SEEN_TRANSACTION char(57) LAST_ERROR_NUMBER uint LAST_ERROR_MESSAGE varchar (MAX_SLAVE_ERRMSG) LAST_ERROR_TIMESTAMP timestamp ================================================================= 4. REMOVED FIELDS: ============== SHOW SLAVE STATUS has a number of fields which, the replication team thinks, will be removed in future. Below is a list of fields that were a part of the output of SHOW SLAVE STATUS but will not be found in the P_S tables. 4.1 The following fields use filename and offset. We should rely only on GTIDs instead. Master_Log_File Read_Master_Log_Pos Relay_Log_File Relay_Log_Pos RELAY_MASTER_LOG_FILE EXEC_MASTER_POS_LOG UNTIL_CONDITION Until_Log_File Until_Log_Pos 4.2 The following fields use server_id. We should rely on server_uuid instead. MASTER_SERVER_ID Replicate_Ignore_Server_Ids 4.3 "Skip_Counter" uses the event count. We should not show 'events' to user, we should show transactions if anything. This option does not work with GTIDs already. 4.4 SECONDS_BEHIND_MASTER is is buggy. 4.5 MASTER_INFO_FILE refers to master.info file, which is not very useful now that we have crash-safe slave tables. The replication team is planning to deprecate this in future. Unless deprecated, the field, if needed, can be obtained from SHOW SLAVE STATUS. We ensure that we will not remove SHOW SLAVE STATUS unless we remove master.info file. 4.6 The following fields didn't appear to be very useful in the first place, they contain the last error by the IO thread *or* the SQL thread, so if there is an error in both threads one of them is overwritten. Instead we have errors per thread. LAST_ERROR LAST_ERRNO 4.7 Probably, we can re-implement these legacy filters as Python filters later. The idea is to come up with better filtering logic. Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table 4.8 In future we could have multiple receiver threads. So, "Slave_IO_State". doesnt really fit in. We may add it later if required. For now, the user can get that from processlist anyway. 4.9 "Relay_Log_Space" will appear together with other fields like BYTES_APPLIED, TRANSACTIONS_QUEUED, TRANSACTIONS_APPLIED etc in a table for replication stats. 4.10 "Executed_Gtid_Set" can show a huge set with a lot of text. Instead we have decided to show GTIDs of transactions that are currently being executed by the appliers. The set of executed GTIDs can alternatively be obtained by reading @@gtid_executed. 5. ADDED FIELDS: ============ 5.1 Prior to this WL, errors were not shown per worker. Coordinator thread was responsible for showing the error in a worker thread. The PS table in section 2.6 titled "replication_execute_status_by_worker" contains one row per worker. Added Last_Error_Number/Message/Timestamp fields to report errors per worker. 5.2 LAST_SEEN_TRANSACTION: Added this field to The PS table in section 2.6 titled "replication_execute_status_by_worker". Below are a few scenarios and the values seen in this column in corresponding cases: 5.2.1 gtid-mode=off: ANONYMOUS (similar to gtid-next) indicating that transactions do not have global identifiers, and are identified by file and position only. 5.2.2 gtid-mode=on: 5.2.2.1 No transaction executed yet: ""(empty) 5.2.2.2 When a transaction is executed, this field is set as soon as gtid-next is set. So, starting from this moment, this field will always show a gtid. 5.2.2.3 The gtid shown in 5.2.2.2 is preserved until the next transaction is executed. The value is also preserved in case of an error in which case the gtid of the transaction being executed at the moment error occurred is shown. 5.2.2.4 When the next gtid_log_event is picked up by this worker thread this field is updated soon after gtid_next is set. 5.3 WORKER_ID is the unique identifier of a worker. On STOP SLAVE, the thread_id field becomes NULL. The worker_id is preserved. 6. MODIFICATIONS IN SHOW SLAVE STATUS(SSS): ================================== Since this work is about an interface for monitoring replication health and performance, the modification in the original interface(i.e.. SSS) is kept to a minimum. The only modified field in SSS was Last_SQL_Error. 6.1 Changes in Last_SQL_Error field in SSS: -------------------------------------- 6.1.1 There is NO change in SSS when the slave is in working with MTS=off i.e.. SQL thread is the only applier. 6.1.2 When MTS=on, the Last_SQL_Error field in SSS shows exactly what the Last_Error_Message field in the table performance_schema.replication_execute_status_by_coordinator shows. This field is modified to suggest that there may be more failures in the other worker threads which can be seen in the table performance_schema.replication_execute_status_by_worker. This table shows the per worker error, if any. If this table is not available, the slave error log can be used. The log or the table performance_schema.replication_execute_status_by_worker should also be used to know more about the failure shown in coordinator's table or SSS. 7. LIFE CYCLE OF PS TABLES: ======================== 7.1 Prior to CHANGE MASTER, the PS tables are empty. 7.2.1 After CHANGE_MASTER, the tables can be seen for the configuration parameters. The Thread_Id columns at this point will show a NULL as there are no active slave threads at this point of time. The Service_State columns at this point will say "Off". 7.2.2 The table replication_execute_status_by_worker is an exception to the rules mentioned in section 7.2.1. This table is supposed to be active only when the slave is operating in multi-threaded mode. So, if the variable "slave_parallel_workers" is not set, this table will show an empty set. The number of rows in this table shows number of workers. Further, this table is filled up for the first time at START SLAVE. 7.3 The thread_ids can be seen after START SLAVE. Service_State= "On" marks an active thread. 7.4 The tables are preserved after STOP SLAVE or in case of threads dying due to an error.
1. FILES ADDED: =========== For each of the four P_S tables two files have been added. 1.1 replication_connection_configuration -table_replication_connection_configuration.h -table_replication_connection_configuration.cc 1.2 replication_connection_status -table_replication_connection_status.h -table_replication_connection_status.cc 1.3 replication_execute_configuration -table_replication_execute_configuration.h -table_replication_execute_configuration.cc 1.4 replication_execute_status -table_replication_execute_status.h -table_replication_execute_status.cc 1.5 replication_execute_status_by_coordinator -table_replication_execute_status_by_coordinator.h -table_replication_execute_status_by_coordinator.cc 1.6 replication_execute_status_by_worker -table_replication_execute_status_by_worker.h -table_replication_execute_status_by_worker.cc 2. CLASSES ADDED: ============= Each ".h" file has a class declaration. The class declarations are similar. So, the present documentation shows only one of them: /** Table PERFORMANCE_SCHEMA.REPLICATION_CONNECTION_CONFIGURATION. */ class table_replication_connection_configuration: public PFS_engine_table { private: void fill_rows(Master_info *); /** Table share lock. */ static THR_LOCK m_table_lock; /** Fields definition. */ static TABLE_FIELD_DEF m_field_def; /** Current row */ st_row_connect_config m_row; /** True if the table is filled up */ bool m_filled; /** Current position. */ PFS_simple_index m_pos; /** Next position. */ PFS_simple_index m_next_pos; protected: /** Read the current row values. @param table Table handle @param buf row buffer @param fields Table fields @param read_all true if all columns are read. */ virtual int read_row_values(TABLE *table, unsigned char *buf, Field **fields, bool read_all); table_replication_receive_status(); public: ~table_replication_receive_status(); /** Table share. */ static PFS_engine_table_share m_share; static PFS_engine_table* create(); virtual int rnd_next(); virtual int rnd_pos(const void *pos); virtual void reset_position(void); }; 3. Before this WL, the DYNAMIC_ARRAY workers inside the class Relay_log_info was destruted at STOP SLAVE. This is now delayed until START SLAVE. The array is used to extract values to be displayed in the table if a SELECT query is fired on the worker table after STOP SLAVE. Thus, the statistics in the table are updated at START SLAVE, except for the service_state and thread_id columns.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.