WL#3656: PERFORMANCE SCHEMA table for SHOW SLAVE STATUS
Affects: Server-5.7
—
Status: Complete
AIM: === To Implement Performance Schema (P_S) tables for the information returned by SHOW SLAVE STATUS. BENEFITS: ======== - The information on slave status can be searched with ad hoc queries, - Pieces of information can be assigned to variables, and thus used in stored procedures, - Easier testing to let SELECT item from Performance_schema tables, - We can get rid of a few old fields to present only relevant information and not expose too much internals, - We can structure the information better, to align it with multi-source replication and multi-threaded slave. Earlier implemented as I_S plugin https://launchpad.net/is-replication-status unfortunately does not fit to definition of I_S which is to contain the meta info not any kind of transient data. RELATED BUGS: ============ | MySQL BUG# | STATUS | |------------+----------| | BUG#50316 | CLOSED | | BUG#56414 | CLOSED | | BUG#35994 | CLOSED | |------------+----------| USER DOCUMENTATION: =================== http://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication- tables.html
Functional requirements: F1: P_S tables must be read-only, avaialble for SQL queries (SELECT ...) as for regular tables. F2: Before CHANGE MASTER statement P_S table should be empty. F3: After CHANGE MASTER statement P_S table should be filled by a data. F4: START/STOP SLAVE statements and running slave should update P_S "status" tables. F5: Concurent access to P_S replication tables must not affect to replication (no locks, no errors, etc) Non-functional requirements: NF1: P_S for SHOW SLAVE STATUS data and SHOW SLAVE STATUS output should contain same values except new/removed fields NF2: Disabed P_S for the server shouldn't affect to the behavior of replication.
1. EXPECTATIONS FROM THE P_S TABLES:
================================
1.1. The new P_S tables must follow all conventions of P_S database,
including the table definition is static and therefore to exists at
bootstrap.
1.2. The type of the attributes must correspond to their internal
representations in the server.
1.3. The P_S tables must be not updatable by the user.
1.4. The P_S tables must be upgradable. It should be easy to add a new
attribute, modify an existing one in future.
2. COMPARISON TO FIELDS SHOWN IN "SHOW SLAVE STATUS":
================================================
The current MySQL replication capabilities are implemented using various
threads that can be broadly classified into the following categories:
(a) An IO thread which is responsible for sending the transacions
executed on the master server to the slave(s).
(b) A coordinator thread to coordinate with the applier threads and
distribute the replication task between these applier threads.
(c) A set of applier threads which apply the transactions executed on
the master server at the slaves.
Currently, the status of these threads is unnecessarily coupled. The current
worklog organizes all this slave status information into six P_S tables.
In addition to that, this worklog will also change the names of various
fields in SHOW SLAVE STATUS. The P_S tables are presented below. To
show an easy correlation, the corresponding fields in SHOW SLAVE STATUS,
have also been mentioned below under the column "OLD NAME".
Note: CHANNEL_NAME will be added to all the P_S tables below as a
part of WL#1697(Multi Source Replication).
Note: The table titled replication_connection_configuration ( sction 2.1
below) shows fields that are similar to those shown by
mysql.slave_master_info table. The idea is to keep the
mysql.slave_master_info table for recovery and persistence purposes and
the table in section 2.1 below for monitoring replication.
2.1
=================================================================
performance_schema.replication_connection_configuration
-----------------------------------------------------------------
NEW NAME OLD NAME
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697.
HOST Master_Host
PORT Master_Port
USER Master_User
NETWORK_INTERFACE Master_Bind
AUTO_POSITION Auto_Position
SSL_ALLOWED Master_SSL_Allowed
SSL_CA_FILE Master_SSL_CA_File
SSL_CA_PATH Master_SSL_CA_Path
SSL_CERTIFICATE Master_SSL_Cert
SSL_CIPHER Master_SSL_Cipher
SSL_KEY Master_SSL_Key
SSL_VERIFY_SERVER_CERTIFICATE Master_SSL_Verify_Server_Cert
SSL_CRL_FILE Master_SSL_Crl
SSL_CRL_PATH Master_SSL_Crlpath
CONNECTION_RETRY_INTERVAL Connect_Retry
CONNECTION_RETRY_COUNT Master_Retry_Count
=================================================================
2.2
=================================================================
performance_schema.replication_connection_status
-----------------------------------------------------------------
NEW NAME OLD NAME
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697.
SOURCE_UUID MASTER_UUID
THREAD_ID
SERVICE_STATE Slave_IO_Running
RECEIVED_TRANSACTION_SET Retrieved_Gtid_Set
LAST_ERROR_NUMBER Last_IO_Errno
LAST_ERROR_MESSAGE Last_IO_Error
LAST_ERROR_TIMESTAMP Last_IO_Error_Timestamp
=================================================================
Note: If WL#1697 is implemented so as to have one IO thread and one
coordinator thread, we will consider re-organizing tables 2.3, 2.4
and 2.5. The present WL will implement tables as defined below. If the
need be, WL#1697 will change the implementation.
2.3
=================================================================
performance_schema.replication_execute_configuration
-----------------------------------------------------------------
NEW NAME OLD NAME
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697.
DESIRED_DELAY SQL_Delay
=================================================================
2.4
=================================================================
performance_schema.replication_execute_status
-----------------------------------------------------------------
NEW NAME OLD NAME
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697.
SERVICE_STATE
REMAINING_DELAY SQL_Remaining_Delay
=================================================================
Note: The errors shown in Last_SQL_Error in SHOW SLAVE STATUS has been
divided into the two tables in 2.5 and 2.6 below. Same is the case
with Last_SQL_Errno and Last_SQL_Error_Timestamp. The idea is to
ensures that no two threads share the same storage for error
messages/numbers/timestamps.
2.5
=================================================================
performance_schema.replication_execute_status_by_coordinator
-----------------------------------------------------------------
NEW NAME OLD NAME
-----------------------------------------------------------------
THREAD_ID(PK)
CHANNEL_NAME New field, will be added by WL#1697.
SERVICE_STATE slave_SQL_Running
LAST_ERROR_NUMBER Last_SQL_Errno
LAST_ERROR_MESSAGE Last_SQL_Error
LAST_ERROR_TIMESTAMP Last_SQL_Error_Timestamp
=================================================================
2.6
=================================================================
performance_schema.replication_execute_status_by_worker
-----------------------------------------------------------------
NEW NAME OLD NAME
-----------------------------------------------------------------
WORKER_ID(PK)
THREAD_ID
CHANNEL_NAME New field, will be added by WL#1697
SERVICE_STATE
LAST_SEEN_TRANSACTION
LAST_ERROR_NUMBER Last_SQL_Errno
LAST_ERROR_MESSAGE Last_SQL_Error
LAST_ERROR_TIMESTAMP Last_SQL_Error_Timestamp
=================================================================
3. DATA TYPES USED TO STORE THE FIELDS IN P_S TABLES:
=================================================
The tables below show the data types/size used to define the DDL of the P_S
tables added as a part of this worklog. After STOP SLAVE, the thread_id field
will show NULL, the statistics for that thread will remain the same as at
that point when STOP SLAVE was executed. These stats are reset on next START
SLAVE.
3.1
=================================================================
performance_schema.replication_connection_configuration
-----------------------------------------------------------------
FIELD NAME DATA TYPE
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697
HOST char(60)
PORT int
USER char(16)
NETWORK_INTERFACE char(60)
AUTO_POSITION tinyint(1)
SSL_ALLOWED enum('Yes', 'No', 'Ignored')
SSL_CA_FILE varchar (FN_REFLEN)
SSL_CA_PATH varchar (FN_REFLEN)
SSL_CERTIFICATE varchar (FN_REFLEN)
SSL_CIPHER varchar (FN_REFLEN)
SSL_KEY varchar (FN_REFLEN)
SSL_VERIFY_SERVER_CERTIFICATE enum('Yes', 'No')
SSL_CRL_FILE varchar (FN_REFLEN)
SSL_CRL_PATH varchar (FN_REFLEN)
CONNECTION_RETRY_INTERVAL uint
CONNECTION_RETRY_COUNT ulong
=================================================================
3.2
=================================================================
performance_schema.replication_connection_status
-----------------------------------------------------------------
FIELD NAME DATA TYPE
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697
SOURCE_UUID char(36)
THREAD_ID bigint unsigned
SERVICE_STATE enum('On', 'Off', 'Conecting')
RECEIVED_TRANSACTION_SET text
LAST_ERROR_NUMBER uint
LAST_ERROR_MESSAGE varchar (MAX_SLAVE_ERRMSG)
LAST_ERROR_TIMESTAMP timestamp
=================================================================
3.3
=================================================================
performance_schema.replication_execute_configuration
-----------------------------------------------------------------
FIELD NAME DATA TYPE
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697
DESIRED_DELAY uint32
=================================================================
3.4
=================================================================
performance_schema.replication_execute_status
-----------------------------------------------------------------
FIELD NAME DATA TYPE
-----------------------------------------------------------------
CHANNEL_NAME(PK) New field, will be added by WL#1697
SERVICE_STATE enum('On', 'Off')
REMAINING_DELAY integer unsigned
=================================================================
3.5
=================================================================
performance_schema.replication_execute_status_by_coordinator
-----------------------------------------------------------------
FIELD NAME DATA TYPE
-----------------------------------------------------------------
THREAD_ID(PK) bigint unsigned
CHANNEL_NAME New field, will be added by WL#1697
SERVICE_STATE enum('On', 'Off')
LAST_ERROR_NUMBER uint,
LAST_ERROR_MESSAGE varchar (MAX_SLAVE_ERRMSG)
LAST_ERROR_TIMESTAMP timestamp
=================================================================
3.6
=================================================================
performance_schema.replication_execute_status_by_worker
-----------------------------------------------------------------
FIELD NAME DATA TYPE
-----------------------------------------------------------------
WORKER_ID(PK) bigint
THREAD_ID bigint unsigned
CHANNEL_NAME New field, will be added by WL#1697
SERVICE_STATE enum('On','Off')
LAST_SEEN_TRANSACTION char(57)
LAST_ERROR_NUMBER uint
LAST_ERROR_MESSAGE varchar (MAX_SLAVE_ERRMSG)
LAST_ERROR_TIMESTAMP timestamp
=================================================================
4. REMOVED FIELDS:
==============
SHOW SLAVE STATUS has a number of fields which, the replication team
thinks, will be removed in future. Below is a list of fields that were a
part of the output of SHOW SLAVE STATUS but will not be found in the
P_S tables.
4.1 The following fields use filename and offset. We should rely only on
GTIDs instead.
Master_Log_File
Read_Master_Log_Pos
Relay_Log_File
Relay_Log_Pos
RELAY_MASTER_LOG_FILE
EXEC_MASTER_POS_LOG
UNTIL_CONDITION
Until_Log_File
Until_Log_Pos
4.2 The following fields use server_id. We should rely on server_uuid
instead.
MASTER_SERVER_ID
Replicate_Ignore_Server_Ids
4.3 "Skip_Counter" uses the event count. We should not show 'events' to
user, we should show transactions if anything. This option does not
work with GTIDs already.
4.4 SECONDS_BEHIND_MASTER is is buggy.
4.5 MASTER_INFO_FILE refers to master.info file, which is not very useful
now that we have crash-safe slave tables. The replication team is
planning to deprecate this in future. Unless deprecated, the field, if
needed, can be obtained from SHOW SLAVE STATUS. We ensure that we will
not remove SHOW SLAVE STATUS unless we remove master.info file.
4.6 The following fields didn't appear to be very useful in the first place,
they contain the last error by the IO thread *or* the SQL thread, so if
there is an error in both threads one of them is overwritten. Instead
we have errors per thread.
LAST_ERROR
LAST_ERRNO
4.7 Probably, we can re-implement these legacy filters
as Python filters later. The idea is to come up with better filtering
logic.
Replicate_Do_DB
Replicate_Ignore_DB
Replicate_Do_Table
Replicate_Ignore_Table
Replicate_Wild_Do_Table
Replicate_Wild_Ignore_Table
4.8 In future we could have multiple receiver threads. So, "Slave_IO_State".
doesnt really fit in. We may add it later if required. For now, the user
can get that from processlist anyway.
4.9 "Relay_Log_Space" will appear together with other fields like
BYTES_APPLIED, TRANSACTIONS_QUEUED, TRANSACTIONS_APPLIED etc in a table
for replication stats.
4.10 "Executed_Gtid_Set" can show a huge set with a lot of text. Instead
we have decided to show GTIDs of transactions that are currently being
executed by the appliers. The set of executed GTIDs can alternatively be
obtained by reading @@gtid_executed.
5. ADDED FIELDS:
============
5.1 Prior to this WL, errors were not shown per worker. Coordinator thread
was responsible for showing the error in a worker thread. The PS table
in section 2.6 titled "replication_execute_status_by_worker" contains
one row per worker. Added Last_Error_Number/Message/Timestamp fields
to report errors per worker.
5.2 LAST_SEEN_TRANSACTION: Added this field to The PS table in section 2.6
titled "replication_execute_status_by_worker". Below are a few scenarios
and the values seen in this column in corresponding cases:
5.2.1 gtid-mode=off: ANONYMOUS (similar to gtid-next) indicating that
transactions do not have global identifiers, and are
identified by file and position only.
5.2.2 gtid-mode=on:
5.2.2.1 No transaction executed yet: ""(empty)
5.2.2.2 When a transaction is executed, this field is set as soon
as gtid-next is set. So, starting from this moment, this
field will always show a gtid.
5.2.2.3 The gtid shown in 5.2.2.2 is preserved until the next
transaction is executed. The value is also preserved in
case of an error in which case the gtid of the transaction
being executed at the moment error occurred is shown.
5.2.2.4 When the next gtid_log_event is picked up by this worker
thread this field is updated soon after gtid_next is set.
5.3 WORKER_ID is the unique identifier of a worker. On STOP SLAVE, the
thread_id field becomes NULL. The worker_id is preserved.
6. MODIFICATIONS IN SHOW SLAVE STATUS(SSS):
==================================
Since this work is about an interface for monitoring replication health and
performance, the modification in the original interface(i.e.. SSS) is kept
to a minimum. The only modified field in SSS was Last_SQL_Error.
6.1 Changes in Last_SQL_Error field in SSS:
--------------------------------------
6.1.1 There is NO change in SSS when the slave is in working with MTS=off
i.e.. SQL thread is the only applier.
6.1.2 When MTS=on, the Last_SQL_Error field in SSS shows exactly what
the Last_Error_Message field in the table
performance_schema.replication_execute_status_by_coordinator shows.
This field is modified to suggest that there may be more failures
in the other worker threads which can be seen in the table
performance_schema.replication_execute_status_by_worker. This table
shows the per worker error, if any. If this table is not available,
the slave error log can be used. The log or the table
performance_schema.replication_execute_status_by_worker should also
be used to know more about the failure shown in coordinator's table
or SSS.
7. LIFE CYCLE OF PS TABLES:
========================
7.1 Prior to CHANGE MASTER, the PS tables are empty.
7.2.1 After CHANGE_MASTER, the tables can be seen for the configuration
parameters. The Thread_Id columns at this point will show a NULL as
there are no active slave threads at this point of time. The
Service_State columns at this point will say "Off".
7.2.2 The table replication_execute_status_by_worker is an exception to the
rules mentioned in section 7.2.1. This table is supposed to be active
only when the slave is operating in multi-threaded mode. So, if the
variable "slave_parallel_workers" is not set, this table will show an
empty set. The number of rows in this table shows number of workers.
Further, this table is filled up for the first time at START SLAVE.
7.3 The thread_ids can be seen after START SLAVE. Service_State= "On" marks
an active thread.
7.4 The tables are preserved after STOP SLAVE or in case of threads dying
due to an error.
1. FILES ADDED:
===========
For each of the four P_S tables two files have been added.
1.1 replication_connection_configuration
-table_replication_connection_configuration.h
-table_replication_connection_configuration.cc
1.2 replication_connection_status
-table_replication_connection_status.h
-table_replication_connection_status.cc
1.3 replication_execute_configuration
-table_replication_execute_configuration.h
-table_replication_execute_configuration.cc
1.4 replication_execute_status
-table_replication_execute_status.h
-table_replication_execute_status.cc
1.5 replication_execute_status_by_coordinator
-table_replication_execute_status_by_coordinator.h
-table_replication_execute_status_by_coordinator.cc
1.6 replication_execute_status_by_worker
-table_replication_execute_status_by_worker.h
-table_replication_execute_status_by_worker.cc
2. CLASSES ADDED:
=============
Each ".h" file has a class declaration. The class declarations are
similar. So, the present documentation shows only one of them:
/** Table PERFORMANCE_SCHEMA.REPLICATION_CONNECTION_CONFIGURATION. */
class table_replication_connection_configuration: public
PFS_engine_table
{
private:
void fill_rows(Master_info *);
/** Table share lock. */
static THR_LOCK m_table_lock;
/** Fields definition. */
static TABLE_FIELD_DEF m_field_def;
/** Current row */
st_row_connect_config m_row;
/** True if the table is filled up */
bool m_filled;
/** Current position. */
PFS_simple_index m_pos;
/** Next position. */
PFS_simple_index m_next_pos;
protected:
/**
Read the current row values.
@param table Table handle
@param buf row buffer
@param fields Table fields
@param read_all true if all columns are read.
*/
virtual int read_row_values(TABLE *table,
unsigned char *buf,
Field **fields,
bool read_all);
table_replication_receive_status();
public:
~table_replication_receive_status();
/** Table share. */
static PFS_engine_table_share m_share;
static PFS_engine_table* create();
virtual int rnd_next();
virtual int rnd_pos(const void *pos);
virtual void reset_position(void);
};
3. Before this WL, the DYNAMIC_ARRAY workers inside the class
Relay_log_info was destruted at STOP SLAVE. This is now delayed until
START SLAVE. The array is used to extract values to be displayed in the
table if a SELECT query is fired on the worker table after STOP SLAVE.
Thus, the statistics in the table are updated at START SLAVE, except for
the service_state and thread_id columns.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.