WL#8599: Reduce contention in IO and SQL threads
Affects: Server-8.0 — Status: Complete
With 5.7 MTS the replication slave is able to execute more transactions than the master, even on write-only workloads, as long as the SQL thread is running without the IO thread. If the IO thread is active reading the relay log from the master the overhead can reach 60%, which eats up a lot of the benefit from a more efficient MTS scheduler. This seems to be caused by contention on the MYSQL_RELAY_LOG::LOCK_log. The main goal (G1) of this worklog is to reduce the contention between receiver (a.k.a. I/O) and applier (a.k.a. SQL) replication threads. Other References ---------------- BUG#77778: TUNING THE LOG_LOCK CONTENTION FOR IO_THREAD AND SQL_THREAD IN RPL_SLAVE.CC
Functional requirements ----------------------- F-1: Receiver (a.k.a. I/O) thread should not block read access to the relay log while queuing events. F-2: Applier (a.k.a. SQL) thread should not block write access to the relay log while reading events. Non-Functional requirements --------------------------- NF-1: The performance of the replication threads should not be worse than before this worklog implementation. NF-2: The overall replication performance (particularly the time to catch up) should be improved, allowing the slave to have greater TPS than the master when using MTS.
Interface Specification ----------------------- I-1: No new files I-2: No changes in existing syntax I-3: No new commands I-4: No new tools I-5: No impact on existing functionality Background ---------- - The relay log has multiple writers: - zero, one, or more clients may issue FLUSH RELAY LOG; - one applier (a.k.a. SQL) thread (purging old logs); - one receiver (a.k.a. I/O) thread; - The relay log has multiple readers: - zero, one, or more applier threads; - zero, one, or more clients may issue SHOW RELAYLOG EVENTS; - Master_info::description_event has one writer: - one receiver thread; - Master_info::description_event has multiple readers: - zero, one, or more clients may issue FLUSH RELAY LOGS; - one receiver thread reads it in many places; - Other Master_info member fields has multiple writers: - zero, one, or more clients may issue FLUSH RELAY LOGS; - one receiver thread change them in two places: - handle_slave_io(), when connecting to the master or stopping; - queue_event(), when queuing new events to the relay log; - Other Master_info member fields has multiple readers: - zero, one, or more applier threads; - zero, one, or more clients may issue SHOW SLAVE STATUS; - zero, one, or more clients may query replication P_S tables; - Currently: - All relay log operations are serialized by LOCK_log; - All access to Master_info member fields (including description_event) are serialized by Master_info::data_lock; - Locking order is: 1. data_lock, 2. LOCK_log; High Level Specification ------------------------ P1) Contention between receiver and applier threads in the "hot log" A slave server having the applier thread applying events from the same relay log file where the receiver thread is appending the events received from the master is having contentions that are affecting both threads performance in comparison with running them alone. The main issue according to P_S collected data is a concurrency in relay_log->LOCK_log (P1a). The next_event() function is acquiring the relay log LOCK_log when the applier is reading from the "hot_log" to prevent other threads to rotate the relay log (and close the current relay log file) while the applier is the middle of the "read_event" operation. As the receiver thread acquires the relay log LOCK_log to write content to the relay log, the use of this lock by the next_event() function is causing the contention when the slave is very updated (the applier is reading the events from the tail of the relay log, and the receiver is writing events to the same relay log file). A second (P1b) issue found was a contention related to the GTID implementation on "wait/synch/rwlock/sql/gtid_commit_rollback". On each Gtid_log_event or Anonymous_gtid_log_event received by the I/O thread, it is acquiring the global SID lock to check if the current server GTID_MODE is correct for the event received. For the Gtid_log_event case, it is taking the global SID lock opportunistically, because it will need to add the received GTID to the retrieved GTID set that is based on the global SID map). For the Anonymous_gtid_log_event, it is taking also the global SID lock, because is it calling the get_gtid_mode() function specifying no lock. The applier thread will also verify the GTID_MODE when applying those events. A third issue (P1c), affecting only the receiver thread, is that it is taking the mi->data_lock and the rli->LOCK_log twice per event received: once when queuing the event (writing it to the relay log), and once when flushing master info after successfully queuing an event. As the mi->data_lock is taken also by the monitoring threads (either using SHOW SLAVE STATUS or replication_connection_status P_S table), monitoring the receiver thread status can also generate performance impact on the receiver thread. Rationale and proposed implementation steps ------------------------------------------- The first step (S1) this worklog will implement is to use the same approach implemented by the Binlog_sender (WL#5721) to dump binary log con contents to slave servers in MySQL 5.7. Instead of using a shared IO_CACHE, the Binlog_sender has it own IO_CACHE used only to read events from the binary log. This step should solve P1a. The second step (S2) to be implemented is to make the receiver thread retrieved GTID sets to have their own SID map/lock. Most of the time there are no relation between the retrieved GTID sets and the server GTID state (only when connecting the receiver thread to a master server both are used by GTID auto positioning protocol). Also, this step should implement a way of checking the current server GTID_MODE without the need of relying on a given lock always. This step should solve P1b. The third step (S3) to be implemented is to make the receiver thread to flush master info inside queue_event function, while holding the mi->data_lock and relay_log->LOCK_log. This should solve P1c. Rationale behind step 2 ----------------------- Thinking about the impact on replication threads when GTID_MODE=ON, we have: Each receiver thread needs to do three GTID related operations per GTID transaction: R1) Check if the GTID_MODE allows the GTID type (rdlock); R2) Store the queued GTID to be used later (rdlock, but it may be upgraded to wrlock and downgraded again for new UUIDs); R3) Add the queued GTID to the channel RETRIEVED_GTID_SET (rdlock); Notice that R1 and R2 are done while holding the same lock. Each applier (or worker) thread needs to do three GTID related operations per GTID transaction: W1) Check if the GTID_MODE allows the GTID type (rdlock); W2) Acquire the ownership of the GTID (rdlock, but it may be upgraded to wrlock and downgraded again for new UUIDs); W3) Add the GTID to GTID_EXECUTED when committed (rdlock); Notice that W1 and W2 are done while holding the same lock. Notice that W3 might be done once for many threads committing in a group. Each binlog sender thread have to do one GTID related operation per GTID event sent to a receiver: S1) Check if the GTID_MODE allows the GTID type (rdlock); Any query into performance_schema.replication_connection_status table will: M1) Dump the RETRIEVED_GTID_SET for receiver threads (wrlock); Any monitoring thread executing SHOW SLAVE STATUS will: M2) Dump the RETRIEVED_GTID_SET for receiver threads (wrlock); M3) Dump GTID_EXECUTED (wrlock); Notice that M2 and M3 are done while holding the same lock. The proposal will make R1, R2, R3, M1 and M2 to rely on channel specific lock instead of relying on the global SID lock. Currently: Any channel doing R1/R2/R3 is blocking: B1.1) any thread doing M1 on same channel; B1.2) any thread doing M1 on other channels; B1.3) any thread doing M2/M3; Any channel worker doing W1/W2/W3 is blocking: B2.1) any thread doing M1 on any channel; B2.2) any thread doing M2/M3; Any monitoring statement doing M1 is blocking: B3.1) same channel from doing R1/R2/R2 B3.2) any other channel from doing R1/R2/R2 B3.3) any worker from doing W1/W2/W3; B3.4) any binlog send from doing S1; B3.5) any other thread doing M1 on same channel; B3.6) any other thread doing M1 on other channels; B3.7) any thread doing M2/M3; Any monitoring statement doing M2/M3 is blocking: B4.1) any channel from doing R1/R2/R2 B4.2) any worker from doing W1/W2/W3; B4.3) any binlog send from doing S1; B4.4) any thread doing M1; B4.5) any other thread doing M2/M3; The design proposed is making B1.2, B2.1, B3.2, B3.3, B3.4 and B3.6 possible.
Each proposed step on the HLS is kind of independent (or incremental) relative to the other steps. Because of this we intend to split the code review of this worklog into 5 reviews (one per proposed step). Step 1: Use a separate IO_CACHE on SQL thread always ---------------------------------------------------- We already coded the infrastructure to the binlog_end_pos handling, so there is no need of a new variable to determine the relay log "limit" to readers. Most of the changes will be on the next_event() function (sql/rpl_slave.cc) and on Relay_log_info::init_relay_log_pos(). Step 2: Make retrieved GTID sets to use their own SID map/lock -------------------------------------------------------------- We will create a new PSI_rwlock_key key_rwlock_receiver_sid_lock to be used by the SID maps of the receiver threads. Replace the global_sid_lock by the retrieved_gtid_set sid_lock on every place that is dealing with the retrieved GTID set: - SHOW SLAVE STATUS; - replication_connection_status P_S table; - MYSQL_BIN_LOG::init_gtid_sets(); - MYSQL_BIN_LOG::open_binlog(); - MYSQL_BIN_LOG::reset_logs(); - MYSQL_BIN_LOG::after_append_to_relay_log(); - Previous_gtids_log_event::Previous_gtids_log_event(); - Relay_log_info::add_logged_gtid(); - Relay_log_info::purge_relay_logs(); - recover_relay_log(); - request_dump(); - queue_event(); - Group Replication/Service Channel Interface; In order to avoid acquiring the global SID lock to check GTID_MODE, as this variable doesn't changed often, we will introduce a global (atomic) counter of how many times the GTID_MODE was changed since the server startup. We will also introduce the Gtid_mode_copy class to hold a copy of the last GTID_MODE to be returned without the need of acquiring locks if the local GTID mode counter has the same value as the global atomic counter. Step 3: Optimize mi->data_lock/relay_log->LOCK_log acquires per event --------------------------------------------------------------------- This should only affect the queue_event() and handle_slave_io() threads. The main idea is to add a new parameter to flush_master_info() in order to tell it to not acquire locks (asserting that the locks are already taken). Then, move the flush_master_info() call at handle_slave_io() to queue_event(), to a place were the locks are already acquired. Finally, we need to make queue_event to return two possible errors: error while queuing, and error while flushing info. The return code will be handled by the caller to throw the error according with the failure.
Copyright (c) 2000, 2023, Oracle Corporation and/or its affiliates. All rights reserved.