WL#7742: Slave performance: Waiting for more transactions to enter Binlog Group Commit (BGC) queues
Affects: Server-5.7
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog adds two new options to introduce an artificial delay to make the binary log group commit procedure wait. This gives a chance that more transactions are flushed and synced together to disk, thus reducing the overall time spent to commit a group of transactions (the bigger the groups the less number of sync operations). With the correct tuning, this can make the slave perform several times faster without compromising master's throughput. These options are named: binlog-group-commit-sync-delay and binlog-group-commit-sync-no-delay-count. The former takes as input the number of microseconds to wait. The latter, takes as input the number of transactions that the server waits for, before deciding to abort the waiting. REFERENCES ========== - PB2 entry: http://pb2.no.oracle.com/web.py?template=show_pushes&branch=mysql-trunk-wl7742 - RB entry: http://rb.no.oracle.com/rb/r/4965/
Functional requirements: F1: The new options SHALL be dynamically changeable. F2: The new options SHALL only have GLOBAL scope. It does not make sense to have SESSION scope. F3: Zero SHALL be an allowed value of the new option. This means that the feature is deactivated. F4: binlog-group-commit-sync-delay does not interact direclty with the existing binlog-max-flush-queue-time option. F5: binlog-group-commit-sync-no-delay-count does not interact direclty with the existing binlog-max-flush-queue-time option. F6: If binlog-group-commit-sync-no-delay-count is zero, the server SHALL wait the entire binlog-group-commit-sync-delay to elapse. F7: If binlog-group-commit-sync-delay is zero, the server SHALL ignore binlog-group-commit-sync-no-delay-count.
1. INTRODUCTION =============== This worklog presents two new options that allow the user to introduce a delay in the binlog group commit pipeline, just before the server starts processing the flush or sync queue. Roughly, the pipeline of the binlog group commit (BGC) is divided into three stages: FLUSH, SYNC and COMMIT. Each stage has a leader thread that handles the procedure on behalf of other threads that are parked. Once all stages are done, the leader thread notifies the parked threads that they have either been committed successfully or ended up in errors. In the flush stage, the leader thread keeps flushing incoming threads' caches until the queue of that stage becomes empty or the binlog-max-flush-queue-time elapses. In the SYNC stage the leader thread syncs the binary log file to disk. In the COMMIT stage, the leader thread commits transactions to the engine and notifies parked transactions that they should awake and resume their after commit operations/cleanup. The idea behind this worklog is to introduce a delay on the flush or sync stage, before the leader thread actually starts processing the flush queue or syncs the binary log file. The exact place where to put the wait needs to be determined by running some tests. With an additional wait by the leader thread, there is a chance that more sessions gather on the flush queue and thus are eligible to be flushed, synced and committed as part of the same/bigger group commit. This should increase performance on the master in some cases, but also on the multi-threaded applier on the slave side. Why? Given that this also increases the chance that more transactions prepare at the same logical point in time, then more transactions shall be eligible to be scheduled in parallel at the slave. 2. NEW OPTIONS ============== A new CLI option and System Variable is introduced to configure the delay: NAME: binlog-group-commit-sync-delay SCOPE: GLOBAL DEFAULT: 0 RANGE: 0 - 1000000 (usec) DYNAMIC: Yes A new CLI option and System Variable is introduced to configure the maximum number of sessions to wait to pile up, before exiting the waiting procedure. NAME: binlog-group-commit-sync-no-delay-count SCOPE: GLOBAL DEFAULT: 0 RANGE: 0 - 100000 (max connections) DYNAMIC: Yes 3. BEHAVIOR OF binlog-group-commit-sync-delay ============================================= This variable controls whether the leader thread for the flush or sync stage of the binlog group commit shall wait or not before actually process the stage queue or sync the binary log file. This gives a chance for more follower threads to reach the flush queue. It is as simple as introducing a sleep operation before the procedure to process the queue or before syncing the binary log file. 4. BEHAVIOR OF binlog-group-commit-sync-no-delay-count ====================================================== This variable sets the maximum number of sessions to wait for before aborting the current wait procedure as specified by binlog-group-commit-sync-delay. If binlog-group-commit-sync-delay is set to 0, then this option is a no-op. As such, setting this option means that the server exits the wait procedure if the number of sessions reaches the count given before the timeout elapses. 5. INTERACTION BETWEEN binlog-group-commit-sync-delay and binlog-max-flush-queue-time ==================================================== binlog-group-commit-sync-delay is not processed while the flush queue is being processed, therefore, it does not intersect with binlog-max-flush-queue-time. The behavior of this option remains intact. 6. INTERACTION BETWEEN binlog-group-commit-sync-no-delay-count and binlog-max-flush-queue-time ================================================================== binlog-group-commit-sync-no-delay-count is not processed while the flush queue is being processed, therefore, it does not intersect with binlog-max-flush-queue-time. The behavior of this option remains intact.
The following details the major low level changes needed to implement the major design blocks depicted in the previous sections. 1. Introduce a waiting mechanism on the FLUSH or SYNC stage for the leader thread to wait according to the value of binlog-group-commit-sync-delay. Similar to the following: Performance tests will dictate the best place where to put this wait. DECISION: Given that basic tests did not show difference between the two approaches, lets wait before SYNCing. If later, some difference is to be found, we can still move the wait point to before the flush procedure. --------------------------------------------------------------------------- thd->thread_id, thd->commit_error)); DBUG_RETURN(finish_commit(thd)); } + + /* Shall introduce a delay. */ + stage_manager.wait_count_or_timeout( + opt_binlog_group_commit_sync_no_delay_count, + opt_binlog_group_commit_sync_delay, + Stage_manager::SYNC_STAGE); + THD *final_queue= stage_manager.fetch_queue_for(Stage_manager::SYNC_STAGE); if (flush_error == 0 && total_bytes > 0) { and + time_t wait_count_or_timeout(ulong count, time_t usec, StageID stage) + { + time_t to_wait= usec; + time_t delta= static_cast(to_wait * 0.1); + while ((static_cast (m_queue[stage].get_size()) < count || count == 0) && to_wait > 0) + { + my_sleep(delta); + to_wait -= delta; + } + + return to_wait; + } --------------------------------------------------------------------------- 2. Introduce code for the new variables, similar to the following: +static Sys_var_ulong Sys_binlog_group_commit_sync_delay( + "binlog_group_commit_sync_delay", + "The number of microseconds the server waits for the " + "binary log group commit sync queue to fill before " + "continuing. Default: 0. Min: 0. Max: 1000000.", + GLOBAL_VAR(opt_binlog_group_commit_sync_delay), + CMD_LINE(REQUIRED_ARG), + VALID_RANGE(0, 1000000 /* max 1 sec */), DEFAULT(0), BLOCK_SIZE(1), + NO_MUTEX_GUARD, NOT_IN_BINLOG); + +static Sys_var_ulong Sys_binlog_group_commit_sync_no_delay_count( + "binlog_group_commit_sync_no_delay_count", + "If there are this many transactions in the commit sync " + "queue and the server is waiting for more transactions " + "to be enqueued (as set using --binlog-group-commit-sync-delay), " + "the commit procedure resumes.", + GLOBAL_VAR(opt_binlog_group_commit_sync_no_delay_count), + CMD_LINE(REQUIRED_ARG), + VALID_RANGE(0, 100000 /* max connections */), + DEFAULT(0), BLOCK_SIZE(1), + NO_MUTEX_GUARD, NOT_IN_BINLOG); +
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.