WL#5754: Query event parallel execution
Affects: Server-Prototype Only — Status: Complete — Priority: Medium
MOTIVATION ---------- This worklog keeps track of the changes that are needed to make parallel INTER-transaction applier at the slave when master is configured to log in STATEMENT format. In a nutshell, on the master side, it details some of the changes needed to log every database on involved in the execution of a statement. On the slave side, it addresses some work that is needed for supporting temporary tables when slave is operating in parallel mode. OBJECTIVE --------- This is a sibling of the base framework of WL#5569 implementing full and effective parallelization for Query events. FEATURES -------- Transparency to the engine type; Query content is without any limit in particular can contain User variables and/or modify a temporary table; Parallelization is by the actual db name; Cross-database updates are supported as well; Automatic back-and-forth switching from PARALLEL to SEQUENTIAL in case of a partitioning conflict such as a being distributed to execute query updates a database that is in use by another Worker thread.
================================================================================ High-Level Specification ================================================================================ 0. Parallelization for Query to follow the same semantics, rules - recovery included - and controlled by the user interface by WL#5569; 1. Query can include arbitrary features: call routines, select from or update a temporary table, refer to user variables etc; 2. Transparent detection of partitioning conflicts and fall back to the sequential execution in such cases; 3. To prove scaling-up performance close to the linear in certain use cases. An outstanding difference of the Query event from the Rows event bundle is that the Query event can update multiple databases and does not have explicit names anywhere in the event header. Lack of this info is being tackled by introducing and exploiting a new Status variable (Updated-db:s) of the Query-log-event class. To-be-updated database names, including ones from involved stored routines, are all known at the tables locking time and are gathered in a list that is eventually copied into Updated-db:s new status variable at logging into the binary log. While building-up support for the User, Random and Int- vars is rather straightforward task requiring only to hold on with assigning the events until their parent query has shown up, the temporary tables case is a special one. For the temp tables Coordinator and Workers continue to use the current placeholder of all replicated slave tables but pieces of code that modify the placeholder are converted into critical sections guarded by a new mutex.
I. Database names propagation and handling on the slave side Coordinator does not sql-parse a Query-log-event but rather relies on the info prepared on the master side. Gathering the list of updated databases spawns into few cases: 1. DML THD::decide_logging_format(tables) 2. CREATE [temporary] table mysql_create_table() 3. DROP table(s) mysql_rm_table_no_locks() 4. CREATE database mysql_create_db() 5. DROP database mysql_rm_db() 6. CREATE sf, sp sp_create_routine() 7. DROP sf, sp sp_drop_routine() 8. CREATE|DROP trigger mysql_create_or_drop_trigger() 9. ALTER, CREATE|DROP index mysql_alter_table() 10. CREATE, DROP event 11. RENAME table(s) 12. ALTER|CREATE view, DROP VIEW(s) 13. GRANT, REVOKE (notice these two can affect multiple databases). Notice that the rest of DDL queries including ALTER DATABASE, ALTER PROCEDURE, ALTER EVENT or dealing with index, tablespaces or the query logs do not change any database. In the 1st DML case all write-level locked tables' databases are of interest. In the rest of cases the database names are found in an individual context. The list of the updated databases is stored into a new thd's member to be consulted at Query_log_event::Query_log_event() constructor to produce the value of Updated-db:s stored into the Query-log-event's header along with other status. Because the status area requires to be of a limited size, the Updated-db is subject to a maximum number of database that it can hold. In case the number of databases exceeds it the list is not stored and the event is applied sequentially on the slave. The runtime maximum is defined as MAX_DBS_IN_QUERY_MTS. It can be modified via cmake-configure and is subject to a constraint: MAX_DBS_IN_QUERY_MTS < OVER_MAX_DBS_IN_QUERY_MTS == 254. The latter constant is not supposed for changes. Its value is set into the status var db number part in the mentioned case of the actual number is over MAX_DBS_IN_QUERY_MTS. Notice, the slave compares *its* MAX_DBS_IN_QUERY_MTS against the number in the event and if the latter is greater the event is handled sequentially. The sequential applying is doomed to the Query events from the Old Master or events that don't have the new status variable set at all. At instantiation on the slave the event contains the list of databases and the core of WL#5569 distribution algorithm is applied. Log_event::mts_get_dbs() new methods returns the list of databases to a caller inside of Slave_worker* Log_event::get_slave_worker_id() to be looped though to invoke get_slave_worker() per each db. In case a db is occupied the function hangs until the db got released. The latter is nohow different from waiting a db of a new query of a being scheduled transaction. Notice, a conservative estimate for the total size of a replication event through @@max_allow_packet and MAX_LOG_EVENT_HEADER becomes lesser precise because MAX_LOG_EVENT_HEADER is incremented by maximum of the new status variable. The increment amount is proportional to the maximum number of databases that the status var can hold and exceeds 1K (de-facto the unit of measure of @@max_allow_packet) in the default case of MAX_DBS_IN_QUERY_MTS == 16. II. Temporary tables On the slave side Query-log-event is regarded as a parent of possible satellite events that can prepare the parent in the binlog. Logics of scheduling such mini-group of events is similar to one for the Table-map + Rows-log-event case implemented in WL#5569. The temporary table case in a Query-log event is handled as the following. Coordinator's sql thread thd->temporary_tables remains to be the placeholder of all replicated temporary tables. Checking out in the list, inserting into and removal from the list is done by any of Workers and also by Coordinator thanks to deploying a mutex that is acquired exclusively by slave threads. The ordinary connection threads don't grab the mutex. The following cases represent all access possibilities for the slave threads: Insert: open_table_uncached() initial population Look-up: open_table() - look-up for a temporary table Release: mark_used_tables_as_free_for_reuse() DROP: find_temporary_table() Removal: close_temporary_tables() any access to thd->temporary_tables should 1. make sure `thd' is the Coordinator's thd 2. surround an access codes with lock/unlock of rli->mts_temp_tables_lock
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.