WL#6965: InnoDB: Truncate UNDO logs/tablespace
Status: Complete
Let me first clarify difference between TRUNCATE and SHRINK. TRUNCATE: - Truncating undo tablespace to make it equivaluent to what it was when it was created new. The physical size of undo tablespace file will be reset to its default initially configured size and all the undo logs within the tablespace will be removed SHRINKING: - Shrinking undo tablespace to reduce the file size. (New size of undo tablespace can be anything between least possible size and current size. The undo records up to the shrink size are maintained and are valid. This WL aims at truncating (and not shrinkng) of UNDO tablespace. UNDO logs and UNDO tablespaces are used interchangebly. ----------------- When transaction performs any DML activity it would result in generation of UNDO logs. Once transaction is committed then INSERT generated UNDO logs are cleared immediately but UPDATE/DELETE generated UNDO logs are kept to service MVCC. Purge tracks the limit and once it identifies that the said UNDO log entries are no more needed it removes it from the UNDO logs. Removal here means pages related to these entries can be marked as free and ready-for-reallocation. This activity doesn't result in reducing the size of UNDO logs/tablespace. On a heavily loaded system even though we re-use the UNDO log pages. In order to implement MVCC (and purge) new pages areadded to the UNDO logs at a higher rate than purge can truncate (remove). This can result in an increase in the undo log file size and can consume a significant size on disk. WL will try to address this problem by periodically truncating undo logs/tablespace. Truncate will be done online without affecting server operation While the undo tablespace is taken offline for truncate it would not be available for attaching to new transactions and so the total transaction handling capacity of the server will reduced till TRUNCATE is complete. The truncate of the UNDO tablespace is a millisecond operation and therefore no significant user impact expected. Note: Other UNDO tablespaces will be available to service transaction so it will not stall the server except we can see some slowness or delayed allocation. One of the tablespace will be selected for truncate and will be made inactive for that time period. During this time-period there should be another UNDO tablespace to service the new upcoming transactions. Requirement for UNDO tablespace truncate: - There should be at-least 2 UNDO tablespaces and innobase_undo_logs should be configured accordingly so that there are enough rsegs. - Server should be configured to use UNDO tablespace (if used with system tablespace it will never be truncated.)
- Functional Requirement: - Server should be started with at-least 2 UNDO tablespaces and innodb_undo_logs set to a value such that there are at-least 2 REDO rollback segments. (--innodb_undo_tablespace=2 --innodb_undo_logs=35) UNDO log count of 35 suggest following 1st rseg resides in system tablespace that is 0 2nd - 33rd rseg resides in system temporary tablespace (these are non-redo rsegs) 34th and 35th rsegs resides in undo tablespaces (normally wih space-ids 1, 2 .... etc) - Server should keep periodic watch on size of UNDO tablespace. If size of UNDO tablespace > threshold then Server should kick-start TRUNCATE. If tablespace is made inactive for TRUNCATE then number of transactions, server can support, will reduce. Based on the load the threshold should be configured taking into consideraion above criteria of reduced transaction handling. Note: threshold is configurable. Check HLS for more details. - On truncate, size of tablespace should be resorted to what it was while was created. (NOTE: Verification of this could be bit tricky if tablespace get re-used before user go to the needed directory to check the size.) - Till date, purge of rsegs use to happen once in 128 times of purge action invocation. With this WL we would introduce a variable innodb_purge_rseg_truncate_frequency that can help control how frequenty this action should take place. Permissible Vaue: 1 - 128. - Complete undo-truncate action is controlled using a flag name innodb_undo_log_truncate (on/off or 0/1). Default it is turned-off. Adminsitrator can trun it on based on workload. Only when it is on UNDO-Tablespace will be marked for truncate. - Non-Functional Requirement: - TRUNCATE shouldn't affect MVCC correctness.
Understanding tablespace-rseg-transaction relationship ====================================================== Let's first understand how these 3 components are inter-related. To start with, scope of the WL is limited only if undo logs resides in undo tablespace (not residing in system tablespace) and so discussion will only consider related scenario. - User can specify undo tablespaces/logs to create using --innodb_undo_tablespaces=n --innodb_undo_logs=m - If m > n (m >= 34) then from rsegs from (34..to.. m) are allocated tablespaces in round-robbin fashion. (Note: rseg id start with 0 indices) 0 -> rseg resides always in system tablespace 1..32 -> rseg resides in shared temporary tablespace (ibtmp1) 33 -> undo-tablespace-0 34 -> undo-tablespace-1 .... 33+n-1 -> undo-tablespace-n-1 (undo tablespace indexes from 0 so n-1 is last). 33+n -> undo-tablespace-0 33+n+1 -> undo-tablespace-1 .... m -> undo-tablespace-x That means, there is many-one mapping between undo logs and tablespaces. (There is one-one between undo logs and rseg so both terms can be used interchangeably) - Besides this, each undo logs/rseg has 1024 slots (0 .... 1023). That means 1024 t can share same rseg. txn rseg tablespace 1 2 0 0 3 4 1 1 5 6 2 0 7 8 9 4 0 10 11 Many -> One Many -> One This means to flag a tablespace as free for truncate all the trxs that are mapped to that tablespace should be complete, besides, ensuring that undo logs of these trxs are not needed by purge or MVCC. For INSERT, UNDO logs are immediately freed (on commit) For UPDATE (INCLUDING DELETE), UNDO logs are not freed on trx_commit (as they might be needed by MVCC) Instead, purge looks at the limit and try to free undo log pages once it has consumed the undo logs and limit (enforced using trx->id) signals it that these undo logs are not needed for MVCC. ------------------------------------------------------------------------------- Approach ======== 1. Detect if UNDO log needs truncate. This could be based on some size threshold. (TODO: Size threshold should be dynamically configurable.) 2. Check if UNDO logs can be truncated. There has to be some minimum number of undo logs in order to continue smooth operation of server. For example: if there is only 1 undo tablespace then we can't schedule it for truncate else it would halt complete server operation. 3. Once identified, mark it so that rsegs associated with it will not be allocated to new upcoming transactions. 4. As part of purge workload check if any UNDO tablespace is marked for truncate If found, then try to see if all the pages are free. (purge enforces this using the limit (trx->id) clause). All free pages means that UNDO log can be truncated now. If there is cached page then check if undo log record are needed based on limit as they are not purged by the purge thread. 5. Initiate truncate. Truncating the tablespace to the original size. 6. Once complete re-enable the rsegs for allocation. ------------------------------------------------------------------------------- Changes List: ============ New enviornment variable introduced: ------------------------------------ - innodb_max_undo_logs_size: Threshold after which UNDO tablespace will qualify for truncate. default: 1024 MB / 1 GB min: 10 MM max: ULONGLONG_MAX - innodb_purge_rseg_truncate_frequency: Rate at which undo log purge should be invoked as part of purge action. Value = n means invoke undo log purge on every nth iteration of purge invocation. default: 128 min: 1 max: 128 - innodb_undo_log_truncate: Enable/Disable undo-tablespace truncate. default: off (boolean variable. possible value on/off or 0/1). If ON only then undo tablespace will be marked for truncate. Note: If variable is truned-off while a undo-tablespace is already marked for truncate then truncate action for such undo-tablespace is not aborted. It continues but no new tablespace is selected unless user turn it on again. Both variables are dynamically configurable and have global scope. Identify UNDO tablespace to truncate: ------------------------------------- 1. Iterate over all the undo-tablespaces and check size of each tablespace. If size of tablespace is > threshold then mark the tablespace active for truncate. Ensure that there are other usuable undo-tablespace(s) and rsegs (besides marked one) so that operation of server doesn't halt. 2. Once tablespace is selected, then mark all the rseg residing in that tablespace as inactive for allocation. This will avoid allocation of these rsegs to new read-write transactions. Existing transaction will continue to use these rsegs. Purge action: ------------- 1. Purge thread based on innodb_purge_rseg_truncate_frequency will invoke undo truncate action. If UNDO tablespace is not marked for truncate then purge thread will free the undo log segment as it does now. If UNDO tablespace is marked for truncate then purge thread will avoid freeing of undo log segment instead just maintain the stat that this undo log segment can be freed. In short it does delay free as anyway tablespace is going to get truncated. |---------------| |--------------| | ONLINE | undo tblsp | OFFLINE | all rsegs | rseg |--> size --->| rseg not |--> in undo tblsp | available | > threshold |available | are free | for allocation| |for allocation| | |---------------| |--------------| | A A | | | (truncate fail) v | |------------------------------------| | (truncate success) | TRUNCATE | |-------------------------| Truncate undo tablespace which in | | turn trucates rsegs residing in it | | Reinitialize rsegs. | |----------------------------------- | Truncate action: ---------------- 1. Check if all the rsegs residing in the marked UNDO tablespace are free. Free here means: All the undo records residing in all the rsegs are no more needed for MVCC. Once this is ensured a truncate action can be initiated. 2. Stop all operation on UNDO tablespace. Ensure that pages from buffer pool are either flushed or released. 3. log-checkpoint. This is to ensure that REDO records related to UNDO tablepsace are no more accessible/valid once we truncate the undo tablespace. 4. DDL log to safeguard truncate action from crash. DDL log file name undo_<space_id>_trunc.log is created in srv-log-dir. 5. Truncate the file back to original size that exist when undo tablespace is created new. Re-initialize tablespace and rseg headers. 6. Remove DDL log marking action complete. 7. Re-enable all the rsegs making them avaiable for re-allocation. Fixup action during server startup: ----------------------------------- 1. During server startup all the available UNDO tablespaces are opened. 2. As part of the operation, check if there is corresponding UNDO tablespace DDL log file. If present, then initiate the truncate action as it indicates that truncate action was not successful. 3. Once done, normal server flow will open then new file and initialize rsegs accordingly. By default, all rsegs are in active mode. Finding Optimal value of params: ------------------------------- - If undo-truncate is going in parallel as quoted above we would surely see a drop in TPS. How much drop in TPS depends on lot of varying factors. Let me list them here. - Number of tablespaces. More the better - Number of undo logs. More the better - Threshold. More the better - innodb_purge_rseg_truncate_frequency. More the better - Machine configuration and IO sub-system. Truncate is fil-system operation. Depending on how fast the IO sub-system is. - Length of parallel running trxn. If any undo-tablespace hold an active data then it can't be truncated even if active data occupies few bytes in undo-tablespace. - Invocation/Scheduling of Purge thread. Truncate action is sub-action of purge thread and purge thread is invoked only when it has significant work to do. - We recently did a study to understand this effect and here are results from aupra01/06. ====================================================================== Machine Used: SURPA-01 -------- Loaded data using: ./sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=1000000 --mysql-socket=/export-c/kbauskar/mysql.sock prepare --mysql-user=root workload: ./sysbench --test=oltp --init-rng=on --oltp-dist-type=uniform --mysql-socket=/export-c/kbauskar/mysql.sock --mysql-user=root --max-requests=0 --mysql-table-engine=innodb --max-time=60 --oltp-table-size=1000000 --num-threads=xxxxx --oltp-skip-trx=off run Strategy: 5 runs for each thread. Ignore first 2 runs and then take average of last 3 runs. -------- Configuration used: Baseline: --innodb_undo_tablespaces=2 --innodb_undo_logs=35 WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2 --innodb_undo_logs=35 --innodb_max_undo_log_size=10M --innodb_purge_rseg_truncate_frequency=1 Results: Baseline (mysql-trunk) WL#6965 regression threads 1 297 292 2 520 534 4 1033 1049 8 1880 1888 16 3454 3452 32 6177 6153 64 7072 6647 6% 128 6974 5924 15% - No regression for thread 1-32 as there is no undo-truncate taking place. - Regression starts from 64 threads as undo-log size goes beyond 10M threshold. - For 128 threads almost at any given time one of the undo tablespaces (out of configured 2) is being truncated. So an expected regression of 15%. -------- Configuration used: Baseline: --innodb_undo_tablespaces=2 --innodb_undo_logs=35 WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2 --innodb_undo_logs=35 --innodb_max_undo_log_size=100M Results: threads baseline WL regression 128 6974 6192 11% - Regression went down by 4% moment we reduced the frequency of undo-truncate. (by increasing threshold and keeping the rseg-purge-frequency to default of 128). -------- Configuration used: Baseline: --innodb_undo_tablespaces=8 --innodb_undo_logs=128 WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=8 --innodb_undo_logs=128 --innodb_max_undo_log_size=10M --innodb_purge_rseg_truncate_frequency=1 Results: threads baseline WL regression 128 7095 6888 2.91% - Regression down to 2-3% with 8 undo tablespace and 128 undo logs. Note: threshold is still 10M so frequency of undo-truncate is still higher. -------- Configuration used: Baseline: --innodb_undo_tablespaces=8 --innodb_undo_logs=128 WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=8 --innodb_undo_logs=128 --innodb_max_undo_log_size=100M Results: threads baseline WL regression 128 7095 7027 0.009% - No regression if we increase the threshold from 10M -> 100M. Default value of threshold if 1G given the frequency of undo-truncate that user might need. -------- Configuration used: Baseline: --innodb_undo_tablespaces=2 --innodb_undo_logs=128 WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2 --innodb_undo_logs=128 --innodb_max_undo_log_size=100M Results: threads baseline WL regression 128 7155 6882 3.8% -------- Machine Used: SURPA-06 Baseline: --innodb_undo_tablespaces=2 --innodb_undo_logs=35 WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2 --innodb_undo_logs=35 --innodb_max_undo_log_size=10M --innodb_purge_rseg_truncate_frequency=1 Baseline WL thrds 1 314 344 2 580 589 4 1286 1213 8 2507 2379 16 4725 4607 32 8080 8046 64 7076 6331 10% 128 5625 2105 62% --------------- Baseline: --innodb_undo_tablespaces=8 --innodb_undo_logs=128 WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=8 --innodb_undo_logs=128 --innodb_max_undo_log_size=10M --innodb_purge_rseg_truncate_frequency=1 thrds Baseline WL %regression 128 6919 6831 1.28% ---------------- Effect of using innodb_purge_rseg_truncate_frequency 1 or 128 WL innodb_purge_rseg_truncate_frequency = 1 .... TPS: 7075 innodb_purge_rseg_truncate_frequency = 128 (default and constant on Baseline) .... TPS: 7178 So if we reduce the frequency of freeing undo-pages then we see increase in tps (with all other setting remaining unchanged). With value 1 of 1 rollback segments are pages are freed on every invocation of purge. Check the significance of this variable for more details. ------------------ Effect of undo-tablespace Baseline: --innodb_undo_tablespaces=128 --innodb_undo_logs=128 WL: innodb_undo_log_truncate=1 --innodb_undo_tablespaces=128 --innodb_undo_logs=128 --innodb_max_undo_log_size=10M --innodb_purge_rseg_truncate_frequency=1 Note: 128 undo-tablespace Baseline (tps): 6918 WL (tps): 6984 ====================================================================== Let's take a case to understand the issue above. 1. When we ran workload with 2 undo-tablespace and 35 undo-logs (which means 2 redo logs as 32 are reserved for non-redo) performance gap was higher. 2. When we ran workload with 8 undo-tablespace and 128 undo-logs (which emans 95 redo logs) performance gap reduced. Why ? a. Amount of generated undo information continue to remain same in both the cases mentioned above. With threshold of 10M it tend to fillout 2 undo-tablespaces quickly and that means at any given time one of the tablespace has size > 10M causing truncate to be active. b. If 1 undo tablespace (out of 2) goes down it is like lossing 50% of your resources vs if 1 undo-tablespace goes down (out of 8) it is like loosing 12.5% of your resources. c. We also observed that if threshold is increased from 10 -> 100M then frequency of truncate reduces and so gap difference is reduced significantly. d. innodb_purge_rseg_truncate_frequency plays a key role in deciding when to clearing up pages in undo-tablespace. For baseline this value is constant to 128. If set to 1 in WL it means for every instance of purge invocation it will try to free rollback segment. (Note: Freeing of rollback segment is different from truncate of rollback segment. Free will make those pages re-usable.) ======================================== Based on our existing testing we observed following optimal value. (note: optimal value defintion may change based on other factor and machine.) - undo-tablespace >= 8 (more the better) - undo-logs = 128 - innodb_purge_rseg_truncate_frequency = default - innodb_max_undo_log_size >= 1G (more the better)
Let's understand the low level changes by describing the change list depicted above. 1. Handler to track truncate of undo tablespace: ------------------------------------------------ /** Track UNDO tablespace mark for truncate. */ class undo_trunc_t { ...... public: undo_trunc_logger_t undo_logger; private: /** UNDO tablespace is mark for truncate. */ ulint m_undo_for_trunc; /** rseg that resides in UNDO tablespace marked for truncate. */ rseg_for_trunc_t m_rseg_for_trunc; /** Start scanning for UNDO tablespace from this space_id. This is to avoid bias selection of one tablespace always. */ ulint m_scan_start; }; Reference to this class will be stored in purge handler and it would be used for tracking the tablespace that is marked for truncate along with related rsegs. 2. Handler to track ddl log needed to safeguard truncate action from crash: --------------------------------------------------------------------------- /** UNDO log truncate logger. Needed to track state of truncate during crash. A DDL_LOG kind of file will be created that will be removed on successful truncate but if server crashes before successful truncate this file will be used to initiate fix-up action during server start. */ class undo_trunc_logger_t { ...... public: /** Magic Number to indicate truncate action is complete. */ const static ib_uint32_t s_magic; /** Truncate Log file Prefix. */ const static char* s_log_prefix; /** Truncate Log file Extension. */ const static char* s_log_ext; private: char* m_log_file_name; }; 3. Selecting UNOD tablespace for truncate: ------------------------------------------ /** Iterate over all the UNDO tablespaces and check if any of the UNDO tablespace qualifies for TRUNCATE (size > threshold). @param[in,out] undo_trunc undo truncate tracker */ static void trx_purge_mark_undo_for_truncate( undo_trunc_t* undo_trunc); Important validation checks done by the API /* Validation/Qualification checks a. At-least 2 UNDO tablespaces so even if one UNDO tablespace is being truncated server can continue to operate. b. At-least 2 UNDO redo rseg/undo logs (besides the default rseg-0) b. At-least 1 UNDO tablespace size > threshold. */ Besides this API will mark the rsegs residing the selected tablespace as inactive. 4. Making rseg inactive: ------------------------ If UNDO tablespace is being truncated then rsegs residing in that tablespace needs to be marked too so that we can avoid allocation of them to new booting transaction. skip_allocation helps in tracking this. pages_marked_freed is use to cache the count of undo segment that can be freed but we delay the free action as tablespace is being truncated anyway. struct trx_rseg_t { ..... /** if true then skip allocating this rseg as the tablespace it resides in is marked for truncate. */ bool skip_allocation; /** track the segment that can be freed. If UNDO tablespace where the segment resides is marked for truncate then we delay free operation and instead let the truncate do the actual free. */ ulint pages_marked_freed; }; 5. Actual truncate action: -------------------------- /** Iterate over selected UNDO tablespace and check if all the rsegs that resides in the tablespace are free. @param[in] limit truncate_limit @param[in,out] undo_trunc undo truncate tracker */ static void trx_purge_initiate_truncate( purge_iter_t* limit, undo_trunc_t* undo_trunc); - API will check if all the rsegs are free based on their size. - rseg->curr_size count is adjusted to consider pages_marked_freed. ulint size_of_rsegs = rseg->curr_size - rseg->pages_marked_freed; - Also, if there are cached undo segment then ensure that records within these segment qualifies for purge. This can done by comparing their trx_no with purge view limit. for (undo = UT_LIST_GET_FIRST(rseg->update_undo_cached); undo != NULL && all_free; undo = UT_LIST_GET_NEXT(undo_list, undo)) { if (limit->trx_no < undo->trx_id) { // segment might be needed for MVCC. } } Same for other cached segment. - Write DDL log using undo_logger. - Truncate the tablespace. This would include physical file truncate and adjusting the space handler (fil_space_t) for the size. Note: space_id is not changing. - Write the header and Re-initialize rseg. skip_allocation will be reset to false and so post truncate immediately rseg can be allocated to new booting transaction. /** Truncate UNDO tablespace, reinitialize header and rseg. @param[in] undo_trunc UNDO tablespace handler @return true if success else false. */ bool trx_undo_truncate_tablespace( undo_trunc_t* undo_trunc); - Remove the DDL log. 6. Fixup action during crash: ----------------------------- - During server start srv_undo_tablespaces_init() api will open all the available undo tablespace. - Before opening any undo tablespace check if any undo tablespace needs fix-up. This can be determined by checking for presence of ddl log file. if (undo_trunc.needs_fix_up(space_id)) { ... srv_undo_tablespace_create fsp_header_init } needs_fix_up will check for presence of log file and if log file is present then for presence of magic-number. If magic-number if present then it suggest that truncate action was complete but unlink of file was not so simply remove the file and report that file didn't exist. If file exist and magic-number if not present it suggest that truncate action was not complete. So complete the truncate action. Instead of truncating the file we can simply delete and re-create the file as file is yet to open.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.