WL#6965: InnoDB: Truncate UNDO logs/tablespace
Status: Complete
Let me first clarify difference between TRUNCATE and SHRINK. TRUNCATE: - Truncating undo tablespace to make it equivaluent to what it was when it was created new. The physical size of undo tablespace file will be reset to its default initially configured size and all the undo logs within the tablespace will be removed SHRINKING: - Shrinking undo tablespace to reduce the file size. (New size of undo tablespace can be anything between least possible size and current size. The undo records up to the shrink size are maintained and are valid. This WL aims at truncating (and not shrinkng) of UNDO tablespace. UNDO logs and UNDO tablespaces are used interchangebly. ----------------- When transaction performs any DML activity it would result in generation of UNDO logs. Once transaction is committed then INSERT generated UNDO logs are cleared immediately but UPDATE/DELETE generated UNDO logs are kept to service MVCC. Purge tracks the limit and once it identifies that the said UNDO log entries are no more needed it removes it from the UNDO logs. Removal here means pages related to these entries can be marked as free and ready-for-reallocation. This activity doesn't result in reducing the size of UNDO logs/tablespace. On a heavily loaded system even though we re-use the UNDO log pages. In order to implement MVCC (and purge) new pages areadded to the UNDO logs at a higher rate than purge can truncate (remove). This can result in an increase in the undo log file size and can consume a significant size on disk. WL will try to address this problem by periodically truncating undo logs/tablespace. Truncate will be done online without affecting server operation While the undo tablespace is taken offline for truncate it would not be available for attaching to new transactions and so the total transaction handling capacity of the server will reduced till TRUNCATE is complete. The truncate of the UNDO tablespace is a millisecond operation and therefore no significant user impact expected. Note: Other UNDO tablespaces will be available to service transaction so it will not stall the server except we can see some slowness or delayed allocation. One of the tablespace will be selected for truncate and will be made inactive for that time period. During this time-period there should be another UNDO tablespace to service the new upcoming transactions. Requirement for UNDO tablespace truncate: - There should be at-least 2 UNDO tablespaces and innobase_undo_logs should be configured accordingly so that there are enough rsegs. - Server should be configured to use UNDO tablespace (if used with system tablespace it will never be truncated.)
- Functional Requirement:
- Server should be started with at-least 2 UNDO tablespaces and
innodb_undo_logs set to a value such that there are at-least 2 REDO rollback
segments. (--innodb_undo_tablespace=2 --innodb_undo_logs=35)
UNDO log count of 35 suggest following
1st rseg resides in system tablespace that is 0
2nd - 33rd rseg resides in system temporary tablespace (these are non-redo
rsegs)
34th and 35th rsegs resides in undo tablespaces (normally wih space-ids
1, 2 .... etc)
- Server should keep periodic watch on size of UNDO tablespace.
If size of UNDO tablespace > threshold then Server should
kick-start TRUNCATE. If tablespace is made inactive for TRUNCATE
then number of transactions, server can support, will reduce.
Based on the load the threshold should be configured taking into
consideraion above criteria of reduced transaction handling.
Note: threshold is configurable. Check HLS for more details.
- On truncate, size of tablespace should be resorted to what it
was while was created.
(NOTE: Verification of this could be bit tricky if tablespace get re-used
before user go to the needed directory to check the size.)
- Till date, purge of rsegs use to happen once in 128 times of purge action
invocation. With this WL we would introduce a variable
innodb_purge_rseg_truncate_frequency that can help control how frequenty
this action should take place.
Permissible Vaue: 1 - 128.
- Complete undo-truncate action is controlled using a flag name
innodb_undo_log_truncate (on/off or 0/1). Default it is turned-off.
Adminsitrator can trun it on based on workload. Only when it is on
UNDO-Tablespace will be marked for truncate.
- Non-Functional Requirement:
- TRUNCATE shouldn't affect MVCC correctness.
Understanding tablespace-rseg-transaction relationship
======================================================
Let's first understand how these 3 components are inter-related.
To start with, scope of the WL is limited only if undo logs resides in undo
tablespace (not residing in system tablespace) and so discussion will only
consider related scenario.
- User can specify undo tablespaces/logs to create using
--innodb_undo_tablespaces=n
--innodb_undo_logs=m
- If m > n (m >= 34) then from rsegs from (34..to.. m) are allocated tablespaces
in round-robbin fashion.
(Note: rseg id start with 0 indices)
0 -> rseg resides always in system tablespace
1..32 -> rseg resides in shared temporary tablespace (ibtmp1)
33 -> undo-tablespace-0
34 -> undo-tablespace-1
....
33+n-1 -> undo-tablespace-n-1 (undo tablespace indexes from 0 so n-1 is last).
33+n -> undo-tablespace-0
33+n+1 -> undo-tablespace-1
....
m -> undo-tablespace-x
That means, there is many-one mapping between undo logs and tablespaces.
(There is one-one between undo logs and rseg so both terms can be used
interchangeably)
- Besides this, each undo logs/rseg has 1024 slots (0 .... 1023).
That means 1024 t can share same rseg.
txn rseg tablespace
1
2 0 0
3
4 1 1
5
6 2 0
7
8
9 4 0
10
11
Many -> One
Many -> One
This means to flag a tablespace as free for truncate all the trxs that are
mapped to that tablespace should be complete, besides, ensuring that undo
logs of these trxs are not needed by purge or MVCC.
For INSERT, UNDO logs are immediately freed (on commit)
For UPDATE (INCLUDING DELETE), UNDO logs are not freed on trx_commit
(as they might be needed by MVCC)
Instead, purge looks at the limit and try to free undo log pages once
it has consumed the undo logs and limit (enforced using trx->id)
signals it that these undo logs are not needed for MVCC.
-------------------------------------------------------------------------------
Approach
========
1. Detect if UNDO log needs truncate. This could be based on some size
threshold. (TODO: Size threshold should be dynamically configurable.)
2. Check if UNDO logs can be truncated. There has to be some minimum
number of undo logs in order to continue smooth operation of server.
For example: if there is only 1 undo tablespace then we can't schedule
it for truncate else it would halt complete server operation.
3. Once identified, mark it so that rsegs associated with it will not
be allocated to new upcoming transactions.
4. As part of purge workload check if any UNDO tablespace is marked for
truncate If found, then try to see if all the pages are free.
(purge enforces this using the limit (trx->id) clause).
All free pages means that UNDO log can be truncated now.
If there is cached page then check if undo log record are needed based
on limit as they are not purged by the purge thread.
5. Initiate truncate. Truncating the tablespace to the original size.
6. Once complete re-enable the rsegs for allocation.
-------------------------------------------------------------------------------
Changes List:
============
New enviornment variable introduced:
------------------------------------
- innodb_max_undo_logs_size: Threshold after which UNDO tablespace
will qualify for truncate.
default: 1024 MB / 1 GB
min: 10 MM
max: ULONGLONG_MAX
- innodb_purge_rseg_truncate_frequency:
Rate at which undo log purge should be invoked
as part of purge action. Value = n means invoke undo log purge on every
nth iteration of purge invocation.
default: 128
min: 1
max: 128
- innodb_undo_log_truncate: Enable/Disable undo-tablespace truncate.
default: off (boolean variable. possible value on/off or 0/1).
If ON only then undo tablespace will be marked for truncate.
Note: If variable is truned-off while a undo-tablespace is already
marked for truncate then truncate action for such undo-tablespace
is not aborted. It continues but no new tablespace is selected unless
user turn it on again.
Both variables are dynamically configurable and have global scope.
Identify UNDO tablespace to truncate:
-------------------------------------
1. Iterate over all the undo-tablespaces and check size of each tablespace.
If size of tablespace is > threshold then mark the tablespace active for
truncate.
Ensure that there are other usuable undo-tablespace(s) and rsegs
(besides marked one) so that operation of server doesn't halt.
2. Once tablespace is selected, then mark all the rseg residing in that
tablespace as inactive for allocation. This will avoid allocation of these
rsegs to new read-write transactions.
Existing transaction will continue to use these rsegs.
Purge action:
-------------
1. Purge thread based on innodb_purge_rseg_truncate_frequency will invoke
undo truncate action. If UNDO tablespace is not marked for truncate then
purge thread will free the undo log segment as it does now.
If UNDO tablespace is marked for truncate then purge thread will avoid
freeing of undo log segment instead just maintain the stat that this
undo log segment can be freed. In short it does delay free as anyway
tablespace is going to get truncated.
|---------------| |--------------|
| ONLINE | undo tblsp | OFFLINE | all rsegs
| rseg |--> size --->| rseg not |--> in undo tblsp
| available | > threshold |available | are free
| for allocation| |for allocation| |
|---------------| |--------------| |
A A |
| | (truncate fail) v
| |------------------------------------|
| (truncate success) | TRUNCATE |
|-------------------------| Truncate undo tablespace which in |
| turn trucates rsegs residing in it |
| Reinitialize rsegs. |
|----------------------------------- |
Truncate action:
----------------
1. Check if all the rsegs residing in the marked UNDO tablespace are free.
Free here means: All the undo records residing in all the rsegs are no more
needed for MVCC.
Once this is ensured a truncate action can be initiated.
2. Stop all operation on UNDO tablespace. Ensure that pages from buffer pool
are either flushed or released.
3. log-checkpoint. This is to ensure that REDO records related to UNDO
tablepsace are no more accessible/valid once we truncate the undo
tablespace.
4. DDL log to safeguard truncate action from crash.
DDL log file name undo_<space_id>_trunc.log is created in srv-log-dir.
5. Truncate the file back to original size that exist when undo tablespace is
created new. Re-initialize tablespace and rseg headers.
6. Remove DDL log marking action complete.
7. Re-enable all the rsegs making them avaiable for re-allocation.
Fixup action during server startup:
-----------------------------------
1. During server startup all the available UNDO tablespaces are opened.
2. As part of the operation, check if there is corresponding UNDO tablespace
DDL log file. If present, then initiate the truncate action as it
indicates that truncate action was not successful.
3. Once done, normal server flow will open then new file and initialize rsegs
accordingly. By default, all rsegs are in active mode.
Finding Optimal value of params:
-------------------------------
- If undo-truncate is going in parallel as quoted above we would surely see a
drop in TPS. How much drop in TPS depends on lot of varying factors. Let me list
them here.
- Number of tablespaces. More the better
- Number of undo logs. More the better
- Threshold. More the better
- innodb_purge_rseg_truncate_frequency. More the better
- Machine configuration and IO sub-system. Truncate is fil-system operation.
Depending on how fast the IO sub-system is.
- Length of parallel running trxn. If any undo-tablespace hold an active data
then it can't be truncated even if active data occupies few bytes in
undo-tablespace.
- Invocation/Scheduling of Purge thread. Truncate action is sub-action of purge
thread and purge thread is invoked only when it has significant work to do.
- We recently did a study to understand this effect and here are results from
aupra01/06.
======================================================================
Machine Used: SURPA-01
--------
Loaded data using:
./sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=1000000
--mysql-socket=/export-c/kbauskar/mysql.sock prepare --mysql-user=root
workload:
./sysbench --test=oltp --init-rng=on --oltp-dist-type=uniform
--mysql-socket=/export-c/kbauskar/mysql.sock --mysql-user=root --max-requests=0
--mysql-table-engine=innodb --max-time=60 --oltp-table-size=1000000
--num-threads=xxxxx --oltp-skip-trx=off run
Strategy:
5 runs for each thread. Ignore first 2 runs and then take average of last 3 runs.
--------
Configuration used:
Baseline: --innodb_undo_tablespaces=2 --innodb_undo_logs=35
WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2
--innodb_undo_logs=35 --innodb_max_undo_log_size=10M
--innodb_purge_rseg_truncate_frequency=1
Results:
Baseline (mysql-trunk) WL#6965 regression
threads
1 297 292
2 520 534
4 1033 1049
8 1880 1888
16 3454 3452
32 6177 6153
64 7072 6647 6%
128 6974 5924 15%
- No regression for thread 1-32 as there is no undo-truncate taking place.
- Regression starts from 64 threads as undo-log size goes beyond 10M threshold.
- For 128 threads almost at any given time one of the undo tablespaces (out of
configured 2) is being truncated.
So an expected regression of 15%.
--------
Configuration used:
Baseline: --innodb_undo_tablespaces=2 --innodb_undo_logs=35
WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2
--innodb_undo_logs=35 --innodb_max_undo_log_size=100M
Results:
threads baseline WL regression
128 6974 6192 11%
- Regression went down by 4% moment we reduced the frequency of undo-truncate.
(by increasing threshold and keeping the rseg-purge-frequency to default of 128).
--------
Configuration used:
Baseline: --innodb_undo_tablespaces=8 --innodb_undo_logs=128
WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=8
--innodb_undo_logs=128 --innodb_max_undo_log_size=10M
--innodb_purge_rseg_truncate_frequency=1
Results:
threads baseline WL regression
128 7095 6888 2.91%
- Regression down to 2-3% with 8 undo tablespace and 128 undo logs. Note:
threshold is still 10M so frequency of undo-truncate is still higher.
--------
Configuration used:
Baseline: --innodb_undo_tablespaces=8 --innodb_undo_logs=128
WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=8
--innodb_undo_logs=128 --innodb_max_undo_log_size=100M
Results:
threads baseline WL regression
128 7095 7027 0.009%
- No regression if we increase the threshold from 10M -> 100M. Default value of
threshold if 1G given the frequency of undo-truncate that user might need.
--------
Configuration used:
Baseline: --innodb_undo_tablespaces=2 --innodb_undo_logs=128
WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2
--innodb_undo_logs=128 --innodb_max_undo_log_size=100M
Results:
threads baseline WL regression
128 7155 6882 3.8%
--------
Machine Used: SURPA-06
Baseline:
--innodb_undo_tablespaces=2 --innodb_undo_logs=35
WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=2
--innodb_undo_logs=35 --innodb_max_undo_log_size=10M
--innodb_purge_rseg_truncate_frequency=1
Baseline WL
thrds
1 314 344
2 580 589
4 1286 1213
8 2507 2379
16 4725 4607
32 8080 8046
64 7076 6331 10%
128 5625 2105 62%
---------------
Baseline: --innodb_undo_tablespaces=8 --innodb_undo_logs=128
WL: --innodb_undo_log_truncate=1 --innodb_undo_tablespaces=8
--innodb_undo_logs=128 --innodb_max_undo_log_size=10M
--innodb_purge_rseg_truncate_frequency=1
thrds Baseline WL %regression
128 6919 6831 1.28%
----------------
Effect of using innodb_purge_rseg_truncate_frequency 1 or 128
WL
innodb_purge_rseg_truncate_frequency = 1 .... TPS: 7075
innodb_purge_rseg_truncate_frequency = 128 (default and constant on Baseline)
.... TPS: 7178
So if we reduce the frequency of freeing undo-pages then we see increase in tps
(with all other setting remaining unchanged).
With value 1 of 1 rollback segments are pages are freed on every invocation of
purge. Check the significance of this variable for more details.
------------------
Effect of undo-tablespace
Baseline: --innodb_undo_tablespaces=128 --innodb_undo_logs=128
WL: innodb_undo_log_truncate=1 --innodb_undo_tablespaces=128
--innodb_undo_logs=128 --innodb_max_undo_log_size=10M
--innodb_purge_rseg_truncate_frequency=1
Note: 128 undo-tablespace
Baseline (tps): 6918
WL (tps): 6984
======================================================================
Let's take a case to understand the issue above.
1. When we ran workload with 2 undo-tablespace and 35 undo-logs (which means 2
redo logs as 32 are reserved for non-redo) performance gap was higher.
2. When we ran workload with 8 undo-tablespace and 128 undo-logs (which emans 95
redo logs) performance gap reduced.
Why ?
a. Amount of generated undo information continue to remain same in both the
cases mentioned above. With threshold of 10M it tend to fillout 2
undo-tablespaces quickly and that means at any given time one of the tablespace
has size > 10M causing truncate to be active.
b. If 1 undo tablespace (out of 2) goes down it is like lossing 50% of your
resources vs if 1 undo-tablespace goes down (out of 8) it is like loosing 12.5%
of your resources.
c. We also observed that if threshold is increased from 10 -> 100M then
frequency of truncate reduces and so gap difference is reduced significantly.
d. innodb_purge_rseg_truncate_frequency plays a key role in deciding when to
clearing up pages in undo-tablespace. For baseline this value is constant to
128. If set to 1 in WL it means for every instance of purge invocation it will
try to free rollback segment. (Note: Freeing of rollback segment is different
from truncate of rollback segment. Free will make those pages re-usable.)
========================================
Based on our existing testing we observed following optimal value.
(note: optimal value defintion may change based on other factor and machine.)
- undo-tablespace >= 8 (more the better)
- undo-logs = 128
- innodb_purge_rseg_truncate_frequency = default
- innodb_max_undo_log_size >= 1G (more the better)
Let's understand the low level changes by describing the change list
depicted above.
1. Handler to track truncate of undo tablespace:
------------------------------------------------
/** Track UNDO tablespace mark for truncate. */
class undo_trunc_t {
......
public:
undo_trunc_logger_t undo_logger;
private:
/** UNDO tablespace is mark for truncate. */
ulint m_undo_for_trunc;
/** rseg that resides in UNDO tablespace marked for truncate. */
rseg_for_trunc_t m_rseg_for_trunc;
/** Start scanning for UNDO tablespace from this space_id.
This is to avoid bias selection of one tablespace always. */
ulint m_scan_start;
};
Reference to this class will be stored in purge handler and it would be
used for tracking the tablespace that is marked for truncate along with
related rsegs.
2. Handler to track ddl log needed to safeguard truncate action from crash:
---------------------------------------------------------------------------
/** UNDO log truncate logger. Needed to track state of truncate
during crash. A DDL_LOG kind of file will be created that will be removed
on successful truncate but if server crashes before successful truncate
this file will be used to initiate fix-up action during server start. */
class undo_trunc_logger_t {
......
public:
/** Magic Number to indicate truncate action is complete. */
const static ib_uint32_t s_magic;
/** Truncate Log file Prefix. */
const static char* s_log_prefix;
/** Truncate Log file Extension. */
const static char* s_log_ext;
private:
char* m_log_file_name;
};
3. Selecting UNOD tablespace for truncate:
------------------------------------------
/** Iterate over all the UNDO tablespaces and check if any of the UNDO
tablespace qualifies for TRUNCATE (size > threshold).
@param[in,out] undo_trunc undo truncate tracker */
static
void
trx_purge_mark_undo_for_truncate(
undo_trunc_t* undo_trunc);
Important validation checks done by the API
/* Validation/Qualification checks
a. At-least 2 UNDO tablespaces so even if one UNDO tablespace
is being truncated server can continue to operate.
b. At-least 2 UNDO redo rseg/undo logs (besides the default rseg-0)
b. At-least 1 UNDO tablespace size > threshold. */
Besides this API will mark the rsegs residing the selected tablespace as inactive.
4. Making rseg inactive:
------------------------
If UNDO tablespace is being truncated then rsegs residing in that tablespace
needs to be marked too so that we can avoid allocation of them to new booting
transaction. skip_allocation helps in tracking this.
pages_marked_freed is use to cache the count of undo segment that can be freed
but we delay the free action as tablespace is being truncated anyway.
struct trx_rseg_t {
.....
/** if true then skip allocating this rseg as the tablespace
it resides in is marked for truncate. */
bool skip_allocation;
/** track the segment that can be freed. If UNDO tablespace
where the segment resides is marked for truncate then we delay
free operation and instead let the truncate do the actual free. */
ulint pages_marked_freed;
};
5. Actual truncate action:
--------------------------
/** Iterate over selected UNDO tablespace and check if all the rsegs
that resides in the tablespace are free.
@param[in] limit truncate_limit
@param[in,out] undo_trunc undo truncate tracker */
static
void
trx_purge_initiate_truncate(
purge_iter_t* limit,
undo_trunc_t* undo_trunc);
- API will check if all the rsegs are free based on their size.
- rseg->curr_size count is adjusted to consider pages_marked_freed.
ulint size_of_rsegs =
rseg->curr_size - rseg->pages_marked_freed;
- Also, if there are cached undo segment then ensure that records within
these segment qualifies for purge. This can done by comparing their trx_no
with purge view limit.
for (undo = UT_LIST_GET_FIRST(rseg->update_undo_cached);
undo != NULL && all_free;
undo = UT_LIST_GET_NEXT(undo_list, undo)) {
if (limit->trx_no < undo->trx_id) {
// segment might be needed for MVCC.
}
}
Same for other cached segment.
- Write DDL log using undo_logger.
- Truncate the tablespace. This would include physical file truncate and
adjusting the space handler (fil_space_t) for the size. Note: space_id is
not changing.
- Write the header and Re-initialize rseg. skip_allocation will be reset to
false and so post truncate immediately rseg can be allocated to new
booting transaction.
/** Truncate UNDO tablespace, reinitialize header and rseg.
@param[in] undo_trunc UNDO tablespace handler
@return true if success else false. */
bool
trx_undo_truncate_tablespace(
undo_trunc_t* undo_trunc);
- Remove the DDL log.
6. Fixup action during crash:
-----------------------------
- During server start srv_undo_tablespaces_init() api will open all the
available undo tablespace.
- Before opening any undo tablespace check if any undo tablespace needs
fix-up. This can be determined by checking for presence of ddl log file.
if (undo_trunc.needs_fix_up(space_id)) {
...
srv_undo_tablespace_create
fsp_header_init
}
needs_fix_up will check for presence of log file and if log file is present
then for presence of magic-number. If magic-number if present then it
suggest that truncate action was complete but unlink of file was not so
simply remove the file and report that file didn't exist.
If file exist and magic-number if not present it suggest that truncate
action was not complete. So complete the truncate action.
Instead of truncating the file we can simply delete and re-create the file
as file is yet to open.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.