WL#9507: InnoDB: Make the number of undo tablespaces and rollback segments dynamic

Affects: Server-8.0   —   Status: Complete

There are two primary goals of this worklog:

Requirements
============
1. Initiate the process of moving rollback segments and undo logs away from the
system tablespace.  The following two tasks are listed in their own worklogs for
tracking purposes, but will be done within this work.
   WL#10498: Change the default value of innodb-undo-tablespaces from 0 to 2. We
need at least 2 undo tablespaces for undo truncation to work. 
   WL#10499: Change the default value for innodb_undo_log_truncate to true.
The goal is that the system tablespace will no longer be used for rollback
segments.  So at a later time we should change the minimum value of
undo_tablespaces to 2. 

2. Allow the the number of undo tablespaces to be set at runtime using the
existing interface which is the setting innodb-undo-tablespaces.  This work
prepares the way for WL9508 which introduces CREATE UNDO TABLESPACE.

Tasks
=====
1. This worklog will include a redesign of the way rollback segments are found
in undo tablespaces.  This will change the meaning of the rollback segment
identifier associated with each modified clustered index record. In addition, it
will create a new rollback segment mapping page in each undo tablespace it
creates and give that tablespace a new default name so it can be distinguished
from the old style.

2. This worklog will allow the setting innodb_undo_tablespaces to be increased
or decreased at runtime or at startup.  If increased, new undo tablespaces will
be created and then atomically added to the list of active undo tablespaces.  If
decreased, the list of active undo tablespaces is atomically reduced in size. 
Existing undo tablespaces will continue to be used by existing transactions and
eventually will become inactive.  Version 8.0 style undo tablespaces will not be
deleted.  In this way, innodb_undo_tablespaces can be easily increased again.
Note that older undo tablespaces are deleted after they are replaced by v8.0
undo tablespaces.

If innodb_undo_tablespaces is set to 0, which is the default for v5.7, this
means that the system tablespace is used for rollback segments.

If innodb_undo_tablespaces is set at runtime from 0 to a number between 1 and
127, new undo tablespaces will be created with the specified number of rollback
segments.  Once they are ready to use, new transactions will be assigned to
these rollback segments on a round robin basis and rollback segments in the
system tablespace will no longer be assigned.  Once all the transactions using
undo logs in the system tablespace have been fully purged, the rollback segments
there will only consist of one page per rollback segment.  

If innodb_undo_tablespaces is > 0 and then set to 0 at runtime, then the
rollback segments in the system tablespace will be used for new transactions and
the existing undo tablespaces will eventually become unused.  But unused v8.0
undo tablespaces will not be deleted in case the setting is changed again.

The default setting for innodb_rollback_segments will stay at 128. v5.7 reserved
32 of the 128 rollback segment slots available in the system tablespace for
temporary rollback segments physically found in the temporary tablespace.

3. This worklog will use innodb_rollback_segments to indicate how many rollback
segments are in each tablespace. That includes the temporary tablespace, the
system tablespace, and each undo tablespace. Using the default setting of 128,
combined with innodb_undo_tablespaces=0, it is possible that rollback segments
will need to be added when innodb_undo_tablespaces is changed.

4. The setting innodb_rollback_segments will also be adjustable at startup and
at runtime.  

5. In short, this worklog will add the ability to add new undo tablespaces and
rollback segments at any time.  But it will not delete any undo tablespaces or
rollback segments in case they are needed in the future.

6. This worklog will add the ability to upgrade undo tablespaces from
existing databases to the new undo tablespaces mentioned in task #1. Any old
undo tablespaces in a 5.7 database will be replaced by v8.0 undo tablespaces.

FR1: (WL#10498) Change the default value for innodb-undo-tablespaces to 2.

FR2: (WL#10499) Change the default value for innodb-undo-log-truncate to true;

FR3: innodb-undo-tablespaces can be changed at runtime or offline to any value
from 0 to 127.

FR4: A deprecation message is put to the error log if innodb_undo_tablespaces is
set to 0, either during runtime or at startup. It will look like this:
   [Info] InnoDB: Setting 'innodb_undo_tablespaces' to 0 is deprecated and will
not be supported in a future release.

FR5: innodb-rollback-segments can be changed at runtime or offline to any
value from 1 to 128. V5.7 allowed this to be reduced from what was originally
created. Now it can be increased to a higher number.

FR6: The setting 'innodb-rollback-segments' is no longer a database wide limit.
 It now controls the number of rollback segments per tablespace.  That includes
the temporary tablespace, the system tablespace (if innodb_undo_tablespaces=0),
and every undo tablespace (if innodb_undo_tablespaces > 0).

FR7: A v5.7 database will be upgraded when a v8.0 server is started on
it. That database can be the result of either a normal or fast shutdown 
(innodb_fast_shutdown=0 or =1).
FR7a: If this upgrade occurs with innodb_undo_tablespaces=0, 2 undo tablespaces
will be used for that instance of InnoDB and this message will be put into the 
log:
   [Info] InnoDB: Database upgrade will use 2 undo tablespaces instead of 0.
Please set innodb_undo_tablespaces=2 or higher.

FR8: Implicit external undo tablespaces are named "undo_nnn" where nnn is the
undo_space_num. The space_id is related to the undo_space_num like this:
   undo_space_num = 0xFFFFFFF0 - undo_space_id
   undo_space_id  = 0xFFFFFFF0 - undo_space_num

FR9: The setting innodb_undo_logs is deleted in this worklog. So if it is used,
there will be an error message. The setting is deprecated in 5.7 with WL#10322.

This worklog creates the infrastructure to create and drop undo tablespaces at runtime. It also establishes a new undo tablespace file format and the upgrade procedures associated with that. All this is done without requiring any new interfaces. This task will allow the number of Undo Tablespaces to be changed during runtime by changing that value of innodb_undo_tablespaces.

Terminology


RSEG = structure in memory to represent a rollback segment in a datafile rseg_id = a number from 0 to 127 that is put into 7 bits of a rollback pointer

Task List


Task #1. Create a separate array to track rollback segments in the temporary tablespace.

WHY: These are not needed for recovery so the mapping from rseg_id to a Rollback Segment Header Page can be done in memory only. They will never need to be persistent. So they need to be tracked separately. There are only 128 numbers that can be tracked in the rollback pointer for every clustered record change. There is no need to reserve any of those numbers for changes to temporary tables.

Task #2. Switch from a fixed array of RSEG pointers to a std::vector

See LLD for detail.

WHY: 1) The purge code already uses a std::vector to track RSEG objects.

    2) After upgrading from a 5.7 system tablespace that does not use undo

tablespaces, there will be gaps in the numbering of RSEG IDs from 1 to 32 in existing databases (new databases with be initialized without that gap.) This vector of durable RSEGs will eliminate that 32 slot gap so that round robin traversal of this list will not slow down.

Task#3. Switch from a fixed array to a std::vector to list space_id_t for undo tablespaces.

  For similar reasons described above, this list will go from a fixed

unchanging size to a dynamic list as tablespaces are added and dropped. This list of undo tablespaces will become the source of the round robin algorithm to assign RSEG objects to transactions once we do item 4 below.

WHY: When undo tablespaces are randomly dropped by DROP UNDO TABLESPACE there will be holes in this array. The worst case is if slot 1 and 127 undo tablespaces so that every other transaction will need to traverse 125 empty slots looking for an available rollback segment. A vector allows faster access since the array is adjusted to the actual size and there are no empty slots.

The following tasks will be done in this worklog:

Task #4. Change the way rollback segments are located by purge, MVCC, transaction rollback and rollback segment assignment at start transaction time.

MVCC and transaction rollback only need to know the space_id associated with a 7-bit rseg_id in a rollback pointer. But purge and start transaction need to find the Rollback Segment Header Page for each rollback segment. Both the space_id and page_num are currently found in the TRX_SYS page in a 256 slot array of which only the first 128 slots are used in v5.6 and v5.7.

We should drop this dependency on the TRX_SYS page and the system tablespace by making a way to translate a 7-bit number into a space_id for MVCC and transaction rollback, and making a way to find each rollback segment within an undo tablespace so that purge and start transaction can do their jobs.

  See LLD for Details.

WHY: 1) It will be a good thing to reduce dependency on the system tablespace.

    2) Increase the number of possible rollback segments. Note that it is
       questionable if any customer has ever run out of 128 rollback segments
       since each rollback segment can contain about 4000 Undo Segments.
       But using more of them may be better in a highly concurrent environment.
    3) Make Rollback Segments independent of transaction rollback pointers.
    4) Allow discovery, translation and identification of undo space IDs quick,
       easy and deterministic, even during redo recovery.

Task #5. Implement in memory storage for rollback segments within undo tablespaces.

Each undo tablespace will have its own list of rollback segments. So the algorithm for assigning a rollback segment to a transaction will need to choose a tablespace first by round robin and then a rollback segment by round-robin.

By preallocating these vectors and only adding new rollback segments to the back of the vector when in concurrent mode, these vectors will not need any new latches. See LLD for a description.

WHY: What used to be a fixed size array can have holes and will be changing size when undo tablespaces can be added and dropped.

Task #6. Deprecate the innodb_undo_logs setting in the latest v5.7 (WL#10322) and delete this setting in this worklog for v8.0. Use innodb_rollback_segments to specify the number of rollback segments that is used within each undo tablespace created. The default will remain 128 so that even when innodb-undo-tablespaces=0 which is the default is in v5.7, the total number of rollback segments will not be less than v5.7.

WHY: 1) These two settings alias each other. The user does not know which will be used if one is different from the other. The second alias setting called innodb_undo_logs uses the wrong name. They actually control the number of Rollback Segments. And innodb_rollback_segments is the setting that reports a deprecation warning!

    2) For 5.6 and 5.7, the minimum value accepted is 1, which is OK if you do

not use innodb_undo_tablespaces > 0, which is the default. But if you try to use extra undo tablespaces, and you forget to set innodb_undo_logs or innodb_rollback_segments to at least 33 + the number of undo tablespaces you want, then the undo tablespaces will be created but not used.

    3) innodb_rollback_segments will change meaning from the number of rollback

segments in a bootstrapped database to the number of rollback segments in a tablespace.

See WL10322: Deprecate innodb_undo_logs in 5.7

Task #7. Create routines to handle changes in innodb_undo_tablespaces and innodb-rollback-segments.

If innodb-undo-tablespaces is increased, create the requested number of new undo tablespaces and fill them with the set number of rollback segments. If innodb-undo-tablespaces is decreased, reduce the active number of undo tablespaces used to assign rollback segments to a new transaction. Keep the inactive undo tablespaces in the vector, even after they are no longer being used.

If innodb-rollback-segments is increased, create the requested number of rollback segments in each known undo tablespace, or in the system tablespace if innodb-undo-tablespace=0. If innodb-rollback-segments is decreased, reduce the active number of rollback segments used to assign one to a new transaction. Keep the inactive rollback segments in the vectors, even after they are no longer being used. They will remain in the tablespace and use only one empty header page.

WHY: 1) This code is the core ability to add tablespaces. It be necessary for CREATE UNDO TABLESPACE.

    2) If a database is initialized with innodb-undo-tablespaces > 1, only one

rollback segment will be created in the system tablespace. Then if innodb-undo-tablespace is set to 0 at runtime, we will need to be able to create the necessary number of rollback segments in the system tablespace. Since this routine is necessary, it is very little extra effort to also be able to add rollback segments to any undo tablespace in response to a change in innodb-rollback-segments. This will provide a more thorough user experience.

Task #8. Create an upgrade path for existing databases.

When innodb starts up with an existing database from an older engine, it will attempt to upgrade the old data dictionary to the new DD. As part of this process, the undo tablespaces and rollback segments will also be upgraded. The tricky part is if there are existing undo logs in the existing rollback segments. We will allow the purge threads to process these undo logs before the new rollback segments and undo tablespaces are used.

But we must not make any changes to the TRX_SYS page or delete any old undo tablespaces until the upgrade process has succeeded.

Fortunately, new undo tablespaces do not need the TRX_SYS page. So they can be created at the normal time in the startup process. They must be created early in the upgrade process since they will be used for creating the new DD and upgrading the user tables. And they can be created before the existing undo logs are purged because these new undo tablespaces do not get in the way of the purge process, nor do they change any existing rollback segment references.

If the upgrade process fails after neew undo tablespaces have been created, they do not seen by an earlier engine. They can be deleted by the user. If an upgrade is attempted again, they will be used. If they do not yet have enough rollback segments, they will be added during startup.

During the purge of old undo logs, a flag will be used to indicate that undo logs are being purged. While this flag is set, the interpretation of the 7-bit rollback pointer is done as if it is a slot in the TRX_SYS page instead of an undo space number that can be converted directly to an undo space ID. After purge of old undo is complete and that flag is turned off, rollback pointers will be interpreted the new way.

Even though the default value in this worklog is innodb_undo_tablespaces=2, the minimum accepted value is still 0. We will set that minimum value to 2. Until then, if MySQL is started with innodb_undo_tablespaces=0 and upgrade is needed, we will ignore that value and use innodb_undo_tablespaces=2. This will prevent the TRX_SYS page from being modified during the upgrade process, so nothing needs to be 'undone' to the TRX_SYS page if the upgrade fails.

Task#9. Optional but not recommended: Delete unused rollback segments in the system tablespace.

If the value of innodb-undo-tablespaces changes from 0 to >0 at runtime, the engine already has code that stops using the rollback segments in the system tablespace. This happens lazily by assigning to each new transaction only rollback segments from undo tablespaces. Eventually, all transactions that were using rollback segments in the system tablespace will be completed and those rollback segments will no longer be used.

It has been suggested that unused rollback segments in the system tablespace should be removed. They only consist of an inode and a header page. With default settings, these empty rollback segments will use up 128 slots in the TRX_SYS page, 128 inodes, and 128 empty rollback segment header pages. This overhead could be deleted from the system tablespace to save space. But it might be handy to keep them there in case all the undo tablespaces are dropped at runtime. In this worklog that could happen if innodb-undo-tablespace is set back to 0 at runtime. If the existing rollback segments in the system tablespace are deleted, then it should no longer be possible to set innodb_undo_tablespaces=0 or drop the last undo tablespace.

WHY: The idea is that when a bunch of new undo tablespaces are created, we need to stop using the rollback segments in the system tablespace. That much is already being done and will continue. The question is whether to delete these header pages and make the inodes and pages available for reuse. But a more important reason to quit using these pages is to be able to free up space used by the system tablespace back to the file system. This can only be done by optimizing or rebuilding the system tablespace.

Task #2. Detail - Switch from a fixed array of RSEG pointers to a std::vector

We will preset the size of this vector to 128 so that it does not need to be
resized. This allows new entries to continue to be added without mutex 
protection.

The list of rollback segments is accessed in 2 ways:
  1. By the purge thread traversing from beginning to end

  2. At the start of each transaction. The list of rollback segments is
traversed from a starting point using a round-robin algorithm.

It is not necessary to access the list of rollback segments when undoing records
for MVCC or rollback.  The rollback pointer contains a page number and offset of
a specific undo record, plus a 7-bit number called an rseg_id.  The rseg_id is
used to find the space_id of the tablespace containing that undo record.  

Task #4 Detail - Change the way rollback segments are located

A Rollback Pointer contains a 7-bit identifier which is only used to find the
undo tablespace where the undo record is found. This 7-bit number is usually
referred to as the rseg_id, but we can also call it the undo_space_num. 

An analysis of the InnoDB code reveals that the 7-bit rseg_id from a rollback
pointer is only used to find the space_id.  The TRX_SYS page contains both the
space_id and the page number of the rollback segment header page for each 
rseg_id. 

If all undo tablespaces use fixed space_ids in a reserved range of near
0xFFFFFFFF, then an rseg_id between 1 and 127 will map directly to an Undo
Tablespace ID and 0 will map to the system tablespace as before. 

Redo log space IDs are reserved from 0xFFFFFFF0 to 0xFFFFFFFF (see
SRV_LOG_SPACE_FIRST_ID).  So we will reserve numbers from 0xFFFFFFFF0 - 1  to
0xFFFFFFF0 - 127 for undo tablespace IDs. 
   undo_space_num = 1 to 127
   undo_space_id  = 0xffffffef to 0xffffff71  or (uint32) -17 to (uint32) -143
   SRV_LOG_SPACE_FIRST_ID = 0xfffffff0  or (uint32) -16
   SPACE_UNKNOWN  = 0xffffffff  or (uint32) -1

   undo_space_id = SRV_LOG_SPACE_FIRST_ID - undo_space_num
   undo_space_num = SRV_LOG_SPACE_FIRST_ID - undo_space_id

The following chart shows this mapping
      RSEG_ID     SPACE_ID
      =======  ================
         0      0
         1     -17  (0xffffffef)
         2     -18  (0xffffffee)
         3     -19  (0xffffffed)
        ...    ...
        127   -143  (0xffffff71)

These undo tablespaces can easily be discovered at startup since the space_ids
are at a fixed range and the predicted file name can be assembled just as it is
done now.  CREATE UNDO TABLESPACE can create an unpredictable file
name with ADD DATAFILE.  So we will need to implement a way to discover
these undo tablespaces not only using the data dictionary but also during
startup before redo recovery when the DD is not yet available. 

Purge can find a list of each Rollback Segment Header Page in an Undo Tablespace
by reading a page at a fixed page number that contains a list of Rollback
Segment Header page numbers, similar to the TRX_SYS page. This page could be
called an RSEG_ARAY page type.  The size of this array would be limited to the
(page size - overhead) / 4 so it could potentially be very large compared to
what is available now.  We could go from a max of 128 rollback segments to
~512,000 (128 undo tablespaces * ~4000 for 16k page size). Initially though, we
will set the default and maximum number of rollback segments per tablespace to
128 since that is the previous limit for the setting innodb-rollback-segments.

Task #5 Detail - Implement in memory storage for rollback segments

All undo tablespace and RSEG vectors will be preallocated to the maximum size of
128 slots.  The active number of slots may increase in size up to 128. But they
will not decrease.  Only the current active size of these vectors will be 
reduced.

If innodb-undo-tablespaces or innodb-rollback-segments is reduced, purge will
continue to attempt to purge all undo tablespaces and rollback segments in each
vector.  Even after these inactive undo tablespaces and rollback segments are
completely purged and empty of undo logs and segments, this work log will not
attempt to delete them and take them out of the vectors.  In this way, there
will be no need for any new mutexes. Only after a clean restart with a smaller
value for innodb_undo_tablespaces or innobd-rollback-segments will the vectors
hold fewer objects. The ability to delete a fully purged undo tablespace and
associated in-memory objects will be added later.

Task #8. Detail - Create an upgrade path for existing databases.

Here are the steps involved in upgrading the undo tablespaces and rollback 
segments:

1. Discover if upgrade is required. If so, set srv_is_upgrade_mode=true.
   See innobase_init_files() just before calling srv_start().

2. If upgrading and innodb_undo_tablespaces=0, use innodb_undo_tablespaces=2
instead so that the TRX_SYS page does not need to be modified until upgrade is
successful.
   See srv_start().

3. Discover all existing undo tablespaces whether old or new style.
   See srv_start() -> srv_undo_tablespaces_open().

4. Create any new 8.0 undo tablespaces that will be needed. 
   See srv_start() -> srv_undo_tablespaces_create().

5. Find all existing rollback segments whether old or new and set up the
in-memory rseg objects.
   See srv_start() -> trx_sys_init_at_db_start() -> trx_rsegs_init().

6. Search all existing rollback segments for undo logs, add to purge queue if 
found.
   See srv_start() -> trx_sys_init_at_db_start() -> trx_lists_init_at_db_start()
-> See trx_resurrect().

7. If upgrading and undo logs are found, set srv_upgrade_old_undo_found=true.
   See srv_start().

8. While this is true, make sure DB_ROLL_PTR is decoded the old way where the
7-bit rseg_id is a slot in the TRX_SYS page.
   See trx_rseg_id_to_space_id().

9. Create new rollback segments in the new undo tablespaces from step 4 and add
rseg objects. These are empty and are not on the purge queue.
   See srv_start() -> trx_rseg_adjust_rollback_segments().

10. Start purge threads and wait for purge to be become empty. 
   See innobase_init_files() -> srv_start_purge_threads().

11. When Purge becomes empty, turn srv_upgrade_old_undo_found=false.  From this
point forward, any change that creates an undo log will use the new undo
tablespaces and rollback segments.
    See innobase_init_files() after waiting for purge to become empty.

12. Create Dictionary tables and finish the upgrade process.
    See dd_upgrade_*.

13. When the upgrade succeeds, delete any old undo tablespaces and clean up the
slots in the TRX_SYS rseg array that pointed to rollback segments in those old
undo tablespaces. If the slots in the TRX_SYS page referred to rollback segments
in the system tablespace, leave them alone.  This worklog does not delete
useable rollback segments.
    See dd_upgrade_finish().

14. Set srv_is_upgrade_mode=false.
    See dd_upgrade_finish().