WL#9508: InnoDB: Support CREATE/ALTER/DROP UNDO TABLESPACE

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

Requirements
============
1) Create SQL commands to CREATE and DROP undo tablespaces to fulfill the
primary requirement: "Users of InnoDB should be able to add and drop
undo tablespaces through a SQL interface at runtime so that the location of each
Undo Tablespace can be chosen individually".

2) Stop creating undo tablespaces through the innodb_undo_tablespaces setting so
that there is only one way to add or drop undo tablespaces. 

SYNTAX
======

CREATE UNDO TABLESPACE
----------------------
MySQL should support the UNDO keyword in the following syntax:

   CREATE UNDO TABLESPACE `undo99` ADD DATAFILE 'undo99.ibu';

The datafile can be a file name only or it can be a full absolute path. But it
must end with the suffix ".ibu".

If a simple file name is used will be put into the location specified by
--innodb-undo-directory which is a read-only startup variable.  If
--innodb-undo-directory is not explicitly set, the datadir is used.

Any absolute path location must be in the list of known directories defined by
--datadir, --innodb-directories, --innodb-home-directory & --innodb-undo-directory.

But the datafile may not be a relative path since there would be confusion as to
what it is relative to; the undo directory or the datadir where all other
relative paths originate from.

If you want to use an undo datafile location other than the datadir but you do
not want the stored filename to be an absolute path, you can set
--innodb-undo-directory to a relative path. It will be relative to the datadir.
A simple file name in the ADD DATAFILE parameter will be added to that path. 
Then the filepath shown in information_schema.files and the path stored in the
DD will be a relative path.

DROP UNDO TABLESPACE
--------------------
Any undo tablespace created by the above statement should be dropped using:

   DROP UNDO TABLESPACE `undo99`;

ALTER UNDO TABLESPACE
---------------------
In order to DROP any existing tablespace it must be empty in the current design.
Likewise, in order to drop an undo tablespace, it must also be empty. To
accomplish this, we will introduce a command that will mark the UNDO tablespace
as 'inactive'.

   ALTER UNDO TABLESPACE 'undo99' SET INACTIVE;

This command will mark the undo tablespace so that it will no longer be used to
assign rollback segments to new transactions. In order for that tablespace to
become empty, all the active transactions that used undo logs in the rollback
segments in that tablespace must be completed, as well as any transaction that
was started before those transactions completed.  Then the purge thread must
clean up all undo logs and undo log segments in this tablespaces' rollback
segments.  Finally, when the rollback segments are no longer used, it is fully
inactive and empty and can be dropped.

This same process is currently being used to truncate undo tablespaces. The
purge thread does the work to clean them up and to truncate them. It will do the
same for undo tablespaces that have been explicitly marked as 'inactive'. It
will not delete the tablespace because instead of dropping it, the user may do:

   ALTER UNDO TABLESPACE 'undo99' SET ACTIVE;

This could be done at any time by the user instead of DROP UNDO TABLESPACE.  

If DROP UNDO TABLESPACE is done at any time before the tablespace has been
truncated, then it will return HA_ERR_TABLESPACE_IS_NOT_EMPTY as InnoDB does now
for a normal tablespace.


INNODB_UNDO_TABLESPACES
=======================
WL#10498 changed the default value of innodb_undo_tablespaces to 2.
WL#10583 then changed the minimum value of innodb_undo_tablespaces to 2.
With 2 undo tablespaces, it is possible to use innodb_undo_log_truncate=ON.
So WL#10499 changed the default value of innodb_undo_log_truncate to ON.

We would like to not use innodb_undo_tablespaces anymore and instead rely on SQL
to create undo tablespaces. But we need to have rollback segments created
somewhere when the instance is initialized.  This location for rollback segments
must exist before any SQL can be accepted.  The pre-8.0 engine creates one
rollback segment in the system tablespace very early during startup.  WL9507
stopped using that rollback segment and created rollback segments in separate
undo tablespaces instead.  We want to continue using separate undo tablespaces
that can be truncated because we want to move away from any dependence on the
system tablespace. So we should create at least 2 undo tablespaces by default
during initialization.

Instead of having 2 ways of creating and dropping undo tablespaces, we should
stop using and deprecate innodb_undo_tablespaces. Then we will create 2 default
independent undo tablespaces as part of each instance.  These undo tablespaces
files are currently named 'undo_001' & 'undo_002' with no extension.  This was
done in WL#9507. Since the explicit undo tablespaces will have a file extension
of '.ibu', it will be beneficial to leave the current naming of these implicit
undo tablespaces the same so that there will be no filename conflicts. 

The tablespace names for the 2 implicit undo tablespaces are 'innodb_undo_001'
and 'innodb_undo_002'.  Since the create tablespace process does not allow the
user to create a tablespace name that starts with 'innodb', these names also
will not conflict with explicit undo tablespace names. 

Implicit undo tablespaces are created in the default undo directory and they are
not permitted to be dropped.  But they can be made inactive with;
    ALTER UNDO TABLESPACE `innodb_undo_001` SET INACTIVE;
Both implicit undo tablespaces will be made active at bootstrap and remain
active until they are SET INACTIVE. There must be at least 2 explicit undo
tablespaces active before these two implicit undo tablespaces can be made inactive. 


UNDO TABLESPACE DISCOVERY
=========================
All undo tablespaces must be discovered and opened before redo recovery and
before any other datafile is opened so that uncommitted transactions can be
rolled back, including rollback of the data dictionary.
Even undo tablespaces that were not involved in redo recovery must be open and
available before recovery.

This means that the data dictionary cannot be used at startup for undo file
discovery.

Implicitly created undo tablespaces are found easily before recovery because
they are always in the default undo directory. But explicit SQL generated undo
tablespaces can be placed in separate locations specified in the ADD DATAFILE
statement. That location is saved in the data dictionary but these remote undo
tablespaces must be discovered and opened before the DD can be used.

The setting --innodb-undo-directory will be the default location for undo
datafiles but they will also be allowed in or under any of the known directories
in --innodb-directories.  In this way, InnoDB will know where to scan in order
to discover undo datafiles.  In order for discovery to work as implemented in
WL8619, we cannot allow ADD DATAFILE locations that are outside of these valid
directories. When InnoDB starts up, only the directories listed in --datadir,
--innodb-home-directory, --innodb-undo-directory and --innodb-directories will
be scanned.  

Any undo datafile that is not found in this scan cannot be used. The directory
scanning code implemented in WL#8619 will be used for this. This code discovers
all undo tablespaces from known directories before recovery. Thus if recovery
happens with missing undo tablespaces, any uncommitted transaction in that undo
tablespace will not get rolled back.  The database would be inconsistent.  This
worklog adds an error message to the error log if an undo tablespace known to
the DD is not found at startup. 

UNDO TABLESPACE PORTABILITY
===========================
Explicit undo tablespaces created with CREATE UNDO TABLESPACE can be moved while
offline to any location that is in or under one of the known directories
specified in --innodb-directories, --innodb-undo-directory, --datadir or the
current directory.  These files will be discovered at startup and assumed to be
the current undo tablespaces which were moved.

Implicit undo tablespaces must always be located in the --innodb-undo-directory.
If they are moved while offline, then mysqld must be started with that new
directory in --innodb-undo-directory.

INFORMATION_SCHEMA.INNODB_TABLESPACES
=====================================
In order to determine if an inactive undo tablespace is empty and ready to drop,
the user can query INFORMATION_SCHEMA.INNODB_TABLESPACES.  There will be a new
column called "STATE". This worklog will implement this display column. 

This state will be persisted in DD table mysql.tablespaces as part of
se_private_data. It will be another key/value pair of {state="<string>"}.The key
{discard=<1/0>} will be removed as part of this worklog and state='discard' will
be used instead.

For undo tablespaces, the values can be 'active', 'inactive', and
'empty'. For general and file-per-table tablespaces the column may contain
either 'normal', 'discarded' or 'corrupted'.

The four undo tablespace states have the following meaning:
active:    These rollback segments can be allocated to new transactions.
inactive:  These rollback segments are no longer being used by new transactions.
           They are 'inactive'. The truncate process is happening. This undo
           tablespace was either selected by the purge thread implicitly or it
           was made inactive explicitly by an ALTER UNDO TABLESPACE statement
           with SET INACTIVE.
           When the truncation process is complete, the next state is ACTIVE If
           it was set implicitly, or EMPTY if it was set explicitly.
empty:     This undo tablespace has been truncated and is no longer active.
           It is ready to be either dropped or set active again explicitly with
           SET ACTIVE.

The three general and file-per-table tablespace states have the following meaning:
normal:    A normal active general or file-per-table tablespace.
discarded: A file-per-table tablespace that has been discarded using
           ALTER TABLE name DISCARD TABLESPACE;
corrupted: A general or file-per-table tablespace that has been identified by
           InnoDB as having been corrupted.
FR1: CREATE UNDO TABLESPACE `name` ADD DATAFILE 'path/to/filename.ibu'; can be
used to create up to 125 new undo tablespaces in addition to the 2 default undo
tablespaces implicitly created at startup.
FR2: The ADD DATAFILE statement can contain any existing filepath and must be
located in or under any of the known InnoDB paths given in
--innodb-undo-directory, --innodb-directories, --datafile and 
--innodb_data_home_dir.
FR3: The ADD DATAFILE filename must use an extension of 'ibu'.
FR4: The ADD DATAFILE filename must not be a relative path.  It can be a simple
file name like 'undo_003.ibu' or it can be a full absolute file path.
FR5: The setting --innodb-undo-tablespaces is deprecated in 5.7. It will be
ignored in this worklog and deleted in a future release.
FR6: There will always be two implicit default undo tablespaces called
'innodb_undo_001' and 'innodb_undo_002'. The datafiles are located in
--innodb-undo-directory and called 'undo_001' and 'undo_002'.  They cannot be
dropped but they can be SET INACTIVE or SET ACTIVE.
FR7: The default undo tablespaces and all SQL generated undo tablespaces can be
truncated if --innodb-undo-log-truncate=ON and the datafile size becomes greater
than --innodb-max-undo-log-size.
FR8: ALTER UNDO TABLESPACE `name` SET INACTIVE; will cause that undo tablespace
to stop being used by new transactions.
FR9: The user can query information_schema.innodb_tablespaces to see which undo
tablespaces are active, inactive and ready to be dropped.
FR10: ALTER UNDO TABLESPACE `name` SET ACTIVE; will cause an inactive undo
tablespace to start being used again.
FR11: DROP UNDO TABLESPACE `name`; will delete an inactive and unused undo
tablespace.
FR12: DROP UNDO TABLESPACE `name`; will fail if the undo tablespace does not
exist, or is active, or has active undo logs.
FR13: Concurrent CREATE/DROP/ALTER UNDO TABLESPACE commands can work on
different tablespaces but will serialize on the same tablespace name.
FR14: The current state of an undo tablespace can be seen with:
SELECT SPACE, NAME, SPACE_TYPE, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES
WHERE SPACE_TYPE = 'Undo';
Algorithm: CREATE UNDO TABLESPACE
=================================
CREATE UNDO TABLESPACE will work similar to CREATE TABLESPACE as far as syntax
and DD updates.  It will create the independent undo tablespace using the same
code introduced in WL#9507. The new tablespace is not used until it is fully
created and ready to be used.  Until then no other thread can use this
tablespace.  There is one global vector of undo tablespace objects that must be
added to under mutex protection.  This is done only after the tablespace has
been created with a header page, an RSEG_ARRAY page, and the required number of
rollback segments.

Algorithm: DROP UNDO TABLESPACE
===============================
DROP TABLESPACE only works for tablespaces that are empty.  So DROP UNDO
TABLESPACE will also only drop an empty undo tablespace. The undo tablespace to
be dropped must be in the `empty` state and all rollback segments must be empty
of undo segments.

Only explicitly created undo tablespaces that have the suffix '.ibu' can be
dropped. The 2 implicit undo tablespaces cannot be dropped.

Algorithm: ALTER UNDO TABLESPACE `name` SET INACTIVE
==================================================
WL#6965 introduced the ability to truncate undo logs and tablespaces. WL#9507
makes use of the WL#6965 ability to stop using existing undo tablespaces when it
needs to reduce the number of undo tablespaces.  Here, we will use this ALTER
statement to explicitly set an undo tablespace as inactive.  WL#6965 implicitly
picks one tablespace at a time for truncation when innodb_undo_log_truncate=ON.

Once an undo tablespace is marked like this, rollback segments owned by that
tablespace are no longer assigned to any new transactions.  Eventually, the
purge process will free up all undo segments in that tablespace.  When a purge
thread finds a completely empty undo tablespace that was marked for truncate
that purge thread will truncate it.

'ALTER UNDO TABLESPACE `name` SET INACTIVE' will just start the process of
emptying. When that undo tablespace is empty, it will be truncated and rebuilt
just like WL#6965.  Then it will stay empty and inactive until it is explicitly
dropped or set active again. The transaction for 'ALTER UNDO TABLESPACE `name`
SET INACTIVE' will set a boolean flag in the dictionary and set a flag in memory
so that the 'inactive' state is known and durable. After this transaction,
'ALTER UNDO TABLESPACE `name` SET INACTIVE' can return immediately.

There must necessarily be a gap in time between 'ALTER UNDO TABLESPACE `name`
SET INACTIVE' and 'DROP UNDO TABLESPACE `name`'. The user must have a way to
know if the undo tablespace is ready to be dropped. So we must provide a way to
monitor which undo tablespaces are active, inactive but not empty, and inactive
plus empty.  A new column called 'STATE' will be added to
INFORMATION_SCHEMA.INNODB_TABLESPACES in order to show these states.

It does not matter if the undo tablespace being set inactive is the one chosen
to hold the undo record for the 'ALTER UNDO TABLESPACE `name` SET INACTIVE'
transaction.  The undo log record for that transaction is like any other
Transaction Undo Log (trx_undo_t) in that undo tablespace.  They all must stay
around until they can be purged.  The undo tablespace and its rollback segments
are all still available and usable and recoverable in a crash even if the undo
tablespace is inactive and marked for truncation.  These undo tablespaces are
just not receiving any new Transaction Undo Logs.  Once this transaction is
older than any other active transaction, the undo records can be purged.  The
change indicating that this tablespace is marked inactive is durable and the
undo records for it are no longer needed.  And that transaction will have been
one of the last transactions put into that undo tablespace. 

If a crash happens during the 'ALTER UNDO TABLESPACE `name` SET INACTIVE'
transaction the tablespace will be used again after recovery since the ALTER
statement will roll back. 

If a crash happens after that commit, the recovery will see that the tablespace
is inactive and marked for truncate.  It will remain available until it is empty.

In order to allow undo tablespace truncation to continue after a crash, WL#6965
creates a file called undo_<space_num>_trunc.log while the truncate of the undo
tablespace is in progress. That was done before a transactional data dictionary
was available.  The DROP UNDO TABLESPACE algorithm, just like the DROP
TABLESPACE command, will use the DDL_LOG.

Concurrency
===========
MDL should be used like any other TABLESPACE DDL so that concurrent UNDO
TABLESPACE DDL cannot be processed concurrently on the same tablespace.

In addition, it is necessary to serialize creation and deletion of all undo
tablespaces so that more than one undo tablespace is not being created or
dropped concurrently. This includes the undo truncation process.  This is
accomplished by undo::ddl_mutex.

Undo Tablespace State Transitions
=================================
There are 3 Undo Tablespace States mentioned above, which can be seen in the
information_schema.innodb_tablespaces table; active, inactive & empty.
Internally, InnoDB will differentiate spaces that are inactive_implicit and
inactive_explicit.  

The state transitions can be summarized below;
1. During normal background undo truncation:
     active -> inactive_implicit -> active
2. SET INACTIVE followed by DROP
     active -> inactive_explicit -> empty {-> DROP deletes the undo tablespace.}
3. SET INACTIVE followed by SET ACTIVE
     active -> inactive_explicit -> active

In addition, it is possible to do SET INACTIVE while in the
'inactive_implicit' state.
2a. When being truncated,  SET INACTIVE followed by DROP
     active -> inactive_implicit-> inactive_explicit -> empty {-> DROP}
3b. When being truncated, SET INACTIVE followed by SET ACTIVE
     active -> inactive_implicit -> inactive_explicit -> active
 
It is also possible to do SET INACTIVE and then immediately do SET ACTIVE
before the truncation has completed.
3b. SET INACTIVE followed immediately by SET ACTIVE
     active -> inactive_explicit -> inactive_implicit -> active

At startup, the states that are persistently stored in the DD will be applied
to the in-memory objects after the DD has been recovered.  Before recovery,
the undo tablespaces must be opened and made ready. So if any undo tablespace
is in the process of being truncated when the engine went down, that process
will be completed at startup.  It cannot consult the DD since this is before
recovery, so InnoDB consults the truncation log file.

This means that just after recovery, all undo tablespaces found will be marked
as active. Then the DD is consulted and the persisted states are applied.

For this reason, there is no reason for the purge thread to do a DD transaction
to persist the state 'inactive_implicit'.  The state is tracked in the
in-memory object but it does not need to be persisted in the DD since the
truncation log file is used for crash recovery purposes.

The purge thread chooses an undo tablespace to truncate because it is too big. 
If a crash happens before it is actually truncated, it can be left active at
startup and it will be chosen again.

This means that even though there are 4 states tracked in memory, there are only
3 states written to the DD. And since information_schema.innodb_tablespaces is
filled from the contents of the DD.tablespaces record, the 'inactive_implicit'
state will never actually be seen.  This is probably fine since undo truncation
happens transiently in the background. That is why the state name in
dd::tablespaces::se_private field is just named 'inactive'.

Upgrading to v8.0
=================
Since MySQL v5.7 allowed up to 127 external implicit tablespaces and this
worklog hard codes that number to 2, it will be necessary for the DBA to
explicitly create the number of undo tablespaces that they need.  

The upgrade process will create the two new implicit undo tablespaces and use
those for new transactions.  Once all the old undo tablespaces (named 'undo001'
, 'undo002'. etc), are purged of any undo logs, they are deleted. 

So even if there are more than 2 undo tablespaces in a 5.7 instance, there will
only be 2 undo tablespaces after upgrade to 8.0.

Replication
===========
The number and distribution of undo tablespaces has historically been
controlled by configuration variables, which are not replicated.  By isolating
configuration changes to each node, the DBA can setup different nodes in
different ways.

Each node should have its own independent set of UNDO tablespaces and rollback
segments. Changes to that configuration by SQL commands should not replicate to
other nodes. 

Setting: innodb_undo_log_truncate
=================================
If this setting is ON, background undo truncation will continue to occur like it
did in 5.7.  Both the 2 implicit undo tablespaces and any other explicit undo
tablespaces that are active will be included in this effort by the purge thread
to locate undo spaces that are too big and truncate them one at a time.

If this setting is OFF, undo tablespaces will not be truncated unles it is done
explicitly by an ALTER UNDO TABLESPACE name SET INACTIVE;  So if the user does
not want undo truncation to occur regularly, they can either set
innodb_max_undo_log_size to a larger size, or they can set
innodb_undo_log_truncate=OFF, monitor the undo tablespace sizes directly and
truncate any that become too big with ALTER UNDO TABLESPACE name SET INACTIVE;
fololowed by ALTER UNDO TABLESPACE name SET ACTIVE;