WL#5535: Separate tablespace(s) for the InnoDB undo log

Status: Complete

Currently, the InnoDB transaction undo log lives in the system tablespace. There
is some provision in the code to have it in different tablespaces. Splitting the
undo log to multiple tablespaces could lift some performance bottlenecks (mutex
contention). It has been requested in BUG#25697.

Moving the undo log outside the system tablespace would also benefit users of
solid-state storage (SSD). As noted in BUG#56283, it could be useful to move
the user tablespaces to SSD and keep the system tablespace on HDD. The reason to
keep the system tablespace on HDD is the sequential bulk writes to the
doublewrite buffer that would quickly wear out SSDs due to their limited number
of rewrites. If we disable the insert buffer and move the undo logs away from
the system tablespace, it will be practically read-only except for DDL
operations and the doublewrite buffer.

Moving the undo log to different tablespace(s) doesn't require a file format 
change, it will however break backward compatibility. Older versions of InnoDB 
will not be able to access the UNDO logs that reside in their own tablespace 
because they will not know where to find them. Secondly, InnoDB opens all system 
tablespaces before doing recovery and the UNDO logs qualify as system space and 
older version will not know about this new system tablespace.

* Have one tablespace per undo log (insert, update, update_key_or_delete).


* More flexibility in using posix_fadvise to avoid file system cache pollution:
* Users can control how many rollback segments they want to create and whether to 
use separate tablespaces or not


* Breaks backward compatibility
Allow users to move the additional UNDO logs to separate tablespaces. For this we 
introduce 3 new parameters:

  1. innodb_undo_directory (string)
       Path to UNDO tablespace directory, default is ".". Value can be an 
absolute path. Default value will create the UNDO log in the process current 
working directory.

  2. innodb_undo_tablespaces (ulint)
     Allow 1-N mapping between UNDO logs and tablespaces, default is 0.

  3. innodb_undo_logs (ulint)
     Default is 1, if value is greater than current UNDO log numbers then create 
the additional UNDO log segments. If innodb_undo_tablespaces > 0 then they 
will be created outside ibdata1 (system tablespace) otherwise in ibdata1.

Rename innodb_rolback_segments to innodb_undo_logs.

Refuse to startup if innodb_undo_tablespaces doesn't match what InnoDB can 

With the above settings users have more flexibility than is available with the 
current design. Currently we create all 128 UNDO logs at once.

Users can't drop UNDO tablespaces or the UNDO segments created within.
At startup before we open any rollback segment (a.k.a UNDO log) for recovery or 
MVCC, we first check to see how many rollback segments are currently configured 
and where they are located. If they are already in a tablespace separate from the 
system tablespace (space id 0) then we try and open these tablespaces using the 
innodb_undo_dir path and "undoN" as the file name. Where N == space id.
Failure to open the currently in-use tablespaces is a fatal error. This is because 
recovery cannot proceed unless we have access to all the configured UNDO 

If the above operations is successful then we try and open up to 
innodb_undo_tablespaces. If we can't open up to the total number of 
tablespaces that the user is requesting then we print out an error message with a 
suggestion about the correctly configured and discovered tablespaces. The UNDO 
tablespace ids are expected to be contiguous and monotonically increasing and less 
than any user user created tablespace ids. This is an invariant and checked by 
various parts of the code.

Note: All of this doesn't require a file format change because the on-disk data 
structures have the necessary provision to support separate UNDO log tablespaces. 
It is the code in previous versions of InnoDB that has shortcomings.

  * Needs to know the path to the UNDO tablespaces
  * Needs to open the tablespaces
  * Hard coded assertions about rseg space id being 0 (system tablespace), mainly 
  * Assumes system space id will always be 0, this has ramifications in the 
fil0fil.c LRU code.

We extend the notion of system tablespace to include the UNDO tablespaces too.

Patch: rb561