WL#6915: InnoDB: Undo logs for temp-tables (and related objects) should reside in temp-tablespace

Status: Complete   —   Priority: Medium

Introduction:
------------

The UNDO logs currently reside in the system-tablespace or in dedicated undo-
tablespaces (undo tablespaces are configurable by user and are optional) for all 
objects created in system.

This work log adds an additional type of UNDO log, one that is not redo logged
and resides in the temp-tablespace. These non-redo-logged UNDO
logs are not  required during recovery but only used for rollback action.
(Purging is blocked for temp-tables, MVCC is not applicable for temp-tables)
Note: The temp-tablespace is always recreated on startup.

This new type of UNDO log is being introduced for the use by temp-tables, both 
compressed and Non-compressed.

WL helps in addressing this requirement.

Visible Difference (if any):
---------------------------

Note change in info message that explicitly notify user about number of redo and
non-redo rollback segment active.

2013-05-10 17:20:42 7609 [Note] InnoDB: 96 redo rollback segment(s) found. 1
redo rollback segment(s) are active.
2013-05-10 17:20:42 7609 [Note] InnoDB: 32 non-redo rollback segment(s) are active.

How would this help ?
----------------------

- Since they are not required for crash recovery we can avoid REDO logging.
  On crash/startup temp-tablespace is re-created therefore there is no
  reference of the temp-tables on restart.

  This small change helps in saving lot of un-needed IO arising out of
  REDO logging and further help in improving performance.

----------------------------------------------------------------------

chtest validtion needed for pb2 runs:

################################################################################
#    innodb_wl6915_debug          passed all enabled (<WHATEVER>_CHECK <> 0) tests
# ------------------------------------------------------------------
# Top level file     = ./suite/innodb/t/innodb_wl6915_debug.test
# Current directory  =
/opt/projects/innodb/codebase/mysql-tree/mysql-trunk-wl6915/mysql-test
# MTR_BINDIR         =
/opt/projects/innodb/codebase/mysql-tree/mysql-trunk-wl6915/bld
# TREE_DIR_CHECK     = 1     # Check if the directories used are consistent
# GREP_CHECK         = 1     # Simple result check with grep
# EXTENDED_CHECK     = 1     # Extended check testcases
# REPEAT_CHECK       = 1     # The test must be repeatable
# ASSIGNED_DIR_CHECK = 1     # Check with explicit assigned MTR
#                            # options tmpdir and vardir.
# OPTION_CHECK       = 1     # Check with various server/mysqltest
#                            # options
# OPTION_CHECK_EXT   = 1     # Extended check of such options
# SESSION_CHECK      = 1     # Check for disconnect in time etc.
# CPU_LOAD_CHECK     = 1     # Check with high parallel CPU load
# DISK_LOAD_CHECK    = 1     # Check with high parallel Disk load

################################################################################
#    innodb_wl6915          passed all enabled (<WHATEVER>_CHECK <> 0) tests
# ------------------------------------------------------------------
# Top level file     = ./suite/innodb/t/innodb_wl6915.test
# Current directory  =
/opt/projects/innodb/codebase/mysql-tree/mysql-trunk-wl6915/mysql-test
# MTR_BINDIR         =
/opt/projects/innodb/codebase/mysql-tree/mysql-trunk-wl6915/bld
# TREE_DIR_CHECK     = 1     # Check if the directories used are consistent
# GREP_CHECK         = 1     # Simple result check with grep
# EXTENDED_CHECK     = 1     # Extended check testcases
# REPEAT_CHECK       = 1     # The test must be repeatable
# ASSIGNED_DIR_CHECK = 1     # Check with explicit assigned MTR
#                            # options tmpdir and vardir.
# OPTION_CHECK       = 1     # Check with various server/mysqltest
#                            # options
# OPTION_CHECK_EXT   = 1     # Extended check of such options
# SESSION_CHECK      = 1     # Check for disconnect in time etc.
# CPU_LOAD_CHECK     = 1     # Check with high parallel CPU load
# DISK_LOAD_CHECK    = 1     # Check with high parallel Disk load

----------------------------------------------------------------------------


High Level Approach Description:
--------------------------------

- Each server instance has 128 rollback segment slots.
  A transaction is assigned a rollback segment which it uses for UNDO logging.
  (Segment is expanded to add more pages based on size of UNDO logging).

- The plan is to reserve few slots for non-redo rollback segments.
  UNDO logs written to these rollback segments will not generate any REDO logs.
  Given the constraint these rollback segments will be used only for
  objects that don't need to be restored on re-start e.g., temp-tables.

- Number of reserved Slots for non-redo rollback segments: 32
  Currently the non-redo rollback segments will reside in the temp-tablespace. 
Eventually other tablespace that satisfy restart condition could be used.

===============================================================================

- With 32 slots reserved for non-redo rsegs the number of slots available for 
transactions that write to normal tables will be reduced from 128 to 96. The 
impact of this change is explained below:

---------------------------------------------------------------------------

Fresh DB scenarios:

CASE-1: Default sceanrio (no undo-logs configured):
  - slot-0: rsegs reside in the system-tablespace. (redo rsegs)
  - slot-1 .... slot-N: rsegs reside in the temp-tablespace. (non-redo rsegs)
  - Slot-N+1 .... slot-127: rsegs reside in the system-tablespace (redo rsegs).

CASE-2: Undo logs = M (M = 1)
  - slot-0: rsegs reside in the system-tablespace. (redo rsegs)
  - slot-1 .... slot-N: rsegs reside in the  temp-tablespace. (non-redo rsegs)
  1st rollback segment is always created in system-tablespace and so with
  lower limit (M = 1) no extra rollback segment is created in undo-tablespace.
  This means all transactions will use system-tablespace residing rollback
  segment only.

CASE-3: Undo logs = M (1 < M <= N)
  - slot-0: rsegs reside in the system-tablespace. (redo rsegs)
  - slot-1 .... slot-N: rsegs reside in the temp-tablespace. (non-redo rsegs)
  (Only 1 slot remain active .... more generic case of above scenario.)

CASE-4: Undo logs = M (1 < N < M <= 128)
  - slot-0: rsegs bounded system-tablespace. (redo rsegs)
  - slot-1 .... slot-N: rsegs bounded to temp-tablespace. (non-redo rsegs)
  - slot-N+1 .... slot-M: rsegs bounded to undo-tablespace (redo rsegs)

----------------------------------------------------------------------------

Upgrade DB scenarios:
(Note: for upgrade clean-shutdown is must).

CASE-1: Default sceanrios (no undo-logs configured)
    Existing setup:
      - slots-0 .... slot-127: rsegs bounded to system-tablespace
    Upgraded setup:
      - CASE-1 from fresh db scenario

CASE-2: Undo logs = M (M = 1)
    Existing setup
      - slot-0: rsegs bounded to system-tablespace (redo rsegs)
    Upgraded setup:
      - CASE-2 from fresh db scenario

CASE-3: Undo logs = M (1 < M <= N)
    Existing setup:
      - slot-0: rsegs bounded to system-tablespace.
      - slot-1...slot-M: rsegs bounded to undo-tablespace.
    Upgraded setup:
      - CASE-3 from fresh db scenario

CASE-4: Undo logs = M (1 < N < M <= 128)
    Existing setup:
      - slot-0: rsegs bounded to system-tablespace.
      - slot-1....slot-M: rsegs bounded to undo tablespace.
    Upgraded setup:
     - CASE-4 from fresh db scenario

-------------------------------------------------------------------------

User is notified about active rollback segments and found rollback segments
using following messages printed during server startup.
Note: there was a generic message before that use to print total rollback
segment active which is now revised to print redo/non-redo rollback segment
and also info about real active rollback segment that would take part in undo
logging.

<snippet>
2013-04-09 09:51:11 26965 [Note] InnoDB: 96 redo rollback segment(s) found. 96
redo rollback segment(s) are active.
2013-04-09 09:51:11 26965 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
</snippet>

-------------------------------------------------------------------------

Currently each transaction is assigned a rollback segment that it uses for
undo logging.

With this new semantics each transaction will be assigned 2 rollback segments.
First comes from redo rollback pool and second one from non-redo
rollback pool. The non-redo rseg is assigned to a transaction only if the 
transaction updates a temporary table.

Read only transactions are not affected by this change except that their rollback 
segment is always a non-redo rollback segment because only temp-tables are allowed 
to be modified under this mode.

--------------------------------------------------------------------------

The split of the rsegs between redo logged and non-redo logged rsegs will reduce 
the upper limit, however it should not have an impact in the real world. For this 
impact to have an effect we would need 96K transactions writing to normal tables 
before we run out UNDO slots.

Maximum number of active transaction at given time will now be reduced
from 128 * 1024 = 128K to 128 - 32 = 96 * 1024 = 96K 
as one rollback segment can be assigned to 1024 transaction at any given time.

(Note: 96K is still a very high limit.)

96K: provided all of them are operating with non-temp-tables.
32K: if all of them are operating with temp-table.

-------------------------------------------------------------------------

Benchmarking experiement using sysbench to ensure there is no regression in
non-temp table case due to reduced undo log rollback segments.


		mysql-trunk	mysql-trunl-wl6915
threads

32		7209		7214

64		7430		7626

128		6719		6865

256		4652		4841

512		2031		2229

1024		698		764


Temporary table UNDO log changes
================================

The objective is to have a special UNDO segment for transactions that update
temporary tables, for both compressed and normal temporary tables. This
temporary table specific UNDO segment should not be REDO logged, this is to
reduce the cost of IO. The temporary tables will not be recovered during crash
recovery therefore we don't care about durability. However, we need the UNDO
segments to rollback to savepoint during normal transaction processing. This WL
proposes to use the temporary tablespace to store these special UNDO logs. The
temporary tablespace update operations are not REDO logged.


Assigning an additional rseg to trx_t
--------------------------------------

Each trx_t should have a rseg that can be used for undo logging of temporary
tables. There are other 3 variables related to UNDO logging that are anchored in

trx_t trx_rseg_t* rseg;
rollback segment assigned to thetransaction, or NULL if not assigned yet

trx_undo_t* insert_undo; 
pointer to the insert undo log, or NULL if no inserts performed yet

trx_undo_t* update_undo;	pointer to the update undo log or NULL if not updates
performed yet

In the new design we will encapsulate these state variables in a new structure.

struct trx_undo_ptr_t {

fields from above;

};

For now there are 2 rollback segments per transaction but in order to keep
design generic will encapsulate it in single structure and will maintain its
instance in trx_t struct.

struct trx_t{

.... existing member variables ....

trx_rsegs_t rsegs; /* rollback segments for undo logging */

};

struct trx_rsegs_t {
/** undo log ptr bounded to system/undo tablespace used for undo logging
of all objects those needs to be recovered on crash. */
trx_undo_ptr_t m_redo;


/** undo log ptr bounded to temp tablespace used for undo logging of
objects only that don't need to be recovered on crash.
Can be NULL if transaction doesn't modify any such table that doesn't
need to be recovered. */
trx_undo_ptr_t m_noredo;
};

Important note: roll_limit that points to least undo-no up to which we can
rollback should be updated irrespective of which rseg is being used.

Transaction is assigned redo rollback segment by default (unless it is read-only
transaction). Non-Redo rollback segment is assigned only if transaction involves
modification of temp-tables.

trx_undo_report_row_operation() function is used for undo logging. Modify
function to add check for assignment of non-redo rollback segment. Also,
explicit turn-off REDO logging if transaction involves use of temp-table.
allocation of non-redo rollback segments

trx_sys is central structure that holds references to all the active rollback
segments. On start, based on configuration N rollback segments are created (open
in case server is not running in boot-strapped mode). In default configuration,
all of 128 rollback segments resides in system-tablespace.

Idea is to reserve 32 slots from these 128 slots (note: we are not increasing
number of slots) for non-redo rollback segments and keep remaining slots open
for redo logged rollback segments.

In boot-strap mode: Create 32 rollback segments residing in temp-tablespace
Skip use of these segments while creating redo rollback segments
irrespective of whether redo is being created in system or undo tablespace.

trx_sys_create_rsegs(): Creates extra rollback segments. Modify interface to
accept the reserved limit of non-redo tablespace and then create them before
creating other rollback segments. Existing logic will detect presence of these
rollback segments and will create only delta rollback segments.

trx_rseg_create()/trx_sysf_rseg_find_free(): Searches for free slots and creates
and assigns rollback segment under free slot. Logic of find free slots needs a
small modification, Why ? Non-Redo rollback segments are re-created on server
startup and so even though there is registered entry about it in sys-header
(created during boot-strapped) it needs to be ignored if slot is reserved for
non-redo rollback segment so that new rollback segment is created and
corresponding slot is updated accordingly.

Rest of the logic of creating rollback segment in specified space remains
unchanged (trx_rseg_mem_create).

At the end trx_sys will have N rollback segments with
slot-0: redo rollback segment residing in system tablespace
slot-1-32: non-redo rollback segment residing in temp-tablespace
slot-33-slot-127: redo rollback segment residing in system/undo tablespace.


Purging of rollback segments
----------------------------

Once transaction commit we don't need rollback segments and so can be purged
(unless there is other view using it in which case purge will not proceed).
Given that we now have 2 rollback segments per transaction, both needs to be
purged. Logically they are single entity and so they should be added as single
block to purge queue with same trx->no. Currently purge queue has facility to
accept only one rollback segment.

We plan to revise this interface and make it more generic so that purge can
accept array of rollback segments for given transaction.

rseg_queue_t is now replaced with more generic class name TrxUndoRsegs

class TrxUndoRsegs {

.... member function mainly getter/setter + constructor + comparator
based on trx_no ....

private:
        typedef std::vector<trx_rseg_t*> trx_rsegs_t;
private:
        /** Transaction number of a transaction of which rollback segments
        are part off. */
        trx_id_t                m_trx_no;

        /** Rollback segments of a transaction, scheduled for purge. */
        trx_rsegs_t             m_rsegs;
};

Iterator interface over TrxUndoRsegs to access rollback segments.

/**
Chooses the rollback segment with the smallest trx_no. */
struct TrxUndoRsegsIterator {

        /** Constructor */
        TrxUndoRsegsIterator(trx_purge_t* purge_sys);

        /**
        Sets the next rseg to purge in m_purge_sys.
        @return zip_size if log is for a compressed table, ULINT_UNDEFINED if
        no rollback segments to purge, 0 for non compressed tables. */
        ulint operator++(int);

private:
        // Disable copying
        TrxUndoRsegsIterator(const TrxUndoRsegsIterator&);
        TrxUndoRsegsIterator& operator=(const TrxUndoRsegsIterator&);

        /** The purge system pointer */
        trx_purge_t*                    m_purge_sys;

        /** The current element to process */
        TrxUndoRsegs                    m_trx_undo_rsegs;

        /** Track the current element in m_trx_undo_rseg */
        TrxUndoRsegs::iterator_t        m_iter;

        /** Sentinel value */
        static const TrxUndoRsegs       NullElement;
};

trx_purge_get_rseg_with_min_trx_id() is now replaced by operator++.

purge_sys now holds an iterator to TrxUndoRsegs. Once it is done
serving the assigned rollback segment it moves to next rollback segment
by incrementing the iterator.

Whenever purge_sys request for next rollback segment it is poped from
purge_queue and assigned to purge system through purge_sys->rseg.
If there are more than one rollback segment assigned to a transaction
next request for assigning rollback segment will be served by assigning
rollback segment from same transaction rollback segment vector.
(New element is not popped from Purge Queue).

In order to add redo and non-redo rollback segments as logical one block
we need to lock both the rollback segments at same time
(which is not possible given the dead-lock sync-rules) and so we have
added separate sync-level for redo mutex and non-redo mutex.
Sequence enforces acquisition of redo mutex first (as it is always present)
and then non-redo (optional if non-redo rollback segment is assigned).

#define SYNC_REDO_RSEG         600
#define SYNC_NOREDO_RSEG       599


Effect on rollback with multiple rollback segments
--------------------------------------------------

Rollback uses rollback-segments to perform the rollback. Currently it read
undo-record from assigned rollback segments and simply apply these undo record.
With multiple rollback segments, it needs to read undo records from both the
rollback segments and apply them. (On crash, non-redo rollback segments are not
restored and so not active during rollback which is fine given that objects
pointed by them are not restored too.)

Additional related changes
---------------------------

- Qualifying temp-tablespace as non-user tablespace:
Update function that classify tablespace as user or system based on space-id.

/** Determine if (i) is a user tablespace id or not. */
# define fil_is_user_tablespace_id(i) ((i) > srv_undo_tablespaces_open)