WL#5522: InnoDB: Transportable Single-Table Tablespaces

Status: Complete

Bug#11746326 25491: DETACH AND ATTACH INNODB TABLESPACE FILES.

Allow users to 'cold' export a table from a running MySQL server instance to
another running instance. It can also be used to do a full table restore on the
same instance. For this we will need to extend the SQL server syntax and 
introduce:

   FLUSH TABLE[S] t1,...,tn FOR EXPORT;


The above SQL clause will lock table t[1-n] in SHARED mode and block
transactions that attempt to update the table. It will however allow read-only
operations to continue. When control returns from the above statement to the
user, the table's dirty pages should all be synced to disk with any change
buffer entries merged prior to sync. The objective is to make the tablespace
transactionally consistent on disk. The quiesce (export) process will also
create a binary file that contains the meta-data required for importing the 
table. The naming convention of the meta-data file is illustrated by the 
following the use case:

   use test;
   CREATE TABLE t(c1 INT) engin=InnoDB;
   FLUSH TABLE t FOR EXPORT; -- quiesce the table and create the meta data file
   UNLOCK TABLES;

   $innodb_data_home_dir/test/t.cfg

This will be created in the same directory that the target tablespace (.ibd)
resides in. The contents of any previous .cfg file will be overwritten. The
format of this file is described in the HLS.

The user must copy the .ibd files and the .cfg file(s) to the destination
directory before releasing the SHARED locks.

After copying the .ibd file(s) the user must release the SHARED locks on the
table(s) using the following syntax:

  UNLOCK TABLES;

When there are foreign key constraints between tables, users should export all
tables at the same (logical) point of time. Stop updating any of the tables,
commit all transactions related to them, acquire S locks on the tables and
export the tables. In that way, the constraints will hold also in the receiving
InnoDB instance.

To IMPORT the tablespace, the user must create the table on the receiving
instance if it doesn't already exist and it is currently the users
responsibility to ensure that the schema matches that of the importing
tablespace. Before a tablespace can be imported InnoDB has to DISCARD the
tablespace that is attached to the table. The syntax for this operation is:

   ALTER TABLE t DISCARD TABLESPACE;

The user must stop all updates on the tables, prior to the IMPORT. For tables
that are in a foreign key relationship, the updates to all the tables in the
group must be stopped.

After detaching/discarding the tablesapce the user must initate the IMPORT using
the following syntax:

  ALTER TABLE t IMPORT TABLESPACE;

After the above operation completes, the table(s) should be ready for use as 
normal.
Transportable Tablespaces aka Bug#11746326 25491: DETACH AND ATTACH INNODB
TABLESPACE FILES.

Requirements

    * Must be independent of size of table(s)
    * Must execute in time proportional to copying files at os level
    * Must support partitioned tables
    * Must be transactionally consistent
    * Must be referentially consistent [1]

Acceptable Limitations
    * Only possible when innodb_file_per_table=ON
    * Cannot quiesce tables that reside in the system tablespace
    * Will require quiesce (new read-only transactions allowed on
      affected table(s))
    * May require that database to which the tables(s) is (are) being
      imported do not have tables with the same name(s)
    * Cannot import tablespace with a page size that doesn't match the
configured page size of the importing instance
    * Will not work with partitions, requires changes to MySQL code:
          mysql> alter table part discard tablespace;
          ERROR 1031 (HY000): Table storage engine for 'part' doesn't have this
option

Usage Scenario

    * Exporting instance:
         1.  FLUSH TABLES t1,...,tn FOR EXPORT;
         2.  Copy the output of SHOW CREATE TABLE t[1-n];
         3.  Copy t[1-n].ibd
         4.  Copy t[1-n].cfg - the table meta data file
         4.  UNLOCK TABLES;

    * Importing instance:
         1.  Replay the SHOW CREATE TABLE t[1-n] dump if table doesn't exist on
target.
         2.  ALTER TABLE t[1-n] DISCARD TABLESPACE;
         3.  Move the discarded tablespace t1.ibt, ..., tn.ibt out of the 
directory
         4.  Move the copied t1.ibd,...,tn.ibd file(s) in place
         5.  Move the copied t1.cfg,...,tn.cfg file(s) in place
         6.  ALTER TABLE t[1-n] IMPORT TABLESPACE;

FLUSH TABLES t1,.....,tn FOR EXPORT; will need to run synchronously and
quiesce the tablespaces for the relevant tables.

[1] When there are foreign key constraints between tables, users
    should export all related tables at the same (logical) point
    of time. The READ ONLY lock will prevent further updates of the
    tables. In that way, the constraints will hold also in the
    receiving InnoDB instance.

There are potential replication related issues that will need to be tested
thoroughly. We will need to document the limitations and possible work arounds
after testing.

Limitations:
  * No checking during import for foreign key relationships
  * Synchronous operation, ie. sync to disk will be done during "FLUSH TABLES 
...
" and if multiple files are specified then each file will be synced serially.
Exporting the tablespace
========================
 * Lock the table in SHARED mode
 * Stop purge coordinator thread
 * Merge change buffer entries (if any) for the table
 * Sync dirty pages to disk
 * Write the table meta-data to a binary .cfg file, see HLS for V1 format.
 * Release the shared LOCK when requested by the user

Discarding the tablespace
=========================
 * Lock the table in X mode
 * Detach the tablespace from the table
 * Free the pages used by the tablespace in the buffer pool
   o This can (and perhaps should) be avoided by always generating an new space
id when we DISCARD (or IMPORT) the table.
     - We can use the newer page invalidate algorithm
     - Gets rid of accidental UNDO
     - Gets rid of REDO issues during recovery
     - Gets rid of stale dblwr buffer copy problem
     - Gets rid of change buffer issues

Importing the tablespace
========================
    * Reset transaction identifiers DB_TRX_ID and the undo log DB_ROLL_PTR in
the cluster index
    * Update the system max next value for generated keys (DB_ROW_ID)
    * Tablespace identifiers
    * Index identifiers
    * Reset log sequence numbers (LSN) stamped on page headers
    * Update the index root page numbers for the indexes

Import algorithm
================
We scan the blocks in extents and modify individual blocks rather than using 
logical index structure.

foreach page in tablespace {
  1. Check each page for corruption.

  2. Update the space id and LSN on every page
     * For the header page
       - Validate the flags
       - Update the LSN

  3. On Btree pages
     * Set the index id
     * Update the max trx id
     * In a cluster index, update the system columns
     * In a cluster index, update the BLOB ptr, set the space id
     * Purge delete marked records, but only if they can be easily
       removed from the page
     * Keep a counter of number of rows, ie. non-delete-marked rows
     * Keep a counter of number of delete marked rows
     * Keep a counter of number of purge failure
     * If a page is stamped with an index id that isn't in the .cfg file
       we assume it is deleted and the page can be ignored.
     * We can't tell free pages from allocated paes (for now). Therefore
       the assumption is that the free pages are either empty or are logically
       consistent. TODO: Cache the extent bitmap and check free pages.

   4. Set the page state to dirty so that it will be written to disk.
}

If there were some delete marked records in the indexes that could not be purged 
optimistically then we make another pass over the index which has such records 
to purge them. This operation will use the buffer pool and traverse the indexes 
logically.

Crash Recovery During Import
============================

If the tablespace file that is being imported is in an inconsistent state, it
should be noticed by import-time consistency checks.

The page-level part of import (adjusting tablespace identifiers and log sequence
numbers) will be done outside the buffer pool. That is, the crash recovery 
should consider the tablespace file to be discarded. The IMPORT operation should 
be atomic, it is either imported or not. The only requirement is that the to be 
IMPORTed tablspace is transactionally consistent. If it was exported using  the 
new FLUSH TABLES ... FOR EXPORT syntax then it should be.