WL#6469: InnoDB: Improve CREATE/DROP performance for temporary tables.

Status: Complete   —   Priority: Medium


Who will be benefited ?
- MySQL InnoDB user community that uses InnoDB temp-tables.

How can we achieve it ?

Temp-table is a special case with lifetime bounded to connection lifetime 
or server lifetime + limited visibility (only visible to trxs executing 
on the connection that created it).

These 2 features will help us limit lot of actions that we have to execute
for general-tables (non-temp-tables) especially logging.

Validation of optimizaton ?
sql-bench (mysql native performance utility) can be used to validate if
optimization has really helped. test-create test-case of this utility try to 
create/drop tons of table (load is configurable) using various scenario.
(Note: sql-bench by default will create/drop non-temp-table. Just modify
server-cfg (a single line change) to enable create/drop of temp-table). 

Optimize performance of InnoDB temp-table.

There are 2 parts to this problem viz.
 - DDL optimization (this worklog)
 - DML optimization (WL#6470)

DDL statements that will be affected:
   * DROP
   * ALTER TABLE ...

Temporary tables are only visible in the connection/session in which they are 
created. They are also bound by the lifetime of the server. They are discarded
on startup.

For DDL changes we will not make the table definitions persistent.
This will eliminate all IO during the table create/drop phase.


- Temp-table definitions will not be written to SYS_XXXX tables.
  This will limit visibility of temp-tables through SYS_XXXX tables.
  Parallely we have added I_S.INNODB_TEMP_TABLE_INFO table to view
  all temp-tables active in system when query is fired.

- Number of Temp-Tables will be limited by the available 
  memory on the system running the server process. This is because their
  definition cannot be put on the LRU list and swapped out. This memory
  is only needed for the table definitions.

- As temp-tables information is not logged to SYS_XXXX table while
  recovering from crash it is not possible to identify orphaned temp-table
  objects left over by crash. This issue is addressed by WL#6560.

- Insert Buffering is disabled for temp-table given the overhead vs its usage
  in temp-table sceanrio.


- Important Note from documenation perspective:
  - temp-tablespace default named as ibtmp1 will be located in data-dir.
    (parallel to ibdata1). (as against tmpdir).
  - user can change location of temp-tablespace using relative path syntax
    while specifying innodb_temp_data_file_path (Same as innodb_data_file_path)
DDL optimization is made up off optimizing multiple DDL
stmts viz. CREATE/DROP/TRUNCATE/.... etc.

We will go over each stmt and will see general actions that
are performed in each stmt. That will help us markout actions
that can be avoided for temp-tables.

For some of these that we need for temp-table we may direct
inerface for some we might need to develop a new one.



- creation of table (temp + non-temp) involves following steps:
  - create in-memory table (dict_table_t) object along with fields.
  - create and assign innodb wide unique table-id.
  - create tablespace (per-table tablespace if configured).
  - create and insert entry to SYS_TABLES and SYS_FIELDS.
  - add in-memory object to dictionary cache.

  - create default pk if user hasn't specified one.
  - create all indexes as specified by user. this involves:
    - create in-memory index (dict_index_t) object.
    - create and assign innodb wide unique index-id.
    - create and insert entry to SYS_INDEXES/SYS_FIELDS table.
    - add in-memory object to dictionary cache.
    - create b-tree object for storing index data and update
      SYS_XXXX table accordingly. Importantly root-page-no.
      (root-page-no is also available in index->page_no 
      in-memory structure).

- looking at above steps from temp-table perspective we can 
  reduce it to following:

  - create in-memory table object along with fields
  - create and assign table-id
  - add in-memory object to dictionary cache.
  - create indexes (in-memory objects) and related b-tree object.

  We are skipping adding of entries to SYS_XXXX tables as it 
  further results in IO + logging actions.

  Temp-table can be maintained in memory given their lifetime is 
  bounded by connection or server lifetime.

  While this limit their view through SYS_XXXX tables which is 
  just an informative view and is not needed for correctness.

  Besides this even from existing action we do avoid logging 
  for temp-table cases viz. during generation of table-id.

  As temp-table is not persistent to disk (in form of SYS_XXXX)
  tables we need to ensure temp-table object are not evicted
  from the memory and so added to non-LRU list.

interface changes:

Normal create-path will create and execute create-ins-graph-plan.
Given temp-table has reduced actions we can directly invoke 
these actions from create_table_def()/create_index()
This decision demands addition or modification of interfaces
so that they can be invoked directly from create_XXXX() level.

- build_tablespace():
  Add build_tablespace interface. This interface will just
  create tablespace, if configured. Currently this action is 
  embbeded within function that also update SYS_XXXX and so
  we segregated this as independent interface. 
  Temp-Table path will just invoke this interface.

- dict_build_index_def():
  Update index information viz. index-id, space-id to in-memory

- dict_create_index_tree():
  Create B-tree and update root-page-no to in-memory index object.
  dict_create_index_tree_step() also update root-page-no to 
  SYS_XXXX table.



- dropping of table (temp + non-temp) involves following:
  - Remove entries from all SYS_XXXX tables.
  - drop indexes. this include freeing btree nodes too.
  - remove in-memory object from dictionary cache.
  - delete tablespace.

For temp-table we just need to execute following actions:
  - drop indexes.
  - remove in-memory object.
  - delete tablespace.

  Dropping of indexes involves freeing of btree-node which is logged
  action. We avoid logging this action as we haven't logged creation
  of btree. For triggering this action we importantly need root-page-no.
  This is still accessible from dict_index_t object (in general case
  it is loaded from SYS_XXXX table) as temp-table is created in same 

  Post these 2 actions we remove the per-table tablespace file if

interface change:

- dict_drop_index_tree():  
  drop index tree (free btree-nodes). 
  There is no corresponding SYS_XXXX table entry to update.

- existing interface named dict_drop_index_tree() has been renamed to 
  dict_drop_index_tree_step(). This naming is inline with other naming
  convention as this is just a step as part of drop-plan.



- truncate of table involves following:
  - drop and recreate the indexes. this will ensure clean 0 entries.
  - assign and update new table-id.
  - if re-create space, is configured then re-create the space and 
    update SYS_XXXX tables.

 For temp-table same actions are present except way of getting page_no
 is different. for temp-table we get it directly from in-memory index
 object. For general table it is loaded by querying SYS_XXXX table.

interface change:

- dict_truncate_index_tree():
  drop index tree (free btree nodes) and re-create new index tree using
  same index-id. Update index with new root-page-no.

- existing interface named dict_truncate_index_tree() has been renamed to 
  dict_truncate_index_tree_step(). This naming is inline with other naming
  convention as this is just a step as part of truncate-plan.



- discard tablespace will de-associate per-table tablespace from table.
  - remove insert buffering reserved buffers.
  - assign undo segment.
  - update SYS_XXXX table to reflect discard status.
  - update SYS_XXXX table to reflect table/index id. 
  - generate and assign new table-id.
  - remove tablespace.

- for temp-table:
  - for temp-table discard/import of tablespace is blocked.



- Logging is are of 2 types: undo/redo.

UNDO logging:

  undo logging is done to undo/rollback segment allotted within
  shared tablespace.
  For DDL stmt there is minimal or no undo logging.
  If present, we have avoided that by using separate path for temp-table.

REDO logging:

  REDO logging is done to redo-log-files created in data-dir during 
  server start-up.

  REDO-logging is done on each action (especially during create or 
  drop of objects).
  Blocking redo logging at logging level when commit is being 
  done is difficult as it is system-wide operation and so 
  difficult to determine which log-stmt corresonds to which object.
  Instead we do block redo-logging at logical level viz. during 
  creation of objects.

  For example: dict_hdr_get_new_id() 
  Generation of new table/index id is logged. For temp-table
  we avoid this logging. Logging is controlled through a mini-trx (mtr).
  mtr has facility to select logging level.
  std logging level: 
  with default set to MTR_LOG_ALL.

  for temp-table we select MTR_LOG_NONE.



- dict_table_t (table) object set flag for temporary table.
  We can test this flag using dict_table_is_temporary().
  flags2 & DICT_TF2_TEMPORARY.



- dict_hdr_get_new_id(): generates table/index/space id.
  We now pass-in table object so that function can determine if 
  action is for temp-table and accordingly turn-off logging.