WL#7682: Optimizing temporary tables

Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

There are lot of cases where-in internal module demands a light weight and
ultra-fast tables for some quick intermediate operations.

Temporary tables help address this requirement but given that temporary tables
are also accessible externally by end-user all the semantics can't be relaxed.

In order to solve this problem and further improve the performance of temporary
tables we are introducing a sub-cass of temporary table named as intrinsic
tables. These intrinsic tables will inherit all the properties of temporary
tables like visbile only the connection that created it, auto-removal on closure
of connection, etc but besides that they would be further optimized for
performance given their use-cases.

TERMS IN USE:
=============

Temporary Table:
----------------
Temporary table is a normal table with following restricted semantics:
1. Temporary table visibility it limited to connection/session that created it.
2. Temporary table lifetime is bounded to connection/session lifetime
   (unless dropped explicitly).

External Temporary Table:
------------------------
Temporary table created by user using "CREATE TEMPORARY TABLE" statement
is classified as external temporary table. This table inherits all the
temporary table semantics. Transaction semantics for this table is same
as normal table and so rollback action is expected to work as it works
with normal table.

Intrinsic Temporary Table:
--------------------------
This is special kind temporary table meant for modules such as
such as Optimizer, FTS Query Processor, etc. mainly to stage data during
plan execution. User can create this table by setting special flag
and then using the same semantics of "CREATE TEMPORARY TABLE".

These tables don't support rollback given the usage scenario.
Besides this locking semantics for these tables has been relaxed.
Relaxing locking semantics is property of temporary table which is
currently implemented only for intrinsic table and might expanded
to other temporary tables in due-course (out of this WL scope.) 

=========================================================================

Functional requirement:

1. Introducing optimized version of temporary table through intrinsic tables.
   These tables will inherit properties of temporary table with some added
   semantics
   - No Undo Logging.
   - Relaxed latching.
   - No transactional semantics.
   - Atomicity is restricted at row level and not at statement level.
     (like MyISAM).
   - Operational even in read-only mode.
   - Optimized for performance.

Non-Functional requirement:

1. Introduction of these tables shouldn't affect or interfer with any other 
   table operation (including any change in semantics.)

2. There shouldn't be any performance regression.

3. Read-only mode is enabled only for intrinsic table. Other types of table will
   continue to respect read only mode as before.

APPROACH:
========

We need to optimize temporary table but given the usage semantics associated
with them it is not possible to turn-off some of the extra overhead. Instead
we introduce a special category of temporary table name as "intrinsic tables".
These tables will inherit properties of temporary table but will be optimized
based on usage for performance.

Let's understand the differences between intrinsic and normal temporary table.

Supported Operations:

1. DML: Insert/Update/Delete
2. Select
3. DDL: Create/Drop
4. DDL + DML operation: Supported on intrinsic table even in read-only mode.
   (This is not supported by noraml temporary table. Not for now at-least.)

Non-supported (but supported by temporary table)
1. DDL: Truncate, Alter
2. Transaction semantics. begin/commit/rollback doesn't have any affect.
3. Handling of duplicate semantics:
   Granularity/Atomicity of statement is limited to row level instead of
   statement.
   For example: If insert statement with multiple rows fails at nth row,
   n-1 rows will still successfully make it to the table.
4. Stats generated by same workload on intrinsic table and temporary table
   may differ as intrinsic table might choose to use different algorithm
   to complete the same operation. For example: UPDATE is never inplace.
5. Not visible through information_schema.innodb_temp_table_info or any
   other medium.
6. Intrinsic table can be non-compressed only. Attempt to create compressed
   intrinsic table will fallback to use normal temporary table semantics 
   ignoring intrinsic table setting.

Intrinsic table being sub-set of temporary table if operation is no supported
by generic temporary table it will not be supported by intrinsic table unless
specified.

INNODB CHANGE LIST:
==================

Let's list the changes that needs to be done.

- Introducing intrinsic table:
  Special type of temporary table meant for internal usage can be created
  externally by setting special flag (innodb_create_intrinsic) using
  "CREATE TEMPORARY TABLE" semantics.

- Turning off UNDO logging:
  In InnoDB, if a table is intrinsic we plan to turn-off UNDO
  logging to save on IO as there is no use-case to support rollback.

  Important:  When UNDO logging is not used, transactions cannot be used to
  group multiple DML statements together atomically. In addition, single SQL 
  statements that affect multiple records also do not succeed or fail 
  atomically.  However, each row which is INSERTED, UPDATED or DELETED is done
  atomically. For example, if one of the secondary key insertions of an
  inserted record had a duplicate key conflict, all previously successful index
  changes for that one record will be undone.
  
- Get rid of hidden columns for intrinsic temp-tables:
  - InnoDB adds hidden columns viz. ROW_ID, TRX_ID, ROLL_PTR to a record
    in-order to support uniqueness, mvcc and rollback.
    There is no need of ROLL_PTR as undo logging is turned-off on intrinsic tables.
  - ROW-ID and TRX_ID are retained to enforce unqiueness and to help create
    consistent view while processing UPDATE respectively.

- Turn-off checksum for the shared temporary tablespace:
  - Checksum can help us recover a corrupted tablespace but since temporary
    tables do not need to be recovered and the shared temporary tablespace is
    created new and empty each time the engine is started, the page checksum
    can be turned-off for the shared temporary tablespace.

- Optimizing intrinsic temporary table handler access:
  - All table handlers are cached in the InnoDB Dictionary which acts as a
    central storage. Handlers can be retrieved using table-id or table-name.
  - For intrinsic temporary tables that are really short-lived the overhead
    of adding such a table to central storage and then removing it is pretty
    costly in a multi-threaded environment. Also, given the scope of temporary
    tables, handler access is needed only to the thread that has created it.
  - With this requirement in place, handlers to such tables can be cached in
    a thread specific variable using table-name as a key to search the needed
    handler.
  
- ROW_ID/TRX_ID optimization:
  - If the clustered index is not explicitly defined on the table, InnoDB
    will auto-create one using a specially appended ROW_ID column.
  - Data for this ROW_ID column is fetched from a central ROW_ID generator
    which again can be costly operation given it is shared resource.
  - Ideally, a ROW_ID can be locally generated for a table but which might
    demand persisting one counter per table to a disk. For intrinsic temporary
    tables, there is no need of persistence so instead use localized ROW_ID
    that is cached only in memory and reset to 0 only restart.
    (Same semantics are inherited for TRX_ID).

- TABLE_ID and INDEX_ID Optimization:
  - A table_id is used as the key to search for a table-handler from InnoDB
    dictionary.  Since an intrinsic temporary table is not cached to the
    InnoDB dictionary there is no need to fetch it from central generator.
    So TABLE_ID can be locally assigned.
  - The same is true for INDEX_ID except that indexes within a table are
    assigned unique numbers sequentially starting from 1.

- Optimizer Cursor Interface:
  - Optimized cursor interface will directly interact with the InnoDB Storage 
    Structure, by-passing the InnoDB Graph, Locking and Transaction layer.
  - Besides simplifying the interface this helps optimize overhead associated
    with these layers.

- Relaxing locking semantics:
  - Given that intrinsic table (and for that matter temporary tables too) are not
    shared accross connections they operate in single threaded mode.
    Locking semantics for these tables can be relaxed at all the levels.
    In short, intrinsic table page just need to be pinned to keep it in memory
    but no need to latch it during modification.

- Double-write buffer:
  - Given that intrinsic table are short-lived double-write buffer is disabled
    for these tables to avoid merge and initial lookup overhead.
  (This change will be at tablespace level so externally created temporary 
   tables residing in shared temporary tablespace will have this turned off 
   too.)

ENABLING DDL+DML ops ON INTRINSIC TABLE IN READ ONLY MODE
=========================================================

Let us list down the steps involved in enabling DDL + DML operations
on intrinsic temporary tables in server read only mode.

1. Enabling Creation of a shared temporary tablespace in read-only mode:

   Shared temporary tablespace is re-created on each server startup.
   Currently in read-only mode this creation is supressed as no DDL or DML
   operations are expected. Moving forward we need to enable DDL + DML 
   operations on intrinsic temporary tables and these tables resides in a 
   shared temporary tablespace and so in read only mode creation of a 
   tablespace should be allowed.

   Let's list steps involved and highlight the needed changes:

   - Delete existing tablespace. (No Change).

   - Generate a space_id using dict_hdr_get_new_id() for a shared temporary
     tablespace. "dict_hdr_get_new_id()" API in normal flow will persist
     the updated space_id to the disk. In read-only mode this will not be needed
     as there will be no tablespace created in read-only mode that will survive
     restart. This effectively means there will be no write request created by
     this flow but still a valid space_id will be assigned.

   - Creating and Extending tablespace file.
     Creation of new file and extending it involves lot of os_file_xxxx 
     operations but most of them are blocked for read-only-mode as no write 
     operation is expected.
     
     (Ideally, read-only mode blocking should be done at higher level (logical
     level of DB) and physical level should not be aware of any such logic.)

     With the new added requirement all the low level operation will be enabled
     in read_only_mode and checks, if any, will be restricted to higher level.

     Also, there are some os_file_xxxx operations that try to set file open 
     modes based on read-only status for such operations read-only status or 
     mode will be passed from higher level as a parameter (instead using global 
     default).

   - Suppressing read-only checks in Tablespace.
     The Tablespace module also has checks to block creation of tablespace
     files if mode is read-only. This will be handled by setting a tablespace 
     configuration flag that indicate "allow-create-in-read-only" behavior.
     Flag will be set only for shared temporary tablespace so that creation of 
     it is not blocked in read-only. Other tablespace creation will be blocked 
     by these checks.

2. Enable creation of buffer flush thread.
   
   The Buffer Flush thread writes dirty blocks to disk. But this thread
   is not spawned in read-only mode since there is no dirty block generated.
   
   This thread will once again be enabled even in read-only mode because
   DDL+DMLs on intrinsic temporary tables will generate dirty pages that
   need to be written to the disk.  Various checks will be added to allow
   writes only to the shared temporary tablespace.

3. Enable write AIO thread in read only mode

   On systems that support asynchronous IO, the InnoDB IO framework will
   try to use it.  Currently in read-only mode, only READ AIO threads are
   created. This WL will enable creation of WRITE AIO threads to flush
   dirty pages that belong to the shared temporary tablespace.

4. Suppress read-only check for intrinsic temporary table at DDL/DML level

   There are checks that block execution of DDL/DML APIs if the server is
   running in read-only mode. These checks will be relaxed to allow intrinsic
   temporary tables.

5. Flushing log to disk

   There is no REDO log generated by any of the DDL or DML operations
   done on intrinsic temporary table and so there is no need to flush
   REDO log to the disk. Also, these tables don't survive restart so there
   is no need of the REDO log in case of a crash.

This section lists more detail and actual code changes used to achieve the
optimization listed above (in HLS).

==============================================================================

Intrinsic temporary tables:

- Intrinsic table are special type of temporary table and will be created
  using same syntax of "CREATE TEMPORARY TABLE", provided, a special flag
  innodb_create_intrinsic is set.
 
  Currently there is no flag/bit in InnoDB to capture intrinsic table.
  A new flag named DICT_TF2_INTRINSC will be added to capture this.
  
  Besides this, API interface to check if table is intrinsic will be added
  as follows:
  	@param[in]     table   table to check
	@return true if intrinsic table flag is set.
	bool dict_table_is_intrinsic(const dict_table_t* table)

==============================================================================

Turning off undo logging:

- InnoDB framework provide capability to turn-off undo logging. 
  Based on the condition, proper flag has to be set.
  Undo logging for intrinsic temporary tables during DDL is already
  turned-off as part of temporary table optimization.
  Undo logging for intrinsic temporary table during DML needs to be turned off.
  
  	if (dict_table_is_intrinsic(table)) {
		flags |= BTR_NO_UNDO_LOG_FLAG;

  (Undo logging can be turned off for all intrinsic tables as there is
   no use-case where-in intrinsic table needs a rollback action.
   Flags above refers to a operation flag passed to function and has nothing to
   do with table->flags/flags2.)

===============================================================================

Getting rid of hidden columns:

- For intrinsic temporary tables we plan of turn-off undo logging.
  This suggest we can eliminate appending of hidden column roll-ptr.

  InnoDB adds this special column while table is being created.
  Switch off addition of this special hidden column.
 
  if (!dict_table_is_intrinsic_temporary(table)) {
	dict_mem_table_add_col(DATA_ROLL_PTR ....)
  }

  Memory allocation and population of the column needs to be skipped too.

===============================================================================

Turn-off checksum for shared temporary tablespace:

- If the tablespace is corrupted then server has to shutdown and the
  tablespace has to be corrected using innodbchecksum tool.

  Shared temporary tablespace are removed on shutdown so even if is corrupted
  on shutdown it will be removed and re-created on re-start and so there is no 
  use-case to correct it out using checksum.

  Given the short-lifetime of the tables residing in this tablespace +
  given that most of these table hold derived data which can re-constructed
  if needed it is advisable to turn-off checksum and save on overhead.

  InnoDB already provide capability to turn-off checksum but at global level.
  Same check is enhanced to ignore/skip checksum if advised by caller.
  Caller can evaluate it based on space_id.

......

  Checksum validation happens while page is loaded into memory/buffer manager
  from cache through following interface. Check here needs to be suppressed.
        buf_page_is_corrupted()

ibool
buf_page_is_corrupted(
/*==================*/
        bool            check_lsn,      /*!< in: true if we need to check
                                        and complain about the LSN */
        const byte*     read_buf,       /*!< in: a database page */
        ulint           zip_size,       /*!< in: size of compressed page;
                                        0 for uncompressed pages */
        bool            skip_checksum   /*!< in: if true, skip checksum. */

.......

  Checksum is also updated when page is modified and being written to disk.
        buf_flush_init_for_writing()


void
buf_flush_init_for_writing(
/*=======================*/
        byte*   page,           /*!< in/out: page */
        void*   page_zip_,      /*!< in/out: compressed page, or NULL */
        lsn_t   newest_lsn,     /*!< in: newest modification lsn
                                to the page */
        bool    skip_checksum)  /*!< in: if true, disable/skip checksum. */

.......

  The check should be bypassed and a default magic value should be written
  in checksum field.

  Caller can evaluate skip_checksum using tablespace_id.
  Following new interface can help in this evaluation:
  
/********************************************************************//**
Check if checksum is enabled for the given space.
@param space_id        verify is checksum is enabled for given space_id.
@return true if checksum is disabled for given space. */
UNIV_INLINE
bool
fsp_is_checksum_enabled(
       ulint   space_id)

 ===============================================================================

- Optimizing intrinsic table handler access:
  - Intrinsic temporary tables are really short-lived and so adding them to 
    a central InnoDB dictionary can be costly given that dictionary is shared
    resource used for a lot of other operations.
    Also, intrinsic temporary tables are not accessible beyond connection scope
    that created it so there is no such need to maintain it in central
    dictionary.

  - Handler reference of such table can be cached in thread specific structure.

  - In MySQL-InnoDB enviornment, THD structure is thread specific and suite the
    needed requirement. Table handler can be created during create call and then
    can be added to THD using table-name as key.

  - Currently trx object associated with the connection is cached in THD.
    A new structure named innodb_private_t can collect all such variables
    that needs to be cached in THD. This new indirection will it scalable for
    furture need.

    typedef std::map    table_cache_t;
    class innodb_private_t {
    public:
        trx_t*          trx;            /*!< transaction handler. */
        table_cache_t*  open_tables;    /*!< handler of tables that are
                                        created or open but not added to
                                        InnoDB dictionary as they are
                                        session specific.
                                        Currently, limited to intrinsic
                                        temporary tables only. */
 
	void register_table_handler(table_name, table);
	dict_table_t* lookup_table_handler(table_name);
	void unregister_table_handler(table_name);
   };


   A reference to the structure will be created during first access
   and assigned to the thd_ha_data() ha_ptr.   Obtain the private handler of
InnoDB session specific data.
   @return reference to private handler */
   __attribute__((warn_unused_result, nonnull))
   static inline
   innodb_private_t*&
   thd_to_innodb_private( THD*    thd);

   This change also means for intrinsic temporary table case there is no need
   to acquire dict_sys mutex and so related assert needs to be relaxed

       ut_ad(mutex_own(&dict_sys->mutex)
             || dict_table_is_intrinsic_temporary(index->table));

===============================================================================

- ROW_ID/TRX_ID optimization:
  - For intrinsic temporary table row-id can be locally generated by maintaining
    a counter in dict_table_t. Only one thread can do insert to the table
    and so there is no need to protect this counter using mutex neither is 
    it persisted to disk as temporary table are recreated.

  Get table localized row-id and increment the row-id counter for next use.
  @return row-id. */
  UNIV_INLINE row_id_t dict_table_get_table_localized_row_id(table)

  Same can be done for TRX_ID. Instead of using central trx_id generator local
  TRX_ID can be generated and re-used.
  Get table localized trx-id and increment the trx-id counter for next use.
  @return trx-id. */
  UNIV_INLINE trx_id_t dict_table_get_table_localized_trx_id(table)

===============================================================================

- TABLE_ID and INDEX_ID optimization:
  - table-id for intrinsic temporary table can be locally generated and assigned
    a default value of ULINT_UNDEFINED as it will not be used in any of the 
    real-operation.

        if (!dict_table_is_intrinsic_temporary(table)) {
               dict_hdr_get_new_id(&table->id, NULL, NULL, table, false);
       } else {
               table->id = ULINT_UNDEFINED;
       }

  - index_id is needed to distinguish and search for indexes among given
    table object and so index_id should be unique within table.
    It can be simply assigned in consecutive fashion starting with 1.

       if (!dict_table_is_intrinsic_temporary(table)) {
               dict_hdr_get_new_id(NULL, &index->id, NULL, table, false);
       } else {
               /* Index are re-loaded in process of creation using id.
               If same-id is used for all indexes only first index will always
               be retrieved when expected is iterative return of all indexes*/
               if (UT_LIST_GET_LEN(table->indexes) > 0) {
                       index->id = UT_LIST_GET_LAST(table->indexes)->id + 1;
               } else {
                       index->id = 1;
               }
       }

===============================================================================

- New Cursor interface for intrinsic temporary table:
  - Intrinsic temporary table will bypass all the locking and transaction code
    using the new cursor interface. Interface will directly interact with low
    level data storage structure.

  - Given that undo will be switched off, insert failure in any of the
    index will result in rollback of that specific insert from all the indexes.
    Complete statement level rollback will not be done and so successfully
    inserted rows even if they are part of same statement will continue to exist.

INSERT interface:
----------------

If clustered index is auto-generated then row-id is appended to each record
which means record are coming in sorted fashion and so an optimized interface
to load sorted data can be used. This interface will avoid searching for position
instead will cache the last inserted position and will insert directly next to it.

+/***************************************************************//**
+This is a specialized function meant for direct insertion to
+auto-generated clustered index based on cached position from
+last successful insert. To be used when data is sorted.
+
+@param[in]     flags   undo logging and locking flags
+@param[in]     mode    BTR_MODIFY_LEAF or BTR_MODIFY_TREE.
+                       depending on whether we wish optimistic or
+                       pessimistic descent down the index tree
+@param[in/out] index   clustered index
+@param[in/out] entry   index entry to insert
+@param[in]     thr     query thread
+
+@return error code */
+
+dberr_t
+row_ins_sorted_clust_index_entry(
+       ulint           flags,
+       ulint           mode,
+       dict_index_t*   index,
+       ulint           n_uniq,
+       dtuple_t*       entry,
+       ulint           n_ext,
+       que_thr_t*      thr)

Function also avoid committing mtr unless there is pessimistic insert
that can cause split/change in tree structure.

DELETE/UPDATE interface:
------------------------

Given the usage priority of delete/update for now delete/update are kept,simple.
DELETE will cause the delete flag to set and UPDATE will be done in form of
DELETE followed by INSERT.
UPDATE action is done only for the index that is being affected by
the update statement. DELETE is applied to all the index records.
This simplicity facilitate explicit rollback action as roll-ptr is not maintained.

+/*********************************************************************//**
+Delete row from table (corresponding entries from all the indexes).
+Function will maintain cursor to the entries to invoke explicity rollback
+just incase update action following delete fails.
+
+@param[in]     node            update node carrying information to delete.
+@param[out]    delete_entries  vector of cursor to deleted entries.
+@param[in]     restore_delete  if true, then restore DELETE records by
+                               unmarking delete.
+@param[in]     update_index    bitmap indicating which all index needs to
+                               be updated.
+@return error code or DB_SUCCESS */
+static
+dberr_t
+row_delete_for_mysql_using_cursor(
+       const upd_node_t*       node,
+       cursors_t&              delete_entries,
+       bool                    restore_delete,
+       const index_update_t&   update_index)

+/*********************************************************************//**
+Does an update of a row for MySQL by inserting new entry with update values.
+@param[in]     node            update node carrying information to delete.
+@param[out]    delete_entries  vector of cursor to deleted entries.
+@param[in]     thr             thread handler
+@param[in]     update_index    bitmap indicating which all index needs to
+                               be updated.
+@return error code or DB_SUCCESS */
+static
+dberr_t
+row_update_for_mysql_using_cursor(
+       const upd_node_t*       node,
+       cursors_t&              delete_entries,
+       que_thr_t*              thr,
+       const index_update_t&   update_index)


SELECT interface:
----------------

Select interface will be optimized by avoiding locking overhead.
Instead it will direct open the cursor to the needed data-storage and will
traverse it as demanded. Given that intrinsic table are not shared there is no
need for persistent cursor positioning just maintaining the last traversed
record should be enough.

/********************************************************************//**
Searches for rows in the database using cursor. 
function is meant for temporary table that are not shared accross connection
and so lot of complexity is reduced especially locking and transaction related.
Cursor simply act as iterator over table.

@param buf     buffer for the fetched row in MySQL format
@param mode    search mode PAGE_CUR_L
@param prebuilt        prebuilt struct for the table handler; this contains
               the info to search_tuple, index; if search tuple contains 0
               field then we position the cursor at start or the end of index,
               depending on 'mode'
@param match_mode
               0 or ROW_SEL_EXACT or ROW_SEL_EXACT_PREFIX
@param direction
               0 or ROW_SEL_NEXT or ROW_SEL_PREV; Note: if this is != 0,
               then prebuilt must has a pcur with stored position! In opening
               of a cursor 'direction' should be 0. 
@return DB_SUCCESS or error code */
static
dberr_t
row_search_for_mysql_for_session_local_table(
       byte*           buf,
       ulint           mode,
       row_prebuilt_t* prebuilt,
       ulint           match_mode,
       ulint           direction)

===============================================================================

- DoubleWrite Buffer

  As per the current design there is facility to disable doublewrite buffer
  for complete server (all tablespaces) but we need a facility to disable it
  only for temporary tablespace.

  This can achieved by adding selective check in doublewrite buffer apis as follows:

  buf_dblwr_update(): Update double write buffer once IO request for block is
complete. Disable if block resides in temporary tablespace.

  buf_dblwr_flush_buffered_writes() copies the data to doublewrite and schedule 
  it for write. If block is not writen to double-write buffer then let the 
  normal block IO proceed. 

          if (buf_dblwr->first_free == 0) {

                /* Wake possible simulated aio thread as there could be
                system temporary tablespace pages active for flushing.
                Note: system temporary tablespace pages are not scheduled
                for doublewrite. */
                os_aio_simulated_wake_handler_threads();

Note: This will disable doublewrite buffer for complete temporary tablespace
which infact is good or something that we have been aiming since long.

===============================================================================

Misc:

Besides major changes listed above there are some minor changes for optimization:

1. Instead of creating a copy of record during insert directly insert the record
   on page.

2. If key is fixed length then avoid re-computing of rec_get_offsets. Re-use
   computed.

===============================================================================

Enabling support for DDL+DML operation on InnoDB intrinsic temporary table in
readonly mode.

Section below sketches the low level details of the changes that needs to be
done for this WL.

1. Enabling creation of shared temporary tablespace in read-only mode:
   
   - Disable persistence of space-id to disk in read-only mode by turning off
     LOGGING completely.

        } else if (disable_redo) {
                mtr_set_log_mode(&mtr,
                        (srv_read_only_mode ? MTR_LOG_NONE : MTR_LOG_NO_REDO));
        }

     MTR_LOG_NONE will avoid writing of the dirty pages to disk there by 
     blocking persistence of modified page to system tablespace (space-id = 0).

   - Some of the os_file_xxxx APIs takes decision based on srv_read_only mode.
     These APIs will continue to work as before except that read_only mode
     will now be passed from caller. This would result in following API changes:

     -# define os_file_create(key, name, create, purpose, type, success)     \
     +# define os_file_create(key, name, create, purpose, type, read_only,   \
     +                       success)  


     -# define os_file_create_simple(key, name, create, access, success)     \
     +# define os_file_create_simple(key, name, create, access,              \
     +               read_only, success)  

      # define os_file_create_simple_no_error_handling(                      \
     -               key, name, create_mode, access, success)                \ 
     +               key, name, create_mode, access, read_only, success)     \

      # define os_aio(type, mode, name, file, buf, offset,                   \
     -               n, message1, message2)                                  \
     +               n, read_only, message1, message2)                       \

   ----------

	+       bool            read_only_mode,
	+                               /*!< in: if true read only mode
	+                               checks are enforced. */


   - Removal of debug assert in some of the low level os_file_xxxx APIs that 
     checks for invocation in read-only mode.

     For example: os_file_write_func() has safety assert that will block its
     invocation in srv_read_only mode. As discussed before low level APIs 
     shouldn't be aware of srv_read_only mode instead this logic should be 
     limited at logical level only. Said that, if any such dependency exist it 
     should be removed.

   - Caller will evaluate the mode based on tablespace it is operating.
     For shared temporary tablespace, read_only_mode will passed as false
     and for all other tablespace read_only_mode will be evaluated using 
     srv_read_only mode.

     For example:

		        node->handle = os_file_create_simple_no_error_handling(
		                innodb_data_file_key, node->name, OS_FILE_OPEN,
	-                       OS_FILE_READ_ONLY, &success);
	+                       OS_FILE_READ_ONLY,
	+                       (space->id == srv_tmp_space.space_id())
	+                       ? false : srv_read_only_mode, &success);


   ....

  - Suppressing check for read-only in Tablespace object.

    Tablespace object too enforces read-only checks. Tablespace object is used
    at logical level and so these checks can be controlled for shared temporary
    tablespace by setting appropriate flags.

    class Tablespace {

        /**
        @return read only status for tablespace. */
        bool get_ignore_read_only()
        {
                return(m_ignore_read_only);
        }

        /** Set Ignore Read Only Status for tablespace.
        @param read_only_status read only status indicator */
        void set_ignore_read_only(bool read_only_status)
        {
                m_ignore_read_only = read_only_status;
        }
	
        /** Ignore server read only configuration for this tablespace. */
        bool            m_ignore_read_only;

    }

    Flag ignore_read_only will be set to true only for shared temporary 
    tablespace. Default value of flag will be false that means respect 
    read-only mode.

    Usage of flag would be something of this sort:

            file.m_handle = os_file_create(
                innodb_data_file_key, file.m_filename, file.m_open_flags,
                OS_FILE_NORMAL, OS_DATA_FILE,
                m_ignore_read_only ? false : srv_read_only_mode, &success);

2. Enable creation of buffer flush thread in read-only mode.

	-       if (!srv_read_only_mode) {
	-               os_thread_create(buf_flush_page_cleaner_thread, NULL, NULL);
	-       }
	+       /* Even in read-only mode there could be flush job generated by
	+       intrinsic temporary table operations. */
	+       os_thread_create(buf_flush_page_cleaner_thread, NULL, NULL);


  This is blocked given that in read-only mode no write request is expected but 
  now with these changes enabled we can expect write request to shared 
  temporary tablespace.

3. Enabling write aio framework in read-only mode:

   Again this is blocked as no write request is expected but needs to be 
   enabled which involves removal of conditional code that enable aio write 
   framework and accordingly adjusting the segment numbers.

+       /* Initialize write aio segment. */
+       os_aio_write_array = os_aio_array_create(
+               n_write_segs * n_per_seg, n_write_segs);
+
+       if (os_aio_write_array == NULL) {
+               return(false);
+       }
+
+       for (ulint i = n_segments; i < (n_write_segs + n_segments); ++i) {
+               ut_a(i < SRV_MAX_N_IO_THREADS);
+               srv_io_thread_function[i] = "write thread";
+       }
+
+       n_segments += n_write_segs;
+

---------------


-               ut_ad(!srv_read_only_mode);
                ut_a(array == os_aio_write_array);

                seg_len = os_aio_write_array->n_slots
                        / os_aio_write_array->n_segments;

-               segment = os_aio_read_array->n_segments + 2
-                       + slot->pos / seg_len;
+               segment = os_aio_read_array->n_segments
+                         + (srv_read_only_mode ? 0 : 2) + slot->pos / seg_len;
        }

4. Relaxing check that block invocation of DDL/DML APIs in read-only mode.

   - Check will be relaxed by "oring" them with type of table.

        /* Step-1: Validation checks before we commence write_row operation. */
-       if (srv_read_only_mode) {
+       if (srv_read_only_mode
+           && !dict_table_is_intrinsic_temporary(prebuilt->table)) {
                ib_senderrf(ha_thd(), IB_LOG_LEVEL_WARN, ER_READ_ONLY_MODE);
                DBUG_RETURN(HA_ERR_TABLE_READONLY);



   This will allow execution of APIs in read-only mode if table is intrinsic
   temporary.

  Also, avoid explicit call that would flush REDO log to disk.
 

+       if (!srv_read_only_mode && !is_intrinsic_temp_table) { 
+               log_buffer_flush_to_disk();
+       }

  Call will be invoked only if read-only is false and table is no intrinsic  
  temporary. Even in non-read-only mode intrinsic temporary don't generate REDO 
  log and so optimization would be helpful.