WL#7682: Optimizing temporary tables
Status: Complete
There are lot of cases where-in internal module demands a light weight and ultra-fast tables for some quick intermediate operations. Temporary tables help address this requirement but given that temporary tables are also accessible externally by end-user all the semantics can't be relaxed. In order to solve this problem and further improve the performance of temporary tables we are introducing a sub-cass of temporary table named as intrinsic tables. These intrinsic tables will inherit all the properties of temporary tables like visbile only the connection that created it, auto-removal on closure of connection, etc but besides that they would be further optimized for performance given their use-cases.
TERMS IN USE:
=============
Temporary Table:
----------------
Temporary table is a normal table with following restricted semantics:
1. Temporary table visibility it limited to connection/session that created it.
2. Temporary table lifetime is bounded to connection/session lifetime
(unless dropped explicitly).
External Temporary Table:
------------------------
Temporary table created by user using "CREATE TEMPORARY TABLE" statement
is classified as external temporary table. This table inherits all the
temporary table semantics. Transaction semantics for this table is same
as normal table and so rollback action is expected to work as it works
with normal table.
Intrinsic Temporary Table:
--------------------------
This is special kind temporary table meant for modules such as
such as Optimizer, FTS Query Processor, etc. mainly to stage data during
plan execution. User can create this table by setting special flag
and then using the same semantics of "CREATE TEMPORARY TABLE".
These tables don't support rollback given the usage scenario.
Besides this locking semantics for these tables has been relaxed.
Relaxing locking semantics is property of temporary table which is
currently implemented only for intrinsic table and might expanded
to other temporary tables in due-course (out of this WL scope.)
=========================================================================
Functional requirement:
1. Introducing optimized version of temporary table through intrinsic tables.
These tables will inherit properties of temporary table with some added
semantics
- No Undo Logging.
- Relaxed latching.
- No transactional semantics.
- Atomicity is restricted at row level and not at statement level.
(like MyISAM).
- Operational even in read-only mode.
- Optimized for performance.
Non-Functional requirement:
1. Introduction of these tables shouldn't affect or interfer with any other
table operation (including any change in semantics.)
2. There shouldn't be any performance regression.
3. Read-only mode is enabled only for intrinsic table. Other types of table will
continue to respect read only mode as before.
APPROACH:
========
We need to optimize temporary table but given the usage semantics associated
with them it is not possible to turn-off some of the extra overhead. Instead
we introduce a special category of temporary table name as "intrinsic tables".
These tables will inherit properties of temporary table but will be optimized
based on usage for performance.
Let's understand the differences between intrinsic and normal temporary table.
Supported Operations:
1. DML: Insert/Update/Delete
2. Select
3. DDL: Create/Drop
4. DDL + DML operation: Supported on intrinsic table even in read-only mode.
(This is not supported by noraml temporary table. Not for now at-least.)
Non-supported (but supported by temporary table)
1. DDL: Truncate, Alter
2. Transaction semantics. begin/commit/rollback doesn't have any affect.
3. Handling of duplicate semantics:
Granularity/Atomicity of statement is limited to row level instead of
statement.
For example: If insert statement with multiple rows fails at nth row,
n-1 rows will still successfully make it to the table.
4. Stats generated by same workload on intrinsic table and temporary table
may differ as intrinsic table might choose to use different algorithm
to complete the same operation. For example: UPDATE is never inplace.
5. Not visible through information_schema.innodb_temp_table_info or any
other medium.
6. Intrinsic table can be non-compressed only. Attempt to create compressed
intrinsic table will fallback to use normal temporary table semantics
ignoring intrinsic table setting.
Intrinsic table being sub-set of temporary table if operation is no supported
by generic temporary table it will not be supported by intrinsic table unless
specified.
INNODB CHANGE LIST:
==================
Let's list the changes that needs to be done.
- Introducing intrinsic table:
Special type of temporary table meant for internal usage can be created
externally by setting special flag (innodb_create_intrinsic) using
"CREATE TEMPORARY TABLE" semantics.
- Turning off UNDO logging:
In InnoDB, if a table is intrinsic we plan to turn-off UNDO
logging to save on IO as there is no use-case to support rollback.
Important: When UNDO logging is not used, transactions cannot be used to
group multiple DML statements together atomically. In addition, single SQL
statements that affect multiple records also do not succeed or fail
atomically. However, each row which is INSERTED, UPDATED or DELETED is done
atomically. For example, if one of the secondary key insertions of an
inserted record had a duplicate key conflict, all previously successful index
changes for that one record will be undone.
- Get rid of hidden columns for intrinsic temp-tables:
- InnoDB adds hidden columns viz. ROW_ID, TRX_ID, ROLL_PTR to a record
in-order to support uniqueness, mvcc and rollback.
There is no need of ROLL_PTR as undo logging is turned-off on intrinsic tables.
- ROW-ID and TRX_ID are retained to enforce unqiueness and to help create
consistent view while processing UPDATE respectively.
- Turn-off checksum for the shared temporary tablespace:
- Checksum can help us recover a corrupted tablespace but since temporary
tables do not need to be recovered and the shared temporary tablespace is
created new and empty each time the engine is started, the page checksum
can be turned-off for the shared temporary tablespace.
- Optimizing intrinsic temporary table handler access:
- All table handlers are cached in the InnoDB Dictionary which acts as a
central storage. Handlers can be retrieved using table-id or table-name.
- For intrinsic temporary tables that are really short-lived the overhead
of adding such a table to central storage and then removing it is pretty
costly in a multi-threaded environment. Also, given the scope of temporary
tables, handler access is needed only to the thread that has created it.
- With this requirement in place, handlers to such tables can be cached in
a thread specific variable using table-name as a key to search the needed
handler.
- ROW_ID/TRX_ID optimization:
- If the clustered index is not explicitly defined on the table, InnoDB
will auto-create one using a specially appended ROW_ID column.
- Data for this ROW_ID column is fetched from a central ROW_ID generator
which again can be costly operation given it is shared resource.
- Ideally, a ROW_ID can be locally generated for a table but which might
demand persisting one counter per table to a disk. For intrinsic temporary
tables, there is no need of persistence so instead use localized ROW_ID
that is cached only in memory and reset to 0 only restart.
(Same semantics are inherited for TRX_ID).
- TABLE_ID and INDEX_ID Optimization:
- A table_id is used as the key to search for a table-handler from InnoDB
dictionary. Since an intrinsic temporary table is not cached to the
InnoDB dictionary there is no need to fetch it from central generator.
So TABLE_ID can be locally assigned.
- The same is true for INDEX_ID except that indexes within a table are
assigned unique numbers sequentially starting from 1.
- Optimizer Cursor Interface:
- Optimized cursor interface will directly interact with the InnoDB Storage
Structure, by-passing the InnoDB Graph, Locking and Transaction layer.
- Besides simplifying the interface this helps optimize overhead associated
with these layers.
- Relaxing locking semantics:
- Given that intrinsic table (and for that matter temporary tables too) are not
shared accross connections they operate in single threaded mode.
Locking semantics for these tables can be relaxed at all the levels.
In short, intrinsic table page just need to be pinned to keep it in memory
but no need to latch it during modification.
- Double-write buffer:
- Given that intrinsic table are short-lived double-write buffer is disabled
for these tables to avoid merge and initial lookup overhead.
(This change will be at tablespace level so externally created temporary
tables residing in shared temporary tablespace will have this turned off
too.)
ENABLING DDL+DML ops ON INTRINSIC TABLE IN READ ONLY MODE
=========================================================
Let us list down the steps involved in enabling DDL + DML operations
on intrinsic temporary tables in server read only mode.
1. Enabling Creation of a shared temporary tablespace in read-only mode:
Shared temporary tablespace is re-created on each server startup.
Currently in read-only mode this creation is supressed as no DDL or DML
operations are expected. Moving forward we need to enable DDL + DML
operations on intrinsic temporary tables and these tables resides in a
shared temporary tablespace and so in read only mode creation of a
tablespace should be allowed.
Let's list steps involved and highlight the needed changes:
- Delete existing tablespace. (No Change).
- Generate a space_id using dict_hdr_get_new_id() for a shared temporary
tablespace. "dict_hdr_get_new_id()" API in normal flow will persist
the updated space_id to the disk. In read-only mode this will not be needed
as there will be no tablespace created in read-only mode that will survive
restart. This effectively means there will be no write request created by
this flow but still a valid space_id will be assigned.
- Creating and Extending tablespace file.
Creation of new file and extending it involves lot of os_file_xxxx
operations but most of them are blocked for read-only-mode as no write
operation is expected.
(Ideally, read-only mode blocking should be done at higher level (logical
level of DB) and physical level should not be aware of any such logic.)
With the new added requirement all the low level operation will be enabled
in read_only_mode and checks, if any, will be restricted to higher level.
Also, there are some os_file_xxxx operations that try to set file open
modes based on read-only status for such operations read-only status or
mode will be passed from higher level as a parameter (instead using global
default).
- Suppressing read-only checks in Tablespace.
The Tablespace module also has checks to block creation of tablespace
files if mode is read-only. This will be handled by setting a tablespace
configuration flag that indicate "allow-create-in-read-only" behavior.
Flag will be set only for shared temporary tablespace so that creation of
it is not blocked in read-only. Other tablespace creation will be blocked
by these checks.
2. Enable creation of buffer flush thread.
The Buffer Flush thread writes dirty blocks to disk. But this thread
is not spawned in read-only mode since there is no dirty block generated.
This thread will once again be enabled even in read-only mode because
DDL+DMLs on intrinsic temporary tables will generate dirty pages that
need to be written to the disk. Various checks will be added to allow
writes only to the shared temporary tablespace.
3. Enable write AIO thread in read only mode
On systems that support asynchronous IO, the InnoDB IO framework will
try to use it. Currently in read-only mode, only READ AIO threads are
created. This WL will enable creation of WRITE AIO threads to flush
dirty pages that belong to the shared temporary tablespace.
4. Suppress read-only check for intrinsic temporary table at DDL/DML level
There are checks that block execution of DDL/DML APIs if the server is
running in read-only mode. These checks will be relaxed to allow intrinsic
temporary tables.
5. Flushing log to disk
There is no REDO log generated by any of the DDL or DML operations
done on intrinsic temporary table and so there is no need to flush
REDO log to the disk. Also, these tables don't survive restart so there
is no need of the REDO log in case of a crash.
This section lists more detail and actual code changes used to achieve the
optimization listed above (in HLS).
==============================================================================
Intrinsic temporary tables:
- Intrinsic table are special type of temporary table and will be created
using same syntax of "CREATE TEMPORARY TABLE", provided, a special flag
innodb_create_intrinsic is set.
Currently there is no flag/bit in InnoDB to capture intrinsic table.
A new flag named DICT_TF2_INTRINSC will be added to capture this.
Besides this, API interface to check if table is intrinsic will be added
as follows:
@param[in] table table to check
@return true if intrinsic table flag is set.
bool dict_table_is_intrinsic(const dict_table_t* table)
==============================================================================
Turning off undo logging:
- InnoDB framework provide capability to turn-off undo logging.
Based on the condition, proper flag has to be set.
Undo logging for intrinsic temporary tables during DDL is already
turned-off as part of temporary table optimization.
Undo logging for intrinsic temporary table during DML needs to be turned off.
if (dict_table_is_intrinsic(table)) {
flags |= BTR_NO_UNDO_LOG_FLAG;
(Undo logging can be turned off for all intrinsic tables as there is
no use-case where-in intrinsic table needs a rollback action.
Flags above refers to a operation flag passed to function and has nothing to
do with table->flags/flags2.)
===============================================================================
Getting rid of hidden columns:
- For intrinsic temporary tables we plan of turn-off undo logging.
This suggest we can eliminate appending of hidden column roll-ptr.
InnoDB adds this special column while table is being created.
Switch off addition of this special hidden column.
if (!dict_table_is_intrinsic_temporary(table)) {
dict_mem_table_add_col(DATA_ROLL_PTR ....)
}
Memory allocation and population of the column needs to be skipped too.
===============================================================================
Turn-off checksum for shared temporary tablespace:
- If the tablespace is corrupted then server has to shutdown and the
tablespace has to be corrected using innodbchecksum tool.
Shared temporary tablespace are removed on shutdown so even if is corrupted
on shutdown it will be removed and re-created on re-start and so there is no
use-case to correct it out using checksum.
Given the short-lifetime of the tables residing in this tablespace +
given that most of these table hold derived data which can re-constructed
if needed it is advisable to turn-off checksum and save on overhead.
InnoDB already provide capability to turn-off checksum but at global level.
Same check is enhanced to ignore/skip checksum if advised by caller.
Caller can evaluate it based on space_id.
......
Checksum validation happens while page is loaded into memory/buffer manager
from cache through following interface. Check here needs to be suppressed.
buf_page_is_corrupted()
ibool
buf_page_is_corrupted(
/*==================*/
bool check_lsn, /*!< in: true if we need to check
and complain about the LSN */
const byte* read_buf, /*!< in: a database page */
ulint zip_size, /*!< in: size of compressed page;
0 for uncompressed pages */
bool skip_checksum /*!< in: if true, skip checksum. */
.......
Checksum is also updated when page is modified and being written to disk.
buf_flush_init_for_writing()
void
buf_flush_init_for_writing(
/*=======================*/
byte* page, /*!< in/out: page */
void* page_zip_, /*!< in/out: compressed page, or NULL */
lsn_t newest_lsn, /*!< in: newest modification lsn
to the page */
bool skip_checksum) /*!< in: if true, disable/skip checksum. */
.......
The check should be bypassed and a default magic value should be written
in checksum field.
Caller can evaluate skip_checksum using tablespace_id.
Following new interface can help in this evaluation:
/********************************************************************//**
Check if checksum is enabled for the given space.
@param space_id verify is checksum is enabled for given space_id.
@return true if checksum is disabled for given space. */
UNIV_INLINE
bool
fsp_is_checksum_enabled(
ulint space_id)
===============================================================================
- Optimizing intrinsic table handler access:
- Intrinsic temporary tables are really short-lived and so adding them to
a central InnoDB dictionary can be costly given that dictionary is shared
resource used for a lot of other operations.
Also, intrinsic temporary tables are not accessible beyond connection scope
that created it so there is no such need to maintain it in central
dictionary.
- Handler reference of such table can be cached in thread specific structure.
- In MySQL-InnoDB enviornment, THD structure is thread specific and suite the
needed requirement. Table handler can be created during create call and then
can be added to THD using table-name as key.
- Currently trx object associated with the connection is cached in THD.
A new structure named innodb_private_t can collect all such variables
that needs to be cached in THD. This new indirection will it scalable for
furture need.
typedef std::map table_cache_t;
class innodb_private_t {
public:
trx_t* trx; /*!< transaction handler. */
table_cache_t* open_tables; /*!< handler of tables that are
created or open but not added to
InnoDB dictionary as they are
session specific.
Currently, limited to intrinsic
temporary tables only. */
void register_table_handler(table_name, table);
dict_table_t* lookup_table_handler(table_name);
void unregister_table_handler(table_name);
};
A reference to the structure will be created during first access
and assigned to the thd_ha_data() ha_ptr. Obtain the private handler of
InnoDB session specific data.
@return reference to private handler */
__attribute__((warn_unused_result, nonnull))
static inline
innodb_private_t*&
thd_to_innodb_private( THD* thd);
This change also means for intrinsic temporary table case there is no need
to acquire dict_sys mutex and so related assert needs to be relaxed
ut_ad(mutex_own(&dict_sys->mutex)
|| dict_table_is_intrinsic_temporary(index->table));
===============================================================================
- ROW_ID/TRX_ID optimization:
- For intrinsic temporary table row-id can be locally generated by maintaining
a counter in dict_table_t. Only one thread can do insert to the table
and so there is no need to protect this counter using mutex neither is
it persisted to disk as temporary table are recreated.
Get table localized row-id and increment the row-id counter for next use.
@return row-id. */
UNIV_INLINE row_id_t dict_table_get_table_localized_row_id(table)
Same can be done for TRX_ID. Instead of using central trx_id generator local
TRX_ID can be generated and re-used.
Get table localized trx-id and increment the trx-id counter for next use.
@return trx-id. */
UNIV_INLINE trx_id_t dict_table_get_table_localized_trx_id(table)
===============================================================================
- TABLE_ID and INDEX_ID optimization:
- table-id for intrinsic temporary table can be locally generated and assigned
a default value of ULINT_UNDEFINED as it will not be used in any of the
real-operation.
if (!dict_table_is_intrinsic_temporary(table)) {
dict_hdr_get_new_id(&table->id, NULL, NULL, table, false);
} else {
table->id = ULINT_UNDEFINED;
}
- index_id is needed to distinguish and search for indexes among given
table object and so index_id should be unique within table.
It can be simply assigned in consecutive fashion starting with 1.
if (!dict_table_is_intrinsic_temporary(table)) {
dict_hdr_get_new_id(NULL, &index->id, NULL, table, false);
} else {
/* Index are re-loaded in process of creation using id.
If same-id is used for all indexes only first index will always
be retrieved when expected is iterative return of all indexes*/
if (UT_LIST_GET_LEN(table->indexes) > 0) {
index->id = UT_LIST_GET_LAST(table->indexes)->id + 1;
} else {
index->id = 1;
}
}
===============================================================================
- New Cursor interface for intrinsic temporary table:
- Intrinsic temporary table will bypass all the locking and transaction code
using the new cursor interface. Interface will directly interact with low
level data storage structure.
- Given that undo will be switched off, insert failure in any of the
index will result in rollback of that specific insert from all the indexes.
Complete statement level rollback will not be done and so successfully
inserted rows even if they are part of same statement will continue to exist.
INSERT interface:
----------------
If clustered index is auto-generated then row-id is appended to each record
which means record are coming in sorted fashion and so an optimized interface
to load sorted data can be used. This interface will avoid searching for position
instead will cache the last inserted position and will insert directly next to it.
+/***************************************************************//**
+This is a specialized function meant for direct insertion to
+auto-generated clustered index based on cached position from
+last successful insert. To be used when data is sorted.
+
+@param[in] flags undo logging and locking flags
+@param[in] mode BTR_MODIFY_LEAF or BTR_MODIFY_TREE.
+ depending on whether we wish optimistic or
+ pessimistic descent down the index tree
+@param[in/out] index clustered index
+@param[in/out] entry index entry to insert
+@param[in] thr query thread
+
+@return error code */
+
+dberr_t
+row_ins_sorted_clust_index_entry(
+ ulint flags,
+ ulint mode,
+ dict_index_t* index,
+ ulint n_uniq,
+ dtuple_t* entry,
+ ulint n_ext,
+ que_thr_t* thr)
Function also avoid committing mtr unless there is pessimistic insert
that can cause split/change in tree structure.
DELETE/UPDATE interface:
------------------------
Given the usage priority of delete/update for now delete/update are kept,simple.
DELETE will cause the delete flag to set and UPDATE will be done in form of
DELETE followed by INSERT.
UPDATE action is done only for the index that is being affected by
the update statement. DELETE is applied to all the index records.
This simplicity facilitate explicit rollback action as roll-ptr is not maintained.
+/*********************************************************************//**
+Delete row from table (corresponding entries from all the indexes).
+Function will maintain cursor to the entries to invoke explicity rollback
+just incase update action following delete fails.
+
+@param[in] node update node carrying information to delete.
+@param[out] delete_entries vector of cursor to deleted entries.
+@param[in] restore_delete if true, then restore DELETE records by
+ unmarking delete.
+@param[in] update_index bitmap indicating which all index needs to
+ be updated.
+@return error code or DB_SUCCESS */
+static
+dberr_t
+row_delete_for_mysql_using_cursor(
+ const upd_node_t* node,
+ cursors_t& delete_entries,
+ bool restore_delete,
+ const index_update_t& update_index)
+/*********************************************************************//**
+Does an update of a row for MySQL by inserting new entry with update values.
+@param[in] node update node carrying information to delete.
+@param[out] delete_entries vector of cursor to deleted entries.
+@param[in] thr thread handler
+@param[in] update_index bitmap indicating which all index needs to
+ be updated.
+@return error code or DB_SUCCESS */
+static
+dberr_t
+row_update_for_mysql_using_cursor(
+ const upd_node_t* node,
+ cursors_t& delete_entries,
+ que_thr_t* thr,
+ const index_update_t& update_index)
SELECT interface:
----------------
Select interface will be optimized by avoiding locking overhead.
Instead it will direct open the cursor to the needed data-storage and will
traverse it as demanded. Given that intrinsic table are not shared there is no
need for persistent cursor positioning just maintaining the last traversed
record should be enough.
/********************************************************************//**
Searches for rows in the database using cursor.
function is meant for temporary table that are not shared accross connection
and so lot of complexity is reduced especially locking and transaction related.
Cursor simply act as iterator over table.
@param buf buffer for the fetched row in MySQL format
@param mode search mode PAGE_CUR_L
@param prebuilt prebuilt struct for the table handler; this contains
the info to search_tuple, index; if search tuple contains 0
field then we position the cursor at start or the end of index,
depending on 'mode'
@param match_mode
0 or ROW_SEL_EXACT or ROW_SEL_EXACT_PREFIX
@param direction
0 or ROW_SEL_NEXT or ROW_SEL_PREV; Note: if this is != 0,
then prebuilt must has a pcur with stored position! In opening
of a cursor 'direction' should be 0.
@return DB_SUCCESS or error code */
static
dberr_t
row_search_for_mysql_for_session_local_table(
byte* buf,
ulint mode,
row_prebuilt_t* prebuilt,
ulint match_mode,
ulint direction)
===============================================================================
- DoubleWrite Buffer
As per the current design there is facility to disable doublewrite buffer
for complete server (all tablespaces) but we need a facility to disable it
only for temporary tablespace.
This can achieved by adding selective check in doublewrite buffer apis as follows:
buf_dblwr_update(): Update double write buffer once IO request for block is
complete. Disable if block resides in temporary tablespace.
buf_dblwr_flush_buffered_writes() copies the data to doublewrite and schedule
it for write. If block is not writen to double-write buffer then let the
normal block IO proceed.
if (buf_dblwr->first_free == 0) {
/* Wake possible simulated aio thread as there could be
system temporary tablespace pages active for flushing.
Note: system temporary tablespace pages are not scheduled
for doublewrite. */
os_aio_simulated_wake_handler_threads();
Note: This will disable doublewrite buffer for complete temporary tablespace
which infact is good or something that we have been aiming since long.
===============================================================================
Misc:
Besides major changes listed above there are some minor changes for optimization:
1. Instead of creating a copy of record during insert directly insert the record
on page.
2. If key is fixed length then avoid re-computing of rec_get_offsets. Re-use
computed.
===============================================================================
Enabling support for DDL+DML operation on InnoDB intrinsic temporary table in
readonly mode.
Section below sketches the low level details of the changes that needs to be
done for this WL.
1. Enabling creation of shared temporary tablespace in read-only mode:
- Disable persistence of space-id to disk in read-only mode by turning off
LOGGING completely.
} else if (disable_redo) {
mtr_set_log_mode(&mtr,
(srv_read_only_mode ? MTR_LOG_NONE : MTR_LOG_NO_REDO));
}
MTR_LOG_NONE will avoid writing of the dirty pages to disk there by
blocking persistence of modified page to system tablespace (space-id = 0).
- Some of the os_file_xxxx APIs takes decision based on srv_read_only mode.
These APIs will continue to work as before except that read_only mode
will now be passed from caller. This would result in following API changes:
-# define os_file_create(key, name, create, purpose, type, success) \
+# define os_file_create(key, name, create, purpose, type, read_only, \
+ success)
-# define os_file_create_simple(key, name, create, access, success) \
+# define os_file_create_simple(key, name, create, access, \
+ read_only, success)
# define os_file_create_simple_no_error_handling( \
- key, name, create_mode, access, success) \
+ key, name, create_mode, access, read_only, success) \
# define os_aio(type, mode, name, file, buf, offset, \
- n, message1, message2) \
+ n, read_only, message1, message2) \
----------
+ bool read_only_mode,
+ /*!< in: if true read only mode
+ checks are enforced. */
- Removal of debug assert in some of the low level os_file_xxxx APIs that
checks for invocation in read-only mode.
For example: os_file_write_func() has safety assert that will block its
invocation in srv_read_only mode. As discussed before low level APIs
shouldn't be aware of srv_read_only mode instead this logic should be
limited at logical level only. Said that, if any such dependency exist it
should be removed.
- Caller will evaluate the mode based on tablespace it is operating.
For shared temporary tablespace, read_only_mode will passed as false
and for all other tablespace read_only_mode will be evaluated using
srv_read_only mode.
For example:
node->handle = os_file_create_simple_no_error_handling(
innodb_data_file_key, node->name, OS_FILE_OPEN,
- OS_FILE_READ_ONLY, &success);
+ OS_FILE_READ_ONLY,
+ (space->id == srv_tmp_space.space_id())
+ ? false : srv_read_only_mode, &success);
....
- Suppressing check for read-only in Tablespace object.
Tablespace object too enforces read-only checks. Tablespace object is used
at logical level and so these checks can be controlled for shared temporary
tablespace by setting appropriate flags.
class Tablespace {
/**
@return read only status for tablespace. */
bool get_ignore_read_only()
{
return(m_ignore_read_only);
}
/** Set Ignore Read Only Status for tablespace.
@param read_only_status read only status indicator */
void set_ignore_read_only(bool read_only_status)
{
m_ignore_read_only = read_only_status;
}
/** Ignore server read only configuration for this tablespace. */
bool m_ignore_read_only;
}
Flag ignore_read_only will be set to true only for shared temporary
tablespace. Default value of flag will be false that means respect
read-only mode.
Usage of flag would be something of this sort:
file.m_handle = os_file_create(
innodb_data_file_key, file.m_filename, file.m_open_flags,
OS_FILE_NORMAL, OS_DATA_FILE,
m_ignore_read_only ? false : srv_read_only_mode, &success);
2. Enable creation of buffer flush thread in read-only mode.
- if (!srv_read_only_mode) {
- os_thread_create(buf_flush_page_cleaner_thread, NULL, NULL);
- }
+ /* Even in read-only mode there could be flush job generated by
+ intrinsic temporary table operations. */
+ os_thread_create(buf_flush_page_cleaner_thread, NULL, NULL);
This is blocked given that in read-only mode no write request is expected but
now with these changes enabled we can expect write request to shared
temporary tablespace.
3. Enabling write aio framework in read-only mode:
Again this is blocked as no write request is expected but needs to be
enabled which involves removal of conditional code that enable aio write
framework and accordingly adjusting the segment numbers.
+ /* Initialize write aio segment. */
+ os_aio_write_array = os_aio_array_create(
+ n_write_segs * n_per_seg, n_write_segs);
+
+ if (os_aio_write_array == NULL) {
+ return(false);
+ }
+
+ for (ulint i = n_segments; i < (n_write_segs + n_segments); ++i) {
+ ut_a(i < SRV_MAX_N_IO_THREADS);
+ srv_io_thread_function[i] = "write thread";
+ }
+
+ n_segments += n_write_segs;
+
---------------
- ut_ad(!srv_read_only_mode);
ut_a(array == os_aio_write_array);
seg_len = os_aio_write_array->n_slots
/ os_aio_write_array->n_segments;
- segment = os_aio_read_array->n_segments + 2
- + slot->pos / seg_len;
+ segment = os_aio_read_array->n_segments
+ + (srv_read_only_mode ? 0 : 2) + slot->pos / seg_len;
}
4. Relaxing check that block invocation of DDL/DML APIs in read-only mode.
- Check will be relaxed by "oring" them with type of table.
/* Step-1: Validation checks before we commence write_row operation. */
- if (srv_read_only_mode) {
+ if (srv_read_only_mode
+ && !dict_table_is_intrinsic_temporary(prebuilt->table)) {
ib_senderrf(ha_thd(), IB_LOG_LEVEL_WARN, ER_READ_ONLY_MODE);
DBUG_RETURN(HA_ERR_TABLE_READONLY);
This will allow execution of APIs in read-only mode if table is intrinsic
temporary.
Also, avoid explicit call that would flush REDO log to disk.
+ if (!srv_read_only_mode && !is_intrinsic_temp_table) {
+ log_buffer_flush_to_disk();
+ }
Call will be invoked only if read-only is false and table is no intrinsic
temporary. Even in non-read-only mode intrinsic temporary don't generate REDO
log and so optimization would be helpful.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.