WL#9559: InnoDB_New_DD: Support in-place ALTER PARTITION
Affects: Server-8.0 — Status: Complete — Priority: Medium
InnoDB now supports the native partitioning which is supposed to rely on ha_innopart handler. However, for some DDL operations like 'ALTER TABLE ... PARTITION ...' are not implemented in this way, instead, they are done by fast_alter_partition_table(). This worklog will support the DDLs mentioned above by in-place APIs of ha_innopart, not to depend on fast_alter_partition_table() any more.
FR1. This worklog should implement these operations by native partitioning 1) ADD PARTITION 2) DROP PARTITION(RANGE/LIST PARITTION only) 3) COALESCE PARTITION(HASH/KEY PARTITION only) 4) REORGANIZE PARTITION 5) REBUILD PARTITION 6) EXCHANGE PARTITION After this worklog, the native partitioning will not depend on fast_alter_partition_table(). 'EXCHANGE PARITITION' will have its own handler API calledha_innopart::exchange_partition_low(). FR2. These following operations would be kept as is, not in this worklog's scope 1) PARTITION BY 2) REMOVE PARTITIONING 3) TRUNCATE PARTITION ... (ALL) 4) OPTIMIZE PARTITION 5) REPAIR PARTITION 6) ANALYZE PARTITION 7) CHECK PARTITION 8) DISCARD PARTITION 9) IMPORT PARTITION FR3. This worklog can't guarantee the crash-safe DDL which will be done as separate work. FR4. As we know, for statements like 'ALTER TABLE ... PARTITION ...', it was impossible to specify the 'ALGORITHM=', and the 'LOCK='. So there was no COPY way for this kind of statements. Let's take COALESCE PARTITION for example, there are 2 options: Option 1. Create a brand new partitioned table with less partitions, and copy all the data from old table to this new one, and finally do rename, no handler::*inplace_alter_table() APis involved. Option 2. Through handler::*inplace_alter_table() APIs, create the new partitions of the table internally, and copy the data inside the SE. Only one exception here is EXCHANGE PARTITION, which already supports 'ALGORITHM=COPY/INPLACE, LOCK=*', and this would be kept after this worklog. FR5. According to FR4, some error messages will be deprecated since option 1 is supported. Before this worklog, if we specify ALGORITHM=X or LOCK=X would result in an error like: ALTER TABLE t1 ALGORITHM=COPY, ADD PARTITION ALGORITHM=COPY/INPLACE is not supported. Reason: Partition specific operations do not yet support LOCK/ALGORITHM. Try ALGORITHM=DEFAULT. ALTER TABLE t1 ALGORITHM=INPLACE, LOCK=NONE, ADD PARTITION LOCK=NONE/SHARED/EXCLUSIVE is not supported. Reason: Partition specific operations do not yet support LOCK/ALGORITHM. Try LOCK=DEFAULT. After this worklog, there won't be these errors. That is, following combinations are supported: 1. ALGORITHM=COPY, LOCK=DEFAULT/SHARED/LOCK. 2. ALGORITHM=INPLACE, LOCK=DEFAULT/SHARED/LOCK, this is for operations which require copying data between partitions, like 'REORGANIZE PARTITION', 'ADD PARTITION' for HASH and KEY partitions, 'COALESCE PARTITION' and 'REBUILD PARTITION'. 3. ALGORITHM=INPLACE, LOCK=NONE, this is for 'ADD PARTITION' for RANGE and LIST partitions, and 'DROP PARTITION'. Any invalid combination would result in the same error messages as common ALTER TABLE operations. For example, ALGORITHM=COPY, LOCK=NONE is not allowed, and 'LOCK=NONE is not supported. Reason: COPY algorithm requires a lock. Try LOCK=SHARED.' would be issued too. FR6. During the altering, the table is MDL locked, so there would be no concurrent DMLs. Only exceptions are for 'ADD PARTITION' for RANGE and LIST partitions and 'DROP PARTITION', which do nothing during inplace_alter_table(), so no need to block concurrent DMLs. FR7. The autoinc behavior in this worklog would be: 1) The next autoinc counter should not be changed after the DDLs of 1)-5) in FR1 2) Regarding to 6) in FR1, the next counter of the partitioned table would be reset to the to be swapped table's counter if the latter is bigger; the next counter of the to be swapped table would be reset to the to be swapped partition's counter if the latter is bigger. This promises the counter would not be re-used.
This worklog will focus on these operations: 1. ADD PARTITION 2. DROP PARTITION(RANGE/LIST) 3. COALESCE PARTITION(HASH/KEY) 4. REORGANIZE PARTITION 5. REBUILD PARTITION 6. EXCHANGE PARTITION Requisite from server --------------------- These ALTER PARTITION operations now should be done by the native partitioning APIs, to make InnoDB partitioning a true native partitioning. However, some preparation work should still be done by Server, such as: 1. There should be a way to know that the native partitioning can support inplace alter partition(By introducing a new flag in server) 2. New APIs shall be modified/added for EXCHANGE PARTITION, which is Partition_helper::exchange_partition_low() 3. REBUILD shall be done through the APIs 4. Server shall let the inplace alter APIs know what do the APIs should do to every old and new partition. There are some partition states defined already, the inplace alter APIs shall be able to get these for all partitions. 5. Server shall provide the inplace alter APIs a function as what it already does in server layer(get_partition_id) to know which partition will a specified row be inserted to. Restrictions ------------ In fact, this is the same as it is before this worklog: 1. Only a single PARTITION BY, ADD PARTITION, DROP PARTITION, REORGANIZE PARTITION, COALESCE PARTITION or EXCHANGE PARTITION clause can be used in a given ALTER TABLE statement. 2. It is possible for an ALTER TABLE statement to contain a PARTITION BY or REMOVE PARTITIONING clause in an addition to other alter specifications, however, this should go like ALGORITHM = COPY, instead of INPLACE APIs ---- Operations of ADD/DROP/COALESCE/REORGANIZE/REBUILD will be done by ha_innopart::*inplace_alter_table() Operation of EXCHANGE PARTITION will be done by ha_innopart::exchange_partition_low inherited from Partition_helper::exchange_partition_low() ALTER PARTITION --------------- The partition_state for every partition can be got from the partition_elements passed to ha_innopart::*inplace_alter_table() APIs, or else, the used partitions should be got in ha_innopart::*partition_low() APIs. Notes: Currently, the DROP work would simply drop the partition table, without considering the rollback. The rollback would be implemented as separate work. Notes: It's possible that the to be created (sub)partition has the same name with some existing one. In this case, if there is no conflict with data file, it should be fine to create the new partition with the new(same) name. For example, if old partition is in tablespace 'innodb_system' and the new one will reside in innodb_file_per_table. However, conflict of in-memory table objects still exist in this scenario. So to make things easy for now, as long as the name gets no change, a temporary table name/data file name should be used, and finally rename it. 1. ADD PARTITION(RANGE/LIST) PREPARE: newly introduced partitions should be created, with either the final table name or a temporary table name ALTER: do nothing COMMIT: If a temporary table name was used, do rename, which should succeed because the old partition should have been dropped ROLLBACK: do nothing 2. DROP PARTITION(RANGE/LIST) PREPARE: Update internal structures for new table ALTER: do nothing COMMIT: drop the partitions ROLLBACK: do nothing 3. ADD PARTITION(HASH/KEY) PREPARE: newly introduced partitions should be created, including the to be added partitions N(with the final data file names) and the to be changed ones T(with temporary data file names); T is necessary since all the data would be re-arranged, so old partitions can't be used directly ALTER: copy data from all necessary old partitions(specified by server) of old table to the partitions of the new table according to the partitioning rule; server will provide the API for native partitioning to know which partition every row in the old table belongs to in new table COMMIT: for those to be changed partitions, the old partitions O have to be renamed to some other temporary ones; then every new partitions of T should be renamed to its according one of O; no work on partitions of N; finally, all partitions of renamed O should be dropped ROLLBACK: do nothing for now 4. COALESCE PARTITION This works similarly to above 3. Only exception is that there is no partitions of N. And renaming of O may not affect all the old partitions, only the leaving partitions should be renamed 5. REORGANIZE PARTITION PREPARE: The reorganized into partitions should be created, with either final name or a temporary table name ALTER: copy data from all necessary old partitions(specified by server) of old table to the partitions of the new table according to the partitioning rule; this is the same as 3 COMMIT: drop the to be reorganized partitions from old table; also make sure that drop should be done before possible rename for reorganized into partitions ROLLBACK: do nothing for now Example: Supposing we have a table like: CREATE TABLE tr ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(50), purchased DATE, KEY(id)) PARTITION BY RANGE(YEAR(purchased)) ( PARTITION p0 VALUES LESS THAN (1990), PARTITION p1 VALUES LESS THAN (1995), PARTITION p2 VALUES LESS THAN (2000), PARTITION p3 VALUES LESS THAN (2005) ); Let's take REORGANIZE PARTITION for example: ALTER TABLE tr REORGANIZE PARTITION p2, p3 INTO ( PARTITION p21 VALUES LESS THAN(1997), PARTITION p22 VALUES LESS THAN(2000), PARTITION p31 VALUES LESS THAN(2002), PARTITION p32 VALUES LESS THAN(2005)); In this example, the table contains 4 partitions p0-p3 originally, and partition p2 and p3 will be reorganized into 4 new partitions. So there will be 6 partitions after reorganization: p0, p1, p21, p22, p31, p32. PREPARE: new internal structure for p0 and p1 will also point to the data file of p0 and p1; p21, p22, p31 and p32 will be created in this step ALTER: data will be copied from original p2 and p3 to p21, p22, p31 and p32. COMMIT: old partitions of p2 and p3 will be dropped; all internal updates will be committed ROLLBACK: do nothing for now Also note that ADD PARTITION(HASH/KEY) and COALESCE PARTITION will always create all partitions with temporary names, and copy data from old partitions to new ones. For example: Assume the table tr is partitioned by hash with 4 partitions, and we issue 'ALTER TABLE tr ADD PARTITION PARTITIONS 2;' New partitions with temporary name will be created like p0#TMP, p1#TMP ... p5#TMP. Data from old p0 to p3 will be copied to new partitions. During COMMIT, p0 to p3 will be renamed to some other temporary table names, like #sqlxxxx#p0..p3. Then p0#TMP...p5#TMP would be renamed into p0...p5. Finally, old partitions #sqlxxxx#p0..p3 can be dropped. This is similar to COALESCE PARTITION. 6. REBUILD PARTITION This should work in a similar way to above 3/4/5. Only difference is data are always copied from the old partitions to the same new partitions. 7. EXCHANGE PARTITION In ha_innopart::exchange_partition_low(), the to be exchanged partition and the table can be got. The structures of the partition and table shall be the same. We have to exchange the se_private_data/se_private_id/options, etc. so that the partition will point to the data of table while the table will point to the data of the partition. Also, since both partition and table's tablespaces are of implicit(innodb_file_per_table), data files have to be renamed to each other. Locking ------- Currently, all these operations are done with an upgradable shared metadata lock which blocks all attempts to update table data, allowing reads. Furthermore, during the change of partitioning, exclusive lock is required to install the new partitioning. This should still be held in this worklog. See FR6 too. Others ------ Since we now support these operations through the native partitioning APIs, we can remove some invalid code like ha_innopart::*_new_partition(), etc. which should not be called any more.
Copyright (c) 2000, 2019, Oracle Corporation and/or its affiliates. All rights reserved.