Starting MySQL 9.1.2, you can update Lakehouse tables using selective load. Selective load lets you edit the list of files that comprise a Lakehouse table without altering the directories in the Object Storage bucket.
To edit the list of files comprising a Lakehouse table, you
can add or remove the file paths specified in the
engine_attribute
option for the table. If a
new file path is specified, then the data from the file is
loaded into the table. If an existing file path is removed,
then the data pertaining to the file is removed from the
table.
Note the following points about the
engine_attribute
option:
Ensure that the
engine_attribute
option specifies all the files to be loaded.You can modify only the file parameters in the
engine_attribute
option.In case the data refresh fails, the value of the
engine_attribute
option is not updated, and an error message is displayed.
You can edit the engine_attribute
option
using one of the following methods:
For more information about the
engine_attribute
option, see
Section 5.2.2, “Lakehouse External Table Syntax”.
Using the ENGINE_ATTRIBUTE
option with
the ALTER
TABLE
statement,
you can update the files of the
ENGINE_ATTRIBUTE
of a table, without
reloading the entire table. The changes are applied
incrementally.
Edit the value of the ENGINE_ATTRIBUTE
option using the ALTER TABLE
statement as shown below:
Press CTRL+C to copymysql> ALTER TABLE TableName ENGINE_ATTRIBUTE = '{NewEngineAttributeJSON}';
Replace the following:
TableName
: the name of the table.NewEngineAttributeJSON
: JSON object that specifies the list of files to include or exclude from the table.
The following example shows how to create a table
supplier_pq
, and then add a new
supplier.parquet
file to the table after
the table is loaded:
Press CTRL+C to copymysql> CREATE TABLE 'supplier_pq' ( 's_suppkey' int NOT NULL, 's_name' char(25) DEFAULT NULL, 's_address' varchar(40) DEFAULT NULL, 's_nationkey' int DEFAULT NULL, 's_phone' char(15) DEFAULT NULL, 's_acctbal' double DEFAULT NULL, 's_comment' varchar(101) DEFAULT NULL, PRIMARY KEY ('s_suppkey'), KEY 'i_s_nationkey' ('s_nationkey') ) ENGINE=Lakehouse SECONDARY_ENGINE=rapid ENGINE_ATTRIBUTE= '{ "file": [ { "name": "src_data/parquet/tpch/supplier.parquet", "bucket": "myBucket", "region": "myRegion", "namespace": "myNamespace" } ], "dialect": { "format": "parquet" } }';
Press CTRL+C to copymysql> ALTER TABLE 'supplier_pq' ENGINE_ATTRIBUTE= '{ "file": [ { "name": "src_data/parquet/tpch/supplier.parquet", "bucket": "myBucket", "region": "myRegion", "namespace": "myNamespace" }, { "name": "src_data2/parquet/tpch/supplier.parquet", "bucket": "myBucket", "region": "myRegion", "namespace": "myNamespace" } ], "dialect": { "format": "parquet" } }';
You can use
Auto Parallel Load
to edit the value of the engine_attribute
option in the following scenarios:
-
If a table is created without the
engine_attribute
option set, you can set theengine_attribute
option using Auto Parallel Load. Auto Parallel Load then runs theALTER TABLE
statement, and loads the data from the files you specified into the table.This method also works on empty InnoDB tables. The InnoDB tables are later converted to Lakehouse tables.
If a table is created with the
engine_attribute
set, but the table is not yet loaded, you can set theengine_attribute
option using Auto Parallel Load. Auto Parallel Load uses the new value of theengine_attribute
option to run theALTER TABLE
statement, and loads data only from the files you specified into the table.-
If a table with the
engine_attribute
option set is already loaded, then Auto Parallel Load lets you edit theengine_attribute
option value only if Lakehouse Incremental Load is enabled. To enable Lakehouse Incremental Load, you must set therefresh_external_tables
option toTRUE
. Auto Parallel Load then runs theALTER TABLE
statement, and loads the data from the new files you specified into the table, and removes data for the files that you removed. The changes are applied incrementally.If
refresh_external_tables
option is not set toTRUE
, selective load fails to run and an error message is displayed.
The following example shows how to edit the
engine_attribute
option using Auto Parallel Load with
the refresh_external_tables
option set to
TRUE
:
Press CTRL+C to copymysql> SET @input_list = '[{ "db_name": "tpch", "tables": [{ "table_name": "supplier_pq", "engine_attribute": { "dialect": {"format": "parquet"}, "file": [{ "name": "src_data/parquet/tpch/supplier.parquet", "bucket": "myBucket", "namespace": "myNamespace", "region": "myRegion" }, { "name": "src_data2/parquet/tpch/supplier.parquet", "bucket": "myBucket", "namespace": "myNamespace", "region": "myRegion" }] } }] }]'; mysql> SET @options = JSON_OBJECT('mode', 'normal', 'refresh_external_tables', TRUE); mysql> CALL sys.heatwave_load(@input_list, @options);