Related Documentation Download this Manual
PDF (US Ltr) - 2.1Mb
PDF (A4) - 2.1Mb


HeatWave User Guide  /  ...  /  Adding or Removing Files Using Selective Load

5.4.2 Adding or Removing Files Using Selective Load

Starting MySQL 9.1.2, you can update Lakehouse tables using selective load. Selective load lets you edit the list of files that comprise a Lakehouse table without altering the directories in the Object Storage bucket.

To edit the list of files comprising a Lakehouse table, you can add or remove the file paths specified in the engine_attribute option for the table. If a new file path is specified, then the data from the file is loaded into the table. If an existing file path is removed, then the data pertaining to the file is removed from the table.

Note the following points about the engine_attribute option:

  • Ensure that the engine_attribute option specifies all the files to be loaded.

  • You can modify only the file parameters in the engine_attribute option.

  • In case the data refresh fails, the value of the engine_attribute option is not updated, and an error message is displayed.

You can edit the engine_attribute option using one of the following methods:

For more information about the engine_attribute option, see Section 5.2.2, “Lakehouse External Table Syntax”.

Using the ALTER TABLE Statement

Using the ENGINE_ATTRIBUTE option with the ALTER TABLE statement,

you can update the files of the ENGINE_ATTRIBUTE of a table, without reloading the entire table. The changes are applied incrementally.

Edit the value of the ENGINE_ATTRIBUTE option using the ALTER TABLE statement as shown below:

Press CTRL+C to copy
mysql> ALTER TABLE TableName ENGINE_ATTRIBUTE = '{NewEngineAttributeJSON}';

Replace the following:

  • TableName: the name of the table.

  • NewEngineAttributeJSON: JSON object that specifies the list of files to include or exclude from the table.

The following example shows how to create a table supplier_pq, and then add a new supplier.parquet file to the table after the table is loaded:

Press CTRL+C to copy
mysql> CREATE TABLE 'supplier_pq' ( 's_suppkey' int NOT NULL, 's_name' char(25) DEFAULT NULL, 's_address' varchar(40) DEFAULT NULL, 's_nationkey' int DEFAULT NULL, 's_phone' char(15) DEFAULT NULL, 's_acctbal' double DEFAULT NULL, 's_comment' varchar(101) DEFAULT NULL, PRIMARY KEY ('s_suppkey'), KEY 'i_s_nationkey' ('s_nationkey') ) ENGINE=Lakehouse SECONDARY_ENGINE=rapid ENGINE_ATTRIBUTE= '{ "file": [ { "name": "src_data/parquet/tpch/supplier.parquet", "bucket": "myBucket", "region": "myRegion", "namespace": "myNamespace" } ], "dialect": { "format": "parquet" } }';
Press CTRL+C to copy
mysql> ALTER TABLE 'supplier_pq' ENGINE_ATTRIBUTE= '{ "file": [ { "name": "src_data/parquet/tpch/supplier.parquet", "bucket": "myBucket", "region": "myRegion", "namespace": "myNamespace" }, { "name": "src_data2/parquet/tpch/supplier.parquet", "bucket": "myBucket", "region": "myRegion", "namespace": "myNamespace" } ], "dialect": { "format": "parquet" } }';

Using Auto Parallel Load

You can use Auto Parallel Load to edit the value of the engine_attribute option in the following scenarios:

  • If a table is created without the engine_attribute option set, you can set the engine_attribute option using Auto Parallel Load. Auto Parallel Load then runs the ALTER TABLE statement, and loads the data from the files you specified into the table.

    This method also works on empty InnoDB tables. The InnoDB tables are later converted to Lakehouse tables.

  • If a table is created with the engine_attribute set, but the table is not yet loaded, you can set the engine_attribute option using Auto Parallel Load. Auto Parallel Load uses the new value of the engine_attribute option to run the ALTER TABLE statement, and loads data only from the files you specified into the table.

  • If a table with the engine_attribute option set is already loaded, then Auto Parallel Load lets you edit the engine_attribute option value only if Lakehouse Incremental Load is enabled. To enable Lakehouse Incremental Load, you must set the refresh_external_tables option to TRUE. Auto Parallel Load then runs the ALTER TABLE statement, and loads the data from the new files you specified into the table, and removes data for the files that you removed. The changes are applied incrementally.

    If refresh_external_tables option is not set to TRUE, selective load fails to run and an error message is displayed.

The following example shows how to edit the engine_attribute option using Auto Parallel Load with the refresh_external_tables option set to TRUE:

Press CTRL+C to copy
mysql> SET @input_list = '[{ "db_name": "tpch", "tables": [{ "table_name": "supplier_pq", "engine_attribute": { "dialect": {"format": "parquet"}, "file": [{ "name": "src_data/parquet/tpch/supplier.parquet", "bucket": "myBucket", "namespace": "myNamespace", "region": "myRegion" }, { "name": "src_data2/parquet/tpch/supplier.parquet", "bucket": "myBucket", "namespace": "myNamespace", "region": "myRegion" }] } }] }]'; mysql> SET @options = JSON_OBJECT('mode', 'normal', 'refresh_external_tables', TRUE); mysql> CALL sys.heatwave_load(@input_list, @options);