Available as of MySQL 9.1.2, you can update external tables using Selective Load, which lets you edit the list of files that comprise an external table without altering the directories in the Object Storage bucket.
This topic contains the following sections:
Load structured data using Lakehouse Auto Parallel Load or manually.
To edit the list of files comprising an external table, you
can add or remove the file paths specified in the
ENGINE_ATTRIBUTE
option for the table. If
a new file path is specified, then the data from the file is
loaded into the table. If an existing file path is removed,
then the data pertaining to the file is removed from the
table.
Note the following points about the
ENGINE_ATTRIBUTE
option:
Ensure that the
ENGINE_ATTRIBUTE
option specifies all the files to be loaded.You can modify only the file parameters in the
ENGINE_ATTRIBUTE
option.If Lakehouse automatically detected dialect parameters, such as delimiters, you must include those parameters in subsequent loads using Selective Load.
If the data refresh fails, the value of the
ENGINE_ATTRIBUTE
option is not updated, and an error message displays.
You can edit the ENGINE_ATTRIBUTE
option
using one of the following methods:
For more information about the
ENGINE_ATTRIBUTE
option, see
Lakehouse
External Table Syntax.
You can use the ENGINE_ATTRIBUTE
option
with the ALTER TABLE
statement to add or remove data in an external table without
reloading the entire table. Lakehouse applies the changes
incrementally.
The following example performs an initial manual load of five files in an Object Storage folder, and then does a subsequent load of data in the external table that removes some of the files from the initial load. The example uses resource principals to load the data.
Complete the following steps to perform the initial load of data:
Upload the files to load into Object Storage. See Upload the files to load into Object Storage Object to a Bucket. This example creates the Object Storage folder
data_files
and uploads the filesdata_file_1.csv
,data_file_2.csv
,data_file_3.csv
,data_file_4.csv
, anddata_file_5.csv
.-
In the terminal window, create and use the database to store the table.
mysql> CREATE DATABASE data_tables; mysql> USE DATABASE data_tables;
-
Set up the
CREATE TABLE
statement and theENGINE_ATTRIBUTE
options to specify the parameters needed to process the external files. As of MySQL 9.4.0, you can use theCREATE EXTERNAL TABLE
statement. In versions earlier than MySQL 9.4.0, you must use theCREATE TABLE
statement and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.Ensure that the table has the correct data type for each column. For this example, columns are defined according to the data in the
bank.csv
file. See the following to learn more:As of MySQL 9.4.0, you can use SQL syntax to set options for external tables. For earlier versions, you must set options using JSON syntax. To learn more, see Lakehouse External Table JSON Syntax and Lakehouse External Table SQL Syntax.
The following example uses JSON syntax:
mysql> CREATE EXTERNAL TABLE data_table_1( col_1 varchar(2) DEFAULT NULL, col_2 smallint unsigned DEFAULT NULL, col_3 varchar(4) DEFAULT NULL, col_4 varchar(4) DEFAULT NULL, col_5 tinyint unsigned DEFAULT NULL, col_6 smallint unsigned DEFAULT NULL, col_7 smallint unsigned DEFAULT NULL, col_8 tinyint unsigned DEFAULT NULL ) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "has_header": true}, "file": [{"uri": "oci://bucket_1@tenant1/data_files/"}]}';
Where:
The
CREATE EXTERNAL TABLE
statement creates the tabledata_table_1
, and automatically setsENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
.Each column for the table is defined according to the files in the
data_files
folder.format
defines the format of the external file:csv
.has_header
identifies a header in the external file.file
parameters define the resource principals to access the files to create the external table.
The following example uses SQL syntax:
mysql> CREATE EXTERNAL TABLE data_table_1( col_1 varchar(2) DEFAULT NULL, col_2 smallint unsigned DEFAULT NULL, col_3 varchar(4) DEFAULT NULL, col_4 varchar(4) DEFAULT NULL, col_5 tinyint unsigned DEFAULT NULL, col_6 smallint unsigned DEFAULT NULL, col_7 smallint unsigned DEFAULT NULL, col_8 tinyint unsigned DEFAULT NULL ) FILE_FORMAT = (FORMAT csv HEADER ON) FILES = (URI = 'oci://bucket_1@tenant1/data_files/');
If you are on MySQL 9.1.2 and earlier, you need to update
dialect
with thefield delimiter
andrecord delimiter
parameters. As of MySQL 9.2.0, MySQL HeatWave Lakehouse can automatically detect these values. See Lakehouse External Table Syntax to learn more. -
Use the
ALTER TABLE
andSECONDARY_LOAD
commands to load the data and create the external table.mysql> ALTER TABLE data_table_1 SECONDARY_LOAD; Warning (code 3877): Command executed during preprocessing: 'ALTER TABLE `data_tables`.`data_table_1` ENGINE_ATTRIBUTE='{"file": [{"uri": "oci://bucket_1@tenant1/data_files/"}], "dialect": {"format": "csv", "has_header": true, "field_delimiter": ",", "record_delimiter": "\\n"}}''.
Refer to the
dialect
parameters that need to be used in subsequent loads. Lakehouse automatically detected thefield_delimiter
andrecord delimiter
values. If you need to reviewdialect
parameters later on, you can use the following command:SHOW CREATE TABLE
.table_name
; -
Optionally, query the number of rows in the table to confirm all files loaded successfully.
mysql> SELECT COUNT(*) FROM data_table_1; +----------+ | count(*) | +----------+ | 4130 | +----------+
For this example, you now want to remove
data_file_1.csv
,
data_file_2.csv
, and
data_file_3.csv
. You want to keep
data_file_4.csv
and
data_file_5.csv
.
Complete the following steps to perform the Selective Load and remove two files from the table:
-
Update the
ENGINE_ATTRIBUTE
parameters for the table, which updates the files for the table to only includedata_file_4.csv
anddata_file_5.csv
. Make sure alldialect
parameters are included.mysql> ALTER TABLE data_table_1 ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "has_header": true, "field_delimiter": ",", "record_delimiter": "\\n"}, "file": [{"uri": "oci://bucket_1@tenant1/data_files/data_file_4.csv"}, {"uri": "oci://bucket_1@tenant1/data_files/data_file_5.csv"} ]}';
-
Optionally, query the number of rows in the table to confirm the appropriate files loaded successfully.
mysql> SELECT COUNT(*) FROM data_table_1; +----------+ | count(*) | +----------+ | 2746 | +----------+
You can use
Auto Parallel Load
to edit the value of the engine_attribute
option in the following scenarios:
-
If you create a table without the
engine_attribute
option set, you can set theengine_attribute
option using Auto Parallel Load. Auto Parallel Load then runs theALTER TABLE
statement and loads the data from the files you specified into the table.This method also works on empty DB System tables, which are later converted to external tables.
If you create a table with the
engine_attribute
set, but the table is unloaded, you can set theengine_attribute
option using Auto Parallel Load. Auto Parallel Load uses the new value of theengine_attribute
option to run theALTER TABLE
statement and loads data only from the files you specified into the table.If a table with the
engine_attribute
option set is already loaded, then Auto Parallel Load lets you edit theengine_attribute
option value only if Lakehouse Incremental Load is enabled. To enable Lakehouse Incremental Load, you must set therefresh_external_tables
option toTRUE
. Auto Parallel Load then runs theALTER TABLE
statement and adds or removes data as defined in the new file paths. Lakehouse applies the changes incrementally.
The following example performs an initial automatic load of five files in an Object Storage folder, and then does a subsequent load of data in the external table that removes some of the files from the initial load. The example uses resource principals to load the data.
Complete the following steps to perform the initial load of data:
Upload the files to load into Object Storage. This example creates the Object Storage folder
data_files
and uploads the filesdata_file_1.csv
,data_file_2.csv
,data_file_3.csv
,data_file_4.csv
, anddata_file_5.csv
.-
Create a session variable with the characteristics of the
input_list
to load the external files. To review all syntax options, see Auto Parallel Load Syntax.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "data_table_1", "engine_attribute": { "dialect": {"format": "csv", "has_header": true}, "file": [{"uri": "oci://bucket_1@tenant1/data_files/"}] } }] }]';
Where:
@input_list
is the name of the session variable.db_name
identifies the database name to store the table:data_tables
. The database is automatically created if it does not exist.table_name
sets the table name to store the data:data_table_1
. The table is automatically created if it does not exist.engine_attribute
defines the parameters of the external file.format
defines the format of the external file:csv
.has_header
identifies a header in the external file. Auto Parallel Load then infers the column names from the first row in the file.file
defines the resource principals to access the files. Replace these values with your own. Theprefix
parameter defines the Object Storage folder storing the files to load.
If you are on MySQL 9.1.2 and earlier, you need to update
dialect
with thefield delimiter
andrecord delimiter
parameters. As of MySQL 9.2.0, MySQL HeatWave Lakehouse can automatically detect these values. See Lakehouse External Table Syntax to learn more. -
Run the
HEATWAVE_LOAD
command. If you set the options toNULL
, the data loads in normal mode by default.mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL); +------------------------------------------+ | INITIALIZING HEATWAVE AUTO PARALLEL LOAD | +------------------------------------------+ | Version: 4.34 | | | | Load Mode: normal | | Load Policy: disable_unsupported_columns | | Output Mode: normal | | | +------------------------------------------+ +--------------------------------------------------------------------------------------------------------------------+ | LAKEHOUSE AUTO SCHEMA INFERENCE | +--------------------------------------------------------------------------------------------------------------------+ | Verifying external lakehouse tables: 1 | | | | SCHEMA TABLE TABLE IS RAW NUM. OF ESTIMATED SUMMARY OF | | NAME NAME CREATED FILE SIZE COLUMNS ROW COUNT ISSUES | | ------ ----- -------- --------- ------- --------- ---------- | | `data_tables` `data_table_1` NO 40.06 KiB 8 1.38 K | | | | New schemas to be created: 1 | | External lakehouse tables to be created: 1 | | | +--------------------------------------------------------------------------------------------------------------------+ +------------------------------------------------------------------------+ | OFFLOAD ANALYSIS | +------------------------------------------------------------------------+ | Verifying input schemas: 1 | | User excluded items: 0 | | | | SCHEMA OFFLOADABLE OFFLOADABLE SUMMARY OF | | NAME TABLES COLUMNS ISSUES | | ------ ----------- ----------- ---------- | | `data_tables` 1 8 | | | | Total offloadable schemas: 1 | | | +------------------------------------------------------------------------+ +-----------------------------------------------------------------------------------------------------------------------------+ | CAPACITY ESTIMATION | +-----------------------------------------------------------------------------------------------------------------------------+ | Default encoding for string columns: VARLEN (unless specified in the schema) | | Estimating memory footprint for 1 schema(s) | | | | TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED | | SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD | | NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME | | ------ ----------- --------- --------- ------- ---------- ------- --------- | | `data_tables` 1 4.83 MiB 576.00 KiB 3 0 3 7.00 s | | | | Sufficient MySQL host memory available to load all tables. | | Sufficient HeatWave cluster memory available to load all tables. | | | +-----------------------------------------------------------------------------------------------------------------------------+ +---------------------------------------------------------------------------------------------------------------------------------------+ | EXECUTING LOAD SCRIPT | +---------------------------------------------------------------------------------------------------------------------------------------+ | HeatWave Load script generated | | Retrieve load script containing 3 generated DDL command(s) using the query below: | | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report" | | SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; | | | | Adjusting load parallelism dynamically per internal/external table. | | Using current parallelism of 32 thread(s) as maximum for internal tables. | | | | Warning: Executing the generated script may alter column definitions and secondary engine flags in the schema | | | | Using SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION | | | | Proceeding to load 1 table(s) into HeatWave. | | | | Applying changes will take approximately 7.03 s | | | +---------------------------------------------------------------------------------------------------------------------------------------+ +------------------------------------------+ | SCHEMA CREATION | +------------------------------------------+ | Schema `data_tables` creation succeeded! | | Warnings/errors encountered: 0 | | Elapsed time: 5.47 ms | | | +------------------------------------------+ +----------------------------------------------+ | TABLE LOAD | +----------------------------------------------+ | TABLE (1 of 1): `data_tables`.`data_table_1` | | Commands executed successfully: 2 of 2 | | Warnings encountered: 1 | | Table load succeeded! | | Total columns loaded: 8 | | Elapsed time: 3.93 s | | | +----------------------------------------------+ +----------------------------------------------------------------------------------+ | LOAD SUMMARY | +----------------------------------------------------------------------------------+ | | | SCHEMA TABLES TABLES COLUMNS LOAD | | NAME LOADED FAILED LOADED DURATION | | ------ ------ ------ ------- -------- | | `data_tables` 1 0 8 3.93 s | | | | Total errors encountered: 0 | | Total warnings encountered: 4 | | Retrieve the associated logs from the report table using the query below: | | SELECT log FROM sys.heatwave_autopilot_report WHERE type IN ('error', 'warn'); | | | +----------------------------------------------------------------------------------+
Review the
SCHEMA CREATION
,TABLE LOAD
, andLOAD SUMMARY
sections for any errors or warnings during the load, and to confirm the external table loaded successfully. -
Optionally, use the new database and query the number of rows in the table to confirm data from all files loaded successfully.
mysql> USE data_tables; mysql> SELECT COUNT(*) FROM data_table_1; +----------+ | COUNT(*) | +----------+ | 4130 | +----------+
For this example, you now want to remove
data_file_1.csv
,
data_file_2.csv
, and
data_file_3.csv
. You want to keep
data_file_4.csv
and
data_file_5.csv
.
Complete the following steps to perform the Selective Load and remove two files from the table:
-
Run the
SHOW CREATE TABLE
command to confirm all thedialect
parameters for the loaded table.mysql> SHOW CREATE TABLE data_tables.data_table_1; +--------------+------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------------+------------------------------------------------------------------------------------------------------------------+ | data_table_1 | CREATE TABLE `data_table_1` ( `Airline` varchar(2) DEFAULT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', `Flight` smallint unsigned DEFAULT NULL, `AirportFrom` varchar(4) DEFAULT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', `AirportTo` varchar(4) DEFAULT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', `DayOfWeek` tinyint unsigned DEFAULT NULL, `Time` smallint unsigned DEFAULT NULL, `Length` smallint unsigned DEFAULT NULL, `Delay` tinyint unsigned DEFAULT NULL ) ENGINE=Lakehouse DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci SECONDARY_ENGINE=RAPID /*!80021 ENGINE_ATTRIBUTE='{"file": [{"bucket": "bucket_1", "prefix": "data_files/", "region": "us-ashburn-1", "namespace": "tenant_1"}], "dialect": {"format": "csv", "has_header": true, "field_delimiter": ",", "record_delimiter": "\\n"}}' */ | +--------------+-----------------------------------------------------------------------------------------------------------------+
Refer to all the
dialect
parameters and ensure to include them in the subsequent load command. For this example, the dialect parameters are:"format": "csv", "has_header": true, "field_delimiter": ",", "record_delimiter": "\\n"
. -
Create a new
input_list
variable with the updated files for the table. Update theengine_attribute
parameters, which updates the files for the table to only includedata_file_4.csv
anddata_file_5.csv
. Make sure alldialect
parameters are included.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "data_table_1", "engine_attribute": { "dialect": { "format": "csv", "has_header": true, "field_delimiter": ",", "record_delimiter": "\\n"}, "file": [{"uri": "oci://bucket_1@tenant1/data_files/data_file_4.csv"}, {"uri": "oci://bucket_1@tenant1/data_files/data_file_5.csv"}] } }] }]';
-
Create an
options
variable that includes the options to load the data innormal
mode and enablerefresh_external_tables
.mysql> SET @options = JSON_OBJECT('mode', 'normal','refresh_external_tables', TRUE);
-
Run the
HEATWAVE_LOAD
command with theinput_list
andoptions
variables.mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), @options); +------------------------------------------+ | INITIALIZING HEATWAVE AUTO PARALLEL LOAD | +------------------------------------------+ | Version: 4.34 | | | | Load Mode: normal | | Load Policy: disable_unsupported_columns | | Output Mode: normal | | | +------------------------------------------+ +--------------------------------------------------------------------------------------------------------------------+ | LAKEHOUSE AUTO SCHEMA INFERENCE | +--------------------------------------------------------------------------------------------------------------------+ | Verifying external lakehouse tables: 1 | | | | SCHEMA TABLE TABLE IS RAW NUM. OF ESTIMATED SUMMARY OF | | NAME NAME CREATED FILE SIZE COLUMNS ROW COUNT ISSUES | | ------ ----- -------- --------- ------- --------- ---------- | | `data_tables` `data_table_1` YES + 40.06 KiB 8 1.38 K | | | | New schemas to be created: 0 | | External lakehouse tables to be created: 0 | | External lakehouse tables to be refreshed (marked with +): 1 | | | +--------------------------------------------------------------------------------------------------------------------+ +------------------------------------------------------------------------+ | OFFLOAD ANALYSIS | +------------------------------------------------------------------------+ | Verifying input schemas: 1 | | User excluded items: 0 | | | | SCHEMA OFFLOADABLE OFFLOADABLE SUMMARY OF | | NAME TABLES COLUMNS ISSUES | | ------ ----------- ----------- ---------- | | `data_tables` 1 8 | | | | Total offloadable schemas: 1 | | | +------------------------------------------------------------------------+ +-----------------------------------------------------------------------------------------------------------------------------+ | CAPACITY ESTIMATION | +-----------------------------------------------------------------------------------------------------------------------------+ | Default encoding for string columns: VARLEN (unless specified in the schema) | | Estimating memory footprint for 1 schema(s) | | Estimates do not include 1 external lakehouse table(s) being refreshed. | | | | TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED | | SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD | | NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME | | ------ ----------- --------- --------- ------- ---------- ------- --------- | | `data_tables` 1 0 bytes 0 bytes 3 0 3 0 ps | | | | Sufficient MySQL host memory available to load all tables. | | Sufficient HeatWave cluster memory available to load all tables. | | | +-----------------------------------------------------------------------------------------------------------------------------+ +---------------------------------------------------------------------------------------------------------------------------------------+ | EXECUTING LOAD SCRIPT | +---------------------------------------------------------------------------------------------------------------------------------------+ | HeatWave Load script generated | | Retrieve load script containing 1 generated DDL command(s) using the query below: | | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report" | | SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; | | | | Adjusting load parallelism dynamically per internal/external table. | | Using current parallelism of 32 thread(s) as maximum for internal tables. | | | | Warning: Executing the generated script may alter column definitions and secondary engine flags in the schema | | | | Using SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION | | | | Proceeding to load 1 table(s) into HeatWave. | | | | Applying changes will take approximately 0 ps | | | +---------------------------------------------------------------------------------------------------------------------------------------+ +----------------------------------------------+ | TABLE LOAD | +----------------------------------------------+ | TABLE (1 of 1): `data_tables`.`data_table_1` | | Commands executed successfully: 1 of 1 | | Warnings encountered: 0 | | Table load succeeded! | | Total columns loaded: 8 | | Elapsed time: 3.09 s | | | +----------------------------------------------+ +----------------------------------------------------------------------------------+ | LOAD SUMMARY | +----------------------------------------------------------------------------------+ | | | SCHEMA TABLES TABLES COLUMNS LOAD | | NAME LOADED FAILED LOADED DURATION | | ------ ------ ------ ------- -------- | | `data_tables` 1 0 8 3.09 s | | | | Total errors encountered: 0 | | Total warnings encountered: 1 | | Retrieve the associated logs from the report table using the query below: | | SELECT log FROM sys.heatwave_autopilot_report WHERE type IN ('error', 'warn'); | | | +----------------------------------------------------------------------------------+
-
Optionally, query the number of rows in the table again to confirm the data from the appropriate files successfully loaded.
mysql> SELECT COUNT(*) FROM data_table_1; +----------+ | count(*) | +----------+ | 2746 | +----------+
-
Learn more about the following: