After setting up an external table with the files to load into
the table, you run HEATWAVE_LOAD
command to
automatically create the table and load the data.
This topic contains the following sections:
Complete the steps to
Create
an External Table before running the
HEATWAVE_LOAD
command.
To demonstrate how to load data using Lakehouse Auto Parallel Load, the following
example uses the same file in the
Create External Table Example
and runs the HEATWAVE_LOAD
command in
dryrun
mode to test out the creation of the
table and the loading of data beforehand.
-
Create a session variable with the characteristics of the
input_list
to create the external table and specify the files to use to load data into the table. To review all syntax options, see HEATWAVE_LOAD.mysql> SET @input_list = '[{ "db_name": "bank_marketing", "tables": [{ "table_name": "bank_train", "engine_attribute": { "dialect": {"format": "csv", "has_header": true}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]';
Where:
@input_list
is the name of the session variable.db_name
identifies the database name to store the table:bank_marketing
. The database is automatically created if it does not exist.table_name
sets the table name to store the data:bank_train
. The table is automatically created if it does not exist.engine_attribute
defines the parameters of the external file.format
defines the format of the external file:csv
.has_header
identifies a header in the external file. Auto Parallel Load then infers the column names from the first row in the file.uri
sets the URI link to access the file. Replacemybucket@mynamespace/data_files/data_file_1.csv
with your own link.
If you are on MySQL 9.1.2 and earlier, you need to update
dialect
with thefield delimiter
andrecord delimiter
parameters. As of MySQL 9.2.0, MySQL HeatWave Lakehouse can automatically detect these values. See Lakehouse External Table Syntax to learn more. -
Use the
HEATWAVE_LOAD
command and use thedryrun
mode option to generate a summary, so you can review the details and address any issues before loading the data. To skip this step, run the command innormal
mode. Insert theinput_list
variable previously created into the command.mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), JSON_OBJECT('mode', 'dryrun')); +------------------------------------------+ | INITIALIZING HEATWAVE AUTO PARALLEL LOAD | +------------------------------------------+ | Version: 4.31 | | | | Load Mode: dryrun | | Load Policy: disable_unsupported_columns | | Output Mode: normal | | | +------------------------------------------+ +--------------------------------------------------------------------------------------------------------------------+ | LAKEHOUSE AUTO SCHEMA INFERENCE | +--------------------------------------------------------------------------------------------------------------------+ | Verifying external lakehouse tables: 1 | | | | SCHEMA TABLE TABLE IS RAW NUM. OF ESTIMATED SUMMARY OF | | NAME NAME CREATED FILE SIZE COLUMNS ROW COUNT ISSUES | | ------ ----- -------- --------- ------- --------- ---------- | | `bank_marketing` `bank_train` NO 450.66 KiB 17 4.52 K | | | | New schemas to be created: 1 | | External lakehouse tables to be created: 1 | | | +--------------------------------------------------------------------------------------------------------------------+ +------------------------------------------------------------------------+ | OFFLOAD ANALYSIS | +------------------------------------------------------------------------+ | Verifying input schemas: 1 | | User excluded items: 0 | | | | SCHEMA OFFLOADABLE OFFLOADABLE SUMMARY OF | | NAME TABLES COLUMNS ISSUES | | ------ ----------- ----------- ---------- | | `bank_marketing` 1 17 | | | | Total offloadable schemas: 1 | | | +------------------------------------------------------------------------+ +-----------------------------------------------------------------------------------------------------------------------------+ | CAPACITY ESTIMATION | +-----------------------------------------------------------------------------------------------------------------------------+ | Default encoding for string columns: VARLEN (unless specified in the schema) | | Estimating memory footprint for 1 schema(s) | | | | TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED | | SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD | | NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME | | ------ ----------- --------- --------- ------- ---------- ------- --------- | | `bank_marketing` 1 4.65 MiB 1.12 MiB 17 0 17 7.00 s | | | | Sufficient MySQL host memory available to load all tables. | | Sufficient HeatWave cluster memory available to load all tables. | | | +-----------------------------------------------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------------------------------+ | LOAD SCRIPT GENERATION | +--------------------------------------------------------------------------------------------------------------------+ | Dryrun mode only generates the load script | | Set mode to "normal" in options to load tables | | | | Retrieve load script containing 3 generated DDL commands using the query below: | | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report" | | SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; | | | | Applying changes will take approximately 7.00 s | | | | Caution: Executing the generated load script may alter column definitions and secondary engine flags in the schema | | | | Total errors encountered: 0 | | Total warnings encountered: 0 | | | +--------------------------------------------------------------------------------------------------------------------+
Review the following sections in the load script:
LAKEHOUSE AUTO SCHEMA INFERENCE
: Displays the database name, table table, file size, number of columns, and the estimated number of rows. Any issues display here.OFFLOAD ANALYSIS
: Confirms the number of tables and columns that Lakehouse can load to MySQL HeatWave.-
CAPACITY ESTIMATION
: Displays the memory requirements and estimated load time.If there is insufficient memory, you can update the number of nodes for the DB System. See Editing a MySQL HeatWave Cluster in the MySQL HeatWave on OCI Service Guide or Editing a MySQL HeatWave Cluster in the MySQL HeatWave on AWS Service Guide.
-
If there are no issues or errors in the summary, you can run the
HEATWAVE_LOAD
command again innormal
mode to load the data. If you set the options toNULL
, the data loads in normal mode by default.mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL); +------------------------------------------+ | INITIALIZING HEATWAVE AUTO PARALLEL LOAD | +------------------------------------------+ | Version: 4.31 | | | | Load Mode: normal | | Load Policy: disable_unsupported_columns | | Output Mode: normal | | | +------------------------------------------+ +--------------------------------------------------------------------------------------------------------------------+ | LAKEHOUSE AUTO SCHEMA INFERENCE | +--------------------------------------------------------------------------------------------------------------------+ | Verifying external lakehouse tables: 1 | | | | SCHEMA TABLE TABLE IS RAW NUM. OF ESTIMATED SUMMARY OF | | NAME NAME CREATED FILE SIZE COLUMNS ROW COUNT ISSUES | | ------ ----- -------- --------- ------- --------- ---------- | | `bank_marketing` `bank_train` NO 450.66 KiB 17 4.52 K | | | | New schemas to be created: 1 | | External lakehouse tables to be created: 1 | | | +--------------------------------------------------------------------------------------------------------------------+ +------------------------------------------------------------------------+ | OFFLOAD ANALYSIS | +------------------------------------------------------------------------+ | Verifying input schemas: 1 | | User excluded items: 0 | | | | SCHEMA OFFLOADABLE OFFLOADABLE SUMMARY OF | | NAME TABLES COLUMNS ISSUES | | ------ ----------- ----------- ---------- | | `bank_marketing` 1 17 | | | | Total offloadable schemas: 1 | | | +------------------------------------------------------------------------+ +-----------------------------------------------------------------------------------------------------------------------------+ | CAPACITY ESTIMATION | +-----------------------------------------------------------------------------------------------------------------------------+ | Default encoding for string columns: VARLEN (unless specified in the schema) | | Estimating memory footprint for 1 schema(s) | | | | TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED | | SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD | | NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME | | ------ ----------- --------- --------- ------- ---------- ------- --------- | | `bank_marketing` 1 4.65 MiB 1.12 MiB 17 0 17 7.00 s | | | | Sufficient MySQL host memory available to load all tables. | | Sufficient HeatWave cluster memory available to load all tables. | | | +-----------------------------------------------------------------------------------------------------------------------------+ +---------------------------------------------------------------------------------------------------------------------------------------+ | EXECUTING LOAD SCRIPT | +---------------------------------------------------------------------------------------------------------------------------------------+ | HeatWave Load script generated | | Retrieve load script containing 3 generated DDL command(s) using the query below: | | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report" | | SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; | | | | Adjusting load parallelism dynamically per internal/external table. | | Using current parallelism of 32 thread(s) as maximum for internal tables. | | | | Warning: Executing the generated script may alter column definitions and secondary engine flags in the schema | | | | Using SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION | | | | Proceeding to load 1 table(s) into HeatWave. | | | | Applying changes will take approximately 7.03 s | | | +---------------------------------------------------------------------------------------------------------------------------------------+ +---------------------------------------------+ | SCHEMA CREATION | +---------------------------------------------+ | Schema `bank_marketing` creation succeeded! | | Warnings/errors encountered: 0 | | Elapsed time: 4.91 ms | | | +---------------------------------------------+ +-----------------------------------------------+ | TABLE LOAD | +-----------------------------------------------+ | TABLE (1 of 1): `bank_marketing`.`bank_train` | | Commands executed successfully: 2 of 2 | | Warnings encountered: 0 | | Table load succeeded! | | Total columns loaded: 17 | | Elapsed time: 4.36 s | | | +-----------------------------------------------+ +-------------------------------------------------------------------------------+ | LOAD SUMMARY | +-------------------------------------------------------------------------------+ | | | SCHEMA TABLES TABLES COLUMNS LOAD | | NAME LOADED FAILED LOADED DURATION | | ------ ------ ------ ------- -------- | | `bank_marketing` 1 0 17 4.36 s | | | | Total errors encountered: 0 | | Total warnings encountered: 0 | | | +-------------------------------------------------------------------------------+
Review the
SCHEMA CREATION
,TABLE LOAD
, andLOAD SUMMARY
sections for any errors or warnings during the load, and to confirm the external table loaded successfully. -
Optionally, generate a load script to review a summary of the database and table created for the load. The output is similar to the following example.
mysql> SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; +-----------------------------------------------------------------------------------------------------------------------------+ | Load Script | +-----------------------------------------------------------------------------------------------------------------------------+ | CREATE DATABASE `tpch`; | | CREATE TABLE `bank_marketing`.`bank_train`( | | `age` tinyint unsigned NOT NULL, | | `job` varchar(13) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `marital` varchar(8) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `education` varchar(9) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `default` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `balance` mediumint NOT NULL, | | `housing` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `loan` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `contact` varchar(9) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `day` tinyint unsigned NOT NULL, | | `month` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `duration` smallint unsigned NOT NULL, | | `campaign` tinyint unsigned NOT NULL, | | `pdays` smallint NOT NULL, | | `previous` tinyint unsigned NOT NULL, | | `poutcome` varchar(7) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN', | | `y` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN') | | ENGINE=lakehouse SECONDARY_ENGINE=RAPID | | ENGINE_ATTRIBUTE='{"file": [{"par": "https://objectstorage.../bank.csv"}], | | "dialect": {"format": "csv", "has_header": true, "field_delimiter": ";", "record_delimiter": "\\n"}}'; | | ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `bank_marketing`.`bank_train` SECONDARY_LOAD; | +-----------------------------------------------------------------------------------------------------------------------------+
-
Optionally, query five rows of the table to confirm the data is loaded and accessible.
mysql> SELECT * FROM bank_marketing.bank_train LIMIT 5; +-----+--------------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+----+ | age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y | +-----+--------------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+----+ | 37 | technician | single | secondary | no | 267 | no | no | cellular | 13 | may | 121 | 1 | -1 | 0 | unknown | no | | 34 | technician | single | tertiary | no | 586 | yes | no | unknown | 16 | may | 489 | 2 | -1 | 0 | unknown | no | | 30 | blue-collar | single | secondary | no | 648 | yes | yes | unknown | 8 | may | 287 | 1 | -1 | 0 | unknown | no | | 57 | entrepreneur | divorced | secondary | yes | 25 | yes | no | cellular | 11 | may | 652 | 1 | 370 | 4 | other | no | | 26 | admin. | single | secondary | no | 96 | yes | no | cellular | 13 | may | 612 | 2 | 348 | 3 | other | no | +-----+--------------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+----+
Once you confirm the table successfully loaded into Lakehouse, you can use the data for the following:
If you previously created a table and you want to load it with the specified file, you can use a single command to load it with Auto Parallel Load.
For example, you created a table with the following command.
mysql> CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int)
FILE_FORMAT = (FORMAT csv)
FILES = (URI = 'oci://mybucket@mynamespace/data_files/data_file_1.csv');
You can load the table to the MySQL HeatWave Cluster with data from the specified file later on with the following command:
mysql> CALL sys.HEATWAVE_LOAD('[ {"db_name": "db_1", "tables": ["table_1"]} ]', NULL);
The value db_1
refers to the name of the
database storing the table, and table_1
refers to the name of the table to load.
After successfully creating the external table and loading it with data, learn how to do the following: