You can load data into a MySQL HeatWave Cluster by using the Auto Parallel Load feature.
Auto Parallel Load, which can be run manually from any MySQL client or
connector, is implemented as a stored procedure named
heatwave_load
, which resides in the MySQL
sys
schema. Running Auto Parallel Load involves
issuing a CALL
statement for the
stored procedure, which takes schemas
and
options
as arguments.
First, run Auto Parallel Load in dryrun
mode to check
for errors and warnings and to inspect the generated load
script. To load a single schema in dryrun
mode:
mysql> CALL sys.heatwave_load(JSON_ARRAY("schema_name"), JSON_OBJECT("mode","dryrun"));
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"), JSON_OBJECT("mode","dryrun"));
+-----------------------------------------------------------------------------------------------------------------------------+
| CAPACITY ESTIMATION |
+-----------------------------------------------------------------------------------------------------------------------------+
| Default encoding for string columns: VARLEN (unless specified in the schema) |
| Estimating memory footprint for 1 schema(s) |
| |
| TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED |
| SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD |
| NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME |
| ------ ----------- --------- --------- ------- ---------- ------- --------- |
| `tpch` 8 1.79 GiB 1.15 GiB 29 24 5 1.25 min |
| |
| Sufficient MySQL host memory available to load all tables. |
| Sufficient HeatWave cluster memory available to load all tables. |
| |
+-----------------------------------------------------------------------------------------------------------------------------+
12 rows in set (0.2959 sec)
+--------------------------------------------------------------------------------------------------------------------+
| LOAD SCRIPT GENERATION |
+--------------------------------------------------------------------------------------------------------------------+
| Dryrun mode only generates the load script |
| Set mode to "normal" in options to load tables |
| |
| Retrieve load script containing 16 generated DDL commands using the query below: |
| Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report" |
| SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; |
| |
| Applying changes will take approximately 1.25 min |
| |
| Caution: Executing the generated load script may alter column definitions and secondary engine flags in the schema |
| |
| Total errors encountered: 0 |
| Total warnings encountered: 0 |
| |
+--------------------------------------------------------------------------------------------------------------------+
14 rows in set (0.2959 sec)
This statement loads tpch
schema in
dryrun
mode, and displays the load summary
In dryrun
mode, Auto Parallel Load sends the load script
to the heatwave_autopilot_report
table
only. See Section 5.8.8, “Autopilot Report Table”. It
does not load data into MySQL HeatWave.
If Auto Parallel Load fails with an error, inspect the errors with a query
to the heatwave_autopilot_report
table.
mysql> SELECT log FROM sys.heatwave_autopilot_report
WHERE type="error";
When Auto Parallel Load finishes running, query the
heatwave_autopilot_report
table to check
for warnings.
mysql> SELECT log FROM sys.heatwave_autopilot_report
WHERE type="warn";
Issue the following query to inspect the load script that was generated.
mysql> SELECT log->>"$.sql" AS "Load Script"
FROM sys.heatwave_autopilot_report
WHERE type = "sql" ORDER BY id;
+------------------------------------------------------------------------------+
| Load Script |
+------------------------------------------------------------------------------+
| SET SESSION innodb_parallel_read_threads = 3; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`customer` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 4; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`lineitem` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 1; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`nation` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 4; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`orders` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 4; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`part` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 4; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`partsupp` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 1; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`region` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 1; |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`supplier` SECONDARY_LOAD; |
+------------------------------------------------------------------------------+
16 rows in set (0.3680 sec)
Once you are satisfied with the Auto Parallel Load
CALL
statement and the
generated load script, reissue the
CALL
statement in
normal
mode to load the data into MySQL HeatWave.
For example:
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"), JSON_OBJECT("mode","normal"));
Retrieve DDL statements in a table or use the following statements to produce a list of DDL statements to easily copy and paste.
mysql> SET SESSION group_concat_max_len = 1000000;
mysql> SELECT GROUP_CONCAT(log->>"$.sql" SEPARATOR ' ')
FROM sys.heatwave_autopilot_report
WHERE type = "sql" ORDER BY id;
*************************** 1. row ***************************
GROUP_CONCAT(log->>"$.sql" SEPARATOR ' '): SET SESSION innodb_parallel_read_threads = 3;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`customer` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`lineitem` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 1;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`nation` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`orders` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`part` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`partsupp` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 1;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`region` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 1;
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`supplier` SECONDARY_LOAD;
1 row in set (0.2575 sec)
The time required to load data depends on the data size. Auto Parallel Load provides an estimate of the time required to complete the load operation.
Tables are loaded in sequence, ordered by schema and table name. Load-time errors are reported as they are encountered. If an error is encountered while loading a table, the operation is not terminated. Auto Parallel Load continues running, moving on to the next table.
When Auto Parallel Load finishes running, it checks if tables are loaded and shows a summary with the number of tables that were loaded and the number of tables that failed to load.
The auto_enc
option is run in
check
mode by default to ensure that there
is enough memory for string column encoding.
The following example uses the auto_enc
option in check
mode, if you want to ensure
that there is sufficient memory for string column encoding
before attempting a load operation. Insufficient memory can
cause a load failure.
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"),
JSON_OBJECT("mode","dryrun","auto_enc",JSON_OBJECT("mode","check")));
+-----------------------------------------------------------------------------------------------------------------------------+
| CAPACITY ESTIMATION |
+-----------------------------------------------------------------------------------------------------------------------------+
| Default encoding for string columns: VARLEN (unless specified in the schema) |
| Estimating memory footprint for 1 schema(s) |
| |
| TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED |
| SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD |
| NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME |
| ------ ----------- --------- --------- ------- ---------- ------- --------- |
| `tpch` 8 1.79 GiB 1.15 GiB 29 24 5 1.25 min |
| |
| Sufficient MySQL host memory available to load all tables. |
| Sufficient HeatWave cluster memory available to load all tables. |
| |
+-----------------------------------------------------------------------------------------------------------------------------+
12 rows in set (0.2401 sec)
+--------------------------------------------------------------------------------------------------------------------+
| LOAD SCRIPT GENERATION |
+--------------------------------------------------------------------------------------------------------------------+
| Dryrun mode only generates the load script |
| Set mode to "normal" in options to load tables |
| |
| Retrieve load script containing 16 generated DDL commands using the query below: |
| Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report" |
| SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; |
| |
| Applying changes will take approximately 1.25 min |
| |
| Caution: Executing the generated load script may alter column definitions and secondary engine flags in the schema |
| |
| Total errors encountered: 0 |
| Total warnings encountered: 0 |
| |
+--------------------------------------------------------------------------------------------------------------------+
14 rows in set (0.2401 sec)
The auto_enc
option runs in
check
mode regardless of whether it is
specified explicitly in the Auto Parallel Load call statement.
Look for capacity estimation data in the Auto Parallel Load output. The results indicate whether there is sufficient memory to load all tables.
When MySQL runs Auto Parallel Load, it sends the output including execution
logs and a generated load script to the
MySQL HeatWave_load_report
table in the
sys
schema.
As of MySQL 8.0.32, the
heatwave_load_report
table is deprecated,
and replaced with
heatwave_autopilot_report
table in the sys
schema.
Learn how to load data manually.