Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.8Mb
PDF (A4) - 3.8Mb


HeatWave User Guide  /  ...  /  Load Data Using Auto Parallel Load

4.2.5 Load Data Using Auto Parallel Load

You can load data into a MySQL HeatWave Cluster by using the Auto Parallel Load feature.

Auto Parallel Load, which can be run manually from any MySQL client or connector, is implemented as a stored procedure named heatwave_load, which resides in the MySQL sys schema. Running Auto Parallel Load involves issuing a CALL statement for the stored procedure, which takes schemas and options as arguments.

Load Data Using Auto Parallel Load

First, run Auto Parallel Load in dryrun mode to check for errors and warnings and to inspect the generated load script. To load a single schema in dryrun mode:

mysql> CALL sys.heatwave_load(JSON_ARRAY("schema_name"), JSON_OBJECT("mode","dryrun"));
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"), JSON_OBJECT("mode","dryrun"));
+-----------------------------------------------------------------------------------------------------------------------------+
| CAPACITY ESTIMATION                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------+
| Default encoding for string columns: VARLEN (unless specified in the schema)                                                |
| Estimating memory footprint for 1 schema(s)                                                                                 |
|                                                                                                                             |
|                                TOTAL       ESTIMATED       ESTIMATED       TOTAL     DICTIONARY      VARLEN       ESTIMATED |
| SCHEMA                   OFFLOADABLE   HEATWAVE NODE      MYSQL NODE      STRING        ENCODED     ENCODED            LOAD |
| NAME                          TABLES       FOOTPRINT       FOOTPRINT     COLUMNS        COLUMNS     COLUMNS            TIME |
| ------                   -----------       ---------       ---------     -------     ----------     -------       --------- |
| `tpch`                             8        1.79 GiB        1.15 GiB          29             24           5        1.25 min |
|                                                                                                                             |
| Sufficient MySQL host memory available to load all tables.                                                                  |
| Sufficient HeatWave cluster memory available to load all tables.                                                            |
|                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------+
12 rows in set (0.2959 sec)

+--------------------------------------------------------------------------------------------------------------------+
| LOAD SCRIPT GENERATION                                                                                             |
+--------------------------------------------------------------------------------------------------------------------+
| Dryrun mode only generates the load script                                                                         |
| Set mode to "normal" in options to load tables                                                                     |
|                                                                                                                    |
| Retrieve load script containing 16 generated DDL commands using the query below:                                   |
| Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report"        |
|   SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;         |
|                                                                                                                    |
| Applying changes will take approximately 1.25 min                                                                  |
|                                                                                                                    |
| Caution: Executing the generated load script may alter column definitions and secondary engine flags in the schema |
|                                                                                                                    |
| Total errors encountered: 0                                                                                        |
| Total warnings encountered: 0                                                                                      |
|                                                                                                                    |
+--------------------------------------------------------------------------------------------------------------------+
14 rows in set (0.2959 sec)

This statement loads tpch schema in dryrun mode, and displays the load summary

In dryrun mode, Auto Parallel Load sends the load script to the heatwave_autopilot_report table only. See Section 5.8.8, “Autopilot Report Table”. It does not load data into MySQL HeatWave.

If Auto Parallel Load fails with an error, inspect the errors with a query to the heatwave_autopilot_report table.

mysql> SELECT log FROM sys.heatwave_autopilot_report
      WHERE type="error";

When Auto Parallel Load finishes running, query the heatwave_autopilot_report table to check for warnings.

mysql> SELECT log FROM sys.heatwave_autopilot_report
      WHERE type="warn";

Issue the following query to inspect the load script that was generated.

mysql> SELECT log->>"$.sql" AS "Load Script"
      FROM sys.heatwave_autopilot_report 
      WHERE type = "sql" ORDER BY id;
+------------------------------------------------------------------------------+
| Load Script                                                                  |
+------------------------------------------------------------------------------+
| SET SESSION innodb_parallel_read_threads = 3;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`customer` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 4;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`lineitem` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 1;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`nation` SECONDARY_LOAD;   |
| SET SESSION innodb_parallel_read_threads = 4;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`orders` SECONDARY_LOAD;   |
| SET SESSION innodb_parallel_read_threads = 4;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`part` SECONDARY_LOAD;     |
| SET SESSION innodb_parallel_read_threads = 4;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`partsupp` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 1;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`region` SECONDARY_LOAD;   |
| SET SESSION innodb_parallel_read_threads = 1;                                |
| ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`supplier` SECONDARY_LOAD; |
+------------------------------------------------------------------------------+
16 rows in set (0.3680 sec)

Once you are satisfied with the Auto Parallel Load CALL statement and the generated load script, reissue the CALL statement in normal mode to load the data into MySQL HeatWave. For example:

mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"), JSON_OBJECT("mode","normal"));
Note

Retrieve DDL statements in a table or use the following statements to produce a list of DDL statements to easily copy and paste.

mysql> SET SESSION group_concat_max_len = 1000000;
mysql> SELECT GROUP_CONCAT(log->>"$.sql" SEPARATOR ' ')
      FROM sys.heatwave_autopilot_report
      WHERE type = "sql" ORDER BY id;
*************************** 1. row ***************************
GROUP_CONCAT(log->>"$.sql" SEPARATOR ' '): SET SESSION innodb_parallel_read_threads = 3; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`customer` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`lineitem` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 1; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`nation` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`orders` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`part` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 4; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`partsupp` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 1; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`region` SECONDARY_LOAD; SET SESSION innodb_parallel_read_threads = 1; 
ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `tpch`.`supplier` SECONDARY_LOAD;
1 row in set (0.2575 sec)

The time required to load data depends on the data size. Auto Parallel Load provides an estimate of the time required to complete the load operation.

Tables are loaded in sequence, ordered by schema and table name. Load-time errors are reported as they are encountered. If an error is encountered while loading a table, the operation is not terminated. Auto Parallel Load continues running, moving on to the next table.

When Auto Parallel Load finishes running, it checks if tables are loaded and shows a summary with the number of tables that were loaded and the number of tables that failed to load.

Memory Estimation for String Column Encoding

The auto_enc option is run in check mode by default to ensure that there is enough memory for string column encoding.

The following example uses the auto_enc option in check mode, if you want to ensure that there is sufficient memory for string column encoding before attempting a load operation. Insufficient memory can cause a load failure.

mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"), 
      JSON_OBJECT("mode","dryrun","auto_enc",JSON_OBJECT("mode","check")));
+-----------------------------------------------------------------------------------------------------------------------------+
| CAPACITY ESTIMATION                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------+
| Default encoding for string columns: VARLEN (unless specified in the schema)                                                |
| Estimating memory footprint for 1 schema(s)                                                                                 |
|                                                                                                                             |
|                                TOTAL       ESTIMATED       ESTIMATED       TOTAL     DICTIONARY      VARLEN       ESTIMATED |
| SCHEMA                   OFFLOADABLE   HEATWAVE NODE      MYSQL NODE      STRING        ENCODED     ENCODED            LOAD |
| NAME                          TABLES       FOOTPRINT       FOOTPRINT     COLUMNS        COLUMNS     COLUMNS            TIME |
| ------                   -----------       ---------       ---------     -------     ----------     -------       --------- |
| `tpch`                             8        1.79 GiB        1.15 GiB          29             24           5        1.25 min |
|                                                                                                                             |
| Sufficient MySQL host memory available to load all tables.                                                                  |
| Sufficient HeatWave cluster memory available to load all tables.                                                            |
|                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------+
12 rows in set (0.2401 sec)

+--------------------------------------------------------------------------------------------------------------------+
| LOAD SCRIPT GENERATION                                                                                             |
+--------------------------------------------------------------------------------------------------------------------+
| Dryrun mode only generates the load script                                                                         |
| Set mode to "normal" in options to load tables                                                                     |
|                                                                                                                    |
| Retrieve load script containing 16 generated DDL commands using the query below:                                   |
| Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report"        |
|   SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;         |
|                                                                                                                    |
| Applying changes will take approximately 1.25 min                                                                  |
|                                                                                                                    |
| Caution: Executing the generated load script may alter column definitions and secondary engine flags in the schema |
|                                                                                                                    |
| Total errors encountered: 0                                                                                        |
| Total warnings encountered: 0                                                                                      |
|                                                                                                                    |
+--------------------------------------------------------------------------------------------------------------------+
14 rows in set (0.2401 sec)
Note

The auto_enc option runs in check mode regardless of whether it is specified explicitly in the Auto Parallel Load call statement.

Look for capacity estimation data in the Auto Parallel Load output. The results indicate whether there is sufficient memory to load all tables.

Auto Parallel Load Report Table

When MySQL runs Auto Parallel Load, it sends the output including execution logs and a generated load script to the MySQL HeatWave_load_report table in the sys schema.

As of MySQL 8.0.32, the heatwave_load_report table is deprecated, and replaced with heatwave_autopilot_report table in the sys schema.

What's Next