5.2.5.2 Loading Lakehouse Data

Using the defined options, run the load procedure in the following way:

mysql> CALL sys.heatwave_load(CAST(@input_list AS JSON), @options);

This example is run without a defined mode, and defaults to normal mode, generating the script and running it. If the mode is set to dryrun, the script is generated and made available to examine in the LOAD SCRIPT GENERATION section of the Auto Parallel Load process.

The procedure initializes, runs, and displays a report. The report is divided into the following sections:

  • INITIALIZING HEATWAVE AUTO PARALLEL LOAD: Lists the load mode, policy, and output mode.

    For example:

    +------------------------------------------+
    | INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
    +------------------------------------------+
    | Version: 3.11                            |
    |                                          |
    | Load Mode: normal                        |
    | Load Policy: disable_unsupported_columns |
    | Output Mode: normal                      |
    |                                          |
    +------------------------------------------+
    6 rows in set (0.02 sec)
  • LAKEHOUSE AUTO SCHEMA INFERENCE: Displays the details of the table, how many rows and columns it contains, its file size, and the name of the schema.

    For example:

    +--------------------------------------------------------------------------------------------------------------------+
    | LAKEHOUSE AUTO SCHEMA INFERENCE                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    | Verifying external lakehouse tables: 4                                                                             |
    |                                                                                                                    |
    | SCHEMA                   TABLE                    TABLE IS           RAW     NUM. OF      ESTIMATED     SUMMARY OF |
    | NAME                     NAME                     CREATED      FILE SIZE     COLUMNS      ROW COUNT     ISSUES     |
    | ------                   -----                    --------     ---------     -------      ---------     ---------- |
    | `tpch`                   `customer_csv`           NO          232.71 GiB           8          1.5 B                |
    | `tpch`                   `nation_json`            NO            3.66 KiB           1             25                |
    | `tpch`                   `region_avro`            NO           476 bytes           3              9                |
    | `tpch`                   `supplier_pq`            NO            7.46 GiB           7          100 M                |
    |                                                                                                                    |
    | New schemas to be created: 1                                                                                       |
    | External lakehouse tables to be created: 4                                                                         |
    |                                                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    13 rows in set (21.06 sec)
  • OFFLOAD ANALYSIS: Displays an analysis of the number and name of the tables and columns which can be offloaded to HeatWave.

    For example:

    +------------------------------------------------------------------------+
    | OFFLOAD ANALYSIS                                                       |
    +------------------------------------------------------------------------+
    | Verifying input schemas: 1                                             |
    | User excluded items: 0                                                 |
    |                                                                        |
    | SCHEMA                       OFFLOADABLE    OFFLOADABLE     SUMMARY OF |
    | NAME                              TABLES        COLUMNS     ISSUES     |
    | ------                       -----------    -----------     ---------- |
    | `tpch`                                 4             19                |
    |                                                                        |
    | Total offloadable schemas: 1                                           |
    |                                                                        |
    +------------------------------------------------------------------------+
    10 rows in set (21.09 sec)
  • CAPACITY ESTIMATION: Displays the HeatWave cluster and MySQL node memory requirement to process the data and an estimation of the load time.

    For example:

    +-----------------------------------------------------------------------------------------------------------------------------+
    | CAPACITY ESTIMATION                                                                                                         |
    +-----------------------------------------------------------------------------------------------------------------------------+
    | Default encoding for string columns: VARLEN (unless specified in the schema)                                                |
    | Estimating memory footprint for 1 schema(s)                                                                                 |
    |                                                                                                                             |
    |                                TOTAL       ESTIMATED       ESTIMATED       TOTAL     DICTIONARY      VARLEN       ESTIMATED |
    | SCHEMA                   OFFLOADABLE   HEATWAVE NODE      MYSQL NODE      STRING        ENCODED     ENCODED            LOAD |
    | NAME                          TABLES       FOOTPRINT       FOOTPRINT     COLUMNS        COLUMNS     COLUMNS            TIME |
    | ------                   -----------       ---------       ---------     -------     ----------     -------       --------- |
    | `tpch`                             4      193.39 GiB        1.44 MiB          12              0          12       22.08 min |
    |                                                                                                                             |
    | Sufficient MySQL host memory available to load all tables.                                                                  |
    | Sufficient HeatWave cluster memory available to load all tables.                                                            |
    |                                                                                                                             |
    +-----------------------------------------------------------------------------------------------------------------------------+
    12 rows in set (21.10 sec)
    Note

    If there is insufficient memory, update the nodes before proceeding with the load.

  • EXECUTING LOAD: Displays information about the generated script and approximate loading time. For example:

    +---------------------------------------------------------------------------------------------------------------------------------------+
    | EXECUTING LOAD SCRIPT                                                                                                                 |
    +---------------------------------------------------------------------------------------------------------------------------------------+
    | HeatWave Load script generated                                                                                                        |
    |   Retrieve load script containing 9 generated DDL command(s) using the query below:                                                   |
    | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report"                           |
    |   SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;                            |
    |                                                                                                                                       |
    | Adjusting load parallelism dynamically per internal/external table.                                                                   |
    | Using current parallelism of 4 thread(s) as maximum for internal tables.                                                              |
    |                                                                                                                                       |
    | Warning: Executing the generated script may alter column definitions and secondary engine flags in the schema                         |
    |                                                                                                                                       |
    | Using SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
    |                                                                                                                                       |
    | Proceeding to load 4 table(s) into HeatWave.                                                                                          |
    |                                                                                                                                       |
    | Applying changes will take approximately 22.08 min                                                                                    |
    |                                                                                                                                       |
    +---------------------------------------------------------------------------------------------------------------------------------------+
    16 rows in set (22.08 sec)
  • SCHEMA CREATION: Displays information about the schema creation process and duration.

    For example:

    +-----------------------------------+
    | SCHEMA CREATION                   |
    +-----------------------------------+
    | Schema `tpch` creation succeeded! |
    | Warnings/errors encountered: 0    |
    | Elapsed time: 2.62 ms             |
    |                                   |
    +-----------------------------------+
    4 rows in set (14.70 sec)
  • LOADING TABLE: Displays information on the table load process.

    For example:

    +----------------------------------------+
    | TABLE LOAD                             |
    +----------------------------------------+
    | TABLE (1 of 4): `tpch`.`customer_csv`  |
    | Commands executed successfully: 2 of 2 |
    | Warnings encountered: 0                |
    | Table load succeeded!                  |
    |   Total columns loaded: 8              |
    |   Elapsed time: 19.33 min              |
    |                                        |
    +----------------------------------------+
    7 rows in set (19 min 41.74 sec)
    
    +----------------------------------------+
    | TABLE LOAD                             |
    +----------------------------------------+
    | TABLE (2 of 4): `tpch`.`nation_json`   |
    | Commands executed successfully: 2 of 2 |
    | Warnings encountered: 0                |
    | Table load succeeded!                  |
    |   Total columns loaded: 1              |
    |   Elapsed time: 3.70 s                 |
    |                                        |
    +----------------------------------------+
    7 rows in set (19 min 45.44 sec)
    
    +----------------------------------------+
    | TABLE LOAD                             |
    +----------------------------------------+
    | TABLE (3 of 4): `tpch`.`region_avro`   |
    | Commands executed successfully: 2 of 2 |
    | Warnings encountered: 0                |
    | Table load succeeded!                  |
    |   Total columns loaded: 3              |
    |   Elapsed time: 3.79 s                 |
    |                                        |
    +----------------------------------------+
    7 rows in set (19 min 49.24 sec)
    
    +----------------------------------------+
    | TABLE LOAD                             |
    +----------------------------------------+
    | TABLE (4 of 4): `tpch`.`supplier_pq`   |
    | Commands executed successfully: 2 of 2 |
    | Warnings encountered: 0                |
    | Table load succeeded!                  |
    |   Total columns loaded: 7              |
    |   Elapsed time: 1.96 min               |
    |                                        |
    +----------------------------------------+
    7 rows in set (21 min 46.98 sec)
  • LOAD SUMMARY: Displays a summary of the load process.

    For example:

    +-------------------------------------------------------------------------------+
    | LOAD SUMMARY                                                                  |
    +-------------------------------------------------------------------------------+
    |                                                                               |
    | SCHEMA                          TABLES       TABLES      COLUMNS         LOAD |
    | NAME                            LOADED       FAILED       LOADED     DURATION |
    | ------                          ------       ------      -------     -------- |
    | `tpch`                               4            0           19    21.41 min |
    |                                                                               |
    +-------------------------------------------------------------------------------+
    6 rows in set (21 min 46.98 sec)