Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.8Mb
PDF (A4) - 3.8Mb


4.3.11.2 Refresh Data Using Incremental Load

Available as of MySQL 9.0.0, Lakehouse Incremental Load allows you to refresh the data in an external table.

This topic contains the following sections:

Before You Begin
Refresh Data Overview

Lakehouse Incremental Load uses the existing engine_attribute settings from the initial load. Therefore, when you refresh data, you cannot include engine_attribute settings in subsequent loads of data.

You can add or remove individual files in Object Storage if they still match the defined file pattern or file prefix in the engine_attribute settings for the initial load. If you want to change the files in the engine_attribute settings for subsequent loads, see how to Add or Remove Files Using Selecting Load (supported as of MySQL 9.1.2).

To run Lakehouse Incremental Load, load the data using Lakehouse Auto Parallel Load and set the Auto Parallel Load refresh_external_tables option to true.

If a subsequent call to Auto Parallel Load includes tables that are not yet loaded, then Auto Parallel Load loads them for the first time.

A call to Auto Parallel Load might contain both loaded and unloaded tables. Those that are unloaded are loaded, and those that are already loaded are refreshed.

Perform Initial Load

The following example performs an initial automatic load of three files in an Object Storage folder, and then does a subsequent refresh of the data in the external table by loading an additional two files into the table. The example uses resource principals to load the data.

  1. Upload the files to load into Object Storage. This example creates the Object Storage folder data_files and uploads the files data_file_1.csv, data_file_2.csv, and data_file_3.csv.

  2. Connect to your MySQL HeatWave Database System.

  3. Create a session variable with the characteristics of the input_list to load the external files. To review all syntax options, see Auto Parallel Load Syntax.

    mysql> SET @input_list = '[{
      "db_name": "data_tables",
      "tables": [{
        "table_name": "data_table_1",
        "engine_attribute": {
          "dialect": {"format": "csv", "has_header": true},
          "file": [{"uri": "oci://mybucket@mynamespace/data_files/"}]
        }
      }]
    }]';

    Where:

    • @input_list is the name of the session variable.

    • db_name identifies the database name to store the table: data_tables. The database is automatically created if it does not exist.

    • table_name sets the table name to store the data: data_table_1. The table is automatically created if it does not exist.

    • engine_attribute defines the parameters of the external file.

    • format defines the format of the external file: csv.

    • has_header identifies a header in the external file. Auto Parallel Load then infers the column names from the first row in the file.

    • file defines the uniform resource identifiers (URI) to access the files. Replace these values with your own. The data_files/ prefix defines the Object Storage folder storing the files to load.

    If you are on MySQL 9.1.2 and earlier, you need to update dialect with the field delimiter and record delimiter parameters. As of MySQL 9.2.0, MySQL HeatWave Lakehouse can automatically detect these values. See Lakehouse External Table Syntax to learn more.

  4. Run the HEATWAVE_LOAD command. If you set the options to NULL, the data loads in normal mode by default.

    mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);

    Review the SCHEMA CREATION, TABLE LOAD, and LOAD SUMMARY sections for any errors or warnings during the load, and to confirm the external table loaded successfully.

  5. Optionally, use the new database and query the number of rows in the table to confirm data from all files loaded successfully.

    mysql> USE data_tables;
    mysql> SELECT COUNT(*) FROM data_table_1;
    +----------+
    | COUNT(*) |
    +----------+
    |     1384 |
    +----------+
Refresh Data

After doing the initial load of data, this example refreshes the data in the table data_table_1 with the loading of additional files into the table.

  1. Upload the new files to load into Object Storage. This example uploads two additional files to the same data_files Object Storage folder: data_file_4.csv and data_file_5.csv.

  2. Create a new session variable with the input_list to set up the incremental load. The configuration must not have any engine_attribute parameters.

    mysql> SET @input_list = '[{"db_name": "data_tables", "tables": [{"table_name": "data_table_1"}]}]';
  3. Create an options variable that includes the options to load the data in normal mode and enable refresh_external_tables.

    mysql> SET @options = JSON_OBJECT('mode', 'normal','refresh_external_tables', TRUE);
  4. Run the HEATWAVE_LOAD command with the input_list and options variables. Tables that are identified to be refreshed are marked with a + symbol. If the files for the table have not changed, then no changes are made to the table. The estimates in the output do not cover the tables that are refreshed.

    mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), @options);
    +------------------------------------------+
    | INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
    +------------------------------------------+
    | Version: 4.34                            |
    |                                          |
    | Load Mode: normal                        |
    | Load Policy: disable_unsupported_columns |
    | Output Mode: normal                      |
    |                                          |
    +------------------------------------------+
    
    +--------------------------------------------------------------------------------------------------------------------+
    | LAKEHOUSE AUTO SCHEMA INFERENCE                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    | Verifying external lakehouse tables: 1                                                                             |
    |                                                                                                                    |
    | SCHEMA                   TABLE                    TABLE IS           RAW     NUM. OF      ESTIMATED     SUMMARY OF |
    | NAME                     NAME                     CREATED      FILE SIZE     COLUMNS      ROW COUNT     ISSUES     |
    | ------                   -----                    --------     ---------     -------      ---------     ---------- |
    | `data_tables`            `data_table_1`           YES +        40.06 KiB           8         1.38 K                |
    |                                                                                                                    |
    | New schemas to be created: 0                                                                                       |
    | External lakehouse tables to be created: 0                                                                         |
    | External lakehouse tables to be refreshed (marked with +): 1                                                       |
    |                                                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    
    +------------------------------------------------------------------------+
    | OFFLOAD ANALYSIS                                                       |
    +------------------------------------------------------------------------+
    | Verifying input schemas: 1                                             |
    | User excluded items: 0                                                 |
    |                                                                        |
    | SCHEMA                       OFFLOADABLE    OFFLOADABLE     SUMMARY OF |
    | NAME                              TABLES        COLUMNS     ISSUES     |
    | ------                       -----------    -----------     ---------- |
    | `data_tables`                          1              8                |
    |                                                                        |
    | Total offloadable schemas: 1                                           |
    |                                                                        |
    +------------------------------------------------------------------------+
    
    +-----------------------------------------------------------------------------------------------------------------------------+
    | CAPACITY ESTIMATION                                                                                                         |
    +-----------------------------------------------------------------------------------------------------------------------------+
    | Default encoding for string columns: VARLEN (unless specified in the schema)                                                |
    | Estimating memory footprint for 1 schema(s)                                                                                 |
    | Estimates do not include 1 external lakehouse table(s) being refreshed.                                                     |
    |                                                                                                                             |
    |                                TOTAL       ESTIMATED       ESTIMATED       TOTAL     DICTIONARY      VARLEN       ESTIMATED |
    | SCHEMA                   OFFLOADABLE   HEATWAVE NODE      MYSQL NODE      STRING        ENCODED     ENCODED            LOAD |
    | NAME                          TABLES       FOOTPRINT       FOOTPRINT     COLUMNS        COLUMNS     COLUMNS            TIME |
    | ------                   -----------       ---------       ---------     -------     ----------     -------       --------- |
    | `data_tables`                      1         0 bytes         0 bytes           3              0           3            0 ps |
    |                                                                                                                             |
    | Sufficient MySQL host memory available to load all tables.                                                                  |
    | Sufficient HeatWave cluster memory available to load all tables.                                                            |
    |                                                                                                                             |
    +-----------------------------------------------------------------------------------------------------------------------------+
    
    +---------------------------------------------------------------------------------------------------------------------------------------+
    | EXECUTING LOAD SCRIPT                                                                                                                 |
    +---------------------------------------------------------------------------------------------------------------------------------------+
    | HeatWave Load script generated                                                                                                        |
    |   Retrieve load script containing 3 generated DDL command(s) using the query below:                                                   |
    | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report"                           |
    |   SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;                            |
    |                                                                                                                                       |
    | Adjusting load parallelism dynamically per internal/external table.                                                                   |
    | Using current parallelism of 32 thread(s) as maximum for internal tables.                                                             |
    |                                                                                                                                       |
    | Warning: Executing the generated script may alter column definitions and secondary engine flags in the schema                         |
    |                                                                                                                                       |
    | Using SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
    |                                                                                                                                       |
    | Proceeding to load 1 table(s) into HeatWave.                                                                                          |
    |                                                                                                                                       |
    | Applying changes will take approximately 26.00 ms                                                                                     |
    |                                                                                                                                       |
    +---------------------------------------------------------------------------------------------------------------------------------------+
    
    +----------------------------------------------+
    | TABLE LOAD                                   |
    +----------------------------------------------+
    | TABLE (1 of 1): `data_tables`.`data_table_1` |
    | Commands executed successfully: 3 of 3       |
    | Warnings encountered: 1                      |
    | Table load succeeded!                        |
    |   Total columns loaded: 8                    |
    |   Elapsed time: 3.66 s                       |
    |                                              |
    +----------------------------------------------+
    
    +----------------------------------------------------------------------------------+
    | LOAD SUMMARY                                                                     |
    +----------------------------------------------------------------------------------+
    |                                                                                  |
    | SCHEMA                          TABLES       TABLES      COLUMNS         LOAD    |
    | NAME                            LOADED       FAILED       LOADED     DURATION    |
    | ------                          ------       ------      -------     --------    |
    | `data_tables`                        1            0            8       3.66 s    |
    |                                                                                  |
    | Total errors encountered: 0                                                      |
    | Total warnings encountered: 2                                                    |
    | Retrieve the associated logs from the report table using the query below:        |
    |   SELECT log FROM sys.heatwave_autopilot_report WHERE type IN ('error', 'warn'); |
    |                                                                                  |
    +----------------------------------------------------------------------------------+
  5. Optionally, query the number of rows in the table again to confirm the data from the two new files successfully loaded.

    mysql> SELECT COUNT(*) FROM data_table_1;
    +----------+
    | COUNT(*) |
    +----------+
    |     4130 |
    +----------+