Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.8Mb
PDF (A4) - 3.8Mb


4.3.8.1 Load Data from Object Storage

You can use Auto Parallel Load to load data from Object Storage into MySQL HeatWave Cluster.

This topic contains the following sections:

Lakehouse Auto Parallel Load Syntax

The following example shows how to load data with Auto Parallel Load in one command using an OCIFS URI. Since no dialect parameters are included, Lakehouse loads a CSV file by default with all default settings. Options are set to NULL.

mysql> CALL sys.HEATWAVE_LOAD(
  '[{"db_name": "db_1",
  "tables": [{
    "table_name": "table_1",
    "engine_attribute": {
      "file": [{"uri": "oci://bucket_1@tenant_1/data_file_1.csv"}]
    }
  }]
}]', NULL);

While you can specify all parameters in one command, it might be easier for you to first create the input_list and store it in a variable. You can then pass the variable in the HEATWAVE_LOAD command.

The following example uses the same load command, but creates the variable with the input_list and then passes that in the HEATWAVE_LOAD command.

mysql> SET @input_list = '[{
  "db_name": "db_1",
  "tables": [{
    "table_name": "table_1",
    "engine_attribute": {
      "file": [{"uri": "oci://bucket_1@tenant_1/data_file_1.csv"}]
    }
  }]
}]';
mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);

For more examples, review how to Use URI to Load Data with Lakehouse Auto Parallel Load.

Load External Data Using Lakehouse Auto Parallel Load

To demonstrate how to load data using Lakehouse Auto Parallel Load, the following example loads a single file and uses a pre-authenticated request (PAR).

The CSV file in this example is from Bank Marketing. To use this file, visit Bank Marketing and download the bank+marketing.zip file. Unzip the file, and then unzip the bank.zip file. Refer to the bank.csv file.

To load external data using Lakehouse Auto Parallel Load:

  1. Prepare the files to load in the proper format. See Supported File Formats.

  2. Upload the files to load into Object Storage. See Uploading an Object Storage Object to a Bucket in Oracle Cloud Infrastructure Documentation.

  3. Select the method to load the files: PAR, resource principals, or uniform resource identifier (URI). To learn more about each method, see the following:

  4. Connect to your MySQL HeatWave Database System.

  5. Create a session variable with the characteristics of the input_list to load and the external file. To review all syntax options, see Auto Parallel Load Syntax.

    mysql> SET @input_list = '[{
      "db_name": "bank_marketing",
      "tables": [{
        "table_name": "bank_train",
        "engine_attribute": {
          "dialect": {"format": "csv", "has_header": true},
          "file": [{"par": "pre_authenticated_request"}]
        }
      }]
    }]';

    Where:

    • @input_list is the name of the session variable.

    • db_name identifies the database name to store the table: bank_marketing. The database is automatically created if it does not exist.

    • table_name sets the table name to store the data: bank_train. The table is automatically created if it does not exist.

    • engine_attribute defines the parameters of the external file.

    • format defines the format of the external file: csv.

    • has_header identifies a header in the external file. Auto Parallel Load then infers the column names from the first row in the file.

    • par sets the pre-authenticated request link to access the file. Replace pre_authenticated_request with your own link.

    If you are on MySQL 9.1.2 and earlier, you need to update dialect with the field delimiter and record delimiter parameters. As of MySQL 9.2.0, MySQL HeatWave Lakehouse can automatically detect these values. See Lakehouse External Table Syntax to learn more.

  6. Use the HEATWAVE_LOAD command and use the dryrun mode option to generate a summary, so you can review the details and address any issues before loading the data. To skip this step, run the command in normal mode. Insert the input_list variable previously created into the command.

    mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), JSON_OBJECT('mode', 'dryrun'));
    +------------------------------------------+
    | INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
    +------------------------------------------+
    | Version: 4.31                            |
    |                                          |
    | Load Mode: dryrun                        |
    | Load Policy: disable_unsupported_columns |
    | Output Mode: normal                      |
    |                                          |
    +------------------------------------------+
    
    +--------------------------------------------------------------------------------------------------------------------+
    | LAKEHOUSE AUTO SCHEMA INFERENCE                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    | Verifying external lakehouse tables: 1                                                                             |
    |                                                                                                                    |
    | SCHEMA                   TABLE                    TABLE IS           RAW     NUM. OF      ESTIMATED     SUMMARY OF |
    | NAME                     NAME                     CREATED      FILE SIZE     COLUMNS      ROW COUNT     ISSUES     |
    | ------                   -----                    --------     ---------     -------      ---------     ---------- |
    | `bank_marketing`         `bank_train`             NO          450.66 KiB          17         4.52 K                |
    |                                                                                                                    |
    | New schemas to be created: 1                                                                                       |
    | External lakehouse tables to be created: 1                                                                         |
    |                                                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    
    +------------------------------------------------------------------------+
    | OFFLOAD ANALYSIS                                                       |
    +------------------------------------------------------------------------+
    | Verifying input schemas: 1                                             |
    | User excluded items: 0                                                 |
    |                                                                        |
    | SCHEMA                       OFFLOADABLE    OFFLOADABLE     SUMMARY OF |
    | NAME                              TABLES        COLUMNS     ISSUES     |
    | ------                       -----------    -----------     ---------- |
    | `bank_marketing`                       1             17                |
    |                                                                        |
    | Total offloadable schemas: 1                                           |
    |                                                                        |
    +------------------------------------------------------------------------+
    
    +-----------------------------------------------------------------------------------------------------------------------------+
    | CAPACITY ESTIMATION                                                                                                         |
    +-----------------------------------------------------------------------------------------------------------------------------+
    | Default encoding for string columns: VARLEN (unless specified in the schema)                                                |
    | Estimating memory footprint for 1 schema(s)                                                                                 |
    |                                                                                                                             |
    |                                TOTAL       ESTIMATED       ESTIMATED       TOTAL     DICTIONARY      VARLEN       ESTIMATED |
    | SCHEMA                   OFFLOADABLE   HEATWAVE NODE      MYSQL NODE      STRING        ENCODED     ENCODED            LOAD |
    | NAME                          TABLES       FOOTPRINT       FOOTPRINT     COLUMNS        COLUMNS     COLUMNS            TIME |
    | ------                   -----------       ---------       ---------     -------     ----------     -------       --------- |
    | `bank_marketing`                   1        4.65 MiB        1.12 MiB          17              0          17          7.00 s |
    |                                                                                                                             |
    | Sufficient MySQL host memory available to load all tables.                                                                  |
    | Sufficient HeatWave cluster memory available to load all tables.                                                            |
    |                                                                                                                             |
    +-----------------------------------------------------------------------------------------------------------------------------+
    
    +--------------------------------------------------------------------------------------------------------------------+
    | LOAD SCRIPT GENERATION                                                                                             |
    +--------------------------------------------------------------------------------------------------------------------+
    | Dryrun mode only generates the load script                                                                         |
    | Set mode to "normal" in options to load tables                                                                     |
    |                                                                                                                    |
    | Retrieve load script containing 3 generated DDL commands using the query below:                                    |
    | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report"        |
    |   SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;         |
    |                                                                                                                    |
    | Applying changes will take approximately 7.00 s                                                                    |
    |                                                                                                                    |
    | Caution: Executing the generated load script may alter column definitions and secondary engine flags in the schema |
    |                                                                                                                    |
    | Total errors encountered: 0                                                                                        |
    | Total warnings encountered: 0                                                                                      |
    |                                                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+

    Review the following sections in the load script:

    • LAKEHOUSE AUTO SCHEMA INFERENCE: Displays the database name, table table, file size, number of columns, and the estimated number of rows. Any issues display here.

    • OFFLOAD ANALYSIS: Confirms the number of tables and columns that Lakehouse can load to MySQL HeatWave.

    • CAPACITY ESTIMATION: Displays the memory requirements and estimated load time.

      If there is insufficient memory, you can update the number of nodes for the DB System. See Editing a MySQL HeatWave Cluster in the MySQL HeatWave on OCI Service Guide or Editing a MySQL HeatWave Cluster in the MySQL HeatWave on AWS Service Guide.

  7. If there are no issues or errors in the summary, you can run the HEATWAVE_LOAD command again in normal mode to load the data. If you set the options to NULL, the data loads in normal mode by default.

    mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
    +------------------------------------------+
    | INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
    +------------------------------------------+
    | Version: 4.31                            |
    |                                          |
    | Load Mode: normal                        |
    | Load Policy: disable_unsupported_columns |
    | Output Mode: normal                      |
    |                                          |
    +------------------------------------------+
    
    +--------------------------------------------------------------------------------------------------------------------+
    | LAKEHOUSE AUTO SCHEMA INFERENCE                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    | Verifying external lakehouse tables: 1                                                                             |
    |                                                                                                                    |
    | SCHEMA                   TABLE                    TABLE IS           RAW     NUM. OF      ESTIMATED     SUMMARY OF |
    | NAME                     NAME                     CREATED      FILE SIZE     COLUMNS      ROW COUNT     ISSUES     |
    | ------                   -----                    --------     ---------     -------      ---------     ---------- |
    | `bank_marketing`         `bank_train`             NO          450.66 KiB          17         4.52 K                |
    |                                                                                                                    |
    | New schemas to be created: 1                                                                                       |
    | External lakehouse tables to be created: 1                                                                         |
    |                                                                                                                    |
    +--------------------------------------------------------------------------------------------------------------------+
    
    +------------------------------------------------------------------------+
    | OFFLOAD ANALYSIS                                                       |
    +------------------------------------------------------------------------+
    | Verifying input schemas: 1                                             |
    | User excluded items: 0                                                 |
    |                                                                        |
    | SCHEMA                       OFFLOADABLE    OFFLOADABLE     SUMMARY OF |
    | NAME                              TABLES        COLUMNS     ISSUES     |
    | ------                       -----------    -----------     ---------- |
    | `bank_marketing`                       1             17                |
    |                                                                        |
    | Total offloadable schemas: 1                                           |
    |                                                                        |
    +------------------------------------------------------------------------+
    
    +-----------------------------------------------------------------------------------------------------------------------------+
    | CAPACITY ESTIMATION                                                                                                         |
    +-----------------------------------------------------------------------------------------------------------------------------+
    | Default encoding for string columns: VARLEN (unless specified in the schema)                                                |
    | Estimating memory footprint for 1 schema(s)                                                                                 |
    |                                                                                                                             |
    |                                TOTAL       ESTIMATED       ESTIMATED       TOTAL     DICTIONARY      VARLEN       ESTIMATED |
    | SCHEMA                   OFFLOADABLE   HEATWAVE NODE      MYSQL NODE      STRING        ENCODED     ENCODED            LOAD |
    | NAME                          TABLES       FOOTPRINT       FOOTPRINT     COLUMNS        COLUMNS     COLUMNS            TIME |
    | ------                   -----------       ---------       ---------     -------     ----------     -------       --------- |
    | `bank_marketing`                   1        4.65 MiB        1.12 MiB          17              0          17          7.00 s |
    |                                                                                                                             |
    | Sufficient MySQL host memory available to load all tables.                                                                  |
    | Sufficient HeatWave cluster memory available to load all tables.                                                            |
    |                                                                                                                             |
    +-----------------------------------------------------------------------------------------------------------------------------+
    
    +---------------------------------------------------------------------------------------------------------------------------------------+
    | EXECUTING LOAD SCRIPT                                                                                                                 |
    +---------------------------------------------------------------------------------------------------------------------------------------+
    | HeatWave Load script generated                                                                                                        |
    |   Retrieve load script containing 3 generated DDL command(s) using the query below:                                                   |
    | Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report"                           |
    |   SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;                            |
    |                                                                                                                                       |
    | Adjusting load parallelism dynamically per internal/external table.                                                                   |
    | Using current parallelism of 32 thread(s) as maximum for internal tables.                                                             |
    |                                                                                                                                       |
    | Warning: Executing the generated script may alter column definitions and secondary engine flags in the schema                         |
    |                                                                                                                                       |
    | Using SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
    |                                                                                                                                       |
    | Proceeding to load 1 table(s) into HeatWave.                                                                                          |
    |                                                                                                                                       |
    | Applying changes will take approximately 7.03 s                                                                                       |
    |                                                                                                                                       |
    +---------------------------------------------------------------------------------------------------------------------------------------+
    
    +---------------------------------------------+
    | SCHEMA CREATION                             |
    +---------------------------------------------+
    | Schema `bank_marketing` creation succeeded! |
    | Warnings/errors encountered: 0              |
    | Elapsed time: 4.91 ms                       |
    |                                             |
    +---------------------------------------------+
    
    +-----------------------------------------------+
    | TABLE LOAD                                    |
    +-----------------------------------------------+
    | TABLE (1 of 1): `bank_marketing`.`bank_train` |
    | Commands executed successfully: 2 of 2        |
    | Warnings encountered: 0                       |
    | Table load succeeded!                         |
    |   Total columns loaded: 17                    |
    |   Elapsed time: 4.36 s                        |
    |                                               |
    +-----------------------------------------------+
    
    +-------------------------------------------------------------------------------+
    | LOAD SUMMARY                                                                  |
    +-------------------------------------------------------------------------------+
    |                                                                               |
    | SCHEMA                          TABLES       TABLES      COLUMNS         LOAD |
    | NAME                            LOADED       FAILED       LOADED     DURATION |
    | ------                          ------       ------      -------     -------- |
    | `bank_marketing`                     1            0           17       4.36 s |
    |                                                                               |
    | Total errors encountered: 0                                                   |
    | Total warnings encountered: 0                                                 |
    |                                                                               |
    +-------------------------------------------------------------------------------+

    Review the SCHEMA CREATION, TABLE LOAD, and LOAD SUMMARY sections for any errors or warnings during the load, and to confirm the external table loaded successfully.

  8. Optionally, generate a load script to review a summary of the database and table created for the load. The output is similar to the following example.

    mysql> SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;
    +-----------------------------------------------------------------------------------------------------------------------------+
    | Load Script                                                                                                                 |
    +-----------------------------------------------------------------------------------------------------------------------------+
    | CREATE DATABASE `tpch`;                                                                                                     |
    | CREATE TABLE `bank_marketing`.`bank_train`(                                                                                 |
    |   `age` tinyint unsigned NOT NULL,                                                                                          |
    |   `job` varchar(13) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                        |
    |   `marital` varchar(8) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                     |
    |   `education` varchar(9) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                   |
    |   `default` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                     |
    |   `balance` mediumint NOT NULL,                                                                                             |
    |   `housing` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                     |
    |   `loan` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                        |
    |   `contact` varchar(9) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                     |
    |   `day` tinyint unsigned NOT NULL,                                                                                          |
    |   `month` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                       |
    |   `duration` smallint unsigned NOT NULL,                                                                                    |
    |   `campaign` tinyint unsigned NOT NULL,                                                                                     |
    |   `pdays` smallint NOT NULL,                                                                                                |
    |   `previous` tinyint unsigned NOT NULL,                                                                                     |
    |   `poutcome` varchar(7) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN',                                                    |
    |   `y` varchar(3) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=VARLEN')                                                           |
    |   ENGINE=lakehouse SECONDARY_ENGINE=RAPID                                                                                   |
    |   ENGINE_ATTRIBUTE='{"file": [{"par": "https://objectstorage.../bank.csv"}],                                                |
    |                      "dialect": {"format": "csv", "has_header": true, "field_delimiter": ";", "record_delimiter": "\\n"}}'; |
    |   ALTER TABLE /*+ AUTOPILOT_DISABLE_CHECK */ `bank_marketing`.`bank_train` SECONDARY_LOAD;                                  |
    +-----------------------------------------------------------------------------------------------------------------------------+
  9. Optionally, query five rows of the table to confirm the data is loaded and accessible.

    mysql> SELECT * FROM bank_marketing.bank_train LIMIT 5;
    +-----+--------------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+----+
    | age | job          | marital  | education | default | balance | housing | loan | contact  | day | month | duration | campaign | pdays | previous | poutcome | y  |
    +-----+--------------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+----+
    |  37 | technician   | single   | secondary | no      |     267 | no      | no   | cellular |  13 | may   |      121 |        1 |    -1 |        0 | unknown  | no |
    |  34 | technician   | single   | tertiary  | no      |     586 | yes     | no   | unknown  |  16 | may   |      489 |        2 |    -1 |        0 | unknown  | no |
    |  30 | blue-collar  | single   | secondary | no      |     648 | yes     | yes  | unknown  |   8 | may   |      287 |        1 |    -1 |        0 | unknown  | no |
    |  57 | entrepreneur | divorced | secondary | yes     |      25 | yes     | no   | cellular |  11 | may   |      652 |        1 |   370 |        4 | other    | no |
    |  26 | admin.       | single   | secondary | no      |      96 | yes     | no   | cellular |  13 | may   |      612 |        2 |   348 |        3 | other    | no |
    +-----+--------------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+----+

    Once you confirm the table successfully loaded into Lakehouse, you can use the data for the following:

What's Next

Review different ways to specify files to load using Lakehouse Auto Parallel Load: