Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.8Mb
PDF (A4) - 3.8Mb


4.3.9.1 Load Data from Object Storage

If you choose not to Load Structured Data Using Lakehouse Auto Parallel Load, you can load external data manually.

This topic contains the following sections:

Before You Begin
CREATE TABLE Statement

Depending on the version of MySQL you are using, use the appropriate CREATE TABLE statement.

  • As of MySQL 9.4.0, you can use the CREATE EXTERNAL TABLE statement, which automatically sets ENGINE to lakehouse, and SECONDARY_ENGINE to rapid.

  • In versions earlier than MySQL 9.4.0, you must use the CREATE TABLE statement, and manually set ENGINE to lakehouse, and SECONDARY_ENGINE to rapid.

For example, you can use the following command in MySQL 9.4.0:

mysql> CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int)
        ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, 
                           "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';

For earlier versions, you must use the following command:

mysql> CREATE TABLE table_1(col_1 int, col_2 int, col_3 int)
        ENGINE=lakehouse
        SECONDARY_ENGINE = rapid
        ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, 
                           "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
JSON Syntax and SQL Syntax

As of MySQL 9.4.0, you can set options when creating external tables using SQL syntax. For earlier versions, you must set options using JSON syntax.

The following example sets options usings SQL syntax:

mysql> CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int)
        FILE_FORMAT = (FORMAT csv)
        FILES = (URI = 'oci://mybucket@mynamespace/data_files/data_file_1.csv');

The following example sets options using JSON syntax:

mysql> CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int)
        ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, 
                           "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';

To review external table options in JSON syntax, see Lakehouse External Table JSON Syntax.

To review external table options in SQL syntax, see Lakehouse External Table SQL Syntax.

Loading Tables Statement

To load an external table into MySQL HeatWave, specify the SECONDARY_LOAD clause in an ALTER TABLE statement and use the following syntax.

ALTER TABLE table_name SECONDARY_LOAD [GUIDED {ON | OFF}];

Specify the following clauses in the ALTER TABLE statement:

  • table_name: Specify the name of the table to load to MySQL HeatWave.

  • GUIDED: Available as of MySQL 9.4.1. Allows you to disable and enable Guided Load as needed. Set to OFF to disable Guided Load. By default, Guided Load is enabled.

Guided Load

Available as of MySQL 8.2.0, the Guided Load feature performs a set of checks and validations before loading data.

These checks include the following:

  • Automatically detect tables and columns that cannot be loaded. If there are tables and columns that are not compatible, stop the load.

  • Automatically set SECONDARY_ENGINE to rapid.

  • Detect any errors with ENGINE_ATTRIBUTE and report them.

  • Infer the table definition and make any necessary adjustments before loading data. These adjustments are similar to those performed by Autopilot during Lakehouse Auto Parallel Load. See: About Lakehouse Auto Parallel Load Schema Inference. If the inferred table definition is not compatible, stop the load.

  • Predict the amount of memory required for loading data. If the required memory is not available, stop the load.

  • Infer the record and field delimiters for CSV files and make the necessary adjustments.

As of MySQL 9.4.1, you have the option to disable Guided Load if you want to skip these checks.

The following example manually creates an external table, and then loads the table into MySQL HeatWave with Guided Load disabled:

mysql> CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int)
        FILE_FORMAT = (FORMAT csv)
        FILES = (URI = 'oci://mybucket@mynamespace/data_files/data_file_1.csv');
mysql> ALTER TABLE table_1 SECONDARY_LOAD GUIDED OFF;
Load External Data Manually Example

To demonstrate how to load data manually, the following example loads a single file and selects to use a pre-authenticated request (PAR).

The CSV file in this example is from Bank Marketing. To use this file, visit Bank Marketing and download the bank+marketing.zip file. Unzip the file, and then unzip the bank.zip file. Refer to the bank.csv file.

To load external data manually:

  1. Prepare the files to load in the proper format. See Supported File Formats.

  2. Upload the files to load into Object Storage. See Uploading an Object Storage Object to a Bucket in Oracle Cloud Infrastructure Documentation.

  3. Select the method to load the files: PAR, resource principals, or uniform resource identifier (URI). To learn more about each method, see the following:

  4. Connect to your MySQL HeatWave Database System.

  5. In the terminal window, create and use the database to store the table.

    mysql> CREATE DATABASE bank_data;
    mysql> USE DATABASE bank_data;
  6. Set up the CREATE TABLE statement and the ENGINE_ATTRIBUTE options to specify the parameters needed to process the external files. As of MySQL 9.4.0, you can use the CREATE EXTERNAL TABLE statement. In versions earlier than 9.4.0, you must use the CREATE TABLE statement and set ENGINE to lakehouse, and SECONDARY_ENGINE to rapid. See CREATE TABLE Statement.

    Ensure that the table has the correct data type for each column. For this example, columns are defined according to the data in the bank.csv file. See the following to learn more:

    As of MySQL 9.4.0, you can use SQL syntax to set options for external tables. For earlier versions, you must set options using JSON syntax. To learn more, see Lakehouse External Table JSON Syntax and Lakehouse External Table SQL Syntax.

    The following example uses JSON syntax:

    mysql> CREATE EXTERNAL TABLE bank_marketing(
            age int,
            job varchar(255),
            marital varchar(255),
            education varchar(255),
            default1 varchar(255),
            balance float,
            housing varchar(255),
            loan varchar(255),
            contact varchar(255),
            day int,
            month varchar(255),
            duration float,
            campaign int,
            pdays float,
            previous float,
            poutcome varchar(255),
            y varchar(255)
        )
            ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "has_header": true},
                               "file": [{ "par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenant_1/b/bucket_1/o/bank.csv"}]}';

    Where:

    • The CREATE EXTERNAL TABLE statement creates the table bank_marketing and automatically sets ENGINE to lakehouse, and SECONDARY_ENGINE to rapid.

    • Each column for the table is defined according to the bank.csv file.

    • format defines the format of the external file: csv.

    • has_header identifies a header in the external file.

    • par sets the pre-authenticated request link to access the file. Replace the PAR in the example with your own.

    The following example uses SQL syntax:

    mysql> CREATE EXTERNAL TABLE bank_marketing(
            age int,
            job varchar(255),
            marital varchar(255),
            education varchar(255),
            default1 varchar(255),
            balance float,
            housing varchar(255),
            loan varchar(255),
            contact varchar(255),
            day int,
            month varchar(255),
            duration float,
            campaign int,
            pdays float,
            previous float,
            poutcome varchar(255),
            y varchar(255)
        )
            FILE_FORMAT = (FORMAT csv HEADER ON)
            FILES = (URL = 'https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenant_1/b/bucket_1/o/bank.csv');

    If you are on MySQL 9.1.2 and earlier, you need to update dialect with the field delimiter and record delimiter parameters. As of MySQL 9.2.0, MySQL HeatWave Lakehouse can automatically detect these values. See Lakehouse External Table Syntax to learn more.

  7. Use the ALTER TABLE and SECONDARY_LOAD commands to load the data and create the external table.

    mysql> ALTER TABLE bank_marketing SECONDARY_LOAD;
    Warning (code 3877): Command executed during preprocessing: 'ALTER TABLE `bank_data`.`bank_marketing` 
    ENGINE_ATTRIBUTE='{"file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenant_1/b/bucket_1/o/bank.csv"}], 
                       "dialect": {"format": "csv", "has_header": true, "field_delimiter": ";", "record_delimiter": "\\n"}}''.

    Review the message to confirm the external table is successfully created. For MySQL 9.1.2 and later, you can also review the field_delimiter and record_delimiter values that MySQL HeatWave automatically detected for the table.

  8. Optionally, query five rows of the table to confirm the data is loaded and accessible.

    mysql> SELECT * FROM bank_marketing.bank_train LIMIT 5;
    +-----+-------------+----------+-----------+----------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
    | age | job         | marital  | education | default1 | balance | housing | loan | contact   | day | month | duration | campaign | pdays | previous | poutcome | y  |
    +-----+-------------+----------+-----------+----------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
    |  37 | services    | married  | secondary | no       |    4760 | yes     | no   | cellular  |   8 | may   |      182 |        2 |   169 |        2 | failure  | no |
    |  32 | technician  | single   | secondary | no       |    2979 | no      | no   | cellular  |  25 | may   |      156 |        1 |    -1 |        0 | unknown  | no |
    |  43 | management  | married  | tertiary  | no       |     690 | yes     | no   | cellular  |   6 | aug   |      171 |        3 |    -1 |        0 | unknown  | no |
    |  50 | blue-collar | divorced | secondary | no       |     203 | yes     | no   | telephone |  19 | nov   |      265 |        1 |   127 |        4 | other    | no |
    |  34 | blue-collar | married  | secondary | no       |     322 | yes     | no   | cellular  |  20 | apr   |       10 |        3 |    -1 |        0 | unknown  | no |
    +-----+-------------+----------+-----------+----------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
What's Next

Once you confirm the table successfully loaded into Lakehouse, you can use the data for the following: