You can use Auto Parallel Load to set up the automatic creation of an
external table along with the loading of data from files in
Object Storage. We recommend setting up Auto Parallel Load with an
input_list
variable, and then using the
HEATWAVE_LOAD
command, which creates the
external table and loads the data in one operation. To learn
more about loading data, see
Load
Structured Data Using Lakehouse Auto Parallel Load.
This topic contains the following sections:
-
Prepare to create external tables by reviewing the following:
-
Review the relevant syntax for creating external tables and loading data:
Auto Parallel Load Syntax for the HEATWAVE_LOAD command
The following example shows how to create an external table
with Auto Parallel Load in one command using an
OCIFS URI.
Since no dialect parameters are included, Lakehouse
specifies a CSV file by default with all default settings.
Options are set to NULL
.
mysql> CALL sys.HEATWAVE_LOAD(
'[{"db_name": "db_1",
"tables": [{
"table_name": "table_1",
"engine_attribute": {
"file": [{"uri": "oci://bucket_1@tenant_1/data_file_1.csv"}]
}
}]
}]', NULL);
While you can specify all parameters in one command, it
might be easier for you to first create the
input_list
and store it in a variable.
You can then pass the variable in the
HEATWAVE_LOAD
command when you load the
data.
The following example uses the same command to create the
external table, but creates the variable with the
input_list
, and then passes that in the
HEATWAVE_LOAD
command to load the data.
mysql> SET @input_list = '[{
"db_name": "db_1",
"tables": [{
"table_name": "table_1",
"engine_attribute": {
"file": [{"uri": "oci://bucket_1@tenant_1/data_file_1.csv"}]
}
}]
}]';
mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
To demonstrate how to create an external table using Lakehouse Auto Parallel Load, the following example specifies a single CSV file by using an OCIFS URI.
The CSV file in this example is from
Bank
Marketing. To use this file, visit
Bank
Marketing and download the
bank+marketing.zip
file. Unzip the file,
and then unzip the bank.zip
file. Refer
to the bank.csv
file.
To load external data using Lakehouse Auto Parallel Load:
Prepare the files to load in the proper format. See Supported File Formats.
Upload the files to load into Object Storage. See Uploading an Object Storage Object to a Bucket in Oracle Cloud Infrastructure Documentation.
-
Select the method to specify the files that will load the data in the external table: uniform resource identifier (URI), resource principals, or PAR. To learn more about each method, see the following:
-
Create a session variable with the characteristics of the
input_list
to create the external table and specify the file to use to load data into the table. To review all syntax options, see HEATWAVE_LOAD.mysql> SET @input_list = '[{ "db_name": "bank_marketing", "tables": [{ "table_name": "bank_train", "engine_attribute": { "dialect": {"format": "csv", "has_header": true}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]';
Where:
@input_list
is the name of the session variable.db_name
identifies the database name to store the table:bank_marketing
. The database is automatically created if it does not exist.table_name
sets the table name to store the data:bank_train
. The table is automatically created if it does not exist.engine_attribute
defines the parameters of the external file.format
defines the format of the external file:csv
.has_header
identifies a header in the external file. Auto Parallel Load then infers the column names from the first row in the file.uri
sets the URI link to access the file. Replacemybucket@mynamespace/data_files/data_file_1.csv
with your own link.
If you are on MySQL 9.1.2 and earlier, you need to update
dialect
with thefield delimiter
andrecord delimiter
parameters. As of MySQL 9.2.0, MySQL HeatWave Lakehouse can automatically detect these values. See Lakehouse External Table Syntax to learn more. -
Run the
HEATWAVE_LOAD
command with the characteristics of theinput_list
innormal
mode to automatically create the external table and load data into the table. See Load Structured Data Using Lakehouse Auto Parallel Load for more information about loading data indryrun
mode to test out the creation of the table and the loading of data beforehand. Keep in mind that theinput_list
variable is valid for the current session. If you end the session, you can no longer use the variable.mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), JSON_OBJECT('mode', 'normal'));
-
Review different ways to specify files when creating external tables by using Lakehouse Auto Parallel Load:
Review how to Load Structured Data Using Lakehouse Auto Parallel Load.