External tables are used by the load procedure to load the data from Object Storage. This section describes the external table syntax.
While it is possible to create your external table manually, it is recommended to use the HeatWave Lakehouse Auto Parallel Load procedure for its analysis and generated report and load script.
mysql>CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid
ENGINE_ATTRIBUTE='{
"dialect":
{
"format": "csv",
"field_delimiter": "|",
"record_delimiter": "|\n",
"escape_character": "\\",
"quotation_marks":"\"",
"skip_rows": 0,
"encoding": "utf8mb4",
"date_format": "auto",
"time_format": "auto",
"trim_spaces": false,
"is_strict_mode": true
},
"file": [file1, file2, …]
}';
ENGINE
: Set tolakehouse
.SECONDARY_ENGINE
: Set torapid
.-
ENGINE_ATTRIBUTE
: JSON object. Defines the location of files, the file format, and how the file format is handled.-
dialect
: (optional) defines how the data is configured.dialect
contains the following:-
format
:csv
(default) orparquet
NoteIf you use the Parquet format, do not define any dialect parameters, other than
format
. HeatWave returns an error if they are defined. Only"format":"parquet"
is required.The remaining dialect parameters pertain to CSV, only.
It is not possible to define multiple formats per table.
field_delimiter
: Default"|"
. Defines one or more characters used to enclose fields. Maximum field delimiter length is 64 characters.record_delimiter
: Default"|\n"
. Defines one or more characters used to delimit records. Maximum field delimiter length is 64 characters.escape_character
: Default"\\"
. Defines one or more characters used to escape special characters.quotation_marks
: Default"\""
. Defines one or more characters used to enclose fields.skip_rows
: Default0
. Defines the number of rows to skip at the start of the file. The maximum value is 20.encoding
: Default"utf8mb4"
. Defines the character encoding.date_format
: Default"auto"
. Seedate_format
for information on the supported formats.time_format
: Default"\""
. For more information, see String and Numeric Literals in Date and Time Context.trim_spaces
:true
|false
(default). If true, leading and trailing spaces are removed.-
is_strict_mode
: whether the loading will take place in strict mode (true
) or non-strict mode (false
).If set to
true
, an error is thrown, and loading stops, if parsing or formatting errors are found in the file, and for empty columns. If set tofalse
, a warning is displayed, but loading does not stop.If
is_strict_mode
is not defined, the value ofsql_mode
is used. See Strict SQL Mode for more information.NoteIf you are using the Parquet file format, you must use
sql_mode
to define strict mode.is_strict_mode
is used for CSV, only.
-
-
file
: (required) includes a list of Object Storage files, prefixes, or both.NoteThe
file
parameter supports a maximum of 256 file locations. If you need to define more than 256, it is recommended to store them under the same bucket or prefix and update thefile
parameter accordingly.If using a PAR to access the data:
-
par
: the PAR URL.NoteDo not specify a region, namespace, or bucket when using PARs. That information is contained in the PAR URL and will generate an error if defined in separate parameters.
prefix
: if you are using a prefix, the prefix name.name
: if you are using a specific file, the filename.is_strict_mode
: whether the loading will take place in strict mode (true
) or non-strict mode (false
) by overriding globalsql_mode
. This parameter overridesis_strict_mode
defined in thedialect
array.
If using a resource principal to access the data directly:
bucket
: the bucket name.namespace
: the tenancy namespace.region
: the region your tenancy is in.prefix
: if you are using a prefix, the prefix name.name
: if you are using a specific file, the filename.is_strict_mode
: whether the loading will take place in strict mode (true
) or non-strict mode (false
) by overriding globalsql_mode
. This parameter overridesis_strict_mode
defined in thedialect
array.
-
-
If you are using a PAR link, the file
parameter
has the following syntax:
"file": [{"par": "PARURL", "prefix": "prefix" | "name": "filename", "is_strict_mode": true|false}]
If the PAR is defined on a bucket, you must ensure Enable Object Listing is enabled.
If you are using a resource principal for authentication, the
file
parameter has the following syntax:
"file": [{"region": "regionName", "namespace": tenancyNamespace", "bucket": "bucketName",
"name": "filename|foldername"}]
You can also define a prefix instead of a name:
"file": [{"region": "regionName", "namespace": tenancyNamespace", "bucket": "bucketName",
"prefix": "prefix"}]
HeatWave Lakehouse extends the Section 2.2.3, “Loading Data Using Auto Parallel Load” functionality to enable loading from external sources, such as Oracle Cloud Infrastructure Object Storage.
Auto Parallel Load analyzes the data to infer the table structure if not provided.
By default, the load operation analyzes and executes the
generated SQL script. To review the report and generated
loading script before implementing it, run the load with the
dryrun
option.
The @load_params
option of Auto Parallel Load
is extended with the external_tables
parameter. This parameter contains the following:
-
db_object
: the details of one or more tables. Eachdb_object
contains the following:"db_name"
: name of the database. If the database does not exist, it is created by the load command."tables"
: details of the tables to load.
The db_object
takes the following syntax:
db_object: {
"db_name": "name",
"tables": [table, table, ...]
}
db_name
: the schema name as specified in the external table definition.tables
: array of one or more tables.
Each table
is defined in a JSON object named
table
:
table: {
"table_name": "name",
"sampling": true | false,
"dialect": {dialect_options},
"file": [file1, file2, ...],
}
table_name
: the name of the table to load.-
sampling
: if set totrue
(default), the schema is inferred by sampling the data and a variety of statistics are collected. The inferred schema is used to generate aCREATE TABLE
statement. The statistics are used to estimate storage requirements and load times.NoteIf set to false, a full scan is performed to infer the schema and collect the relevant statistics. Depending on the size of the data, this can take a long time.
dialect
: formatting options for the input format. See thedialect
parameter in Section 4.5, “Lakehouse External Table Syntax” for more information.-
file
: the location of the data in Object Storage. This can be a path to a file, a prefix, or a PAR.For information on
file
syntax, see Section 4.5, “Lakehouse External Table Syntax”.