External tables utilize
table-level
and
column-level
ENGINE_ATTRIBUTE
options of the
CREATE
TABLE
statement to specify parameters
needed to process data stored externally in Object Storage.
The column-level ENGINE_ATTRIBUTE
override specific dialect
parameters, if
required.
You can specify these ENGINE_ATTRIBUTE
options as follows:
CREATE TABLE table_name
(col_name data_type [ENGINE_ATTRIBUTE='Col_Engine_Attribute_JSON'], ...)
ENGINE=lakehouse
SECONDARY_ENGINE=rapid
ENGINE_ATTRIBUTE='Tbl_Engine_Attribute_JSON';
The external table ENGINE_ATTRIBUTE
is
a JSON
object that consists of the
dialect parameters and the file parameters. They are
represented as follows:
file_JSON_array
is aJSON
array containing theJSON
objects representing the Resource Principal (RP) or Pre-Authenticated Request (PAR) file location details and the file parameters.dialect_JSON_object
is aJSON
object containing the dialect parameters to specify the data format in the files.
Tbl_Engine_Attribute_JSON: {
begin-object
"file": file_JSON_array
[, "dialect": dialect_JSON_object ]
end-object
}
file_JSON_array: {
begin-array file_def [, file_def] … end-array
}
file_def: file_RP_JSON | file_PAR_JSON
file_def_RP_JSON: {
"bucket": "bucket_name",
"namespace": "namespace_name",
"region": "region_name",
{"prefix": "prefix_value" | "name": "filename" | "pattern" : "pattern_value"},
"is_strict_mode": {true | false},
"allow_missing_files": {true | false}
}
file_PAR_JSON: {
"par": "PAR URL",
["prefix": "prefix_value", | "name": "filename", | "pattern" : "pattern_value",]
"is_strict_mode": {true | false},
"allow_missing_files": {true | false}
}
dialect_JSON_object: begin-object dialect_option_key_value [, dialect_option_key_value]... end-object
dialect_option_key_value: {
"format": {"avro" | "csv" | "json" | "parquet"}
| "check_constraints": {true | false}
| "field_delimiter": {"|" | "," | "\t" | ";" | "auto" | "custom_field_delimiter"}
| "record_delimiter": {"\r" | "\n" | "\r\n" | "auto" | "custom_record_delimiter"}
| "escape_character": {"\\" | "custom_escape_character"}
| "quotation_marks": {"\"" | "custom_quotation_marks"}
| "encoding": "utf8mb4"
| "date_format": "custom_date_format"
| "time_format": "custom_time_format"
| "timestamp_format": "custom_timestamp_format"
| "trim_spaces": {true | false}
| "skip_rows": Integer_20
| "has_header": {true | false}
| "is_strict_mode": {true | false}
}
Integer_20: Represents integer value between 0 and 20.
When you specify
ENGINE_ATTRIBUTE
parameters inJSON
format, you must escape certain characters using\ (backslash)
. For example, to specify linefeed as a record delimiter, you need to specify it as"\\n"
in theENGINE_ATTRIBUTE
parameter’sJSON
value.In the syntax description, square brackets (
"["
and"]"
) indicate optional words or clauses.The syntax description uses a vertical bar (
"|"
) to separate alternatives. The list of alternatives are enclosed in square brackets"["
and"]"
) when one member may be chosen. The list of alternatives are enclosed in curly braces"{"
and"}"
) when one member must be chosen.The terms
begin-array
andend-array
in the syntax represent aJSON
array. While writing the sample, replace them with[]
.The terms
begin-object
andend-object
in the syntax represent aJSON
object. While writing the sample, replace them with{}
.
Some dialect parameters can be overridden at the column
level by specifying column-level
ENGINE_ATTRIBUTE
parameters. You can
define the date_format
,
time_format
and timestamp_format
as optional
column-level ENGINE_ATTRIBUTE
parameters . For more information, refer to
dialect
parameters.
Col_Engine_Attribute_JSON: begin-object column_option_key_value [, column_option_key_value]... end-object
column_option_key_value: {
"date_format": "custom_date_format"
| "time_format": "custom_time_format"
| "timestamp_format": "custom_timestamp_format"
}
For more details about
column_definition
, refer to
CREATE TABLE
statement.
The mandatory parameters required to create an external table are:
ENGINE
: Set tolakehouse
.SECONDARY_ENGINE
: Set torapid
.-
ENGINE_ATTRIBUTE
: JSON object literal. Defines the location of files, the file format, and how the file format is handled.Use key-value pairs in
JSON
format to specifyoptions
. Lakehouse uses the default setting if there is no defined option. UseNULL
to specify no arguments. -
file
: Defines the Object Storage files. You can define the file locations using resource principal or PAR URL.Lakehouse supports a maximum of 256 file locations. To define more than 256 files, store the files under the same bucket or use
prefix
orpattern
.-
file
parameters for resource principals:bucket
: The bucket name.namespace
: The tenancy namespace.-
region
: The region that the tenancy is in.In MySQL version 9.2.1 or later, if the region is not specified, the region linked to the instance is set as the tenancy region.
-
file
parameters for pre-authenticated requests:-
par
: The PAR URL.Do not specify a
region
,namespace
orbucket
withpar
. These values are included in the PAR URL and will generate an error if defined as separate parameters. For more information, refer to pre-authenticated request recommendations.
-
-
-
The following
file
parameters apply to all file formats. Use one of the parameters, unless the target defines a specific file.name
: A specific Object Storage file name. For pre-authenticated requests, the file name provided in the PAR URL is considered.pattern
: A regular expression that defines a set of Object Storage files. The pattern follows the modified Modified ECMAScript regular expression grammar.prefix
: The prefix for a set of Object Storage files. For pre-authenticated requests, prefix or bucket name present in the PAR URL is considered. For more information, pre-authenticated request examples and filname, pattern and prefix examples.
-
is_strict_mode
: Loads data in strict mode or non-strict mode, based on the selected option. This overrides theis_strict_mode
dialect parameter. The supported options are:true
: Loads the data in strict mode. The data loading will be stopped if there is an error due to then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.false
: Loads the data in non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data will be loaded.
-
allow_missing_files
: Loads the data in strict mode or non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data will be loaded. This overrides theis_strict_mode
file parameter for missing files, and theallow_missing_files
dialect parameter. The supported options are:true
: If any file is missing, an error will not be thrown and data loading will continue with the existing files, unless all the files are not available.false
:If any file is missing then, an error will be thrown and data will not be loaded.
The dialect parameters are optional and are used to define the data configuration.
The following dialect parameters are applicable to all
file formats (CSV
,
JSON
, Parquet
and
Avro
).
-
format
: The file format defined in a table. You can define only one file format per table. The supported file formats are: -
check_constraints
: Supported as of MySQL 8.4.0. Lakehouse validates primary key and unique key constraints during the initial load based on the selected option. The supported options are:true
: The default value. Lakehouse validates primary key and unique key constraints only during the initial load of the table. If there are subsequent loads or refreshes of the table, validation does not occur.false
: Lakehouse does not validate primary key and unique key constraints.
-
is_strict_mode
: Loads data in strict mode or non-strict mode, based on the selected option. This setting overrides the globalsql_mode
. By default,is_strict_mode
is set to thesql_mode
value. See Strict SQL Mode. Thefile
common parameteris_strict_mode
can override this setting. The supported options are:-
true
: Loads the data in strict mode. The data loading will be stopped if there is an error due to then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.As of MySQL 8.4.0, the dialect parameter
is_strict_mode
applies to all file formats. Before MySQL 8.4.0, it only applies to theCSV
file format. ForAvro
andParquet
file formats, use thefile
parameteris_strict_mode
to define strict mode before MySQL 8.4.0. false
: Loads the data in non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data will be loaded.If no value is set, the setting is defined by the
sql_mode
.
This setting overrides the global
sql_mode
. The default is the value ofsql_mode
. See Strict SQL Mode. Thefile
common parameteris_strict_mode
can override this setting. -
-
allow_missing_files
: Supported as of MySQL 8.4.0. Ignores the missing files and completes the data loading based on the selected option. This overrides the dialect parameteris_strict_mode
for missing files. The supported options are:true
: An error will not be thrown for any missing file, and data loading will continue with the existing files. An error is thrown if all files are not available.false
: If any file is missing then, an error will be thrown and data will not be loaded.If no value is set, the setting is defined by
is_strict_mode
. If no value is set foris_strict_mode
, the setting is defined by thesql_mode
.
A missing file is defined as:
With the
name
parameter: There is no file with that name.With the
pattern
parameter: There are no files that match the pattern.With the
prefix
parameter: There are no files with that prefix.
Refer to the following table to review how different
combinations of settings for
allow_missing_files
and
is_strict_mode
affect errors for
missing files.
Table 5.1 Combinations of Settings for allow_missing_files and is_strict_mode
Setting for allow_missing_files and is_strict_mode | Description of Possible Errors |
---|---|
|
No error generated unless all files are missing. |
|
Error generated for any missing files. |
|
No error generated unless all files are missing. |
|
Error generated for any missing files. |
|
Error generated for any missing files. |
|
No error generated unless all files are missing. |
|
No error generated unless all files are missing. |
|
Error generated for any missing files. |
|
Error generation depends on setting for
sql_mode . |
The following dialect parameters applicable to
CSV
and JSON
file
formats.
If these parameters are used with
Avro
or Parquet
file formats, an error will be generated.
encoding
: Defines the character encoding. The default is"utf8mb4"
.-
record_delimiter
: Defines one or more characters used to delimit records. The maximum record delimiter length is 64 characters. You can add any string as acustom_record_delimiter
.The default record delimiter for
json
is"\n"
. The only alternative forjson
is"\r\n"
.For
CSV
files, if you set therecord_delimiter
toauto
, Lakehouse Autopilot automatically detects the record delimiters and composite record delimiters with field delimiters as prefixes. Supported as of MySQL 9.2.0 and set as the default value.The following record delimiters detected automatically by Lakehouse Autopilot:
\r
: Carriage return.\n
: Line feed. This is the default for all MySQL versions prior to 9.2.0.\r\n
: Carriage return and line feed
The following dialect parameters are applicable only to
CSV
file formats.
-
field_delimiter
: Defines one or more characters used to enclose fields. The maximum field delimiter length is 64 characters. You can add any string as acustom_field_delimiter
.When the
field_delimiter
is set toauto
, Lakehouse Autopilot automatically detects the field delimiters. Supported as of MySQL 9.2.0 and set as the default value.The following field delimiters are detected automatically by Lakehouse Autopilot:
|
: Pipe. Default for all MySQL versions prior to 9.2.0.,
: Comma\t
: Tab;
: Semicolon
escape_character
: Defines one or more characters used to escape special characters. The default is"\\"
. You can add any string as acustom_escape_character
.quotation_marks
: Defines one or more characters used to enclose fields. The default is"\'"
. You can add any string as acustom_quotation_marks
.date_format
: The date format, see:date_format
. You can also set date formats for each column by using the columnENGINE_ATTRIBUTE
option, which overrides the format in the dialect parameter. This format is ignored while loading data in auto parallel load.time_format
: The time format, see: String and Numeric Literals in Date and Time Context. You can also set time formats for each column by using the columnENGINE_ATTRIBUTE
option, which overrides the format in the dialect parameter. This format is ignored while loading data in auto parallel load.timestamp_format
: Supported as of MySQL 9.0.1. The timestamp format, see:date_format
. You can also set timestamp formats for each column by using the columnENGINE_ATTRIBUTE
option, which overrides the format in the dialect parameter. This format is ignored while loading data in auto parallel load.trim_spaces
: Removes/retains the leading and trailing spaces, based on the set option. The default value isfalse
.skip_rows
: The number of rows to skip at the start of the file. The default value is0
and the maximum value is20
.-
has_header
: Adds a header row to theCSV
file, based on the selected option. The default isfalse
.If
has_header
andskip_rows
are both defined, Lakehouse first skips the number of rows, and then uses the next row as the header row.