External tables use
table-level
and
column-level
ENGINE_ATTRIBUTE
options of the
CREATE
TABLE
statement to specify the parameters
needed to process data stored externally in Object Storage.
The column-level ENGINE_ATTRIBUTE
overrides
specific dialect
parameters, if required.
This topic contains the following sections:
The following syntax shows how to specify these
ENGINE_ATTRIBUTE
options as of MySQL
9.4.0. You can use the CREATE EXTERNAL
TABLE
statement, which automatically sets
ENGINE
to lakehouse
,
and the SECONDARY_ENGINE
to
rapid
.
CREATE EXTERNAL TABLE table_name
(col_name data_type [ENGINE_ATTRIBUTE='Col_Engine_Attribute_JSON'], ...)
ENGINE_ATTRIBUTE='Tbl_Engine_Attribute_JSON';
The following syntax shows how to specify these
ENGINE_ATTRIBUTE
options in versions
earlier than MySQL 9.4.0. You must use the
CREATE TABLE
statement, and
manually set ENGINE
to
lakehouse
, and
SECONDARY_ENGINE
to
rapid
.
CREATE TABLE table_name
(col_name data_type [ENGINE_ATTRIBUTE='Col_Engine_Attribute_JSON'], ...)
ENGINE=lakehouse
SECONDARY_ENGINE=rapid
ENGINE_ATTRIBUTE='Tbl_Engine_Attribute_JSON';
To learn how to load data into tables created with the
CREATE
TABLE
statement, see
Load
Structured Data Manually.
You can also use ENGINE_ATTRIBUTE
options
with the HEATWAVE_LOAD
command to load
data using Lakehouse Auto Parallel Load. See
Auto
Parallel Load Syntax and
Load
Structured Data Using Lakehouse Auto Parallel Load.
The external table ENGINE_ATTRIBUTE
is a
JSON
object that consists of the dialect
parameters and the file parameters. They are represented in
the following ways:
-
file_JSON_array
is aJSON
array containing theJSON
objects representing the file location details and file parameters. You can use the following methods to specify these details:-
Uniform Resource Identifiers (URI)
Available as of MySQL 9.3.1 for MySQL HeatWave on OCI. You can provide an OCIFS URI, PAR URI, or Native URI.
-
See Resource Principals in Oracle Cloud Infrastructure Documentation.
-
Pre-Authenticated Requests (PAR)
See Object Storage Pre-Authenticated Requests in Oracle Cloud Infrastructure Documentation.
Refer to File Options.
-
dialect_JSON_object
is aJSON
object containing the dialect parameters to specify the data format in the files.
Tbl_Engine_Attribute_JSON: {
begin-object
"file": file_JSON_array
[, "dialect": dialect_JSON_object ]
end-object
}
file_JSON_array: {
begin-array file_def [, file_def] … end-array
}
file_def: file_RP_JSON | file_PAR_JSON | file_URI_JSON
file_def_RP_JSON: {
"bucket": "bucket_name",
"namespace": "namespace_name",
"region": "region_name",
{"prefix": "prefix_value" | "name": "filename" | "pattern" : "pattern_value"},
"is_strict_mode": {true | false},
"allow_missing_files": {true | false}
}
file_PAR_JSON: {
"par": "PAR URL",
["prefix": "prefix_value", | "name": "filename", | "pattern" : "pattern_value",]
"is_strict_mode": {true | false},
"allow_missing_files": {true | false}
}
file_URI_JSON: {
("uri": "OCIFS_URI") | ("uri": "PAR_URI") | ("uri": "NATIVE_URI"),
"is_strict_mode": {true | false},
"allow_missing_files": {true | false}
}
event_based_incremental_load: "auto_refresh_event_source": {"stream_OCID"}
dialect_JSON_object: begin-object dialect_option_key_value [, dialect_option_key_value]... end-object
dialect_option_key_value: {
"format": {"avro" | "csv" | "json" | "parquet"}
| "check_constraints": {true | false}
| "field_delimiter": {"|" | "," | "\t" | ";" | "auto" | "custom_field_delimiter"}
| "record_delimiter": {"\r" | "\n" | "\r\n" | "auto" | "custom_record_delimiter"}
| "escape_character": {"\\" | "custom_escape_character"}
| "quotation_marks": {"\"" | "custom_quotation_marks"}
| "encoding": "utf8mb4"
| "date_format": "custom_date_format"
| "time_format": "custom_time_format"
| "timestamp_format": "custom_timestamp_format"
| "trim_spaces": {true | false}
| "skip_rows": Integer_20
| "has_header": {true | false}
| "is_strict_mode": {true | false}
| "compression": {"zip" | "gzip" | "bzip2" | "auto"}
| "embed_model_id": {"MySQL HeatWave_in_database_embedding_model"}
}
Integer_20: Represents integer value between 0 and 20.
When you specify
ENGINE_ATTRIBUTE
parameters inJSON
format, you must escape certain characters using\ (backslash)
. For example, to specify linefeed as a record delimiter, you need to specify it as"\\n"
in theENGINE_ATTRIBUTE
parameter’sJSON
value.In the syntax description, square brackets (
"["
and"]"
) indicate optional words or clauses.The syntax description uses a vertical bar (
"|"
) to separate alternatives. The list of alternatives are enclosed in square brackets"["
and"]"
) when you must choose one member. The list of alternatives are enclosed in curly braces"{"
and"}"
) when you must choose one member.The terms
begin-array
andend-array
in the syntax represent aJSON
array. While writing the sample, replace them with[]
.The terms
begin-object
andend-object
in the syntax represent aJSON
object. While writing the sample, replace them with{}
.
You can override some options at the column level by
specifying column-level parameters. You can set the
date_format
,
time_format
,
timestamp_format
, and
model
as optional column-level
ENGINE_ATTRIBUTE
parameters . For more
information, refer to
dialect
parameters.
Col_Engine_Attribute_JSON: begin-object column_option_key_value [, column_option_key_value]... end-object
column_option_key_value: {
"date_format": "custom_date_format"
| "time_format": "custom_time_format"
| "timestamp_format": "custom_timestamp_format"
| "model": {MySQL HeatWave_in_database_embedding_model}
}
-
model
: Supported as of MySQL 9.3.2. Lets you specify the column-level embedding model forVECTOR
columns. You must specify a model supported by MySQL HeatWave GenAI. If the vector uses another embedding model, you cannot use it with GenAI features. You can include an unsupported embedding model in the column or table comments for your own reference. You cannot use this to load data using Lakehouse Auto Parallel Load. Learn more about the Vector Data Type.The following example manually loads a table and sets the embedding model for the column to
multilingual-e5-small
. If you are on a version earlier than MySQL 9.4.0, you must useCREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1(col_1 VECTOR(5) ENGINE_ATTRIBUTE '{"model": "multilingual-e5-small"}') ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
The following example loads a table manually with a uniform
resource identifier (URI) and sets the
date_format
,
time_format
, and
timestamp_format
at the column level. If
you are on a version earlier than MySQL 9.4.0, you must use
CREATE TABLE
and set
ENGINE
to lakehouse
,
and SECONDARY_ENGINE
to
rapid
. See
CREATE TABLE Statement.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 date ENGINE_ATTRIBUTE '{"date_format": "%W %M %Y"}',
col_2 time ENGINE_ATTRIBUTE '{"time_format": "hh:mm:ss"}',
col_3 timestamp ENGINE_ATTRIBUTE '{"timestamp_format": "%Y-%m-%d %H:%i:%s"}',
col_4 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
mysql> ALTER TABLE table_1 SECONDARY_LOAD;
For more information about other column definitions, see Dialect Parameters for CSV Files.
The following parameters are required to create an external table:
ENGINE
: Set tolakehouse
if you are on a version earlier than MySQL 9.4.0. For MySQL 9.4.0 and later, you can use theCREATE EXTERNAL TABLE
statement, which automatically setsENGINE
tolakehouse
.SECONDARY_ENGINE
: Set torapid
if you are on a version earlier than MySQL 9.4.0. For MySQL 9.4.0 and later, you can use theCREATE EXTERNAL TABLE
statement, which automatically setsSECONDARY_ENGINE
torapid
.-
ENGINE_ATTRIBUTE
: JSON object literal. Defines the location of files, the file format, and how the file format is handled.Use key-value pairs in
JSON
format to specifyoptions
. Lakehouse uses the default setting if there is no defined option. UseNULL
to specify no arguments. -
file
: Defines the Object Storage files. You can define the file locations using resource principals, uniform resource identifier (URI) (as of MySQL 9.3.1 for MySQL HeatWave on OCI), or PAR URL.As of MySQL 9.3.1, Lakehouse supports a maximum of 170 file locations. Before MySQL 9.3.1, Lakehouse supports a maximum of 256 file locations. To define more than the maximum number of files, store the files under the same bucket or use
prefix
orpattern
.-
file
parameters for uniform resource identifiers (As of MySQL 9.3.1 for MySQL HeatWave on OCI):-
uri
: The URL depending on the type of URI you define. Do not provide a pattern, prefix, or name parameter with theuri
. These details are inferred by theobject_path
you define.The following examples use the URI syntax for OCIFS:
oci://bucket_name@namespace_name/object_path
-
Name: If the
object_path
is neither a glob pattern nor prefix.The following example loads a table using Lakehouse Auto Parallel Load with a single file by using an OCIFS URI.
mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
Prefix: If the
object_path
is not a glob pattern and ends with an unencoded/
character, such as a folder path.The following example loads a table using Lakehouse Auto Parallel Load with a prefix by using an OCIFS URI, which specifies files in an Object Storage folder.
mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
Glob pattern: If the
object_path
contains at least one of the following unencoded characters:?
,*
, or[
. To use these characters as literals, you need to escape them or encode them as needed depending on the URI syntax. Regex patterns are not supported. See Glob Patterns from the Oracle Cloud Infrastructure Documentation to learn more.The following example loads a table using Lakehouse Auto Parallel Load by using an unencoded
*
character to create a glob pattern.mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_files/data_file_*.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
Review additional examples that use URIs to load data automatically or manually.
-
-
file
parameters for resource principals:bucket_name
: Buckets allow you to storage objects in a compartment. To learn more about buckets, see Object Storage Buckets in Oracle Cloud Infrastructure Documentation.namespace_name
: This is the top-level container for all buckets and objects. To learn how to view your namespace, see Object Storage Namespaces in Oracle Cloud Infrastructure Documentation.-
region_name
: You need to define the region identifier for your OCI tenancy. To view the list of available regions and region identifiers, see Regions and Availability Domains in Oracle Cloud Infrastructure Documentation.As of MySQL 9.2.1, if the region is not specified, the instance's associated region is used as the default region. Otherwise, the specified region is used.
The following example loads a table manually with a single file by using resource principals. If you are on a version earlier than MySQL 9.4.0, you must use
CREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, "file": [{"region": "regionName", "namespace": "tenant_1", "bucket": "bucket_1", "name": "data_file_1.csv"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
Review additional examples that use resource principals to load structured data using Lakehouse Auto Parallel Load or manually.
-
file
parameters for pre-authenticated requests:-
par
: The PAR URL.Do not specify a
region
,namespace
orbucket
withpar
. These values are included in the PAR URL and generate an error if defined as separate parameters. For more information, refer to PAR Recommendations.
The following example loads a table with a single file specified with a PAR.
mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenancy_1/b/bucket_1/o/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
Review additional examples that use PARs to load structured data using Lakehouse Auto Parallel Load or manually.
-
-
-
The following
file
parameters apply only to PAR and resource principals. Use one of the parameters, unless the target defines a specific file.-
name
: A specific Object Storage file name. For pre-authenticated requests, the file name provided in the PAR URL is considered.The following example uses a PAR and the
name
parameter and to specify a file in an Object Storage bucket. The PAR points to the Object Storage bucket.mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{ "par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenancy_1/b/bucket_1/o/", "name": "data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
pattern
: A regular expression that defines a set of Object Storage files.The following example uses resource principals and a pattern to load files that have a numerical suffix.
mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{ "region": "us-ashburn-1", "namespace": "tenant_1", "bucket": "bucket_1", "pattern": "data_files/data_file_\\\\d+\\\\.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
See the following to learn more:
-
prefix
: The prefix for a set of Object Storage files. For pre-authenticated requests, prefix or bucket name present in the PAR URL is considered.The following example uses resource principals and a prefix to specify files to load in an Object Storage folder.
mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{ "region": "us-ashburn-1", "namespace": "tenant_1", "bucket": "bucket_1", "prefix": "data_files/"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
Review examples that use PARs to load structured data using Lakehouse Auto Parallel Load or manually.
Review examples that use resource principals to load structured data using Lakehouse Auto Parallel Load or manually.
-
The following file parameters are optional:
-
compression
: Supported as of MySQL 9.3.2. Use this to load compressed files.Compression is supported for all file formats:
-
CSV
: The default value is no compression. You can setcompression
tozip
,gzip
, orbzip2
.The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and uses a CSV file compressed in
zip
format.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv", "compression": "zip"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv.zip"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
JSON
: The default value is no compression. You can setcompression
tozip
,gzip
, orbzip2
.The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and uses a JSON file compressed in
gzip
format.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "json", "compression": "gzip"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.json.gz"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
Avro
: The default value isauto
. You cannot set any other options forcompression
, as Avro files declare their compression format in the file metadata.The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and uses an Avro file with the
auto
compression option.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "avro", "compression": "auto"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.avro"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
Parquet
: The default value isauto
. You cannot set any other options forcompression
, as Parquet files declar their compression format in the file metadata.The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and uses an Parquet file with the
auto
compression option.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "parquet", "compression": "auto"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.parquet"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
Note the following when loading compressed files:
Each compressed file must only contain a single original file. You cannot provide a compressed file with archives or folders with multiple files.
Files in
zip
format must be compressed with the Deflate algorithm.If loading multiple files, all the files must be in the same compression format.
When loading multiple compressed files, it is best to prepare files of similar size, and in the range of 80-120 MB each. Otherwise, you may experience longer loading times.
-
-
is_strict_mode
: Loads data in strict mode or non-strict mode, based on the selected option. This overrides theis_strict_mode
dialect parameter. To learn about the default value, see theis_strict_mode
dialect parameter in Common Dialect Parameters. The supported options are:true
: Loads the data in strict mode. The data loading stops if there is an error due to missing files, empty columns, formatting errors or parsing errors.false
: Loads the data in non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data is loaded.
The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and sets
is_strict_mode
tofalse
in thefile
parameters.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv", "is_strict_mode": false}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
allow_missing_files
: Supported as of MySQL 8.4.0. Ignores the missing files and completes the data loading based on the selected option. This overrides the dialect parameteris_strict_mode
for missing files. To learn about the default value, see theallow_missing_files
dialect parameter in Common Dialect Parameters. The supported options are:true
: If any file is missing, no error occurs and data loading continues with the existing files. An error occurs if all the files are not available.false
: If any file is missing, an error occurs and data is not loaded.
The following example loads a table manually from a glob pattern specified with a uniform resource identifier (URI) and sets
allow_missing_files
totrue
in thefile
parameters. If any of the specified files in the pattern are not available, an error does not occur and data continues loading for the existing files. If all files are missing, an error occurs.If you are on a version earlier than MySQL 9.4.0, you must use
CREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1( col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_[1-10].csv", "allow_missing_files": true}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
-
auto_refresh_event_source
: Supported as of MySQL 9.4.1. Specifies a Stream OCID for event-based incremental load. This string value is verified as a valid Stream resource identifier. Learn more about Resource Identifiers from Oracle Cloud Infrastructure Documentation. Learn how to Refresh Data Using Event-Based Incremental Load.The following example loads a table using Lakehouse Auto Parallel Load and specifies a Stream OCID to set up event-based incremental load.
mysql> SET @input_list = '[{ "db_name": "lakehouse_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "auto_refresh_event_source": "ocid1.stream.oc1.iad...", "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
The dialect parameters are optional and are used to define the data configuration.
The following dialect parameters are applicable to all file
formats (CSV
, JSON
,
Parquet
and Avro
).
-
format
: The file format defined in a table. You can define only one file format per table. The supported file formats are:-
CSV
: The default file format.The following example loads a CSV file into a table automatically with a uniform resource identifier (URI).
mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
JSON
: Supported as of MySQL 8.4.0. Only Newline Delimited JSON files are supported. Tables created withjson
format must only have a single column that conforms to the JSON data type.The following example loads a JSON file into a table manually with a uniform resource identifier (URI). If you are on a version earlier than MySQL 9.4.0, you must use
CREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1( col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "json"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.json"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
-
Parquet
: The Parquet data type.The following example loads a Parquet file into a table automatically with a uniform resource identifier (URI).
mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "parquet"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.parquet"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
Avro
: The Avro data type.The following example loads an Avro file into a table manually with a uniform resource identifier (URI). If you are on a version earlier than MySQL 9.4.0, you must use
CREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1( col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "avro"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.avro"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
-
-
check_constraints
: Supported as of MySQL 8.4.0. Lakehouse validates primary key and unique key constraints during the initial load based on the selected option. The supported options are:true
: The default value. Lakehouse validates primary key and unique key constraints only during the initial load of the table. If there are subsequent loads or refreshes of the table, validation does not occur.false
: Lakehouse does not validate primary key and unique key constraints.
The following example loads a table manually with a uniform resource identifier (URI) and sets
check_constraints
tofalse
. If you are on a version earlier than MySQL 9.4.0, you must useCREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1( col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "check_constraints": false}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
-
is_strict_mode
: Loads data in strict mode or non-strict mode, based on the selected option. By default,is_strict_mode
is set to thesql_mode
value. See Strict SQL Mode. Thefile
common parameteris_strict_mode
can override this setting. The supported options are:-
true
: Loads the data in strict mode. The data loading stops if there is an error due to missing files, empty columns, formatting errors or parsing errors.As of MySQL 8.4.0, the dialect parameter
is_strict_mode
applies to all file formats. Before MySQL 8.4.0, it only applies to theCSV
file format. ForAvro
andParquet
file formats, use thefile
parameteris_strict_mode
to define strict mode before MySQL 8.4.0. false
: Loads the data in non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data is loaded.If no value is set, the setting is defined by the
sql_mode
.Review Table 4.5, “Combinations of Settings for allow_missing_files and is_strict_mode”.
This setting overrides the global
sql_mode
setting for handling missing files. The default is the value ofsql_mode
. See Strict SQL Mode. Thefile
common parameteris_strict_mode
can override this setting.The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and sets
is_strict_mode
tofalse
in thedialect
parameters.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv", "is_strict_mode": false}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
-
-
allow_missing_files
: Supported as of MySQL 8.4.0. Ignores the missing files and completes the data loading based on the selected option. This overrides the dialect parameteris_strict_mode
for missing files. The supported options are:true
: An error does not occur for any missing file, and data loading continues with the existing files. An error occurs if all files are not available.false
: If any file is missing, an error occurs and data is not loaded.If no value is set, the setting is defined by
is_strict_mode
. If no value is set foris_strict_mode
, the setting is defined by thesql_mode
.Review Table 4.5, “Combinations of Settings for allow_missing_files and is_strict_mode”.
A missing file is defined as:
With the
name
parameter: There is no file with that name.With the
pattern
parameter: There are no files that match the pattern.With the
prefix
parameter: There are no files with that prefix.
The following example loads a table from a glob pattern specified with a uniform resource identifier (URI) and sets
allow_missing_files
totrue
in thedialect
parameters. If any of the specified files in the pattern are not available, an error does not occur and data continues loading for the existing files. If all files are missing, an error occurs. If a file-specific setting is set, this may override theallow_missing_files
setting.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv", "allow_missing_files": true}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_[1-10].csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
Refer to the following table to review how different
combinations of settings for
allow_missing_files
and
is_strict_mode
affect errors for missing
files.
Table 4.5 Combinations of Settings for allow_missing_files and is_strict_mode
allow_missing_files | is_strict_mode | Possible Errors |
---|---|---|
T | T | No error generated unless all files are missing. |
T | F | |
T | Not set | |
Not set | F | |
F | T | Error generated for any missing files. |
F | F | |
F | Not set | |
Not set | T | |
Not set | Not set | Error generation depends on setting for
sql_mode . |
The following dialect parameters applicable to
CSV
and JSON
file
formats.
If these parameters are used with Avro
or Parquet
file formats, an error
occurs.
encoding
: Defines the character encoding. The default is"utf8mb4"
.-
record_delimiter
: Defines one or more characters used to delimit records. The maximum record delimiter length is 64 characters. You can add any string as acustom_record_delimiter
.The default record delimiter for
json
is"\n"
. The only alternative forjson
is"\r\n"
.For
CSV
files, if you set therecord_delimiter
toauto
, Auto Parallel Load automatically detects the record delimiters and composite record delimiters with field delimiters as prefixes. Supported as of MySQL 9.2.0 and set as the default value.The following record delimiters detected automatically by Auto Parallel Load:
\r
: Carriage return.\n
: Line feed. This is the default for all MySQL versions prior to 9.2.0.\r\n
: Carriage return and line feed
The following dialect parameters are applicable to
CSV
and Parquet
file
formats.
-
embed_model_id
: Supported as of MySQL 9.3.2. Lets you specify the embedding model forVECTOR
columns in the table. You must specify a model supported by MySQL HeatWave GenAI. You cannot use this to load data manually. Learn more about the Vector Data Type.The following example loads a table using Lakehouse Auto Parallel Load and sets the embedding model for all
VECTOR
columns detected in the table tominilm
. Keep in mind that if the table already exists and there areVECTOR
columns with an embedding model already set, the existing model is not replaced with the one specified in this example.mysql> SET @input_list = '[{ "db_name": "data_db", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv", "embed_model_id": "minilm"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
The following dialect parameters are applicable only to
CSV
file formats.
-
field_delimiter
: Defines one or more characters used to enclose fields. The maximum field delimiter length is 64 characters. You can add any string as acustom_field_delimiter
.When the
field_delimiter
is set toauto
, Auto Parallel Load automatically detects the field delimiters. Supported as of MySQL 9.2.0 and set as the default value.The following field delimiters are detected automatically by Auto Parallel Load:
|
: Pipe. Default for all MySQL versions prior to 9.2.0.,
: Comma\t
: Tab;
: Semicolon
The following example loads a table manually with a uniform resource identifier (URI) and sets the
record_delimiter
, andfield_delimiter
. If you are on a version earlier than MySQL 9.4.0, you must useCREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1( col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "record_delimiter": ";", "field_delimiter": "\r\n"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
escape_character
: Defines one or more characters used to escape special characters. The default is"\\"
. You can add any string as acustom_escape_character
.-
quotation_marks
: Defines one or more characters used to enclose fields. The default is"\'"
. You can add any string as acustom_quotation_marks
.The following example loads a table manually with a uniform resource identifier (URI) and sets the
escape_character
, andquotation_marks
. If you are on a version earlier than MySQL 9.4.0, you must useCREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1( col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "escape_character": "\\\", "quotation_marks": "\""}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
date_format
: The date format, see:date_format
. You can also set date formats for each column by using the columnENGINE_ATTRIBUTE
option, which overrides the format in the dialect parameter. This format is ignored if you load data using Lakehouse Auto Parallel Load.time_format
: The time format, see: String and Numeric Literals in Date and Time Context. You can also set time formats for each column by using the columnENGINE_ATTRIBUTE
option, which overrides the format in the dialect parameter. This format is ignored if you load data using Lakehouse Auto Parallel Load.-
timestamp_format
: Supported as of MySQL 9.0.1. The timestamp format, see:date_format
. You can also set timestamp formats for each column by using the columnENGINE_ATTRIBUTE
option, which overrides the format in the dialect parameter. This format is ignored if you load data using Lakehouse Auto Parallel Load.The following example loads a table manually with a uniform resource identifier (URI) and sets the
date_format
,time_format
, andtimestamp_format
. If you are on a version earlier than MySQL 9.4.0, you must useCREATE TABLE
and setENGINE
tolakehouse
, andSECONDARY_ENGINE
torapid
. See CREATE TABLE Statement.mysql> CREATE EXTERNAL TABLE table_1( col_1 date, col_2 time, col_3 timestamp, col_4 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "date_format": "%W %M %Y", "time_format": "hh:mm:ss", "timestamp_format": "%Y-%m-%d %H:%i:%s"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}'; mysql> ALTER TABLE table_1 SECONDARY_LOAD;
-
trim_spaces
: Removes/retains the leading and trailing spaces, based on the set option. The default value isfalse
.The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and sets
trim_spaces
totrue
.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv", "trim_spaces": true}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
skip_rows
: The number of rows to skip at the start of the file. The default value is0
and the maximum value is20
.-
has_header
: Adds a header row to theCSV
file, based on the selected option. The default isfalse
.If
has_header
andskip_rows
are both defined, Lakehouse first skips the number of rows, and then uses the next row as the header row.The following example loads a table using Lakehouse Auto Parallel Load with a uniform resource identifier (URI) and sets
skip_rows
andhas_header
. Since both parameters are set, the first row is skipped and the second row is set as the header row.mysql> SET @input_list = '[{ "db_name": "data_tables", "tables": [{ "table_name": "table_1", "engine_attribute": { "dialect": {"format": "csv", "skip_rows": 1, "has_header": true}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}] } }] }]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
Learn how to Load Structured Data Using Lakehouse Auto Parallel Load
Learn how to Load Structured Data Manually
Learn how to Refresh Data Using Event-Based Incremental Load