Lakehouse uses external tables to load the data from Object Storage.
MySQL 8.4.0 adds support for the following:
The JSON file format.
Support for primary key and unique key constraint validation.
is_strict_mode
as adialect
parameter now supports all file formats. It can override the globalsql_mode
.allow_missing_files
is adialect
parameter and afile
parameter that can allow missing files or not.
mysql> CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid
ENGINE_ATTRIBUTE='{
"dialect":
{
"format": "avro" | "csv" | "json" | "parquet",
"check_constraints": true | false,
"field_delimiter": "|",
"record_delimiter": "|\n",
"escape_character": "\\",
"quotation_marks":"\"",
"skip_rows": 0,
"encoding": "utf8mb4",
"date_format": "auto",
"time_format": "auto",
"trim_spaces": true | false,
"has_header": true | false,
"is_strict_mode": true | false,
"allow_missing_files": true | false
},
"file":
[
file_section [, file_section, file_section, ...]
]
}';
file_section: {
"bucket": "bucket_name",
"namespace": "namespace",
"region": "region",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false,
"allow_missing_files": true | false
}
or
file_section: {
"par": "PAR URL",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false,
"allow_missing_files": true | false
}
Before MySQL 8.4.0:
mysql> CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid
ENGINE_ATTRIBUTE='{
"dialect":
{
"format": "avro" | "csv" | "parquet",
"field_delimiter": "|",
"record_delimiter": "|\n",
"escape_character": "\\",
"quotation_marks":"\"",
"skip_rows": 0,
"encoding": "utf8mb4",
"date_format": "auto",
"time_format": "auto",
"trim_spaces": true | false,
"is_strict_mode": true | false,
"has_header": true | false
},
"file":
[
file_section [, file_section, file_section, ...]
]
}';
file_section: {
"bucket": "bucket_name",
"namespace": "namespace",
"region": "region",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false
}
or
file_section: {
"par": "PAR URL",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false
}
ENGINE
: Set tolakehouse
.SECONDARY_ENGINE
: Set torapid
.-
ENGINE_ATTRIBUTE
: JSON object literal. Defines the location of files, the file format, and how the file format is handled.Use key-value pairs in
JSON
format to specifyoptions
. Lakehouse uses the default setting if there is no defined option. UseNULL
to specify no arguments.-
dialect
: Defines the data configuration. Optional.-
dialect
parameters that apply to all file formats:-
format
:avro
,csv
,json
orparquet
. The default iscsv
.It is not possible to define multiple formats per table.
Tables created with
json
format must only have a single column that conforms to theJSON
data type, see: The JSON Data Type.As of MySQL 8.4.0,
json
only supports Newline Delimited JSON files.For
parquet
see: Section 4.5.1, “Parquet Data Type Conversions”. -
check_constraints
: Whether to validate primary key and unique key constraints or not. The default istrue
. Supported as of MySQL 8.4.0.If set to
true
, then Lakehouse validates primary key and unique key constraints.If set to
false
, then Lakehouse does not validate primary key and unique key constraints. -
is_strict_mode
: Whether the loading takes place in strict mode,true
, or non-strict mode,false
. This setting overrides the globalsql_mode
. The default is the value ofsql_mode
. See Strict SQL Mode. Thefile
common parameteris_strict_mode
can override this setting.If set to
true
, then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.If set to
false
, then missing files, empty columns, formatting errors or parsing errors display a warning, and loading continues.As of MySQL 8.4.0, the
dialect
parameteris_strict_mode
applies to all file formats. Before MySQL 8.4.0, it only applies to the CSV file format. For Avro and Parquet file formats, use thefile
parameteris_strict_mode
to define strict mode before MySQL 8.4.0. -
allow_missing_files
: Whether to allow missing files or not. This overrides thedialect
parameteris_strict_mode
for missing files. Supported as of MySQL 8.4.0.If set to
true
, then any missing files do not throw an error, and loading does not stop, unless all files are missing.If set to
false
, then any missing files throw an error, and loading stops.A missing file is defined as:
With the
name
parameter: there is no file with that name.With the
pattern
parameter: there are no files that match the pattern.With the
prefix
parameter: there are no files with that prefix.
-
-
dialect
parameters that only apply tocsv
andjson
:The use of any of these parameters with
avro
orparquet
will produce an error.encoding
: Defines the character encoding. The default is"utf8mb4"
.-
record_delimiter
: Defines one or more characters used to delimit records. The maximum field delimiter length is 64 characters.The default for
csv
is"|\n"
.The default for
json
is"\n"
. The only alternative forjson
is"\r\n"
.
-
dialect
parameters that only apply tocsv
:The use of any of these parameters with
avro
,json
orparquet
will produce an error.field_delimiter
: Defines one or more characters used to enclose fields. The maximum field delimiter length is 64 characters. The default is"|"
.escape_character
: Defines one or more characters used to escape special characters. The default is"\\"
.quotation_marks
: Defines one or more characters used to enclose fields. The default is"\""
.-
skip_rows
: The number of rows to skip at the start of the file. The maximum value is20
. The default is0
.See comments for
has_header
. date_format
: The date format, see:date_format
. The default is"auto"
.time_format
: The time format, see: String and Numeric Literals in Date and Time Context. The default is"auto"
.trim_spaces
: Whether to remove leading and trailing spaces, or not. The default isfalse
.-
has_header
: Whether the CSV file has a header row, or not. The default isfalse
.If
has_header
andskip_rows
are both defined, Lakehouse first skips the number of rows, and then uses the next row as the header row.
-
-
file
: Defines the Object Storage files. Required.Lakehouse supports a maximum of 256 file locations. To define more than 256, store the files under the same bucket or use
prefix
orpattern
.-
file
parameters for resource principals, see: Section 4.3.2, “Resource Principals”.bucket
: The bucket name.namespace
: The tenancy namespaceregion
: The region that the tenancy is in.
-
file
parameters for pre-authenticated requests, see: Section 4.3.1, “Pre-Authenticated Requests”.-
par
: The PAR URL.Do not specify a
region
,namespace
orbucket
withpar
. That information is contained in the PAR URL and will generate an error if defined in separate parameters. See: Section 4.3.1.1, “Recommendations”.
-
-
file
parameters that apply to all file formats:-
Use one or more of the following parameters, unless the target defines a specific file:
name
: A specific Object Storage file name.pattern
: A regular expression that defines a set of Object Storage files. The pattern follows the modified ECMAScript regular expression grammar, see: Modified ECMAScript regular expression grammar.prefix
: The prefix for a set of Object Storage files.
is_strict_mode
: Whether the loading takes place in strict mode,true
, or non-strict mode,false
. This overrides thedialect
parameteris_strict_mode
.-
allow_missing_files
: Whether to allow missing files or not. This overrides thefile
parameteris_strict_mode
for missing files, and thedialect
parameterallow_missing_files
. Supported as of MySQL 8.4.0.If set to
true
, then any missing files do not throw an error, and loading does not stop.If set to
false
, then any missing files throw an error, and loading stops.
-
-
-