Lakehouse uses external tables to load the data from Object Storage.
MySQL 9.0.1-u1 adds support for the following:
timestampe_format
as adialect
parameter allows you to customize the format for columns of theDATETIME
andTIMESTAMP
data types.ENGINE_ATTRIBUTE
option for specific columns. Use this option to customize the format for columns of theDATETIME
andTIMESTAMP
data types. Formats set with this parameter override formats set with thedialect
parameter.
mysql> CREATE TABLE table_name create_definition
(column_name ENGINE_ATTRIBUTE '
{"date_format": "default",
"time_format": "default",
"timestamp_format": "default"
}')
ENGINE=lakehouse SECONDARY_ENGINE=rapid ENGINE_ATTRIBUTE='{
"dialect":
{
"format": "avro" | "csv" | "json" | "parquet",
"check_constraints": true | false,
"field_delimiter": "|",
"record_delimiter": "|\n",
"escape_character": "\\",
"quotation_marks":"\"",
"skip_rows": 0,
"encoding": "utf8mb4",
"date_format": "default",
"time_format": "default",
"timestamp_format": "default",
"trim_spaces": true | false,
"has_header": true | false,
"is_strict_mode": true | false,
"allow_missing_files": true | false
},
"file":
[
file_section [, file_section, file_section, ...]
]
}';
file_section: {
"bucket": "bucket_name",
"namespace": "namespace",
"region": "region",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false,
"allow_missing_files": true | false
}
or
file_section: {
"par": "PAR URL",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false,
"allow_missing_files": true | false
}
MySQL 8.4.0 adds support for the following:
The JSON file format.
Support for primary key and unique key constraint validation.
is_strict_mode
as adialect
parameter now supports all file formats. It can override the globalsql_mode
.allow_missing_files
is adialect
parameter and afile
parameter that can allow missing files or not.
mysql> CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid
ENGINE_ATTRIBUTE='{
"dialect":
{
"format": "avro" | "csv" | "json" | "parquet",
"check_constraints": true | false,
"field_delimiter": "|",
"record_delimiter": "|\n",
"escape_character": "\\",
"quotation_marks":"\"",
"skip_rows": 0,
"encoding": "utf8mb4",
"date_format": "auto",
"time_format": "auto",
"trim_spaces": true | false,
"has_header": true | false,
"is_strict_mode": true | false,
"allow_missing_files": true | false
},
"file":
[
file_section [, file_section, file_section, ...]
]
}';
file_section: {
"bucket": "bucket_name",
"namespace": "namespace",
"region": "region",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false,
"allow_missing_files": true | false
}
or
file_section: {
"par": "PAR URL",
("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
"is_strict_mode": true | false,
"allow_missing_files": true | false
}
Before MySQL 8.4.0:
mysql> CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid
ENGINE_ATTRIBUTE='{
"dialect":
{
"format": "avro" | "csv" | "parquet",
"field_delimiter": "|",
"record_delimiter": "|\n",
"escape_character": "\\",
"quotation_marks":"\"",
"skip_rows": 0,
"encoding": "utf8mb4",
"date_format": "auto",
"time_format": "auto",
"trim_spaces": true | false,
"is_strict_mode": true | false,
"has_header": true | false
},
"file":
[
file_section [, file_section, file_section, ...]
]
}';
file_section: {
"bucket": "bucket_name",
"namespace": "namespace",
"region": "region",
("prefix": "prefix") | ("name": "filename") | ("pattern" : "pattern"),
"is_strict_mode": true | false
}
or
file_section: {
"par": "PAR URL",
("prefix": "prefix") | ("name": "filename") | ("pattern" : "pattern"),
"is_strict_mode": true | false
}
ENGINE
: Set tolakehouse
.SECONDARY_ENGINE
: Set torapid
.-
ENGINE_ATTRIBUTE
: JSON object literal. Defines the location of files, the file format, and how the file format is handled.Use key-value pairs in
JSON
format to specifyoptions
. Lakehouse uses the default setting if there is no defined option. UseNULL
to specify no arguments.-
dialect
: Defines the data configuration. Optional.-
dialect
parameters that apply to all file formats:-
format
:avro
,csv
,json
orparquet
. The default iscsv
.It is not possible to define multiple formats per table.
Tables created with
json
format must only have a single column that conforms to theJSON
data type, see: The JSON Data Type.As of MySQL 8.4.0,
json
only supports Newline Delimited JSON files.For
parquet
see: Section 5.6.1, “Parquet Data Type Conversions”. -
check_constraints
: Whether to validate primary key and unique key constraints or not. The default istrue
. Supported as of MySQL 8.4.0.If set to
true
, then Lakehouse validates primary key and unique key constraints.If set to
false
, then Lakehouse does not validate primary key and unique key constraints. -
is_strict_mode
: Whether the loading takes place in strict mode,true
, or non-strict mode,false
. This setting overrides the globalsql_mode
. The default is the value ofsql_mode
. See Strict SQL Mode. Thefile
common parameteris_strict_mode
can override this setting.If set to
true
, then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.If set to
false
, then missing files, empty columns, formatting errors or parsing errors display a warning, and loading continues.As of MySQL 8.4.0, the
dialect
parameteris_strict_mode
applies to all file formats. Before MySQL 8.4.0, it only applies to the CSV file format. For Avro and Parquet file formats, use thefile
parameteris_strict_mode
to define strict mode before MySQL 8.4.0. -
allow_missing_files
: Whether to allow missing files or not. This overrides thedialect
parameteris_strict_mode
for missing files. Supported as of MySQL 8.4.0.If set to
true
, then any missing files do not throw an error, and loading does not stop, unless all files are missing.If set to
false
, then any missing files throw an error, and loading stops.A missing file is defined as:
With the
name
parameter: there is no file with that name.With the
pattern
parameter: there are no files that match the pattern.With the
prefix
parameter: there are no files with that prefix.
-
-
dialect
parameters that only apply tocsv
andjson
:The use of any of these parameters with
avro
orparquet
will produce an error.encoding
: Defines the character encoding. The default is"utf8mb4"
.-
record_delimiter
: Defines one or more characters used to delimit records. The maximum field delimiter length is 64 characters.The default for
csv
is"|\n"
.The default for
json
is"\n"
. The only alternative forjson
is"\r\n"
.
-
dialect
parameters that only apply tocsv
:The use of any of these parameters with
avro
,json
orparquet
will produce an error.field_delimiter
: Defines one or more characters used to enclose fields. The maximum field delimiter length is 64 characters. The default is"|"
.escape_character
: Defines one or more characters used to escape special characters. The default is"\\"
.quotation_marks
: Defines one or more characters used to enclose fields. The default is"\""
.-
skip_rows
: The number of rows to skip at the start of the file. The maximum value is20
. The default is0
.See comments for
has_header
. date_format
: The date format, see:date_format
. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.time_format
: The time format, see: String and Numeric Literals in Date and Time Context. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.timestampe_format
: The timestamp format, see:date_format
. Optionally, you can set timestamp formats for each column by using the columnENGINE_ATTRIBUTE
option, which overrides the format in thedialect
parameter. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.trim_spaces
: Whether to remove leading and trailing spaces, or not. The default isfalse
.-
has_header
: Whether the CSV file has a header row, or not. The default isfalse
.If
has_header
andskip_rows
are both defined, Lakehouse first skips the number of rows, and then uses the next row as the header row.
-
-
file
: Defines the Object Storage files. Required.Lakehouse supports a maximum of 256 file locations. To define more than 256, store the files under the same bucket or use
prefix
orpattern
.-
file
parameters for resource principals, see: Section 5.4.2, “Resource Principals”.bucket
: The bucket name.namespace
: The tenancy namespaceregion
: The region that the tenancy is in.
-
file
parameters for pre-authenticated requests, see: Section 5.4.1, “Pre-Authenticated Requests”.-
par
: The PAR URL.Do not specify a
region
,namespace
orbucket
withpar
. That information is contained in the PAR URL and will generate an error if defined in separate parameters. See: Section 5.4.1.1, “Recommendations”.
-
-
file
parameters that apply to all file formats:-
Use one or more of the following parameters, unless the target defines a specific file:
name
: A specific Object Storage file name.pattern
: A regular expression that defines a set of Object Storage files. The pattern follows the modified ECMAScript regular expression grammar, see: Modified ECMAScript regular expression grammar.prefix
: The prefix for a set of Object Storage files.
See: Section 5.2.2.1, “File Name, Pattern and Prefix Examples”.
is_strict_mode
: Whether the loading takes place in strict mode,true
, or non-strict mode,false
. This overrides thedialect
parameteris_strict_mode
.-
allow_missing_files
: Whether to allow missing files or not. This overrides thefile
parameteris_strict_mode
for missing files, and thedialect
parameterallow_missing_files
. Supported as of MySQL 8.4.0.If set to
true
, then any missing files do not throw an error, and loading does not stop.If set to
false
, then any missing files throw an error, and loading stops.
-
-
-