MySQL :: HeatWave User Guide :: 5.2.2 Lakehouse External Table Syntax

HeatWave User Guide / ... / Lakehouse External Table Syntax

5.2.2 Lakehouse External Table Syntax

5.2.2.1 File Name, Pattern and Prefix Examples

Lakehouse uses external tables to load the data from Object Storage.

MySQL 9.2.0 adds support for the following:

Loading CSV files without knowing the exact values of the following dialect parameters in advance:

record_delimiter: If set to auto, Lakehouse Autopilot can automatically detect the record delimiters and composite record delimiters with field delimiters as prefixes.

The following record delimiters can be detected automatically: \r (carriage return), \n (line feed), \r\n (carriage return and line feed), and ; (semicolon).

Default value of record_delimiter for CSV files is auto.
field_delimiter: If set to auto, Lakehouse Autopilot can automatically detect the field delimiters.

The following field delimiters can be detected automatically: | (pipe), , (comma), \t (tab), and ; (semicolon).

Default value of field_delimiter for CSV files is auto.

The detected delimiters are then used in the generated CREATE TABLE statement.

Press CTRL+C to copy
 
mysql>CREATE TABLE table_name create_definition 
(column_name ENGINE_ATTRIBUTE '
  {"date_format": "default", 
   "time_format": "default",
   "timestamp_format": "default"
  }')
ENGINE=lakehouse SECONDARY_ENGINE=rapid ENGINE_ATTRIBUTE='{
  "dialect": 
  {
    "format": "avro" | "csv" | "json" | "parquet",
    "check_constraints": true | false,
    "field_delimiter": "|" | "," | "\t" | ";" | "auto", 
    "record_delimiter": "\r" | "\n" | "\r\n" | "auto", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "default", 
    "time_format": "default",
    "timestamp_format": "default",
    "trim_spaces": true | false,
    "has_header": true | false,
    "is_strict_mode": true | false,
    "allow_missing_files": true | false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

or

file_section: {
  "par": "PAR URL",
  ["prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),]
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

MySQL 9.0.1-u1 adds support for the following:

timestamp_format as a dialect parameter allows you to customize the format for columns of the DATETIME and TIMESTAMP data types.
ENGINE_ATTRIBUTE option for specific columns. Use this option to customize the format for columns of the DATETIME and TIMESTAMP data types. Formats set with this parameter override formats set with the dialect parameter.

Press CTRL+C to copy
 
mysql>CREATE TABLE table_name create_definition 
(column_name ENGINE_ATTRIBUTE '
  {"date_format": "default", 
   "time_format": "default",
   "timestamp_format": "default"
  }')
ENGINE=lakehouse SECONDARY_ENGINE=rapid ENGINE_ATTRIBUTE='{
  "dialect": 
  {
    "format": "avro" | "csv" | "json" | "parquet",
    "check_constraints": true | false,
    "field_delimiter": "|", 
    "record_delimiter": "|\n", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "default", 
    "time_format": "default",
    "timestamp_format": "default",
    "trim_spaces": true | false,
    "has_header": true | false,
    "is_strict_mode": true | false,
    "allow_missing_files": true | false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

or

file_section: {
  "par": "PAR URL",
  ["prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),]
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

MySQL 8.4.0 adds support for the following:

The JSON file format.
Support for primary key and unique key constraint validation.
is_strict_mode as a dialect parameter now supports all file formats. It can override the global sql_mode.
allow_missing_files is a dialect parameter and a file parameter that can allow missing files or not.

Press CTRL+C to copy
 
mysql>CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid 
ENGINE_ATTRIBUTE='{                    
  "dialect": 
  {
    "format": "avro" | "csv" | "json" | "parquet",
    "check_constraints": true | false,
    "field_delimiter": "|", 
    "record_delimiter": "|\n", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "auto", 
    "time_format": "auto", 
    "trim_spaces": true | false,
    "has_header": true | false,
    "is_strict_mode": true | false,
    "allow_missing_files": true | false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

or

file_section: {
  "par": "PAR URL",
  ["prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),]
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

Before MySQL 8.4.0:

Press CTRL+C to copy
 
mysql>CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid 
ENGINE_ATTRIBUTE='{                    
  "dialect": 
  {
    "format": "avro" | "csv" | "parquet",
    "field_delimiter": "|", 
    "record_delimiter": "|\n", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "auto", 
    "time_format": "auto", 
    "trim_spaces": true | false,
    "is_strict_mode": true | false,
    "has_header": true | false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename") | ("pattern" : "pattern"),
  "is_strict_mode": true | false
}

or

file_section: {
  "par": "PAR URL",
  ["prefix": "prefix") | ("name": "filename") | ("pattern" : "pattern"),]
  "is_strict_mode": true | false
}

ENGINE: Set to lakehouse.
SECONDARY_ENGINE: Set to rapid.
ENGINE_ATTRIBUTE: JSON object literal. Defines the location of files, the file format, and how the file format is handled.

Use key-value pairs in JSON format to specify options. Lakehouse uses the default setting if there is no defined option. Use NULL to specify no arguments.
- dialect: Defines the data configuration. Optional.
  - dialect parameters that apply to all file formats:
    
    format: avro, csv, json or parquet. The default is csv.
    
    It is not possible to define multiple formats per table.
    
    Tables created with json format must only have a single column that conforms to the JSON data type, see: The JSON Data Type.
    
    As of MySQL 8.4.0, json only supports Newline Delimited JSON files.
    
    For parquet see: Section 5.7.1, “Parquet Data Type Conversions”.
    
    check_constraints: Whether to validate primary key and unique key constraints or not. The default is true. Supported as of MySQL 8.4.0.
    
    If set to true, then Lakehouse validates primary key and unique key constraints.
    
    If set to false, then Lakehouse does not validate primary key and unique key constraints.
    
    is_strict_mode: Whether the loading takes place in strict mode, true, or non-strict mode, false. This setting overrides the global sql_mode. The default is the value of sql_mode. See Strict SQL Mode. The file common parameter is_strict_mode can override this setting.
    
    If set to true, then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.
    
    If set to false, then missing files, empty columns, formatting errors or parsing errors display a warning, and loading continues.
    
    As of MySQL 8.4.0, the dialect parameter is_strict_mode applies to all file formats. Before MySQL 8.4.0, it only applies to the CSV file format. For Avro and Parquet file formats, use the file parameter is_strict_mode to define strict mode before MySQL 8.4.0.
    
    allow_missing_files: Whether to allow missing files or not. This overrides the dialect parameter is_strict_mode for missing files. Supported as of MySQL 8.4.0.
    
    If set to true, then any missing files do not throw an error, and loading does not stop, unless all files are missing.
    
    If set to false, then any missing files throw an error, and loading stops.
    
    A missing file is defined as:
    
    With the name parameter: there is no file with that name.
    
    With the pattern parameter: there are no files that match the pattern.
    
    With the prefix parameter: there are no files with that prefix.
  - dialect parameters that only apply to csv and json:
    
    The use of any of these parameters with avro or parquet will produce an error.
    
    encoding: Defines the character encoding. The default is "utf8mb4".
    
    record_delimiter: Defines one or more characters used to delimit records. The maximum field delimiter length is 64 characters.
    
    The default record delimiter for csv is "auto", and the default for json is "\n". The only alternative for json is "\r\n".
    
    As of MySQL 9.2.0, Lakehouse Autopilot can automatically detect the record delimiters and composite record delimiters with field delimiters as prefixes. You can load the files, without knowing the exact dialect parameters.
    
    The record delimiters detected automatically are: \r (carriage return), \n (line feed), \r\n (carriage return and line feed), and ; (semicolon).
  - dialect parameters that only apply to csv:
    
    The use of any of these parameters with avro, json or parquet will produce an error.
    
    field_delimiter: Defines one or more characters used to enclose fields. The maximum field delimiter length is 64 characters. The default is "auto".
    
    As of MySQL 9.2.0, Lakehouse Autopilot can automatically detect the field delimiters. You can load the files, without knowing the exact dialect parameters.
    
    The field delimiters detected automatically are: | (pipe), , (comma), \t (tab), and ; (semicolon).
    
    escape_character: Defines one or more characters used to escape special characters. The default is "\\".
    
    escape_character: Defines one or more characters used to escape special characters. The default is "\\".
    
    quotation_marks: Defines one or more characters used to enclose fields. The default is "\'".
    
    skip_rows: The number of rows to skip at the start of the file. The maximum value is 20. The default is 0.
    
    See comments for has_header.
    
    date_format: The date format, see: date_format. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.
    
    time_format: The time format, see: String and Numeric Literals in Date and Time Context. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.
    
    timestamp_format: The timestamp format, see: date_format. Optionally, you can set timestamp formats for each column by using the column ENGINE_ATTRIBUTE option, which overrides the format in the dialect parameter. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”. Supported as of MySQL 9.0.1.
    
    trim_spaces: Whether to remove leading and trailing spaces, or not. The default is false.
    
    has_header: Whether the CSV file has a header row, or not. The default is false.
    
    If has_header and skip_rows are both defined, Lakehouse first skips the number of rows, and then uses the next row as the header row.
- file: Defines the Object Storage files. Required.
  
  Lakehouse supports a maximum of 256 file locations. To define more than 256, store the files under the same bucket or use prefix or pattern.
  - file parameters for resource principals, see: Section 5.5.2, “Resource Principals”.
    
    bucket: The bucket name.
    
    namespace: The tenancy namespace
    
    region: The region that the tenancy is in.
  - file parameters for pre-authenticated requests, see: Section 5.5.1, “Pre-Authenticated Requests”.
    
    par: The PAR URL.
    
    Do not specify a region, namespace or bucket with par. That information is contained in the PAR URL and will generate an error if defined in separate parameters. See: Section 5.5.1.1, “Recommendations”.
  - file parameters that apply to all file formats:
    
    Use one or more of the following parameters, unless the target defines a specific file:
    
    name: A specific Object Storage file name. For Section 5.5.1, “Pre-Authenticated Requests”, the file name provided in the PAR URL is considered.
    
    pattern: A regular expression that defines a set of Object Storage files. The pattern follows the modified ECMAScript regular expression grammar, see: Modified ECMAScript regular expression grammar.
    
    prefix: The prefix for a set of Object Storage files. For Section 5.5.1, “Pre-Authenticated Requests”, prefix or bucket name present in the PAR URL is considered. For more information, Section 5.5.1.2, “Pre-Authenticated Request Examples”.
    
    See: Section 5.2.2.1, “File Name, Pattern and Prefix Examples”.
    
    is_strict_mode: Whether the loading takes place in strict mode, true, or non-strict mode, false. This overrides the dialect parameter is_strict_mode.
    
    allow_missing_files: Whether to allow missing files or not. This overrides the file parameter is_strict_mode for missing files, and the dialect parameter allow_missing_files. Supported as of MySQL 8.4.0.
    
    If set to true, then any missing files do not throw an error, and loading does not stop.
    
    If set to false, then any missing files throw an error, and loading stops.

PREV HOME UP NEXT