Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.3Mb
PDF (A4) - 2.3Mb


5.2.2.1 External Table Syntax

External tables utilize table-level and column-level ENGINE_ATTRIBUTE options of the CREATE TABLE statement to specify parameters needed to process data stored externally in Object Storage. The column-level ENGINE_ATTRIBUTE override specific dialect parameters, if required.

You can specify these ENGINE_ATTRIBUTE options as follows:

CREATE TABLE table_name 
(col_name data_type [ENGINE_ATTRIBUTE='Col_Engine_Attribute_JSON'], ...)
ENGINE=lakehouse 
SECONDARY_ENGINE=rapid 
ENGINE_ATTRIBUTE='Tbl_Engine_Attribute_JSON';
External Table ENGINE_ATTRIBUTE Options

The external table ENGINE_ATTRIBUTE is a JSON object that consists of the dialect parameters and the file parameters. They are represented as follows:

Tbl_Engine_Attribute_JSON: {
  begin-object
    "file": file_JSON_array
    [, "dialect": dialect_JSON_object ]
  end-object
}

file_JSON_array: {
  begin-array file_def [, file_def] … end-array
}    

file_def: file_RP_JSON | file_PAR_JSON

file_def_RP_JSON: {
  "bucket": "bucket_name",
  "namespace": "namespace_name",
  "region": "region_name",
  {"prefix": "prefix_value" | "name": "filename" | "pattern" : "pattern_value"},
  "is_strict_mode": {true | false},
  "allow_missing_files": {true | false}
}

file_PAR_JSON: {
  "par": "PAR URL",
  ["prefix": "prefix_value", | "name": "filename", | "pattern" : "pattern_value",]
  "is_strict_mode": {true | false},
  "allow_missing_files": {true | false}
}

dialect_JSON_object: begin-object dialect_option_key_value [, dialect_option_key_value]... end-object
 
dialect_option_key_value: {
    "format": {"avro" | "csv" | "json" | "parquet"}  
  | "check_constraints": {true | false}
  | "field_delimiter": {"|" | "," | "\t" | ";" | "auto" | "custom_field_delimiter"} 
  | "record_delimiter": {"\r" | "\n" | "\r\n" | "auto" | "custom_record_delimiter"} 
  | "escape_character": {"\\" | "custom_escape_character"} 
  | "quotation_marks": {"\"" | "custom_quotation_marks"}  
  | "encoding": "utf8mb4"
  | "date_format": "custom_date_format" 
  | "time_format": "custom_time_format"
  | "timestamp_format": "custom_timestamp_format"
  | "trim_spaces": {true | false}
  | "skip_rows": Integer_20
  | "has_header": {true | false}
  | "is_strict_mode": {true | false}
} 
Integer_20: Represents integer value between 0 and 20.
Note
  • When you specify ENGINE_ATTRIBUTE parameters in JSON format, you must escape certain characters using \ (backslash). For example, to specify linefeed as a record delimiter, you need to specify it as "\\n" in the ENGINE_ATTRIBUTE parameter’s JSON value.

  • In the syntax description, square brackets ("[" and "]") indicate optional words or clauses.

  • The syntax description uses a vertical bar ("|") to separate alternatives. The list of alternatives are enclosed in square brackets "[" and "]") when one member may be chosen. The list of alternatives are enclosed in curly braces "{" and "}") when one member must be chosen.

  • The terms begin-array and end-array in the syntax represent a JSON array. While writing the sample, replace them with [].

  • The terms begin-object and end-object in the syntax represent a JSON object. While writing the sample, replace them with {}.

Column Definitions

Some dialect parameters can be overridden at the column level by specifying column-level ENGINE_ATTRIBUTE parameters. You can define the date_format, time_format and timestamp_format as optional column-level ENGINE_ATTRIBUTE parameters . For more information, refer to dialect parameters.

Col_Engine_Attribute_JSON: begin-object column_option_key_value [, column_option_key_value]... end-object
 
column_option_key_value: {
    "date_format": "custom_date_format"  
  | "time_format": "custom_time_format"
  | "timestamp_format": "custom_timestamp_format"
}

For more details about column_definition, refer to CREATE TABLE statement.

File Options

The mandatory parameters required to create an external table are:

  • ENGINE: Set to lakehouse.

  • SECONDARY_ENGINE: Set to rapid.

  • ENGINE_ATTRIBUTE: JSON object literal. Defines the location of files, the file format, and how the file format is handled.

    Use key-value pairs in JSON format to specify options. Lakehouse uses the default setting if there is no defined option. Use NULL to specify no arguments.

  • file: Defines the Object Storage files. You can define the file locations using resource principal or PAR URL.

    Lakehouse supports a maximum of 256 file locations. To define more than 256 files, store the files under the same bucket or use prefix or pattern.

    • file parameters for resource principals:

      • bucket: The bucket name.

      • namespace: The tenancy namespace.

      • region: The region that the tenancy is in.

        In MySQL version 9.2.1 or later, if the region is not specified, the region linked to the instance is set as the tenancy region.

    • file parameters for pre-authenticated requests:

      • par: The PAR URL.

        Do not specify a region, namespace or bucket with par. These values are included in the PAR URL and will generate an error if defined as separate parameters. For more information, refer to pre-authenticated request recommendations.

  • The following file parameters apply to all file formats. Use one of the parameters, unless the target defines a specific file.

  • is_strict_mode: Loads data in strict mode or non-strict mode, based on the selected option. This overrides the is_strict_mode dialect parameter. The supported options are:

    • true: Loads the data in strict mode. The data loading will be stopped if there is an error due to then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.

    • false: Loads the data in non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data will be loaded.

  • allow_missing_files: Loads the data in strict mode or non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data will be loaded. This overrides the is_strict_mode file parameter for missing files, and the allow_missing_files dialect parameter. The supported options are:

    • true: If any file is missing, an error will not be thrown and data loading will continue with the existing files, unless all the files are not available.

    • false:If any file is missing then, an error will be thrown and data will not be loaded.

Dialect Parameters

The dialect parameters are optional and are used to define the data configuration.

The following dialect parameters are applicable to all file formats (CSV, JSON, Parquet and Avro).

  • format: The file format defined in a table. You can define only one file format per table. The supported file formats are:

    • CSV: The default file format.

    • JSON: JSON file format is supported as of MySQL 8.4.0. Newline Delimited JSON files are only supported. Tables created with json format must only have a single column that conforms to the JSON data type.

    • Parquet: Parquet data type.

    • Avro

  • check_constraints: Supported as of MySQL 8.4.0. Lakehouse validates primary key and unique key constraints during the initial load based on the selected option. The supported options are:

    • true: The default value. Lakehouse validates primary key and unique key constraints only during the initial load of the table. If there are subsequent loads or refreshes of the table, validation does not occur.

    • false: Lakehouse does not validate primary key and unique key constraints.

  • is_strict_mode: Loads data in strict mode or non-strict mode, based on the selected option. This setting overrides the global sql_mode. By default, is_strict_mode is set to the sql_mode value. See Strict SQL Mode. The file common parameter is_strict_mode can override this setting. The supported options are:

    • true: Loads the data in strict mode. The data loading will be stopped if there is an error due to then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.

      As of MySQL 8.4.0, the dialect parameter is_strict_mode applies to all file formats. Before MySQL 8.4.0, it only applies to the CSV file format. For Avro and Parquet file formats, use the file parameter is_strict_mode to define strict mode before MySQL 8.4.0.

    • false: Loads the data in non-strict mode. Missing files, empty columns, formatting errors or parsing errors display a warning, and data will be loaded.

    • If no value is set, the setting is defined by the sql_mode.

    This setting overrides the global sql_mode. The default is the value of sql_mode. See Strict SQL Mode. The file common parameter is_strict_mode can override this setting.

  • allow_missing_files: Supported as of MySQL 8.4.0. Ignores the missing files and completes the data loading based on the selected option. This overrides the dialect parameter is_strict_mode for missing files. The supported options are:

    • true: An error will not be thrown for any missing file, and data loading will continue with the existing files. An error is thrown if all files are not available.

    • false: If any file is missing then, an error will be thrown and data will not be loaded.

    • If no value is set, the setting is defined by is_strict_mode. If no value is set for is_strict_mode, the setting is defined by the sql_mode.

    A missing file is defined as:

    • With the name parameter: There is no file with that name.

    • With the pattern parameter: There are no files that match the pattern.

    • With the prefix parameter: There are no files with that prefix.

Refer to the following table to review how different combinations of settings for allow_missing_files and is_strict_mode affect errors for missing files.

Table 5.1 Combinations of Settings for allow_missing_files and is_strict_mode

Setting for allow_missing_files and is_strict_mode Description of Possible Errors
  • allow_missing_files=true

  • is_strict_mode=true

No error generated unless all files are missing.
  • allow_missing_files=false

  • is_strict_mode=false

Error generated for any missing files.
  • allow_missing_files=true

  • is_strict_mode=false

No error generated unless all files are missing.
  • allow_missing_files=false

  • is_strict_mode=true

Error generated for any missing files.
  • allow_missing_files not set.

  • is_strict_mode=true

Error generated for any missing files.
  • allow_missing_files not set.

  • is_strict_mode=false

No error generated unless all files are missing.
  • allow_missing_files=true

  • is_strict_mode not set.

No error generated unless all files are missing.
  • allow_missing_files=false

  • is_strict_mode not set.

Error generated for any missing files.
  • allow_missing_files not set.

  • is_strict_mode not set.

Error generation depends on setting for sql_mode.

The following dialect parameters applicable to CSV and JSON file formats.

Note

If these parameters are used with Avro or Parquet file formats, an error will be generated.

  • encoding: Defines the character encoding. The default is "utf8mb4".

  • record_delimiter: Defines one or more characters used to delimit records. The maximum record delimiter length is 64 characters. You can add any string as a custom_record_delimiter.

    The default record delimiter for json is "\n". The only alternative for json is "\r\n".

    For CSV files, if you set the record_delimiter to auto, Lakehouse Autopilot automatically detects the record delimiters and composite record delimiters with field delimiters as prefixes. Supported as of MySQL 9.2.0 and set as the default value.

    The following record delimiters detected automatically by Lakehouse Autopilot:

    • \r: Carriage return.

    • \n: Line feed. This is the default for all MySQL versions prior to 9.2.0.

    • \r\n : Carriage return and line feed

The following dialect parameters are applicable only to CSV file formats.

  • field_delimiter: Defines one or more characters used to enclose fields. The maximum field delimiter length is 64 characters. You can add any string as a custom_field_delimiter.

    When the field_delimiter is set to auto, Lakehouse Autopilot automatically detects the field delimiters. Supported as of MySQL 9.2.0 and set as the default value.

    The following field delimiters are detected automatically by Lakehouse Autopilot:

    • |: Pipe. Default for all MySQL versions prior to 9.2.0.

    • ,: Comma

    • \t: Tab

    • ;: Semicolon

  • escape_character: Defines one or more characters used to escape special characters. The default is "\\". You can add any string as a custom_escape_character.

  • quotation_marks: Defines one or more characters used to enclose fields. The default is "\'". You can add any string as a custom_quotation_marks.

  • date_format: The date format, see: date_format. You can also set date formats for each column by using the column ENGINE_ATTRIBUTE option, which overrides the format in the dialect parameter. This format is ignored while loading data in auto parallel load.

  • time_format: The time format, see: String and Numeric Literals in Date and Time Context. You can also set time formats for each column by using the column ENGINE_ATTRIBUTE option, which overrides the format in the dialect parameter. This format is ignored while loading data in auto parallel load.

  • timestamp_format: Supported as of MySQL 9.0.1. The timestamp format, see: date_format. You can also set timestamp formats for each column by using the column ENGINE_ATTRIBUTE option, which overrides the format in the dialect parameter. This format is ignored while loading data in auto parallel load.

  • trim_spaces: Removes/retains the leading and trailing spaces, based on the set option. The default value is false.

  • skip_rows: The number of rows to skip at the start of the file. The default value is 0 and the maximum value is 20.

  • has_header: Adds a header row to the CSV file, based on the selected option. The default is false.

    If has_header and skip_rows are both defined, Lakehouse first skips the number of rows, and then uses the next row as the header row.