HeatWave User Guide  /  ...  /  Lakehouse External Table Syntax

5.2.2 Lakehouse External Table Syntax

Lakehouse uses external tables to load the data from Object Storage.

MySQL 9.0.1-u1 adds support for the following:

  • timestampe_format as a dialect parameter allows you to customize the format for columns of the DATETIME and TIMESTAMP data types.

  • ENGINE_ATTRIBUTE option for specific columns. Use this option to customize the format for columns of the DATETIME and TIMESTAMP data types. Formats set with this parameter override formats set with the dialect parameter.

mysql> CREATE TABLE table_name create_definition 
(column_name ENGINE_ATTRIBUTE '
  {"date_format": "default", 
   "time_format": "default",
   "timestamp_format": "default"
  }')
ENGINE=lakehouse SECONDARY_ENGINE=rapid ENGINE_ATTRIBUTE='{
  "dialect": 
  {
    "format": "avro" | "csv" | "json" | "parquet",
    "check_constraints": true | false,
    "field_delimiter": "|", 
    "record_delimiter": "|\n", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "default", 
    "time_format": "default",
    "timestamp_format": "default",
    "trim_spaces": true | false,
    "has_header": true | false,
    "is_strict_mode": true | false,
    "allow_missing_files": true | false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

or

file_section: {
  "par": "PAR URL",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

MySQL 8.4.0 adds support for the following:

  • The JSON file format.

  • Support for primary key and unique key constraint validation.

  • is_strict_mode as a dialect parameter now supports all file formats. It can override the global sql_mode.

  • allow_missing_files is a dialect parameter and a file parameter that can allow missing files or not.

mysql> CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid 
ENGINE_ATTRIBUTE='{                    
  "dialect": 
  {
    "format": "avro" | "csv" | "json" | "parquet",
    "check_constraints": true | false,
    "field_delimiter": "|", 
    "record_delimiter": "|\n", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "auto", 
    "time_format": "auto", 
    "trim_spaces": true | false,
    "has_header": true | false,
    "is_strict_mode": true | false,
    "allow_missing_files": true | false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

or

file_section: {
  "par": "PAR URL",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false,
  "allow_missing_files": true | false
}

Before MySQL 8.4.0:

mysql> CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid 
ENGINE_ATTRIBUTE='{                    
  "dialect": 
  {
    "format": "avro" | "csv" | "parquet",
    "field_delimiter": "|", 
    "record_delimiter": "|\n", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "auto", 
    "time_format": "auto", 
    "trim_spaces": true | false,
    "is_strict_mode": true | false,
    "has_header": true | false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename") | ("pattern" : "pattern"),
  "is_strict_mode": true | false
}

or

file_section: {
  "par": "PAR URL",
  ("prefix": "prefix") | ("name": "filename") | ("pattern" : "pattern"),
  "is_strict_mode": true | false
}
  • ENGINE: Set to lakehouse.

  • SECONDARY_ENGINE: Set to rapid.

  • ENGINE_ATTRIBUTE: JSON object literal. Defines the location of files, the file format, and how the file format is handled.

    Use key-value pairs in JSON format to specify options. Lakehouse uses the default setting if there is no defined option. Use NULL to specify no arguments.

    • dialect: Defines the data configuration. Optional.

      • dialect parameters that apply to all file formats:

        • format: avro, csv, json or parquet. The default is csv.

          It is not possible to define multiple formats per table.

          Tables created with json format must only have a single column that conforms to the JSON data type, see: The JSON Data Type.

          As of MySQL 8.4.0, json only supports Newline Delimited JSON files.

          For parquet see: Section 5.6.1, “Parquet Data Type Conversions”.

        • check_constraints: Whether to validate primary key and unique key constraints or not. The default is true. Supported as of MySQL 8.4.0.

          If set to true, then Lakehouse validates primary key and unique key constraints.

          If set to false, then Lakehouse does not validate primary key and unique key constraints.

        • is_strict_mode: Whether the loading takes place in strict mode, true, or non-strict mode, false. This setting overrides the global sql_mode. The default is the value of sql_mode. See Strict SQL Mode. The file common parameter is_strict_mode can override this setting.

          If set to true, then missing files, empty columns, formatting errors or parsing errors throw an error, and loading stops.

          If set to false, then missing files, empty columns, formatting errors or parsing errors display a warning, and loading continues.

          As of MySQL 8.4.0, the dialect parameter is_strict_mode applies to all file formats. Before MySQL 8.4.0, it only applies to the CSV file format. For Avro and Parquet file formats, use the file parameter is_strict_mode to define strict mode before MySQL 8.4.0.

        • allow_missing_files: Whether to allow missing files or not. This overrides the dialect parameter is_strict_mode for missing files. Supported as of MySQL 8.4.0.

          If set to true, then any missing files do not throw an error, and loading does not stop, unless all files are missing.

          If set to false, then any missing files throw an error, and loading stops.

          A missing file is defined as:

          • With the name parameter: there is no file with that name.

          • With the pattern parameter: there are no files that match the pattern.

          • With the prefix parameter: there are no files with that prefix.

      • dialect parameters that only apply to csv and json:

        The use of any of these parameters with avro or parquet will produce an error.

        • encoding: Defines the character encoding. The default is "utf8mb4".

        • record_delimiter: Defines one or more characters used to delimit records. The maximum field delimiter length is 64 characters.

          The default for csv is "|\n".

          The default for json is "\n". The only alternative for json is "\r\n".

      • dialect parameters that only apply to csv:

        The use of any of these parameters with avro, json or parquet will produce an error.

        • field_delimiter: Defines one or more characters used to enclose fields. The maximum field delimiter length is 64 characters. The default is "|".

        • escape_character: Defines one or more characters used to escape special characters. The default is "\\".

        • quotation_marks: Defines one or more characters used to enclose fields. The default is "\"".

        • skip_rows: The number of rows to skip at the start of the file. The maximum value is 20. The default is 0.

          See comments for has_header.

        • date_format: The date format, see: date_format. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.

        • time_format: The time format, see: String and Numeric Literals in Date and Time Context. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.

        • timestampe_format: The timestamp format, see: date_format. Optionally, you can set timestamp formats for each column by using the column ENGINE_ATTRIBUTE option, which overrides the format in the dialect parameter. This format is ignored during auto parallel load. See Section 5.2.4, “Loading Data Using Auto Parallel Load”.

        • trim_spaces: Whether to remove leading and trailing spaces, or not. The default is false.

        • has_header: Whether the CSV file has a header row, or not. The default is false.

          If has_header and skip_rows are both defined, Lakehouse first skips the number of rows, and then uses the next row as the header row.

    • file: Defines the Object Storage files. Required.

      Lakehouse supports a maximum of 256 file locations. To define more than 256, store the files under the same bucket or use prefix or pattern.

      • file parameters for resource principals, see: Section 5.4.2, “Resource Principals”.

        • bucket: The bucket name.

        • namespace: The tenancy namespace

        • region: The region that the tenancy is in.

      • file parameters for pre-authenticated requests, see: Section 5.4.1, “Pre-Authenticated Requests”.

        • par: The PAR URL.

          Do not specify a region, namespace or bucket with par. That information is contained in the PAR URL and will generate an error if defined in separate parameters. See: Section 5.4.1.1, “Recommendations”.

      • file parameters that apply to all file formats:

        • Use one or more of the following parameters, unless the target defines a specific file:

          • name: A specific Object Storage file name.

          • pattern: A regular expression that defines a set of Object Storage files. The pattern follows the modified ECMAScript regular expression grammar, see: Modified ECMAScript regular expression grammar.

          • prefix: The prefix for a set of Object Storage files.

          See: Section 5.2.2.1, “File Name, Pattern and Prefix Examples”.

        • is_strict_mode: Whether the loading takes place in strict mode, true, or non-strict mode, false. This overrides the dialect parameter is_strict_mode.

        • allow_missing_files: Whether to allow missing files or not. This overrides the file parameter is_strict_mode for missing files, and the dialect parameter allow_missing_files. Supported as of MySQL 8.4.0.

          If set to true, then any missing files do not throw an error, and loading does not stop.

          If set to false, then any missing files throw an error, and loading stops.