MySQL HeatWave User Guide  /  ...  /  Lakehouse External Table Syntax

4.3.1 Lakehouse External Table Syntax

External tables are used by the load procedure to load the data from Object Storage.

mysql> CREATE TABLE table_name create_definition ENGINE=lakehouse SECONDARY_ENGINE=rapid 
ENGINE_ATTRIBUTE='{                    
  "dialect": 
  {
    "format": "avro" | "csv" | "parquet",
    "field_delimiter": "|", 
    "record_delimiter": "|\n", 
    "escape_character": "\\", 
    "quotation_marks":"\"", 
    "skip_rows": 0,  
    "encoding": "utf8mb4", 
    "date_format": "auto", 
    "time_format": "auto", 
    "trim_spaces": false,
    "is_strict_mode": true | false,
    "has_header": false
  },
  "file":
  [
    file_section [, file_section, file_section, ...]
  ]
}';

file_section: {
  "bucket": "bucket_name",
  "namespace": "namespace",
  "region": "region",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false
}

or

file_section: {
  "par": "PAR URL",
  ("prefix": "prefix") | ("name": "filename")| ("pattern" : "pattern"),
  "is_strict_mode": true | false
}
  • ENGINE: Set to lakehouse.

  • SECONDARY_ENGINE: Set to rapid.

  • ENGINE_ATTRIBUTE: JSON object. Defines the location of files, the file format, and how the file format is handled.

    • dialect: Defines the data configuration. Optional.

      • dialect common parameters:

        • format: avro, csv or parquet. The default is csv.

          It is not possible to define multiple formats per table.

          For Avro and Parquet file formats, use the file common parameter is_strict_mode to define strict mode.

      • dialect parameters that apply to csv only.

        The use of any of these parameters with avro or parquet will produce an error.

        • field_delimiter: Default "|". Defines one or more characters used to enclose fields. Maximum field delimiter length is 64 characters.

        • record_delimiter: Default "|\n". Defines one or more characters used to delimit records. Maximum field delimiter length is 64 characters.

        • escape_character: Default "\\". Defines one or more characters used to escape special characters.

        • quotation_marks: Default "\"". Defines one or more characters used to enclose fields.

        • skip_rows: Default 0. Defines the number of rows to skip at the start of the file. The maximum value is 20.

          See comments for has_header.

        • encoding: Default "utf8mb4". Defines the character encoding.

        • date_format: Default "auto". See date_format for information on the supported formats.

        • time_format: Default "auto". For more information, see String and Numeric Literals in Date and Time Context.

        • trim_spaces: true | false, default. If true, leading and trailing spaces are removed.

        • is_strict_mode: whether the loading will take place in strict mode, true, or non-strict mode, false. See the file common parameter is_strict_mode.

        • has_header: Default false. Defines whether the CSV file has a header row.

          If has_header and skip_rows are both defined, HeatWave first skips the number of rows, and then uses the next row as the header row.

    • file: Defines the Object Storage files. Required.

      HeatWave supports a maximum of 256 file locations. To define more than 256, store the files under the same bucket or use prefix or pattern.

      • file parameters for resource principals, see: Section 4.5.2, “Resource Principals”.

        • bucket: The bucket name.

        • namespace: The tenancy namespace

        • region: The region that the tenancy is in.

      • file parameters for pre-authenticated requests, see: Section 4.5.1, “Pre-Authenticated Requests”.

        • par: The PAR URL.

          If the PAR is defined on a bucket, use Enable Object Listing when creating the PAR in the console.

          Do not specify a region, namespace or bucket with par. That information is contained in the PAR URL and will generate an error if defined in separate parameters.

      • file common parameters.

        • Use one or more of the following parameters, required.

          • prefix: The prefix for a set of Object Storage files.

          • name: A specific Object Storage file name.

          • pattern: A regular expression that defines a set of Object Storage files. The pattern follows the modified ECMAScript regular expression grammar, see: Modified ECMAScript regular expression grammar

        • is_strict_mode: Whether the loading will take place in strict mode, true, or non-strict mode, false, by overriding global sql_mode. This parameter overrides is_strict_mode defined in the dialect array for CSV files. The default is the value of sql_mode. See Strict SQL Mode.

          If set to true, an error is thrown, and loading stops, if parsing or formatting errors are found in the file, and for empty columns.

          If set to false, a warning is displayed, but loading does not stop.