MySQL HeatWave User Guide  /  ...  /  Lakehouse Auto Parallel Load

4.4.2 Lakehouse Auto Parallel Load

For the full Auto Parallel Load syntax, see: Section 2.2.3, “Loading Data Using Auto Parallel Load”. HeatWave Lakehouse extends Auto Parallel Load with the external_tables option. This is a JSON array that includes one or more db_object:

db_object: {
    "db_name": "name",
    "tables": JSON_ARRAY(table [, table] ...)
}

table: {
    "table_name": "name",
    "sampling": true|false,
    "file": JSON_ARRAY(file_section [, file_section]...), 
    "dialect": {dialect_section},
}
  • db_object: the details of one or more tables. Each db_object contains the following:

    • db_name: name of the database. If the database does not exist, Lakehouse Auto Parallel Load creates it during the load process.

    • tables: a JSON array of table. Each table contains the following:

      • table_name: the name of the table to load.

      • sampling: if set to true, the default setting, Lakehouse Auto Parallel Load infers the schema by sampling the data and collect statistics.

        If set to false, Lakehouse Auto Parallel Load performs a full scan to infer the schema and collect statistics. Depending on the size of the data, this can take a long time.

        Auto Parallel Load uses the inferred schema to generate CREATE TABLE statements. The statistics are used to estimate storage requirements and load times.

      • dialect: details about the file format. See the dialect parameter in Section 4.3.1, “Lakehouse External Table Syntax”.

      • file: the location of the data in Object Storage. This can use a pre-authenticated request or a resource principal, and can be a path to a file, a file prefix, or a file pattern. See the file parameter in Section 4.3.1, “Lakehouse External Table Syntax”, and see: Section 4.5, “Access Object Storage”.