MySQL HeatWave User Guide  /  ...  /  Lakehouse Auto Parallel Load Schema Inference

4.4.1 Lakehouse Auto Parallel Load Schema Inference

Lakehouse Auto Parallel Load includes schema inference, and uses it in one of two ways:

  • Lakehouse Auto Parallel Load analyzes the data, infers the table structure, and creates the database and all tables. This only requires the name of the database, the names of each table, the external file parameters, and then Lakehouse Auto Parallel Load generates the CREATE DATABASE and CREATE TABLE statements. For example, see: Section 4.4.3.1, “Load Configuration”.

    Lakehouse Auto Parallel Load uses header information from the external files to define the column names. If this is not available, Lakehouse Auto Parallel Load defines the column names sequentially: col_1, col_2, col_3 ...

  • As of MySQL 8.3.0, if the tables are already defined, Lakehouse Auto Parallel Load analyzes the data, infers the table structure, and then modifies the structure to avoid errors during data load. For example, if a table defines a column with TINYINT, but Lakehouse Auto Parallel Load infers that the data requires SMALLINT MEDIUMINT, INT, or BIGINT, then Lakehouse Auto Parallel Load will modify the structure accordingly. If the inferred data type is incompatible with the table definition, Lakehouse Auto Parallel Load raises an error, and specifies the column as NOT SECONDARY.