Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.9Mb
PDF (A4) - 3.9Mb


4.3.7 About Lakehouse Auto Parallel Load

Lakehouse Auto Parallel Load, which extends the Auto Parallel Load feature of MySQL HeatWave, facilitates the process of loading data from Object Storage into MySQL HeatWave by automating many of the steps involved, including:

  • Excluding schemas, tables, and columns that Auto Parallel Load cannot load.

  • Verifying that there is sufficient memory available for the data.

  • Optimizing load parallelism based on machine learning models.

  • Loading data into MySQL HeatWave.

  • Defining LAKEHOUSE as the engine for tables that MySQL HeatWave loads.

  • Defining the ENGINE_ATTRIBUTE for tables that MySQL HeatWave loads.

Lakehouse Auto Parallel Load also includes Lakehouse Incremental Load, which can refresh tables after an initial load.

Requirements and Privileges for Auto Parallel Load

To run Auto Parallel Load in normal mode, the MySQL HeatWave Cluster must be active.

To use Auto Parallel Load, ask the admin user to grant you the following privileges. Replace user_name and database_name in the commands with the appropriate user name and database name.

The PROCESS privilege on the appropriate database in the DB System.

mysql> GRANT PROCESS ON database_name.* TO 'user_name'@'%';

The EXECUTE privilege on the sys schema.

mysql> GRANT EXECUTE ON sys.* TO 'user_name'@'%';

The SELECT privilege.

mysql> GRANT SELECT ON performance_schema.* TO 'user_name'@'%';

About Lakehouse Auto Parallel Load Schema Inference

Lakehouse Auto Parallel Load includes schema inference, and uses it in one of two ways:

  • Lakehouse Auto Parallel Load analyzes the data, infers the table structure, and creates the database and all tables. This only requires the name of the database, the names of each table, the external file parameters, and then Lakehouse Auto Parallel Load generates the CREATE DATABASE and CREATE TABLE statements.

    Lakehouse Auto Parallel Load uses header information from the external files to define the column names. If this is not available, Lakehouse Auto Parallel Load defines the column names sequentially: col_1, col_2, col_3 ...

  • If the tables already exist, Lakehouse Auto Parallel Load analyzes the data, infers the table structure, and then modifies the structure to avoid errors during data load. For example, if a table defines a column with TINYINT, but Lakehouse Auto Parallel Load infers that the data requires SMALLINT MEDIUMINT, INT, or BIGINT, then Lakehouse Auto Parallel Load modifies the structure accordingly. If the inferred data type is incompatible with the table definition, Lakehouse Auto Parallel Load raises an error, and specifies the column as NOT SECONDARY. To learn more, see Load External Data Using Lakehouse Auto Parallel Load with an Existing Table.

    Note

    If you are on a version earlier than MySQL 8.4.0, refer to Lakehouse Auto Parallel Load with The external_tables Option for more information on the appropriate syntax to use.

  • As of MySQL 9.5.0, Lakehouse Auto Parallel Load is able to infer the format of temporal data types. To do this, specify the appropriate external table option (EXTERNAL_FORMAT, DATE FORMAT, DATETIME FORMAT, and TIME FORMAT for SQL syntax, and date_format, time_format, and timestamp_format for JSON syntax) and set the option to auto. The auto setting is the default value if no temporal format is specified. To review examples, see Lakehouse External Table SQL Syntax and External Table JSON Syntax.

    Inferring formats of temporal data types has the following requirements and specifications:

    • This is only available for files in CSV format.

    • Only the temporal formats specified at the dialect level are considered during inference. Column level formats, if any, are ignored during inference.

    • To review supported values for format specifiers of all temporal formats, see date_format. Not including at least one supported format specifier generates an error.

    • If you use an ambiguous format to specify temporal data types (for example, date_format as %m:%d:%y and time_format as %h:%i:%s), and the data matches both formats, the following order of precedence is used for inferring the data type: DATE > TIME > TIMESTAMP > DATETIME.

    • Any extra characters in the temporal format that are not format specifiers must match exactly with the corresponding characters in the data for the temporal data type to be recognized. For example, the specified format "timestamp_format": "abc%m:%d:%yabc%h:%i:%s" would match "abc3:21:2022abc8:22:44", but not "3:21:2022 8:22:44".

    • Any setting for the strict mode option (STRICT_LOAD or is_strict_mode) does not affect inference for temporal data types. This also includes MySQL modes such as NO_ZERO_DATE and NO_ZERO_IN_DATE. Invalid and zero dates are still considered as valid temporal values during inference. The impact of the sql_mode is considered during the loading of tables into Lakehouse.

    • If you set a specific format at the column level for a temporal data type when creating a table, and a different format is inferred when loading the table to Lakehouse, the specific format used for the column when creating the table is prioritized and used for loading the column.

Review the following table of supported formats for temporal data types. If multiple formats are valid for the inferred data, the first valid format listed in the following table is used as the inferred temporal format.

Table 4.8 Supported formats for inferred temporal data types

Temporal Data Type Format Pattern Example Values
DATE %Y-%m-%d

2025-08-09

2025-08-9

2025-8-09

DATE %m/%d/%Y

08/09/2025

08/9/2025

8/09/2025

DATE %d/%m/%Y

09/08/2025

09/8/2025

9/08/2025

TIME %H:%i:%s.%f (24-hour format) 22:30:05.100200
TIME %H:%i (24-hour format) 22:30
TIME %I:%i %p (12-hour format) 10:30 PM
TIME %I:%i:%S %p (12-hour format) 10:30:05 PM
TIMESTAMP %Y-%m-%d %H:%i:%s.%f 2025-08-20 14:55:30.123456
TIMESTAMP %Y-%m-%d %H:%i 2025-08-20 14:55
TIMESTAMP %Y-%m-%dT%H:%i:%s.%f 22025-08-20T14:55:01.000000
TIMESTAMP %Y-%m-%dT%H:%i 2025-08-20T14:55