Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 1.8Mb
PDF (A4) - 1.8Mb


2.2.3.2 Auto Parallel Load Syntax

MySQL 8.4.0 adds support for the following:

  • An input_list JSON array replaces the db_list JSON array. This adds an include list to exactly specify the tables and columns to load for a set of queries. It is no longer necessary to include a complete schema, and exclude unnecessary tables and columns.

    input_list is backwards compatible with db_list.

  • A validation mode for external files.

mysql> CALL sys.heatwave_load (input_list,[options]);

input_list: {
     JSON_ARRAY(input [,input] ...)
}

options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          ["mode",{"normal"|"dryrun"|"validation"}]
          ["output",{"normal"|"compact"|"silent"|"help"}]
          ["sql_mode","sql_mode"]
          ["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
          ["set_load_parallelism",{true|false}]
          ["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
          }
}

input: {
     'db_name' | db_object
}
  
db_object: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          "db_name": "db_name",
          ["tables": JSON_ARRAY(table [, table] ...)]
          ["exclude_tables": JSON_ARRAY(table [, table] ...)]
          }
}

table: {
     'table_name' | table_object
}

table_object: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          "table_name": "table_name",
          ['engine_attribute': engine_attribute_object],
          ['columns': JSON_ARRAY('column_name' [, 'column_name', ...]}],
          ['exclude_columns': JSON_ARRAY('column_name' [, 'column_name', ...]}]
          }
}

engine_attribute_object: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          "sampling": true|false,
          "dialect": {dialect_section},
          "file": JSON_ARRAY(file_section [, file_section]...), 
          }
}

My SQL 8.0.33-u3 adds support for HeatWave Lakehouse with external_tables, see: Chapter 5, HeatWave Lakehouse:

mysql> CALL sys.heatwave_load (db_list,[options]);

db_list: {
     JSON_ARRAY(["schema_name","schema_name"] ...)
}

options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          ["mode",{"normal"|"dryrun"}]
          ["output",{"normal"|"compact"|"silent"|"help"}]
          ["sql_mode","sql_mode"]
          ["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
          ["exclude_list",JSON_ARRAY(schema_name_1, schema_name_2.table_name_1, schema_name_3.table_name_2.column_name_1, ...)]
          ["set_load_parallelism",{true|false}]
          ["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
          ["external_tables",JSON_ARRAY(db_object [, db_object]... )]
          }
}
  
db_object: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          "db_name": "name",
          "tables": JSON_ARRAY(table [, table] ...)
          }
}
  
table: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          "table_name": "name",
          "sampling": true|false,
          "dialect": {dialect_section},
          "file": JSON_ARRAY(file_section [, file_section]...), 
          }
}

Before MySQL 8.0.33-u3:

mysql> CALL sys.heatwave_load (db_list,[options]);

db_list: {
     JSON_ARRAY(["schema_name","schema_name"] ...)
}

options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          ["mode",{"normal"|"dryrun"}]
          ["output",{"normal"|"compact"|"silent"|"help"}]
          ["sql_mode","sql_mode"]
          ["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
          ["exclude_list",JSON_ARRAY(schema_name_1, schema_name_2.table_name_1, schema_name_3.table_name_2.column_name_1, ...)]
          ["set_load_parallelism",{true|false}]
          ["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
          }
}

As of MySQL 8.4.0 use input_list to define what to load. input_list is a JSON array and requires one or more valid input which can be either a valid schema name or a db_object. An empty array is permitted to view the Auto Parallel Load command-line help, see Section 2.2.3.5, “Auto Parallel Load Command-Line Help”. This is backwards compatible with db_list.

Before MySQL 8.4.0, db_list specifies the schemas to load. The list is a JSON array and requires one or more valid schema names. An empty array is permitted to view the Auto Parallel Load command-line help.

Use key-value pairs in JSON format to specify parameters. HeatWave uses the default setting if there is no option setting. Use NULL to specify no arguments.

For syntax examples, see Section 2.2.3.6, “Auto Parallel Load Examples”.

Auto Parallel Load options is a JSON object literal that includes:

  • mode: Defines the Auto Parallel Load operational mode. Permitted values are:

    • normal: The default. Generates and executes the load script.

    • dryrun: Generates a load script only. Auto Parallel Load executes in dryrun mode automatically if the HeatWave Cluster is not active.

    • validation: Only use with Lakehouse. validation performs the same checks as dryrun and also validates external files before loading. It follows all the options and the load configuration, for example column information, sql_mode, is_strict_mode and allow_missing_files, but does not load any tables. It uses schema inference and might modify the schema, see: Section 5.2.4.1, “Lakehouse Auto Parallel Load Schema Inference”. validation is faster than a full load, particularly for large tables. The memory requirement is similar to running a full load.

      Note

      validation requires created tables.

  • output: Defines how Auto Parallel Load produces output. Permitted values are:

  • sql_mode: Defines the SQL mode used while loading tables. Auto Parallel Load does not support the MySQL global or session sql_mode variable. To run Auto Parallel Load with a non-oci-default SQL mode configuration, specify the configuration using the Auto Parallel Load sql_mode option as a string value. If no SQL modes are specified, the default OCI SQL mode configuration is used.

    For information about SQL modes, see Server SQL Modes.

  • policy: Defines the policy for handling of tables containing columns with unsupported data types. Permitted values are:

    • disable_unsupported_columns: The default. Disable columns with unsupported data types and include the table in the load script. Columns that are explicitly pre-defined as NOT SECONDARY are ignored (they are neither disabled or enabled).

      Auto Parallel Load does not generate statements to disable columns that are explicitly defined as NOT SECONDARY.

    • not_disable_unsupported_columns: Exclude the table from the load script if the table contains a column with an unsupported data type.

      A column with an unsupported data type that is explicitly defined as a NOT SECONDARY column does not cause the table to be excluded. For information about defining columns as NOT SECONDARY, see Section 2.2.2.1, “Excluding Table Columns”.

  • exclude_list: Defines a list of schemas, tables, and columns to exclude from the load script. Names must be fully qualified without backticks.

    Do not use as of MySQL 8.4.0. Use db_object with tables, exclude_tables, columns or exclude_columns instead. exclude_list will be deprecated in a future release.

    Auto Parallel Load automatically excludes database objects that cannot be offloaded, according to the default policy setting. These objects need not be specified explicitly in the exclude list. System schemas, non-InnoDB tables, tables that are already loaded in HeatWave, and columns explicitly defined as NOT SECONDARY are automatically excluded.

  • set_load_parallelism: Enabled by default. Optimizes load parallelism based on machine-learning models by optimizing the innodb_parallel_read_threads variable setting before loading each table.

  • auto_enc: Checks if there is enough memory for string column encoding. Settings include:

    • mode: Defines the auto_enc operational mode. Permitted values are:

      • off: Disables the auto_enc option. No memory checks are performed.

      • check: The default. Checks if there is enough memory on the MySQL node for dictionary-encoded columns and if there is enough root heap memory for variable-length column encoding overhead. Dictionary-encoded columns require memory on the MySQL node for dictionaries. For each loaded table, 4MB of memory (the default heap segment size) must be allocated from the root heap for variable-length column encoding overhead. As of MySQL 8.0.30, the default heap segment size is reduced from 4MB to a default of 64KB per table. If there is not enough memory, Auto Parallel Load executes in dryrun mode and prints a warning about insufficient memory. The auto_enc option runs check mode if it is not specified explicitly and set to off. For more information, see Section 2.2.3.4, “Memory Estimation for String Column Encoding”.

  • external_tables: non-InnoDB tables which do not store any data, but refer to data stored externally. For the external_tables syntax, see: Section 5.2.4.2, “Lakehouse Auto Parallel Load with the external_tables Option”.

    Do not use as of MySQL 8.4.0. Use db_object with tables or exclude_tables instead. external_tables will be deprecated in a future release.

The db_object is a JSON object literal that includes:

  • db_name: The name of the database to load.

  • Use one or other of the following, but not both. The use of both parameters will throw an error.

    • tables: An optional JSON array of table to include in the load.

    • exclude_tables: As of MySQL 8.4.0, an optional JSON array of table to exclude from the load.

  • table: Either a valid table name or a table_object.

  • As of MySQL 8.4.0, table_object is a JSON object literal that includes:

    • table_name: The name of the table to load.

    • engine_attribute: A JSON object literal that includes:

    • Use one or other of the following, but not both. The use of both parameters will throw an error.

      • columns: An optional JSON array of column_name to include in the load.

      • exclude_columns: An optional JSON array of column_name to exclude from the load.