MySQL 9.0.0 adds support for Lakehouse Incremental Load with the
refresh_external_tables
option, see:
Section 5.2.6, “Lakehouse Incremental Load”.
mysql> CALL sys.heatwave_load (input_list,[options]);
input_list: {
JSON_ARRAY(input [,input] ...)
}
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
["mode",{"normal"|"dryrun"|"validation"}]
["output",{"normal"|"compact"|"silent"|"help"}]
["sql_mode","sql_mode"]
["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
["set_load_parallelism",{true|false}]
["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
["refresh_external_tables",{true|false}]
}
}
input: {
'db_name' | db_object
}
db_object: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"db_name": "db_name",
["tables": JSON_ARRAY(table [, table] ...)]
["exclude_tables": JSON_ARRAY(table [, table] ...)]
}
}
table: {
'table_name' | table_object
}
table_object: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"table_name": "table_name",
['engine_attribute': engine_attribute_object],
['columns': JSON_ARRAY('column_name' [, 'column_name', ...])],
['exclude_columns': JSON_ARRAY('column_name' [, 'column_name', ...])]
}
}
engine_attribute_object: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"sampling": true|false,
"dialect": {dialect_section},
"file": JSON_ARRAY(file_section [, file_section]...)
}
}
MySQL 8.4.0 adds support for the following:
-
An
input_list
JSON array replaces thedb_list
JSON array. This adds an include list to exactly specify the tables and columns to load for a set of queries. It is no longer necessary to include a complete schema, and exclude unnecessary tables and columns.input_list
is backwards compatible withdb_list
. A
validation
mode for external files.
mysql> CALL sys.heatwave_load (input_list,[options]);
input_list: {
JSON_ARRAY(input [,input] ...)
}
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
["mode",{"normal"|"dryrun"|"validation"}]
["output",{"normal"|"compact"|"silent"|"help"}]
["sql_mode","sql_mode"]
["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
["set_load_parallelism",{true|false}]
["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
}
}
input: {
'db_name' | db_object
}
db_object: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"db_name": "db_name",
["tables": JSON_ARRAY(table [, table] ...)]
["exclude_tables": JSON_ARRAY(table [, table] ...)]
}
}
table: {
'table_name' | table_object
}
table_object: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"table_name": "table_name",
['engine_attribute': engine_attribute_object],
['columns': JSON_ARRAY('column_name' [, 'column_name', ...])],
['exclude_columns': JSON_ARRAY('column_name' [, 'column_name', ...])]
}
}
engine_attribute_object: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"sampling": true|false,
"dialect": {dialect_section},
"file": JSON_ARRAY(file_section [, file_section]...)
}
}
Use input_list
to define what to load.
input_list
is a JSON array and requires
one or more valid input
which can be
either a valid schema name or a
db_object
. An empty array is permitted to
view the Auto Parallel Load command-line help, see
Section 2.2.3.5, “Auto Parallel Load Command-Line Help”. This is
backwards compatible with db_list
.
Use key-value pairs in JSON
format to
specify parameters. HeatWave uses the default setting if
there is no option setting. Use NULL
to
specify no arguments.
For syntax examples, see Section 2.2.3.6, “Auto Parallel Load Examples”.
Auto Parallel Load options
is a JSON object literal
that includes:
-
mode
: Defines the Auto Parallel Load operational mode. Permitted values are:normal
: The default. Generates and executes the load script.dryrun
: Generates a load script only. Auto Parallel Load executes indryrun
mode automatically if the HeatWave Cluster is not active.-
validation
: Only use with Lakehouse.validation
performs the same checks asdryrun
and also validates external files before loading. It follows all theoptions
and the load configuration, for example column information,sql_mode
,is_strict_mode
andallow_missing_files
, but does not load any tables. It uses schema inference and might modify the schema, see: Section 5.2.4.1, “Lakehouse Auto Parallel Load Schema Inference”.validation
is faster than a full load, particularly for large tables. The memory requirement is similar to running a full load.Notevalidation
requires created tables.
-
output
: Defines how Auto Parallel Load produces output. Permitted values are:normal
: The default. Produces summarized output and sends it tostdout
and to theheatwave_autopilot_report
table. See Section 2.7.9, “Autopilot Report Table”.silent
: Sends output to theheatwave_autopilot_report
table only. See Section 2.7.9, “Autopilot Report Table”. Thesilent
output type is useful if human-readable output is not required; when the output is consumed by a script, for example. For an example of a stored procedure with an Auto Parallel Load call that uses thesilent
output type, see Section 2.2.3.6, “Auto Parallel Load Examples”.compact
: Produces compact output.help
: Displays Auto Parallel Load command-line help. See Section 2.2.3.5, “Auto Parallel Load Command-Line Help”.
-
sql_mode
: Defines the SQL mode used while loading tables. Auto Parallel Load does not support the MySQL global or sessionsql_mode
variable. To run Auto Parallel Load with a non-oci-default SQL mode configuration, specify the configuration using the Auto Parallel Loadsql_mode
option as a string value. If no SQL modes are specified, the default OCI SQL mode configuration is used.For information about SQL modes, see Server SQL Modes.
-
policy
: Defines the policy for handling of tables containing columns with unsupported data types. Permitted values are:-
disable_unsupported_columns
: The default. Disable columns with unsupported data types and include the table in the load script. Columns that are explicitly pre-defined asNOT SECONDARY
are ignored (they are neither disabled or enabled).Auto Parallel Load does not generate statements to disable columns that are explicitly defined as
NOT SECONDARY
. -
not_disable_unsupported_columns
: Exclude the table from the load script if the table contains a column with an unsupported data type.A column with an unsupported data type that is explicitly defined as a
NOT SECONDARY
column does not cause the table to be excluded. For information about defining columns asNOT SECONDARY
, see Section 2.2.2.1, “Excluding Table Columns”.
-
set_load_parallelism
: Enabled by default. Optimizes load parallelism based on machine-learning models by optimizing theinnodb_parallel_read_threads
variable setting before loading each table.-
auto_enc
: Checks if there is enough memory for string column encoding. Settings include:-
mode
: Defines theauto_enc
operational mode. Permitted values are:off
: Disables theauto_enc
option. No memory checks are performed.check
: The default. Checks if there is enough memory on the MySQL node for dictionary-encoded columns and if there is enough root heap memory for variable-length column encoding overhead. Dictionary-encoded columns require memory on the MySQL node for dictionaries. For each loaded table, 64KB of memory, the default heap segment size, must be allocated from the root heap for variable-length column encoding overhead. If there is not enough memory, Auto Parallel Load executes indryrun
mode and prints a warning about insufficient memory. Theauto_enc
option runscheck
mode if it is not specified explicitly and set tooff
. For more information, see Section 2.2.3.4, “Memory Estimation for String Column Encoding”.
-
-
refresh_external_tables
: Only use with Lakehouse. Set totrue
to refresh external tables. See: Section 5.2.6, “Lakehouse Incremental Load”.Set to
false
, the default setting, to only load new tables, unloaded tables, and not refresh external tables.This option only works with
mode
set tonormal
. All othermode
settings ignore this option.
The db_object
is a JSON object literal
that includes:
db_name
: The name of the database to load.-
Use one or other of the following, but not both. The use of both parameters will throw an error.
tables
: An optional JSON array oftable
to include in the load.exclude_tables
: As of MySQL 8.4.0, an optional JSON array oftable
to exclude from the load.
table
: Either a valid table name or atable_object
.-
As of MySQL 8.4.0,
table_object
is a JSON object literal that includes:table_name
: The name of the table to load.-
engine_attribute
: A JSON object literal that includes:-
sampling
: Only use with Lakehouse. If set totrue
, the default setting, Lakehouse Auto Parallel Load samples the data to infer the schema and collect statistics.If set to
false
, Lakehouse Auto Parallel Load performs a full scan to infer the schema and collect statistics. Depending on the size of the data, this can take a long time.Auto Parallel Load uses the inferred schema to generate
CREATE TABLE
statements. The statistics are used to estimate storage requirements and load times. See: Section 5.2.4.1, “Lakehouse Auto Parallel Load Schema Inference”. For
dialect
andfile
, see: Section 5.2.2, “Lakehouse External Table Syntax”.
-
-
Use one or other of the following, but not both. The use of both parameters will throw an error.
columns
: An optional JSON array ofcolumn_name
to include in the load.exclude_columns
: An optional JSON array ofcolumn_name
to exclude from the load.
Before MySQL 8.4.0, the following syntax will be deprecated in a future release.
mysql> CALL sys.heatwave_load (db_list,[options]);
db_list: {
JSON_ARRAY(["schema_name","schema_name"] ...)
}
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
["mode",{"normal"|"dryrun"}]
["output",{"normal"|"compact"|"silent"|"help"}]
["sql_mode","sql_mode"]
["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
["exclude_list",JSON_ARRAY(schema_name_1, schema_name_2.table_name_1, schema_name_3.table_name_2.column_name_1, ...)]
["set_load_parallelism",{true|false}]
["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
["external_tables",JSON_ARRAY(db_object [, db_object]... )]
}
}
db_object: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"db_name": "name",
"tables": JSON_ARRAY(table [, table] ...)
}
}
table: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"table_name": "name",
"sampling": true|false,
"dialect": {dialect_section},
"file": JSON_ARRAY(file_section [, file_section]...)
}
}
db_list
specifies the schemas to load.
The list is a JSON array and requires one or more valid
schema names. An empty array is permitted to view the
Auto Parallel Load command-line help.
Auto Parallel Load options
is a JSON object literal.
The following options will be deprecated in a future
release:
-
exclude_list
: Defines a list of schemas, tables, and columns to exclude from the load script. Names must be fully qualified without backticks.Use
db_object
withtables
,exclude_tables
,columns
orexclude_columns
instead.Auto Parallel Load automatically excludes database objects that cannot be offloaded, according to the default
policy
setting. These objects need not be specified explicitly in the exclude list. System schemas, non-InnoDB
tables, tables that are already loaded in HeatWave, and columns explicitly defined asNOT SECONDARY
are automatically excluded. -
external_tables
: non-InnoDB tables which do not store any data, but refer to data stored externally. For theexternal_tables
syntax, see: Section 5.2.4.2, “Lakehouse Auto Parallel Load with the external_tables Option”.Use
db_object
withtables
orexclude_tables
instead.