Related Documentation Download this Manual
PDF (US Ltr) - 0.6Mb
PDF (A4) - 0.6Mb


HeatWave User Guide  /  Loading Data  /  Auto Parallel Load

4.1 Auto Parallel Load

Loading data into HeatWave involves several manual steps. The time required to perform these steps typically depends on the number of schemas, tables, and columns. Auto Parallel Load facilitates the process by automating many of the steps involved, including:

  • Excluding schemas, tables, and columns that cannot be loaded.

  • Generating a load script with DDL statements for preparing and loading each table.

  • Verifying that there is sufficient memory available for the data.

  • Applying options and settings affecting the data load process.

  • Optimizing load parallelism based on machine-learning models.

  • Loading data into HeatWave by executing the generated load script.

Auto Parallel Load is implemented as a stored procedure named heatwave_load, which resides in the MySQL sys schema. Running Auto Parallel Load involves issuing a CALL statement for the stored procedure, which takes a list of schemas and options as arguments.

CALL sys.heatwave_load (db_list,[options]);

You can write an Auto Parallel Load CALL statement manually using the syntax described below, or use the Auto Parallel Load load command that is generated when performing a node count estimate in the OCI console. See Generating a Node Count Estimate in the MySQL DB System User Guide.

Auto Parallel Load can be run from any MySQL client or connector.

Auto Parallel Load is described under the following topics in this section:

Auto Parallel Load Syntax

CALL sys.heatwave_load (db_list,[options]);

db_list: {
     JSON_ARRAY({""|"schema_name"[,"schema_name"] ...)
}

options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
          "key","value": {
          ["mode",{"normal"|"dryrun"}]
          ["output",{"normal"|"compact"|"silent"|"help"}]
          ["sql_mode","sql_mode"]
          ["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
          ["exclude_list",JSON_ARRAY("db_object"[,"db_object"] ...)]
          ["set_load_parallelism",{TRUE|FALSE}]
          ["auto_enc",JSON_OBJECT("mode",{"off"|"check"}]
}

db_list specifies the schemas to load. The list is specified as a JSON_ARRAY. Specifying an empty array is permitted for viewing the Auto Parallel Load command-line help (see Auto Parallel Load Command-Line Help). Otherwise, one or more valid schema names are required.

options are specified as key-value pairs in JSON object format. If an option is not specified, the default setting is used. If no options are specified, NULL can be specified in place of the option's argument.

For syntax examples, see Auto Parallel Load Examples.

Auto Parallel Load options include:

  • mode: Defines the Auto Parallel Load operational mode. Permitted values are:

    • normal: The default. Generates and executes the load script.

    • dryrun: Generates a load script only. Auto Parallel Load executes in dryrun mode automatically if the HeatWave cluster is not active.

  • output: Defines how Auto Parallel Load produces output. Permitted values are:

  • sql_mode: Defines the SQL mode used while loading tables. Auto Parallel Load does not support the MySQL global or session sql_mode variable. To run Auto Parallel Load with a non-oci-default SQL mode configuration, specify the configuration using the Auto Parallel Load sql_mode option as a string value. If no SQL modes are specified, the default OCI SQL mode configuration is used.

    For information about SQL modes, see Server SQL Modes.

  • policy: Defines the policy for handling of tables containing columns with unsupported data types. Permitted values are:

    • disable_unsupported_columns: The default. Disable columns with unsupported data types and include the table in the load script. Columns that are explicitly pre-defined as NOT SECONDARY are ignored (they are neither disabled or enabled).

      Auto Parallel Load does not generate statements to disable columns that are explicitly defined as NOT SECONDARY.

    • not_disable_unsupported_columns: Exclude the table from the load script if the table contains a column with an unsupported data type.

      A column with an unsupported data type that is explicitly defined as a NOT SECONDARY column does not cause the table to be excluded. For information about defining columns as NOT SECONDARY, see Section 3.2, “Excluding Table Columns”.

  • exclude_list: Defines a list of database objects (schemas, tables, and columns) to exclude from the load script. Names must be fully qualified without backticks, as in the following example:

    CALL sys.heatwave_load(JSON_ARRAY("db0","db1","db2","db3"), 
    JSON_OBJECT("exclude_list", JSON_ARRAY("db0.t1","db0.t2", "db0.t3.c1")));

    Auto Parallel Load automatically excludes database objects that cannot be offloaded (according to the default policy setting). These objects need not be specified explicitly in the exclude list. System schemas, non-InnoDB tables, tables that are already loaded in HeatWave, and columns explicitly defined as NOT SECONDARY are excluded automatically.

  • set_load_parallelism: Enabled by default. Optimizes load parallelism based on machine-learning models by optimizing the innodb_parallel_read_threads variable setting before loading each table.

  • auto_enc: Checks if there is enough memory on the MySQL node for dictionary-encoded columns. Settings include:

    • mode: Defines the auto_enc operational mode. Permitted values are:

      • off: Disables the auto_enc option.

      • check: The default. Checks if there is enough memory on the MySQL node for dictionary-encoded columns. Dictionary-encoded columns require memory on the MySQL node for dictionaries. If there is not enough memory, Auto Parallel Load executes in "dryrun" mode and prints a warning about insufficient memory. The auto_enc option runs check mode if it is not specified explicitly and set to off. For more information, see Dictionary Size Estimation.

Auto Parallel Load Requirements

  • The user must have the following MySQL privileges:

  • To run Auto Parallel Load in "normal" mode, the HeatWave cluster must be active.

Running Auto Parallel Load

Run Auto Parallel Load in "dryrun" mode first to check for errors and warnings and to inspect the generated load script. A simple Auto Parallel Load call in "dryrun" mode that specifies a list of schemas ("db0","db1","db2","db3") is written as follows:

CALL sys.heatwave_load(JSON_ARRAY("db0","db1","db2","db3"), 
JSON_OBJECT("mode","dryrun"));

An Auto Parallel Load call in "dryrun" mode with additional options specified appears as follows:

CALL sys.heatwave_load(JSON_ARRAY("db0","db1","db2","db3"),
JSON_OBJECT("mode","dryrun", "exclude_list", JSON_ARRAY("db0.t1","db0.t2", "db0.t3"),
"policy","disable_unsupported_columns",
"auto_enc",JSON_OBJECT("mode","check")));

For information about Auto Parallel Load options, see Auto Parallel Load Syntax.

In "dryrun" mode, Auto Parallel Load sends the load script to the heatwave_load_report table only. It does not load data into HeatWave.

If Auto Parallel Load fails with an error, inspect the errors by querying the heatwave_load_report table:

SELECT log FROM sys.heatwave_load_report WHERE type="error";

When Auto Parallel Load finishes running, use the following query to check for warnings:

SELECT log FROM sys.heatwave_load_report WHERE type="warn";

Issue the following query to inspect the load script that was generated:

SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_load_report 
WHERE type = "sql" ORDER BY id;

Once you are satisfied with the Auto Parallel Load CALL statement and the generated load script, reissue the CALL statement in "normal" mode to load the data into HeatWave. For example:

CALL sys.heatwave_load(JSON_ARRAY("db0","db1","db2","db3"),
JSON_OBJECT("mode","normal","exclude_list",JSON_ARRAY("db0.t1","db0.t2", "db0.t3"),
"policy","disable_unsupported_columns",
"auto_enc",JSON_OBJECT("mode","check")));
Note

You can retrieve DDL statements in a table or use the following statements to produce a list of DDL statements that you can easily copy and paste:

SET SESSION group_concat_max_len = 1000000;
SELECT GROUP_CONCAT(log->>"$.sql" SEPARATOR ' ') FROM sys.heatwave_load_report
  WHERE type = "sql" ORDER BY id;

The time required to load data depends on the data size. Auto Parallel Load provides an estimate of the time required to complete the load operation.

Tables are loaded in sequence, ordered by schema and table name. Load-time errors are reported as they are encountered. If an error is encountered while loading a table, the operation is not terminated. Auto Parallel Load continues running, moving on to the next table.

When Auto Parallel Load finishes running, it checks if tables are loaded and shows a summary with the number of tables that were loaded and the number of tables that failed to load.

Dictionary Size Estimation

The auto_enc option is run in check mode by default to ensure that there is enough memory on the MySQL node for dictionary-encoded string columns.

The following example uses the auto_enc option in check mode, which is useful if you have dictionary-encoded columns and want to ensure that there is enough memory on the MySQL node for the associated dictionaries before attempting a load operation. Insufficient memory on the MySQL node can cause a load failure.

CALL sys.heatwave_load(JSON_ARRAY("tpch"), 
  JSON_OBJECT("mode","dryrun","auto_enc",JSON_OBJECT("mode","check")));
Note

The auto_enc option runs in check mode regardless of whether it is specified explicitly in the Auto Parallel Load call statement.

Look for capacity estimation data in the Auto Parallel Load output. The results indicate whether there is sufficient memory to load all tables.

The Auto Parallel Load Report Table

When Auto Parallel Load is run, output including Auto Parallel Load execution logs and the generated load script is sent to the heatwave_load_report table in the sys schema.

The heatwave_load_report table is a temporary table. It contains data from the last execution of Auto Parallel Load. Data is only available for the current session and is lost when the session terminates or when the server is shut down.

Auto Parallel Load Report Table Query Examples

The heatwave_load_report table can be queried after running Auto Parallel Load, as in the following examples:

  • View error information in case Auto Parallel Load stops unexpectedly:

    SELECT log FROM sys.heatwave_load_report WHERE type="error";
  • View warnings to find out why tables cannot be loaded:

    SELECT log FROM sys.heatwave_load_report WHERE type="warn";
  • View the generated load script to see commands that would be executed by Auto Parallel Load in "normal" mode:

    SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_load_report 
    WHERE type = "sql" ORDER BY id;
  • View the number of load commands generated:

    SELECT Count(*) AS "Total Load Commands Generated" FROM sys.heatwave_load_report 
    WHERE type = "sql" ORDER BY id;
  • View load script data for a particular table:

    SELECT log->>"$.sql" FROM sys.heatwave_load_report  
    WHERE type="sql" AND log->>"$.schema_name" = "db0" AND log->>"$.table_name" = "tbl" 
    ORDER BY id;
  • Concatenate Auto Parallel Load generated DDL statements into a single string that can be copied and pasted for execution. The group_concat_max_len variable sets the result length in bytes for the GROUP_CONCAT() function to accommodate a potentially long string. (The default group_concat_max_len setting is 1024 bytes.)

    SET SESSION group_concat_max_len = 1000000;
    SELECT GROUP_CONCAT(log->>"$.sql" SEPARATOR ' ') FROM sys.heatwave_load_report 
      WHERE type = "sql" ORDER BY id;

Auto Parallel Load Command-Line Help

To view Auto Parallel Load command-line help, issue the following statement:

CALL sys.heatwave_load(JSON_ARRAY(""),JSON_OBJECT("output","help"));

The command-line help provides usage documentation for the Auto Parallel Load utility.

Auto Parallel Load Examples

  • Load the tables belonging to a single schema. No options are specified, which means that the default options are used.

    CALL sys.heatwave_load(JSON_ARRAY("db0"),NULL);
  • Run Auto Parallel Load in dryrun mode to determine if there are any warnings.

    CALL sys.heatwave_load(JSON_ARRAY("tpch"), JSON_OBJECT("mode","dryrun"));
  • Load the tables belonging to multiple schemas. No options are specified, which means that the default options are used.

    CALL sys.heatwave_load(JSON_ARRAY("db0","db1","db2","db3"),NULL);
  • Load the tables belonging to multiple schemas. The not_disable_unsupported_columns policy causes tables with unsupported columns to be excluded from the load operation. Unsupported columns are those with unsupported data types.

    CALL sys.heatwave_load(JSON_ARRAY("db0","db1","db2","db3"), 
    JSON_OBJECT("policy","not_disable_unsupported_columns"));
  • Load the tables belonging to multiple schemas, excluding specified tables and a particular column:

    CALL sys.heatwave_load(JSON_ARRAY("db0","db1","db2","db3"), 
    JSON_OBJECT("exclude_list",JSON_ARRAY("db0.t1","db0.t2","db0.t3.c1")));
  • Load tables that begin with an hw prefix from a schema named schema_customer_1.

    SET @exc_list = (SELECT JSON_OBJECT('exclude_list',
                     JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name))) 
                     FROM information_schema.tables 
                     WHERE table_schema = 'schema_customer_1' 
                     AND table_name NOT LIKE 'hw%');
    CALL sys.heatwave_load(JSON_ARRAY('schema_customer_1'), @exc_list);
  • Load all schemas with tables that start with an hw prefix.

    SET @db_list = (SELECT json_arrayagg(schema_name) FROM information_schema.schemata);
    SET @exc_list = (SELECT JSON_OBJECT('exclude_list',
                     JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name))) 
                     FROM information_schema.tables 
                     WHERE table_schema NOT IN
                     ('mysql','information_schema', 'performance_schema','sys') 
                     AND table_name NOT LIKE 'hw%');
    CALL sys.heatwave_load(@db_list, @exc_list);

    You can check db_list and exc_list using SELECT JSON_PRETTY(@db_list); and SELECT JSON_PRETTY(@exc_list);

  • Call Auto Parallel Load from a stored procedure:

    DROP PROCEDURE IF EXISTS auto_load_wrapper;
    DELIMITER //
    CREATE PROCEDURE auto_load_wrapper()
    BEGIN
      -- AUTOMATED INPUT
      SET @db_list = (SELECT JSON_ARRAYAGG(schema_name) FROM information_schema.schemata);
      SET @exc_list = (SELECT JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name)) 
      FROM information_schema.tables WHERE table_schema = "db0");
    
      CALL sys.heatwave_load(@db_list, JSON_OBJECT("output","silent","exclude_list", 
      CAST(@exc_list AS JSON)));
    
      -- CUSTOM OUTPUT
      SELECT log as 'Unsupported objects' FROM sys.heatwave_load_report WHERE type="warn" 
      AND stage="VERIFICATION" and log like "%Unsupported%";
      SELECT Count(*) AS "Total Load commands Generated" 
      FROM sys.heatwave_load_report WHERE type = "sql" ORDER BY id;
    
    END //
    DELIMITER ;
    
    CALL auto_load_wrapper();