Related Documentation Download this Manual
PDF (US Ltr) - 0.6Mb
PDF (A4) - 0.6Mb


HeatWave User Guide  /  Best Practices

Chapter 9 Best Practices

HeatWave best practices are described under the following topics in this section:

Provisioning

To determine the appropriate HeatWave cluster size for a workload, generate a node count estimate in the console. Node count estimates are generated by the HeatWave Auto Provisioning feature, which uses machine learning models to predict the number of required nodes based on node shape and data sampling. For instructions, see Generating a Node Count Estimate, in the MySQL Database Service User Guide.

Generate a node count estimate:

  • When adding a HeatWave cluster to a DB System, to determine the number of nodes required for the data you intend to load.

  • Periodically, to ensure that you have an appropriate number of HeatWave nodes for your data. Over time, data size may increase or decrease, so it is important to monitor the size of your data by performing node count estimates.

  • When encountering out-of-memory errors while running queries. In this case, the HeatWave cluster may not have sufficient memory capacity.

  • When the data growth rate is high.

  • When the transaction rate (the rate of updates and inserts) is high.

Importing Data into the MySQL DB System

MySQL Shell is the recommended utility for importing data into the MySQL DB System. MySQL Shell dump and load utilities are purpose-built for use with MySQL Database Service; useful for all types of exports and imports. MySQL Shell supports export to, and import from, Object Storage. The minimum supported source version of MySQL is 5.7.9. For more information, see Importing and Exporting Databases, in the MySQL Database Service User Guide.

Inbound Replication

For an OLTP workload that resides in an on-premise instance of MySQL Server, inbound replication is recommended for replicating data to the MySQL DB System for offload to the HeatWave cluster. For more information, see Replication, in the MySQL Database Service User Guide.

Preparing Data

The following practices are recommended when preparing data for loading into HeatWave:

  • Instead of preparing and loading tables into HeatWave manually, consider using the Auto Parallel Load utility. See Section 4.1, “Auto Parallel Load”.

    Tip

    An Auto Parallel Load command is generated in the console when performing a node count estimate. For more information, see Generating a Node Count Estimate.

  • To minimize the number of HeatWave nodes required for your data, exclude table columns that are not accessed by your queries. For information about excluding columns, see Section 3.2, “Excluding Table Columns”.

  • To save space in memory, set CHAR, VARCHAR, and TEXT-type column lengths to the minimum length required for the longest string value.

  • Where appropriate, apply dictionary encoding to CHAR, VARCHAR, and TEXT-type columns. Dictionary encoding reduces memory consumption on the HeatWave cluster nodes. Use the following criteria when selecting string columns for dictionary encoding:

    1. The column is not used as a key in JOIN queries.

    2. Your queries do not perform operations such as LIKE, SUBSTR, CONCAT, etc., on the column. Variable-length encoding supports string functions and operators and LIKE predicates; dictionary encoding does not.

    3. The column has a limited number of distinct values. Dictionary encoding is best suited to columns with a limited number of distinct values, such as country columns.

    4. The column is expected to have few new values added during change propagation. Avoid dictionary encoding for columns with a high number of inserts and updates. Adding a significant number of a new, unique values to a dictionary encoded column can cause a change propagation failure.

    The following columns from the TPC Benchmark™ H (TPC-H) provide examples of string columns that are suitable and unsuitable for dictionary encoding:

    • ORDERS.O_ORDERPRIORITY

      This column is used only in range queries. The values associated with column are limited. During updates, it is unlikely for a significant number of new, unique values to be added. These characteristics make the column suitable for dictionary encoding.

    • LINEITEM.L_COMMENT

      This column is not used in joins or other complex expressions, but as a comment field, values are expected to be unique, making the column unsuitable for dictionary encoding.

    When in doubt about choosing an encoding type, use variable-length encoding, which is applied by default when tables are loaded into HeatWave, or use the HeatWave Encoding Advisor to obtain encoding recommendations. See Section 8.1, “Auto Encoding”.

  • Data is partitioned by the table's primary key when no data placement keys are defined. Only consider defining data placement keys if partitioning data by the primary key does not provide suitable performance.

    Reserve the use of data placement keys for the most time-consuming queries. In such cases, define data placement keys on:

    • The most frequently used JOIN keys.

    • The keys of the longest running queries.

    Consider using Auto Data Placement for data placement recommendations. See Section 8.2, “Auto Data Placement”.

Loading Data

Instead of preparing and loading tables into HeatWave manually, consider using the Auto Parallel Load utility. See Section 4.1, “Auto Parallel Load”.

The loading of data into HeatWave can be classified into three types: Initial Bulk Load, Incremental Bulk Load, and Change Propagation.

  • Initial Bulk Load: Performed when loading data into HeatWave for the first time, or when reloading data after a failure or intended stoppage. The best time to perform an initial bulk load is during off-peak hours, as bulk load operations can affect OLTP performance on the MySQL DB System.

  • Incremental Bulk Load: Performed when there is a substantial amount of data to load into tables that are already loaded in HeatWave. An incremental bulk load involves these steps:

    1. Performing a SECONDARY_UNLOAD operation to unload a table from HeatWave. See Chapter 5, Unloading Tables.

    2. Importing data into the table on the MySQL DB System node. See Importing and Exporting Databases in the MySQL Database Service User Guide.

    3. Performing a SECONDARY_LOAD operation to reload the table into HeatWave. See Chapter 4, Loading Data.

    Depending on the amount of data, an incremental bulk load may be a more expedient method of loading new data than waiting for change propagation to occur. It also provides greater control over when new data is loaded. As with initial build loads, the best time to perform an incremental bulk load is during off-peak hours, as bulk load operations can affect OLTP performance on the MySQL DB System.

  • Change Propagation: After tables are loaded into HeatWave, data changes are automatically propagated from InnoDB tables on the MySQL DB System to their counterpart tables in HeatWave. See Section 4.2, “Change Propagation”.

Use the following strategies to improve load performance:

  • Increase the number of read threads

    For medium to large tables, increase the number of read threads to 32 by setting the innodb_parallel_read_threads variable on the MySQL DB System.

    mysql> SET SESSION innodb_parallel_read_threads = 32;

    If the MySQL DB System is not busy, you can increase the value to 64.

    Tip

    The Auto Parallel Load utility automatically optimizes the number of parallel read threads for each table. See Section 4.1, “Auto Parallel Load”.

  • Load tables concurrently

    If you have many small and medium tables (less than 20GB in size), load tables from multiple sessions:

    Session 1: 
    mysql> ALTER TABLE supplier SECONDARY_LOAD;
    
    Session 2: 
    mysql> ALTER TABLE parts SECONDARY_LOAD;
    
    Session 3: 
    mysql> ALTER TABLE region SECONDARY_LOAD;
    
    Session 4: 
    mysql> ALTER TABLE partsupp SECONDARY_LOAD;
  • Avoid or reduce conflicting operations

    Data load operations share resources with other OLTP DML and DDL operations on the MySQL DB System. To improve load performance, avoid or reduce conflicting DDL and DML operations. For example, avoid running DDL and large DML operations on the LINEITEM table while executing an ALTER TABLE LINEITEM SECONDARY_LOAD operation.

Auto Encoding and Auto Data Placement

The Advisor utility analyzes your data and HeatWave query history to provide string column encoding and data placement key recommendations. Consider re-running Advisor for updated recommendations when queries change, when data changes significantly, and after reloading modified tables.

In all cases, re-run your queries before running Advisor. See Chapter 8, Workload Optimization using Advisor.

Running Queries

The following practices are recommended when running queries:

  • If a query fails to offload and you cannot identify the reason, enable tracing and query the INFORMATION_SCHEMA.OPTIMIZER_TRACE table to debug the query. See Debugging Queries.

    If the optimizer trace does not return all of the trace information, increase the optimizer trace buffer size. The MISSING_BYTES_BEYOND_MAX_MEM_SIZE column of the INFORMATION_SCHEMA.OPTIMIZER_TRACE table shows how many bytes are missing from a trace. If the column shows a non-zero value, increase the optimizer_trace_max_mem_size setting accordingly. For example:

    SET optimizer_trace_max_mem_size=1000000;
  • If an INFORMATION_SCHEMA.OPTIMIZER_TRACE query trace indicates that a subquery is not yet supported, try unnesting the subquery. For example, the following query contains a subquery and is not offloaded as indicated by the EXPLAIN output, which does not show Using secondary engine.

    mysql> EXPLAIN SELECT COUNT(*) FROM orders o WHERE o_totalprice> (SELECT AVG(o_totalprice) 
           FROM orders WHERE o_custkey=o.o_custkey)\G
    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: o
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 14862970
         filtered: 100.00
            Extra: Using where
    *************************** 2. row ***************************
               id: 2
      select_type: DEPENDENT SUBQUERY
            table: orders
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 14862970
         filtered: 10.00
            Extra: Using where
    2 rows in set, 2 warnings (0.00 sec)

    This query can be rewritten as follows to unnest the subquery so that it can be offloaded.

    mysql> EXPLAIN SELECT COUNT(*) FROM orders o, 
           (SELECT o_custkey, AVG(o_totalprice) a_totalprice 
           FROM orders GROUP BY o_custkey)a 
           WHERE o.o_custkey=a.o_custkey AND o.o_totalprice>a.a_totalprice;
  • By default, SELECT queries are offloaded to HeatWave for execution and fall back to the MySQL DB system if that is not possible. To force a query to execute on HeatWave or fail if that is not possible, set the use_secondary_engine variable to FORCED. In this mode, a SELECT statement returns an error if it cannot be offloaded. The use_secondary_engine variable can be set as shown:

    • Using a SET statement before running queries:

      mysql> SET SESSION use_secondary_engine = FORCED
    • Using a SET_VAR optimizer hint when issuing a query:

      mysql> SELECT /*+ SET_VAR(use_secondary_engine = FORCED) */ ... FROM ...
  • If you encounter out-of-memory errors when running queries:

    1. Avoid or rewrite queries that produce a Cartesian product. In the following query, a JOIN predicated is not defined between the supplier and nation tables, which causes the query to select all rows from both tables:

      mysql> SELECT s_nationkey, s_suppkey, l_comment FROM lineitem, supplier, nation 
             WHERE s_suppkey = l_suppkey LIMIT 10;
      ERROR 3015 (HY000): Out of memory in storage engine 'Failure detected in RAPID; query 
      execution cannot proceed'.

      To avoid the Cartesian product, add a relevant predicate between the supplier and nation tables to filter out rows:

      mysql> SELECT s_nationkey, s_suppkey, l_comment FROM lineitem, supplier, nation 
             WHERE s_nationkey = n_nationkey and s_suppkey = l_suppkey LIMIT 10;
    2. Avoid or rewrite queries that produce a Cartesian product introduced by the MySQL optimizer. Due to lack of quality statistics or non-optimal cost decisions, MySQL optimizer may introduce one or more Cartesian products in a query even if a query has predicates defined among all participating tables. For example:

      mysql> SELECT o_orderkey, c_custkey, l_shipdate, s_nationkey, s_suppkey, l_comment 
             FROM lineitem, supplier, nation, customer, orders 
             WHERE c_custkey = o_custkey AND o_orderkey = l_orderkey
             AND c_nationkey = s_nationkey AND c_nationkey = n_nationkey AND c_custkey < 3000000
             LIMIT 10;
      ERROR 3015 (HY000): Out of memory in storage engine 'Failure detected in RAPID; 
      query execution cannot proceed'.

      The EXPLAIN plan output shows that there is no common predicate between the first two table entries (NATION and SUPPLIER).

      mysql> EXPLAIN SELECT o_orderkey, c_custkey, l_shipdate, s_nationkey, s_suppkey, l_comment 
             FROM lineitem, supplier, nation, customer, orders 
             WHERE c_custkey = o_custkey AND o_orderkey = l_orderkey AND c_nationkey = s_nationkey 
             AND c_nationkey = n_nationkey AND c_custkey < 3000000 LIMIT 10\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: supplier
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 99626
           filtered: 100.00
              Extra: Using secondary engine RAPID
      *************************** 2. row ***************************
                 id: 1
        select_type: SIMPLE
              table: nation
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 25
           filtered: 10.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID
      *************************** 3. row ***************************
                 id: 1
        select_type: SIMPLE
              table: customer
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 1382274
           filtered: 5.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID
      *************************** 4. row ***************************
                 id: 1
        select_type: SIMPLE
              table: orders
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 14862970
           filtered: 10.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID
      *************************** 5. row ***************************
                 id: 1
        select_type: SIMPLE
              table: lineitem
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 56834662
           filtered: 10.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID

      To force a join order so that there are predicates associated with each pair of tables, add a STRAIGHT_JOIN hint. For example:

      mysql> EXPLAIN SELECT o_orderkey, c_custkey, l_shipdate, s_nationkey, s_suppkey, l_comment 
             FROM SUPPLIER STRAIGHT_JOIN CUSTOMER STRAIGHT_JOIN NATION STRAIGHT_JOIN ORDERS 
             STRAIGHT_JOIN LINEITEM WHERE c_custkey = o_custkey and o_orderkey = l_orderkey 
             AND c_nationkey = s_nationkey AND c_nationkey = n_nationkey AND c_custkey < 3000000
             LIMIT 10\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: supplier
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 99626
           filtered: 100.00
              Extra: Using secondary engine RAPID
      *************************** 2. row ***************************
                 id: 1
        select_type: SIMPLE
              table: customer
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 1382274
           filtered: 5.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID
      *************************** 3. row ***************************
                 id: 1
        select_type: SIMPLE
              table: nation
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 25
           filtered: 10.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID
      *************************** 4. row ***************************
                 id: 1
        select_type: SIMPLE
              table: orders
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 14862970
           filtered: 10.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID
      *************************** 5. row ***************************
                 id: 1
        select_type: SIMPLE
              table: lineitem
         partitions: NULL
               type: ALL
      possible_keys: NULL
                key: NULL
            key_len: NULL
                ref: NULL
               rows: 56834662
           filtered: 10.00
              Extra: Using where; Using join buffer (hash join); Using secondary engine RAPID
    3. Avoid or rewrite queries that produce a very large result set. This is a common cause of out of memory errors during query processing. Use aggregation functions, a GROUP BY clause, or a LIMIT clause to reduce the result set size.

    4. Avoid or rewrite queries that produce a very large intermediate result set. In certain cases, large result sets can be avoided by adding a STRAIGHT_JOIN hint, which enforces a join order in a decreasing of selectiveness.

    5. Check the size of your data by performing a node count estimate. If your data has grown substantially, the HeatWave cluster may require additional nodes. For instructions, see Generating a Node Count Estimate, in the MySQL Database Service User Guide.

    6. HeatWave optimizes for network usage by default. Try running the query with the MIN_MEM_CONSUMPTION strategy by setting by setting rapid_execution_strategy to MIN_MEM_CONSUMPTION. The rapid_execution_strategy variable can be set as shown:

      • Using a SET statement before running queries:

        mysql> SET SESSION rapid_execution_strategy = MIN_MEM_CONSUMPTION;
      • Using a SET_VAR optimizer hint when issuing a query:

        mysql> SELECT /*+ SET_VAR(rapid_execution_strategy = MIN_MEM_CONSUMPTION) */ ... FROM ...
      • Unloading tables that are not used. These tables consume memory on HeatWave nodes unnecessarily. See Chapter 5, Unloading Tables.

      • Excluding table columns that are not accessed by your queries. These columns consume memory on HeatWave nodes unnecessarily. This strategy requires reloading data. See Section 3.2, “Excluding Table Columns”.

    7. After running queries, consider using HeatWave Advisor for encoding and data placement recommendations. See Chapter 8, Workload Optimization using Advisor.

Monitoring

The following monitoring practices are recommended:

  • Monitor operating system memory usage. Use the console to set an alarm to notify you when memory usage on HeatWave nodes remains above 450GB for an extended period of time. If memory usage exceeds this threshold, either reduce the size of your data or add nodes to the HeatWave cluster. For information about using metrics, alarms, and notifications, refer to MySQL Database Service Metrics, in the MySQL Database Service User Guide.

  • Monitor change propagation status. If change propagation is interrupted, table data becomes stale and queries that access tables with stale data are not offloaded. For instructions, see Section 4.2, “Change Propagation”.

Reloading Data

Reloading data is recommended in the following cases:

  • After resizing the cluster by adding or removing nodes. Reloading data distributes the data among all nodes of the resized cluster.

  • After a maintenance window. Maintenance involves a restart, which requires that you reload data into HeatWave. Consider setting up a MySQL Database Service event notification or Service Connector Hub notification to let you know when an update has occurred.

  • When the HeatWave cluster is restarted. Data in the HeatWave cluster is lost in this case, requiring reload.