Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.1Mb
PDF (A4) - 2.1Mb


HeatWave User Guide  /  ...  /  Preparing Data

2.9.1 Preparing Data

The following practices are recommended when preparing data for loading into HeatWave:

  • Instead of preparing and loading tables into HeatWave manually, consider using the Auto Parallel Load utility. See Section 2.2.3, “Loading Data Using Auto Parallel Load”.

  • To minimize the number of HeatWave nodes required for your data, exclude table columns that are not accessed by your queries. For information about excluding columns, see Section 2.2.2.1, “Excluding Table Columns”.

  • To save space in memory, set CHAR, VARCHAR, and TEXT type column lengths to the minimum length required for the longest string value.

  • Where appropriate, apply dictionary encoding to CHAR, VARCHAR, and TEXT type columns. Dictionary encoding reduces memory consumption on the HeatWave Cluster nodes. Use the following criteria when selecting string columns for dictionary encoding:

    1. The column is not used as a key in JOIN queries.

    2. Your queries do not perform operations such as LIKE, SUBSTR, CONCAT, etc., on the column. Variable-length encoding supports string functions and operators and LIKE predicates; dictionary encoding does not.

    3. The column has a limited number of distinct values. Dictionary encoding is best suited to columns with a limited number of distinct values, such as country columns.

    4. The column is expected to have few new values added during change propagation. Avoid dictionary encoding for columns with a high number of inserts and updates. Adding a significant number of a new, unique values to a dictionary encoded column can cause a change propagation failure.

    The following columns from the TPC Benchmark™ H (TPC-H) provide examples of string columns that are suitable and unsuitable for dictionary encoding:

    • ORDERS.O_ORDERPRIORITY

      This column is used only in range queries. The values associated with column are limited. During updates, it is unlikely for a significant number of new, unique values to be added. These characteristics make the column suitable for dictionary encoding.

    • LINEITEM.L_COMMENT

      This column is not used in joins or other complex expressions, but as a comment field, values are expected to be unique, making the column unsuitable for dictionary encoding.

    When in doubt about choosing an encoding type, use variable-length encoding, which is applied by default when tables are loaded into HeatWave, or use Auto Encoding to obtain encoding recommendations. See Section 2.7.4, “Auto Encoding”.

  • Data is partitioned by the table primary key when no data placement keys are defined. Only consider defining data placement keys if partitioning data by the primary key does not provide suitable performance.

    Reserve the use of data placement keys for the most time-consuming queries. In such cases, define data placement keys on:

    • The most frequently used JOIN keys.

    • The keys of the longest running queries.

    Consider using Auto Data Placement for data placement recommendations. See Section 2.7.5, “Auto Data Placement”.