Related Documentation Download this Manual
PDF (US Ltr) - 0.6Mb
PDF (A4) - 0.6Mb


HeatWave User Guide  /  Preparing Data  /  Defining Data Placement Keys

3.4 Defining Data Placement Keys

When data is loaded into HeatWave, it is partitioned by the table's primary key and sliced horizontally for distribution among HeatWave nodes by default. The data placement key feature permits partitioning data by JOIN or GROUP BY key columns instead, which can improve JOIN or GROUP BY query performance by avoiding costs associated with redistributing data among HeatWave nodes at query execution time.

Generally, use data placement keys only if partitioning by the primary key does not provide adequate performance. Also, reserve data placement keys for the most time-consuming queries. In such cases, define data placement keys on the most frequently used JOIN keys and the keys of the longest running queries.

Tip

For data placement key recommendations, use the Advisor utility after loading tables into HeatWave and running queries. For more information, see Chapter 8, Workload Optimization using Advisor.

Defining a data placement key requires adding a column comment with the data placement keyword string:

RAPID_COLUMN=DATA_PLACEMENT_KEY=N

where N is an index value that defines the priority order of data placement keys.

  • The index must start with 1.

  • Permitted index values range from 1 to 16, inclusive.

  • An index value cannot be repeated in the same table. For example, you cannot assign an index value of 2 to more than one column in the same table.

  • Gaps in index values are not permitted. For example, if you define a data placement key column with an index value of 3, there must also be two other data placement key columns with index values of 1 and 2, respectively.

You can define the data placement keyword string in a CREATE TABLE or ALTER TABLE statement:

CREATE TABLE orders (date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1');
ALTER TABLE orders MODIFY date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1';

The following example shows multiple columns defined as data placement keys. Although a primary key is defined, data is partitioned by the data placement keys, which are prioritized over the primary key.

CREATE TABLE orders (
  id INT PRIMARY KEY, 
  date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1',
  price FLOAT COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=2'
);

When defining multiple columns as data placement keys, prioritize the keys according to query cost. For example, assign DATA_PLACEMENT_KEY=1 to the key of the costliest query, and DATA_PLACEMENT_KEY=2 to the key of the next costliest query, and so on.

Note

Other information is permitted in column comments. For example, it is permitted to specify a column description alongside a data placement keyword string:

COMMENT 'column_description RAPID_COLUMN=DATA_PLACEMENT_KEY=1'

Usage notes:

  • JOIN and GROUP BY query optimizations are only applied if at least one of the JOIN or GROUP BY relations has a key that matches the defined data placement key.

  • If a JOIN operation can be executed with or without the JOIN and GROUP BY query optimization, a compilation-time cost model determines how the query is executed. The cost model uses estimated statistics.

  • A data placement key cannot be defined on a dictionary-encoded string column but are permitted on variable-length encoded columns. HeatWave applies variable-length encoding to string columns by default. See Section 3.3, “Encoding String Columns”.

  • A data placement key can only be defined on a column with a supported data type. See Section 11.1, “Supported Data Types”.

  • A data placement key column cannot be defined as a NOT SECONDARY column. See Section 3.2, “Excluding Table Columns”.

  • For related metadata queries, see Section 11.5, “Metadata Queries”.