When data is loaded into HeatWave, it is partitioned by the
table's primary key and sliced horizontally for distribution
among HeatWave nodes by default. The data placement key feature
permits partitioning data by
GROUP BY key columns instead, which can
query performance by avoiding costs associated with
redistributing data among HeatWave nodes at query execution time.
Generally, use data placement keys only if partitioning by the
primary key does not provide adequate performance. Also,
reserve data placement keys for the most time-consuming
queries. In such cases, define data placement keys on the most
JOIN keys and the keys of
the longest running queries.
For data placement key recommendations, use the Advisor utility after loading tables into HeatWave and running queries. For more information, see Section 2.7.3, “Workload Optimization using Advisor”.
Defining a data placement key requires adding a column comment with the data placement keyword string:
N is an index value that
defines the priority order of data placement keys.
The index must start with 1.
Permitted index values range from 1 to 16, inclusive.
An index value cannot be repeated in the same table. For example, you cannot assign an index value of 2 to more than one column in the same table.
Gaps in index values are not permitted. For example, if you define a data placement key column with an index value of 3, there must also be two other data placement key columns with index values of 1 and 2, respectively.
CREATE TABLE orders (date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1');
ALTER TABLE orders MODIFY date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1';
The following example shows multiple columns defined as data placement keys. Although a primary key is defined, data is partitioned by the data placement keys, which are prioritized over the primary key.
CREATE TABLE orders ( id INT PRIMARY KEY, date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1', price FLOAT COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=2' );
When defining multiple columns as data placement keys,
prioritize the keys according to query cost. For example,
DATA_PLACEMENT_KEY=1 to the key of
the costliest query, and
DATA_PLACEMENT_KEY=2 to the key of the next
costliest query, and so on.
Other information is permitted in column comments. For example, it is permitted to specify a column description alongside a data placement keyword string:
COMMENT 'column_description RAPID_COLUMN=DATA_PLACEMENT_KEY=1'
To modify or remove a data placement key, refer to the procedure described in Section 2.4, “Modifying Tables”.
GROUP BYquery optimizations are only applied if at least one of the
GROUP BYrelations has a key that matches the defined data placement key.
JOINoperation can be executed with or without the
GROUP BYquery optimization, a compilation-time cost model determines how the query is executed. The cost model uses estimated statistics.
A data placement key cannot be defined on a dictionary-encoded string column but are permitted on variable-length encoded columns. HeatWave applies variable-length encoding to string columns by default. See Section 2.7.1, “Encoding String Columns”.
A data placement key can only be defined on a column with a supported data type. See Section 2.9, “Supported Data Types”.
A data placement key column cannot be defined as a
NOT SECONDARYcolumn. See Excluding Table Columns.
For related metadata queries, see Section 2.15, “Metadata Queries”.