When data is loaded into HeatWave, it is partitioned by the
table primary key and sliced horizontally for distribution
among HeatWave nodes by default. The data placement key feature
permits partitioning data by JOIN
or
GROUP BY
key columns instead, which can
improve JOIN
or GROUP BY
query performance by avoiding costs associated with
redistributing data among HeatWave nodes at query execution time.
Generally, use data placement keys only if partitioning by the
primary key does not provide adequate performance. Also,
reserve data placement keys for the most time-consuming
queries. In such cases, define data placement keys on the most
frequently used JOIN
keys and the keys of
the longest running queries.
For data placement key recommendations, use HeatWave Autopilot Advisor after loading tables into HeatWave and running queries. For more information, see Section 2.7, “Workload Optimization for OLAP”.
Defining a data placement key requires adding a column comment with the data placement keyword string:
$> RAPID_COLUMN=DATA_PLACEMENT_KEY=N
where N
is an index value that
defines the priority order of data placement keys.
The index must start with 1.
Permitted index values range from 1 to 16, inclusive.
An index value cannot be repeated in the same table. For example, you cannot assign an index value of 2 to more than one column in the same table.
Gaps in index values are not permitted. For example, if you define a data placement key column with an index value of 3, there must also be two other data placement key columns with index values of 1 and 2, respectively.
You can define the data placement keyword string in a
CREATE TABLE
or
ALTER TABLE
statement:
mysql> CREATE TABLE orders (date DATE
COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1');
mysql> ALTER TABLE orders MODIFY date DATE
COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1';
The following example shows multiple columns defined as data placement keys. Although a primary key is defined, data is partitioned by the data placement keys, which are prioritized over the primary key.
mysql> CREATE TABLE orders (
id INT PRIMARY KEY,
date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1',
price FLOAT COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=2');
When defining multiple columns as data placement keys,
prioritize the keys according to query cost. For example,
assign DATA_PLACEMENT_KEY=1
to the key of
the costliest query, and
DATA_PLACEMENT_KEY=2
to the key of the next
costliest query, and so on.
Other information is permitted in column comments. For example, it is permitted to specify a column description alongside a data placement keyword string:
mysql> COMMENT 'column_description RAPID_COLUMN=DATA_PLACEMENT_KEY=1'
To modify or remove a data placement key, refer to the procedure described in Section 2.4, “Modifying Tables”.
JOIN
andGROUP BY
query optimizations are only applied if at least one of theJOIN
orGROUP BY
relations has a key that matches the defined data placement key.If a
JOIN
operation can be executed with or without theJOIN
andGROUP BY
query optimization, a compilation-time cost model determines how the query is executed. The cost model uses estimated statistics.A data placement key cannot be defined on a dictionary-encoded string column but are permitted on variable-length encoded columns. HeatWave applies variable-length encoding to string columns by default. See Section 2.7.1, “Encoding String Columns”.
A data placement key can only be defined on a column with a supported data type. See Section 2.10, “Supported Data Types”.
A data placement key column cannot be defined as a
NOT SECONDARY
column. See Section 2.2.2.1, “Excluding Table Columns”.For related metadata queries, see Section 2.16, “Metadata Queries”.