When data is loaded into MySQL HeatWave, it is partitioned by the table
primary key and sliced horizontally for distribution among
MySQL HeatWave nodes by default. However, this may not be optimal for
queries involving JOIN
or GROUP
BY
operations as they incur costs due to data
redistribution among MySQL HeatWave nodes at query execution time
In case of such queries, the data placement key feature permits
partitioning data by JOIN
or GROUP
BY
key columns instead of primary keys. To use this
feature, define data placement keys on the most frequently used
JOIN
keys and the keys of the longest running
queries.
For data placement key recommendations, use MySQL HeatWave Autopilot Advisor after loading tables into MySQL HeatWave and running queries. For more information, see Section 5.8, “Optimize Workloads for OLAP”.
This topic includes the following sections:
Review the requirements.
The examples in this topic use the sample database
airportdb
. To learn how to download the sample database, see AirportDB Analytics Quickstart.
Defining a data placement key requires adding a column comment with the data placement keyword string:
Press CTRL+C to copyRAPID_COLUMN=DATA_PLACEMENT_KEY=N
where N
is an index value that
defines the priority order of data placement keys. The rules
for assigning a data placement index are:
The index must start with 1.
Index values range from 1 to 16.
Each column must have an unique index value. For example, you cannot assign the index value of 2 to more than one column in the same table.
Always assign consecutive index values, without gaps.For example, if you define a data placement key column with an index value of 3, there must also be two other data placement key columns with index values of 1 and 2, respectively.
The following example shows multiple columns defined as data placement keys. Although a primary key is defined, data is partitioned by the data placement keys, which are prioritized over the primary key.
Press CTRL+C to copymysql> CREATE TABLE airline_discount( id INT PRIMARY KEY, date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1', discount FLOAT COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=2'); Query OK, 0 rows affected (0.2448 sec)
When defining multiple columns as data placement keys,
prioritize the keys according to query cost. For example,
assign DATA_PLACEMENT_KEY=1
to the key of
the costliest query, and
DATA_PLACEMENT_KEY=2
to the key of the
second costliest query, and so on.
Other information is permitted in column comments. For example, it is permitted to specify a column description alongside a data placement keyword string:
Press CTRL+C to copyCOMMENT 'column_description RAPID_COLUMN=DATA_PLACEMENT_KEY=1'
To view the data placement type of the tables loaded into
MySQL HeatWave, PrimaryKey
means the data
placement key is using the primary key of the table:
Press CTRL+C to copymysql> SELECT SCHEMA_NAME, TABLE_NAME, DATA_PLACEMENT_TYPE, LOAD_STATUS FROM performance_schema.rpd_table_id JOIN performance_schema.rpd_tables USING (ID);
You can also view data placement keys for an individual table using SHOW CREATE TABLE.
For information about data placement key index values, see Section 5.8.2, “Data Placement Keys”.
To determine whether a JOIN
or
GROUP BY
query used data placement
partitions, examine the QEP_TEXT
column in
the performance_schema.rpd_query_stats
table. This column contains prepart
(pre-partitioning) data, which indicates how the query was
distributed across MySQL HeatWave nodes.
For GROUP BY
operations, the
prepart
data is displayed as:
Press CTRL+C to copy"prepart": #
Where, # represents the number of nodes.
For JOIN
operations, the
prepart
data appears as:
Press CTRL+C to copy"prepart": [#, #]
where [#, #]
represents the number of
MySQL HeatWave nodes used in the left and the right branches
respectively. If either value is greater than 1, that branch
used data placement partitions. If both values are 1 (i.e.,
"prepart": [1, 1]), then data placement partitions is not used
both the branches.
To query QEP_TEXT
prepart
data for the last executed query:
Press CTRL+C to copymysql> SELECT CONCAT( '"prepart":[', (JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][0]')), "," ,(JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][1]')) , ']' ) FROM performance_schema.rpd_query_stats WHERE query_id = (select max(query_id) FROM performance_schema.rpd_query_stats); +-----------------------------------------------------------------------------+ | concat( '"prepart":[', (JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][0]')), | |"," ,(JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][1]')) , ']' ) | +-----------------------------------------------------------------------------+ | "prepart":[2,2] | +-----------------------------------------------------------------------------+
Refer to the following example to understand how to use MySQL HeatWave data placement keys to improve query performance by controlling data distribution and verifying execution plan efficiency.
-
To begin with, check whether the target table (airline_discount) has any existing data placement keys, using SELECT statement:
Press CTRL+C to copymysql> SELECT COLUMN_NAME, COLUMN_COMMENT FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME LIKE 'airline_discount' AND COLUMN_COMMENT LIKE '%DATA_PLACEMENT_KEY%';/ add output
-
Modify the table (airline_discount) to add (airline_discount)the from and to columns as placement keys, so that MySQL HeatWave can locate related data across nodes and reduce data movement during joins.
Press CTRL+C to copymysql> ALTER TABLE flight MODIFY `from` smallint NOT NULL COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1', MODIFY `to` smallint NOT NULL COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=2'; Add output
-
Use the following query to inspect the number of partitions used in the most recent offloaded query:
Press CTRL+C to copymysql> SELECT QEP_TEXT->>"$**.nPreparts" FROM performance_schema.rpd_query_stats WHERE query_id = (select max(query_id) FROM performance_schema.rpd_query_stats); Add output
The
nPreparts
value represents the number of partitions involved. A lowernPreparts
value indicates more efficient data placement and reduced inter-node communication.
JOIN
andGROUP BY
query optimizations are only applied if at least one of theJOIN
orGROUP BY
relations has a key that matches the defined data placement key.If you execute a
JOIN
operation with or without theJOIN
andGROUP BY
query optimization, a compilation-time cost model determines how the query is executed. The cost model uses estimated statistics.A data placement key cannot be defined on a dictionary-encoded string column but are permitted on variable-length encoded columns. MySQL HeatWave applies variable-length encoding to string columns by default. See Section 5.8.1.4, “Encoding String Columns”.
A data placement key can only be defined on a column with a supported data type. See Section 4.2.1, “Supported Data Types for MySQL HeatWave”.
A data placement key column cannot be defined as a
NOT SECONDARY
column. See Section 4.2.6.1, “Exclude Table Columns”.
Learn about Autopilot Advisor.