Auto Data Placement generates data placement key
recommendations. Data placement keys are used to partition table
data among MySQL HeatWave nodes when loading tables. Partitioning table
data by JOIN
and GROUP BY
key columns can improve query performance by avoiding costs
associated with redistributing data among MySQL HeatWave nodes at query
execution time. The Data Placement Advisor generates data
placement key recommendations by analyzing table statistics and
MySQL HeatWave query history. For more information about data placement
keys, see Section 5.8.2, “Data Placement Keys”.
This topic contains the following sections:
To run Advisor for both encoding and data placement recommendations, run Auto Encoding first, apply the recommended encodings, rerun the queries, and then run Auto Data Placement. This sequence allows data placement performance benefits to be calculated with string column encodings in place, which provides for greater accuracy from Advisor internal models.
For Advisor to generate data placement recommendations:
Tables must be loaded in MySQL HeatWave.
-
There must be a query history with at least 5 queries. A query is counted if it includes a
JOIN
on tables loaded in the MySQL HeatWave Cluster orGROUP BY
keys. A query executed on a table that is no longer loaded or that was reloaded since the query was run is not counted.To view the query history, query the
performance_schema.rpd_query_stats
table. For example:mysql> SELECT query_id, LEFT(query_text,160) FROM performance_schema.rpd_query_stats; *************************** 1. row *************************** query_id: 14 LEFT(query_text,160): SELECT country from airport_geo *************************** 2. row *************************** query_id: 15 LEFT(query_text,160): CREATE TABLE flight_from_US SELECT * FROM flight WHERE `from` IN (SELECT airport_id FROM airport_geo WHERE country="UNITED STATES") *************************** 3. row *************************** query_id: 16 LEFT(query_text,160): CREATE TABLE flight_from_US SELECT * FROM flight WHERE `from` IN (SELECT airport_id FROM airport_geo WHERE country="UNITED STATES") *************************** 4. row *************************** query_id: 17 LEFT(query_text,160): CREATE TABLE flight_from_US SELECT * FROM flight WHERE `from` IN (SELECT airport_id FROM airport_geo WHERE country="UNITED STATES") ... ... 9 rows in set (0.2179 sec)
For the most accurate data placement recommendations, run
Advisor on one schema at a time. In the following example,
Advisor is run on the airportdb
schema
using the target_schema
option. No other
options
are specified, which means
that the default option settings are used.
mysql> CALL sys.heatwave_advisor(JSON_OBJECT('target_schema',JSON_ARRAY('tpch_1024')));
Advisor output provides information about each stage of Advisor execution. The data placement suggestion output shows suggested data placement keys and the estimated performance benefit of applying the keys.
When you run the MySQL HeatWave Advisor, it provides a query to retrieve the generated DDL statements needed to implement the recommended data placement keys. You cannot add or modify data placement keys without reloading the table. Therefore, the Advisor creates DDL statements to unload the table, apply the new keys, and then reload the table.
mysql> CALL sys.heatwave_advisor(JSON_OBJECT('target_schema',JSON_ARRAY('tpch_1024')));
+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.12 |
| |
| Output Mode: normal |
| Excluded Queries: 0 |
| Target Schemas: 1 |
| |
+-------------------------------+
6 rows in set (0.01 sec)
+---------------------------------------------------------+
| ANALYZING LOADED DATA |
+---------------------------------------------------------+
| Total 8 tables loaded in HeatWave for 1 schemas |
| Tables excluded by user: 0 (within target schemas) |
| |
| SCHEMA TABLES COLUMNS |
| NAME LOADED LOADED |
| ------ ------ ------ |
| `tpch_1024` 8 61 |
| |
+---------------------------------------------------------+
8 rows in set (0.02 sec)
+----------------------------------------------------------------------+
| AUTO DATA PLACEMENT |
+----------------------------------------------------------------------+
| Auto Data Placement Configuration: |
| Minimum benefit threshold: 1% |
| Producing Data Placement suggestions for current setup: |
| Tables Loaded: 8 |
| Queries used: 189 |
| Total query execution time: 22.75 min |
| Most recent query executed on: Tuesday 8th June 2021 16:29:02 |
| Oldest query executed on: Tuesday 8th June 2021 16:05:43 |
| HeatWave cluster size: 5 nodes |
| |
| All possible Data Placement combinations based on query history: 120 |
| Explored Data Placement combinations after pruning: 90 |
| |
+----------------------------------------------------------------------+
16 rows in set (12.38 sec)
+---------------------------------------------------------------------------------------+
| DATA PLACEMENT SUGGESTIONS |
+---------------------------------------------------------------------------------------+
| Total Data Placement suggestions produced for 2 tables |
| |
| TABLE DATA PLACEMENT DATA PLACEMENT |
| NAME CURRENT KEY SUGGESTED KEY |
| ------ -------------- -------------- |
| `tpch_1024`.`LINEITEM` L_ORDERKEY, L_LINENUMBER L_ORDERKEY |
| `tpch_1024`.`SUPPLIER` S_SUPPKEY S_NATIONKEY |
| |
| Expected benefit after applying Data Placement suggestions |
| Runtime saving: 6.17 min |
| Performance benefit: 27% |
| |
+---------------------------------------------------------------------------------------+
12 rows in set (16.42 sec)
+-------------------------------------------------------------------------------------------+
| SCRIPT GENERATION |
+-------------------------------------------------------------------------------------------+
| Script generated for applying suggestions for 2 loaded tables |
| |
| Applying changes takes approximately 1.18 h |
| |
| Retrieve script containing 12 generated DDL commands using the query below: |
| SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql"|
| ORDER BY id; |
| |
| Caution: Executing the generated script alters the column comment and secondary engine |
| flags in the schema |
| |
+-------------------------------------------------------------------------------------------+
9 rows in set (16.43 sec)
SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql"
ORDER BY id;
+-------------------------------------------------------------------------------------------+
| SQL Script |
+-------------------------------------------------------------------------------------------+
| SET SESSION innodb_parallel_read_threads = 48; |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_UNLOAD; |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_ENGINE=NULL; |
| ALTER TABLE `tpch_1024`.`LINEITEM` MODIFY `L_ORDERKEY` bigint NOT NULL COMMENT |
| 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1'; |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_ENGINE=RAPID; |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_LOAD; |
| SET SESSION innodb_parallel_read_threads = 48; |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_UNLOAD; |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_ENGINE=NULL; |
| ALTER TABLE `tpch_1024`.`SUPPLIER` MODIFY `S_NATIONKEY` int NOT NULL COMMENT |
| 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1'; |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_ENGINE=RAPID; |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_LOAD; |
+-------------------------------------------------------------------------------------------+
12 rows in set (0.00 sec)
If a table already has data placement keys or columns are customized before running Advisor, Advisor may generate DDL statements for removing previously defined data placement keys.
Advisor provides recommendations only if data placement keys are estimated to improve query performance. If not, an information message is returned and no recommendations are provided.
Advisor provides data placement key recommendations based on approximate models. Recommendations are therefore not guaranteed to improve query performance.
-
Invoking Advisor without any
options
runs the Data Placement Advisor with the default option settings.mysql> CALL sys.heatwave_advisor(NULL);
-
Running Advisor with only the
target_schema
option runs the Data Placement Advisor on the specified schemas with the default option settings.mysql> CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",JSON_ARRAY("tpch","employees")));
-
Run Advisor with the data placement
max_combinations
andbenefit_threshold
parameters. For information about theseoptions
, see Section 10.1.5, “HEATWAVE_ADVISOR”.mysql> CALL sys.heatwave_advisor(JSON_OBJECT("auto_dp", JSON_OBJECT("max_combinations",100,"benefit_threshold",20)));
-
The following example shows how to view the MySQL HeatWave query history by querying the
performance_schema.rpd_query_stats
table, and how to exclude specific queries from Data Placement Advisor analysis using theexclude_query
option:mysql> SELECT query_id, LEFT(query_text,160) FROM performance_schema.rpd_query_stats; mysql> CALL sys.heatwave_advisor(JSON_OBJECT("target_schema", JSON_ARRAY("tpch"),"exclude_query",JSON_ARRAY(1,11,12,14)));
-
This example demonstrates how to invoke the Data Placement Advisor with
options
specified in a variable:mysql> SET @options = JSON_OBJECT( "target_schema", JSON_ARRAY("analytics45","sample_schema"), "exclude_query", JSON_ARRAY(12,24), "auto_dp", JSON_OBJECT( "benefit_threshold", 12.5, "max_combinations", 100 )); mysql> CALL sys.heatwave_advisor( @options );
-
This example demonstrates how to invoke Advisor in silent output mode, which is useful if the output is consumed by a script, for example. Auto Data Placement is run by default if no option such as
auto_enc
orquery_insights
is specified.mysql> CALL sys.heatwave_advisor(JSON_OBJECT("output","silent"));
Learn more about Auto Query Time Estimation.