Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 3.8Mb
PDF (A4) - 3.8Mb


5.8.5 Auto Data Placement

Auto Data Placement generates data placement key recommendations. Data placement keys are used to partition table data among MySQL HeatWave nodes when loading tables. Partitioning table data by JOIN and GROUP BY key columns can improve query performance by avoiding costs associated with redistributing data among MySQL HeatWave nodes at query execution time. The Data Placement Advisor generates data placement key recommendations by analyzing table statistics and MySQL HeatWave query history. For more information about data placement keys, see Section 5.8.2, “Data Placement Keys”.

This topic contains the following sections:

Run Auto Data Placement

Note

To run Advisor for both encoding and data placement recommendations, run Auto Encoding first, apply the recommended encodings, rerun the queries, and then run Auto Data Placement. This sequence allows data placement performance benefits to be calculated with string column encodings in place, which provides for greater accuracy from Advisor internal models.

For Advisor to generate data placement recommendations:

  • Tables must be loaded in MySQL HeatWave.

  • There must be a query history with at least 5 queries. A query is counted if it includes a JOIN on tables loaded in the MySQL HeatWave Cluster or GROUP BY keys. A query executed on a table that is no longer loaded or that was reloaded since the query was run is not counted.

    To view the query history, query the performance_schema.rpd_query_stats table. For example:

    mysql> SELECT query_id, LEFT(query_text,160) 
              FROM performance_schema.rpd_query_stats;
    *************************** 1. row ***************************
                query_id: 14
    LEFT(query_text,160): SELECT country from airport_geo
    *************************** 2. row ***************************
                query_id: 15
    LEFT(query_text,160): CREATE TABLE flight_from_US SELECT * FROM flight WHERE `from` IN (SELECT airport_id FROM airport_geo WHERE country="UNITED STATES")
    *************************** 3. row ***************************
                query_id: 16
    LEFT(query_text,160): CREATE TABLE flight_from_US SELECT * FROM flight WHERE `from` IN (SELECT airport_id FROM airport_geo WHERE country="UNITED STATES")
    *************************** 4. row ***************************
                query_id: 17
    LEFT(query_text,160): CREATE TABLE flight_from_US SELECT * FROM flight WHERE `from` IN (SELECT airport_id FROM airport_geo WHERE country="UNITED STATES")
    ...
    ...
    9 rows in set (0.2179 sec)

For the most accurate data placement recommendations, run Advisor on one schema at a time. In the following example, Advisor is run on the airportdb schema using the target_schema option. No other options are specified, which means that the default option settings are used.

mysql> CALL sys.heatwave_advisor(JSON_OBJECT('target_schema',JSON_ARRAY('tpch_1024')));

Advisor output provides information about each stage of Advisor execution. The data placement suggestion output shows suggested data placement keys and the estimated performance benefit of applying the keys.

When you run the MySQL HeatWave Advisor, it provides a query to retrieve the generated DDL statements needed to implement the recommended data placement keys. You cannot add or modify data placement keys without reloading the table. Therefore, the Advisor creates DDL statements to unload the table, apply the new keys, and then reload the table.

mysql> CALL sys.heatwave_advisor(JSON_OBJECT('target_schema',JSON_ARRAY('tpch_1024')));
+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.12                 |
|                               |
| Output Mode: normal           |
| Excluded Queries: 0           |
| Target Schemas: 1             |
|                               |
+-------------------------------+
6 rows in set (0.01 sec)

+---------------------------------------------------------+
| ANALYZING LOADED DATA                                   |
+---------------------------------------------------------+
| Total 8 tables loaded in HeatWave for 1 schemas         |
| Tables excluded by user: 0 (within target schemas)      |
|                                                         |
| SCHEMA                            TABLES        COLUMNS |
| NAME                              LOADED         LOADED |
| ------                            ------         ------ |
| `tpch_1024`                            8             61 |
|                                                         |
+---------------------------------------------------------+
8 rows in set (0.02 sec)

+----------------------------------------------------------------------+ 
| AUTO DATA PLACEMENT                                                  |
+----------------------------------------------------------------------+
| Auto Data Placement Configuration:                                   |
|   Minimum benefit threshold: 1%                                      |
| Producing Data Placement suggestions for current setup:              |
|   Tables Loaded: 8                                                   |
|   Queries used: 189                                                  |
|     Total query execution time: 22.75 min                            |
|     Most recent query executed on: Tuesday 8th June 2021 16:29:02    |
|     Oldest query executed on: Tuesday 8th June 2021 16:05:43         |
|   HeatWave cluster size: 5 nodes                                     |
|                                                                      |
| All possible Data Placement combinations based on query history: 120 |
| Explored Data Placement combinations after pruning: 90               |
|                                                                      |
+----------------------------------------------------------------------+
16 rows in set (12.38 sec)

+---------------------------------------------------------------------------------------+
| DATA PLACEMENT SUGGESTIONS                                                            |
+---------------------------------------------------------------------------------------+
| Total Data Placement suggestions produced for 2 tables                                |
|                                                                                       |
| TABLE                                         DATA PLACEMENT           DATA PLACEMENT |
| NAME                                             CURRENT KEY            SUGGESTED KEY |
| ------                                        --------------           -------------- |
| `tpch_1024`.`LINEITEM`              L_ORDERKEY, L_LINENUMBER               L_ORDERKEY |
| `tpch_1024`.`SUPPLIER`                             S_SUPPKEY              S_NATIONKEY |
|                                                                                       |
| Expected benefit after applying Data Placement suggestions                            |
|   Runtime saving: 6.17 min                                                            |
|   Performance benefit: 27%                                                            |
|                                                                                       |
+---------------------------------------------------------------------------------------+
12 rows in set (16.42 sec)

+-------------------------------------------------------------------------------------------+
| SCRIPT GENERATION                                                                         |
+-------------------------------------------------------------------------------------------+
| Script generated for applying suggestions for 2 loaded tables                             |
|                                                                                           |
| Applying changes takes approximately 1.18 h                                               |
|                                                                                           |
| Retrieve script containing 12 generated DDL commands using the query below:               |
|   SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql"|
|   ORDER BY id;                                                                            |
|                                                                                           |
| Caution: Executing the generated script alters the column comment and secondary engine    |
| flags in the schema                                                                       |
|                                                                                           |
+-------------------------------------------------------------------------------------------+
9 rows in set (16.43 sec)


SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql" 
 ORDER BY id;
+-------------------------------------------------------------------------------------------+
| SQL Script                                                                                |
+-------------------------------------------------------------------------------------------+
| SET SESSION innodb_parallel_read_threads = 48;                                            |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_UNLOAD;                                      |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_ENGINE=NULL;                                 |
| ALTER TABLE `tpch_1024`.`LINEITEM` MODIFY `L_ORDERKEY` bigint NOT NULL COMMENT            |
|  'RAPID_COLUMN=DATA_PLACEMENT_KEY=1';                                                     |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_ENGINE=RAPID;                                |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_LOAD;                                        |
| SET SESSION innodb_parallel_read_threads = 48;                                            |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_UNLOAD;                                      |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_ENGINE=NULL;                                 |
| ALTER TABLE `tpch_1024`.`SUPPLIER` MODIFY `S_NATIONKEY` int NOT NULL COMMENT              |
|  'RAPID_COLUMN=DATA_PLACEMENT_KEY=1';                                                     |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_ENGINE=RAPID;                                |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_LOAD;                                        |
+-------------------------------------------------------------------------------------------+
12 rows in set (0.00 sec)

Usage Notes

  • If a table already has data placement keys or columns are customized before running Advisor, Advisor may generate DDL statements for removing previously defined data placement keys.

  • Advisor provides recommendations only if data placement keys are estimated to improve query performance. If not, an information message is returned and no recommendations are provided.

  • Advisor provides data placement key recommendations based on approximate models. Recommendations are therefore not guaranteed to improve query performance.

Auto Data Placement Examples

  • Invoking Advisor without any options runs the Data Placement Advisor with the default option settings.

    mysql> CALL sys.heatwave_advisor(NULL);
  • Running Advisor with only the target_schema option runs the Data Placement Advisor on the specified schemas with the default option settings.

    mysql> CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",JSON_ARRAY("tpch","employees")));
  • Run Advisor with the data placement max_combinations and benefit_threshold parameters. For information about these options, see Section 10.1.5, “HEATWAVE_ADVISOR”.

    mysql> CALL sys.heatwave_advisor(JSON_OBJECT("auto_dp",
              JSON_OBJECT("max_combinations",100,"benefit_threshold",20)));
  • The following example shows how to view the MySQL HeatWave query history by querying the performance_schema.rpd_query_stats table, and how to exclude specific queries from Data Placement Advisor analysis using the exclude_query option:

    mysql> SELECT query_id, LEFT(query_text,160) 
              FROM performance_schema.rpd_query_stats;
    
    mysql> CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",
              JSON_ARRAY("tpch"),"exclude_query",JSON_ARRAY(1,11,12,14)));
  • This example demonstrates how to invoke the Data Placement Advisor with options specified in a variable:

    mysql> SET @options = JSON_OBJECT(
                "target_schema", JSON_ARRAY("analytics45","sample_schema"),
                "exclude_query", JSON_ARRAY(12,24),
                "auto_dp", JSON_OBJECT(
                       "benefit_threshold", 12.5,
                       "max_combinations", 100
    ));
    
    mysql> CALL sys.heatwave_advisor( @options );
  • This example demonstrates how to invoke Advisor in silent output mode, which is useful if the output is consumed by a script, for example. Auto Data Placement is run by default if no option such as auto_enc or query_insights is specified.

    mysql> CALL sys.heatwave_advisor(JSON_OBJECT("output","silent"));

What's Next

Learn more about Auto Query Time Estimation.