Related Documentation Download this Manual
PDF (US Ltr) - 0.6Mb
PDF (A4) - 0.6Mb


8.2 Auto Data Placement

Auto Data Placement generates data placement key recommendations. Data placement keys are used to partition table data among HeatWave nodes when loading tables. Partitioning table data by JOIN and GROUP BY key columns can improve query performance by avoiding costs associated with redistributing data among HeatWave nodes at query execution time. The Data Placement Advisor generates data placement key recommendations by analyzing table statistics and HeatWave query history. For more information about data placement keys, see Section 3.4, “Defining Data Placement Keys”.

Auto Data Placement Syntax

CALL sys.heatwave_advisor ([options]);
  
 options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
        "key","value": 
        ["output",{"normal"|"silent"|"help"}]
        ["target_schema",JSON_ARRAY({"schema_name"[,"schema_name"]}]
        ["exclude_query",JSON_ARRAY("query_id"[,"query_id"] ...)]
        ["query_session_id",JSON_ARRAY("query_session_id"[,"query_session_id"] ...)]
        ["auto_dp",JSON_OBJECT(auto_dp_option)]
}

  auto_dp_option: {
                 ["benefit_threshold",N]
                 ["max_combinations",N]
 }

For syntax examples, see Section 8.4, “Advisor Examples”.

Advisor options are specified as key-value pairs in JSON-object format. Options include:

  • output: Defines how Advisor produces output. Permitted values are:

  • target_schema: Defines one or more schemas for Advisor to analyze. The list is specified in JSON-array format. If a target schema is not specified, all schemas in HeatWave are analyzed. When a target schema is specified, Advisor generates recommendations for tables belonging to the target schema. For the most accurate recommendations, specify one schema at a time. Only run Advisor on multiple schemas if your queries access tables in multiple schemas.

  • exclude_query: Defines the IDs of queries to exclude when Advisor analyzes query statistics. To identify query IDs, query the performance_schema.rpd_query_stats table. For a query example, see Section 8.4, “Advisor Examples”.

  • query_session_id: Defines session IDs for filtering queries by session ID. To identify session IDs, query the performance_schema.rpd_query_stats table. For a query example, see Section 8.4, “Advisor Examples”.

  • auto_dp: Defines settings for the Data Placement feature, which recommends data placement keys. Settings include:

    • benefit_threshold: The minimum query performance improvement expressed as a percentage value. Advisor only suggests data placement keys estimated to meet or exceed the benefit_threshold. The default value is 0.01 (1%). Query performance is a combined measure of all analyzed queries.

    • max_combinations: The maximum number of data placement key combinations Advisor considers before making recommendations. The default is 10000. The supported range is 1 to 100000. Specifying fewer combinations generates recommendations more quickly but recommendations may not be optimal.

Running Auto Data Placement

Note

If you intend to run Advisor for both encoding and data placement recommendations, it is recommended that you run Auto Encoding first, apply the recommended encodings, rerun your queries, and then run Auto Data Placement. This sequence allows data placement performance benefits to be calculated with string column encodings in place, which provides for greater accuracy from Advisor internal models.

For Advisor to generate data placement recommendations:

  • Tables must be loaded in HeatWave.

  • There must be a query history with at least 5 queries. A query is counted if it includes a JOIN on tables loaded in the HeatWave cluster or GROUP BY keys. A query executed on a table that is no longer loaded or that was reloaded since the query was run is not counted.

    To view the query history, query the performance_schema.rpd_query_stats table. For example:

    mysql> SELECT query_id, LEFT(query_text,160) FROM performance_schema.rpd_query_stats;

For the most accurate data placement recommendations, run Advisor on one schema at a time. In the following example, Advisor is run on the tpch_1024 schema using the target_schema option. No other options are specified, which means that the default option settings are used.

CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",JSON_ARRAY("tpch_1024")));

Advisor output provides information about each stage of Advisor execution. The data placement suggestion output shows suggested data placement keys and the estimated performance benefit of applying the keys.

The script generation output provides a query for retrieving the generated DDL statements for implementing the suggested data placement keys. Data placement keys cannot be added to a table or modified without reloading the table. Therefore, Advisor generates DDL statements for unloading the table, adding the keys, and reloading the table.

mysql> CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",JSON_ARRAY("tpch_1024")));
+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.12                 |
|                               |
| Output Mode: normal           |
| Excluded Queries: 0           |
| Target Schemas: 1             |
|                               |
+-------------------------------+
6 rows in set (0.01 sec)

+---------------------------------------------------------+
| ANALYZING LOADED DATA                                   |
+---------------------------------------------------------+
| Total 8 tables loaded in HeatWave for 1 schemas         |
| Tables excluded by user: 0 (within target schemas)      |
|                                                         |
| SCHEMA                            TABLES        COLUMNS |
| NAME                              LOADED         LOADED |
| ------                            ------         ------ |
| `tpch_1024`                            8             61 |
|                                                         |
+---------------------------------------------------------+
8 rows in set (0.02 sec)

+----------------------------------------------------------------------+ 
| AUTO DATA PLACEMENT                                                  |
+----------------------------------------------------------------------+
| Auto Data Placement Configuration:                                   |                                                                                         |                                                                      |
|   Minimum benefit threshold: 1%                                      |                                                                                         |                                                                      |                                                                                         
| Producing Data Placement suggestions for current setup:              |                                                                                         |                                                                      |                                                                                         
|   Tables Loaded: 8                                                   |
|   Queries used: 189                                                  |
|     Total query execution time: 22.75 min                            |
|     Most recent query executed on: Tuesday 8th June 2021 16:29:02    |
|     Oldest query executed on: Tuesday 8th June 2021 16:05:43         |
|   HeatWave cluster size: 5 nodes                                     |
|                                                                      |
| All possible Data Placement combinations based on query history: 120 |
| Explored Data Placement combinations after pruning: 90               |
|                                                                      |
+----------------------------------------------------------------------+
16 rows in set (12.38 sec)

+---------------------------------------------------------------------------------------+
| DATA PLACEMENT SUGGESTIONS                                                            |
+---------------------------------------------------------------------------------------+
| Total Data Placement suggestions produced for 2 tables                                |
|                                                                                       |
| TABLE                                         DATA PLACEMENT           DATA PLACEMENT |
| NAME                                             CURRENT KEY            SUGGESTED KEY |
| ------                                        --------------           -------------- |
| `tpch_1024`.`LINEITEM`              L_ORDERKEY, L_LINENUMBER               L_ORDERKEY |
| `tpch_1024`.`SUPPLIER`                             S_SUPPKEY              S_NATIONKEY |
|                                                                                       |
| Expected benefit after applying Data Placement suggestions                            |
|   Runtime saving: 6.17 min                                                            |
|   Performance benefit: 27%                                                            |
|                                                                                       |
+---------------------------------------------------------------------------------------+
12 rows in set (16.42 sec)

+-------------------------------------------------------------------------------------------+
| SCRIPT GENERATION                                                                         |
+-------------------------------------------------------------------------------------------+
| Script generated for applying suggestions for 2 loaded tables                             |
|                                                                                           |
| Applying changes will take approximately 1.18 h                                           |
|                                                                                           |
| Retrieve script containing 12 generated DDL commands using the query below:               |
|   SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql"|
|   ORDER BY id;                                                                            |
|                                                                                           |
| Caution: Executing the generated script will alter the column comment and secondary engine|
| flags in the schema                                                                       |
|                                                                                           |
+-------------------------------------------------------------------------------------------+
9 rows in set (16.43 sec)


SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql" 
 ORDER BY id;
+-------------------------------------------------------------------------------------------+
| SQL Script                                                                                |
+-------------------------------------------------------------------------------------------+
| SET SESSION innodb_parallel_read_threads = 48;                                            |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_UNLOAD;                                      |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_ENGINE=NULL;                                 |
| ALTER TABLE `tpch_1024`.`LINEITEM` MODIFY `L_ORDERKEY` bigint NOT NULL COMMENT            |
|  'RAPID_COLUMN=DATA_PLACEMENT_KEY=1';                                                     |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_ENGINE=RAPID;                                |
| ALTER TABLE `tpch_1024`.`LINEITEM` SECONDARY_LOAD;                                        |
| SET SESSION innodb_parallel_read_threads = 48;                                            |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_UNLOAD;                                      |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_ENGINE=NULL;                                 |
| ALTER TABLE `tpch_1024`.`SUPPLIER` MODIFY `S_NATIONKEY` int NOT NULL COMMENT              |
|  'RAPID_COLUMN=DATA_PLACEMENT_KEY=1';                                                     |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_ENGINE=RAPID;                                |
| ALTER TABLE `tpch_1024`.`SUPPLIER` SECONDARY_LOAD;                                        |
+-------------------------------------------------------------------------------------------+
12 rows in set (0.00 sec)

Usage Notes:

  • If a table already has data placement keys or columns are customized prior to running Advisor, Advisor may generate DDL statements for removing previously defined data placement keys.

  • Advisor provides recommendations only if data placement keys are estimated to improve query performance. If not, an information message is returned and no recommendations are provided.

  • Advisor provides data placement key recommendations based on approximate models. Recommendations are therefore not guaranteed to improve query performance.