Related Documentation Download this Manual
PDF (US Ltr) - 0.6Mb
PDF (A4) - 0.6Mb


8.1 Auto Encoding

Auto Encoding provides string column encoding recommendations. Choosing the right string column encodings can improve the performance of queries accessing those columns. The type of encoding applied to string columns also affects the amount of memory required on HeatWave nodes. HeatWave supports two string column encoding types: variable-length and dictionary. HeatWave applies variable-length encoding to string columns by default when data is loaded, which may not be the optimal encoding choice in all cases. Auto Encoding generates string column encoding recommendations by analyzing column data, HeatWave query history, and available MySQL node memory. For more information about string column encoding, see Section 3.3, “Encoding String Columns”.

Auto Encoding Syntax

CALL sys.heatwave_advisor ([options]);
  
 options: {
     JSON_OBJECT("key","value"[,"key","value"] ...)
        "key","value": 
        ["output",{"normal"|"silent"|"help"}]
        [target_schema,JSON_ARRAY({"schema_name"[,"schema_name"]}]
        ["exclude_query",JSON_ARRAY("query_id"[,"query_id"] ...)]
        ["query_session_id",JSON_ARRAY("query_session_id"[,"query_session_id"] ...)]
        ["auto_enc",JSON_OBJECT(auto_enc_option)]
}

 auto_enc_option: {
                 ["mode",{"off"|"recommend"}]
                 ["fixed_enc",JSON_OBJECT("schema.tbl.col",{"varlen"|"dictionary"}
                    [,"schema.tbl.col",{"varlen"|"dictionary"}] ...]
  }

For syntax examples, see Section 8.4, “Advisor Examples”.

Advisor options are specified as key-value pairs in JSON-object format. Options include:

  • output: Defines how Advisor produces output. Permitted values are:

  • target_schema: Defines one or more schemas for Advisor to analyze. The list is specified in JSON-array format. If a target schema is not specified, all schemas in the HeatWave cluster are analyzed. When a target schema is specified, Advisor generates recommendations for tables belonging to the target schema. For the most accurate recommendations, specify one schema at a time. Only run Advisor on multiple schemas if your queries access tables in multiple schemas.

  • exclude_query: Defines the IDs of queries to exclude when Advisor analyzes query statistics. To identify query IDs, query the performance_schema.rpd_query_stats table. For a query example, see Section 8.4, “Advisor Examples”.

  • query_session_id: Defines session IDs for filtering queries by session ID. To identify session IDs, query the performance_schema.rpd_query_stats table. For a query example, see Section 8.4, “Advisor Examples”.

  • auto_enc: Defines settings for Auto Encoding, which provides string column encoding recommendations. Settings include:

    • mode: Defines the operational mode. Permitted values are:

      • off: The default. Disables the Auto Encoding feature.

      • recommend: The Auto Encoding feature recommends string column encodings.

    • fixed_enc: Defines an encoding type for specified columns. Use this option if you know the encoding you want for a specific column and you are not interested in an encoding recommendation for that column. Only applicable in recommend mode. Columns with a fixed encoding type are excluded from encoding recommendations. The fixed_enc key is a fully qualified column name without backticks in the following format: schema_name.tbl_name.col_name. The value is the encoding type; either varlen or dictionary. Multiple key-value pairs can be specified in a comma-separated list.

Running Auto Encoding

Auto Encoding is enabled by specifying the auto_enc option in recommend mode. See Auto Encoding Syntax.

Note

If you intend to run Advisor for both encoding and data placement recommendations, it is recommended that you run Auto Encoding first, apply the recommended encodings, rerun your queries, and then run Auto Data Placement. This sequence allows data placement performance benefits to be calculated with string column encodings in place, which provides for greater accuracy from Advisor internal models.

For Advisor to provide string column encoding recommendations, tables must be loaded in HeatWave and a query history must be available. Run the queries that you intend to use or run a representative set of queries. Failing to do so can affect query offload after Auto Encoding recommendations are implemented due to query constraints associated with dictionary encoding. For dictionary encoding limitations, see Section 11.4.2, “Dictionary Encoding”.

In the following example, Auto Encoding is run in recommend mode, which analyzes column data, checks the amount of memory on the MySQL node, and provides encoding recommendations intended to reduce the amount of space required on HeatWave nodes and optimize query performance. There is no target schema specified, so Auto Encoding runs on all schemas loaded in HeatWave

CALL sys.heatwave_advisor(JSON_OBJECT("auto_enc",JSON_OBJECT("mode","recommend")));

The fixed_enc option can be used in recommend mode to specify an encoding for specific columns. These columns are excluded from consideration when Auto Encoding generates recommendations. Manually encoded columns are also excluded from consideration. (For manual encoding instructions, see Section 3.3, “Encoding String Columns”.)

CALL sys.heatwave_advisor(JSON_OBJECT("auto_enc",JSON_OBJECT("mode","recommend","fixed_enc", 
JSON_OBJECT("tpch.CUSTOMER.C_ADDRESS","varlen"))));

Advisor output provides information about each stage of Advisor execution, including recommended column encodings and estimated HeatWave cluster memory savings.

mysql> CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",JSON_ARRAY("tpch_1024"), 
       "auto_enc",JSON_OBJECT("mode","recommend")));
+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.12                 |
|                               |
| Output Mode: normal           |
| Excluded Queries: 0           |
| Target Schemas: 1             |
|                               |
+-------------------------------+
6 rows in set (0.01 sec)

+---------------------------------------------------------+
| ANALYZING LOADED DATA                                   |
+---------------------------------------------------------+
| Total 8 tables loaded in HeatWave for 1 schemas         |
| Tables excluded by user: 0 (within target schemas)      |
|                                                         |
| SCHEMA                            TABLES        COLUMNS |
| NAME                              LOADED         LOADED |
| ------                            ------         ------ |
| `tpch_1024`                            8             61 |
|                                                         |
+---------------------------------------------------------+
8 rows in set (0.15 sec)

+-------------------------------------------------------------------------------------------+
| ENCODING SUGGESTIONS                                                                      |
+-------------------------------------------------------------------------------------------+
| Total Auto Encoding suggestions produced for 21 columns                                   |
| Queries executed: 200                                                                     |
|   Total query execution time: 28.82 min                                                   |
|   Most recent query executed on: Tuesday 8th June 2021 14:42:13                           |
|   Oldest query executed on: Tuesday 8th June 2021 14:11:45                                |
|                                                                                           |
|                                                        CURRENT           SUGGESTED        |
| COLUMN                                                  COLUMN              COLUMN        |
| NAME                                                  ENCODING            ENCODING        |
| ------                                                --------           ---------        |
| `tpch_1024`.`CUSTOMER`.`C_ADDRESS`                      VARLEN          DICTIONARY        |
| `tpch_1024`.`CUSTOMER`.`C_COMMENT`                      VARLEN          DICTIONARY        |
| `tpch_1024`.`CUSTOMER`.`C_MKTSEGMENT`                   VARLEN          DICTIONARY        |
| `tpch_1024`.`CUSTOMER`.`C_NAME`                         VARLEN          DICTIONARY        |
| `tpch_1024`.`LINEITEM`.`L_COMMENT`                      VARLEN          DICTIONARY        |
| `tpch_1024`.`LINEITEM`.`L_SHIPINSTRUCT`                 VARLEN          DICTIONARY        |
| `tpch_1024`.`LINEITEM`.`L_SHIPMODE`                     VARLEN          DICTIONARY        |
| `tpch_1024`.`NATION`.`N_COMMENT`                        VARLEN          DICTIONARY        |
| `tpch_1024`.`NATION`.`N_NAME`                           VARLEN          DICTIONARY        |
| `tpch_1024`.`ORDERS`.`O_CLERK`                          VARLEN          DICTIONARY        |
| `tpch_1024`.`ORDERS`.`O_ORDERPRIORITY`                  VARLEN          DICTIONARY        |
| `tpch_1024`.`PART`.`P_BRAND`                            VARLEN          DICTIONARY        |
| `tpch_1024`.`PART`.`P_COMMENT`                          VARLEN          DICTIONARY        |
| `tpch_1024`.`PART`.`P_CONTAINER`                        VARLEN          DICTIONARY        |
| `tpch_1024`.`PART`.`P_MFGR`                             VARLEN          DICTIONARY        |
| `tpch_1024`.`PARTSUPP`.`PS_COMMENT`                     VARLEN          DICTIONARY        |
| `tpch_1024`.`REGION`.`R_COMMENT`                        VARLEN          DICTIONARY        |
| `tpch_1024`.`REGION`.`R_NAME`                           VARLEN          DICTIONARY        |
| `tpch_1024`.`SUPPLIER`.`S_ADDRESS`                      VARLEN          DICTIONARY        |
| `tpch_1024`.`SUPPLIER`.`S_NAME`                         VARLEN          DICTIONARY        |
| `tpch_1024`.`SUPPLIER`.`S_PHONE`                        VARLEN          DICTIONARY        |
|                                                                                           |
| Applying the suggested encodings might improve query performance and cluster memory usage.|
|   Estimated HeatWave cluster memory savings: 355.60 GiB                                   |
|                                                                                           |
+-------------------------------------------------------------------------------------------+
35 rows in set (18.18 sec)

+-------------------------------------------------------------------------------------------+
| SCRIPT GENERATION                                                                         |
+-------------------------------------------------------------------------------------------+
| Script generated for applying suggestions for 8 loaded tables                             |
|                                                                                           |
| Applying changes will take approximately 1.64 h                                           |
|                                                                                           |
| Retrieve script containing 61 generated DDL commands using the query below:               |
|   SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql"|
|   ORDER BY id;                                                                            |
|                                                                                           |
| Caution: Executing the generated script will alter the column comment and secondary engine|
| flags in the schema                                                                       |
|                                                                                           |
+-------------------------------------------------------------------------------------------+
9 rows in set (18.20 sec)

To inspect the load script, which includes the DDL statements required to implement the recommended encodings, query the heatwave_advisor_report table:

SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report 
WHERE type = "sql" ORDER BY id;

To concatenate generated DDL statements into a single string that can be copied and pasted for execution, issue the statements that follow. The group_concat_max_len variable sets the result length in bytes for the GROUP_CONCAT() function to accommodate a potentially long string. (The default group_concat_max_len setting is 1024 bytes.)

SET SESSION group_concat_max_len = 1000000;
SELECT GROUP_CONCAT(log->>"$.sql" SEPARATOR ' ') FROM sys.heatwave_advisor_report 
  WHERE type = "sql" ORDER BY id;

Usage Notes:

  • Auto Encoding analyzes string columns (CHAR, VARCHAR, and TEXT-type columns) of tables that are loaded in HeatWave. Automatically or manually excluded columns, columns greater than 8000 bytes, and columns with manually defined encodings are excluded from consideration. Auto Encoding also analyzes HeatWave query history to identify query constraints that preclude the use of dictionary encoding. Dictionary-encoded columns are not supported in JOIN operations, with string functions and operators, or in LIKE predicates. For dictionary encoding limitations, see Section 11.4.2, “Dictionary Encoding”.

  • The time required to generate encoding recommendations depends on the number of queries to be analyzed, the number of operators, and the complexity of each query.

  • Encoding recommendations for the same table may differ after changes to data or data statistics. For example, changes to table cardinality or the number of distinct values in a column can affect recommendations.

  • Auto Encoding does not generate recommendations for a given table if existing encodings do not require modification.

  • Auto Encoding only recommends dictionary encoding if it is expected to reduce the amount of memory required on HeatWave nodes.

  • If there is not enough MySQL node memory for the dictionaries of all columns that would benefit from dictionary encoding, the columns estimated to save the most memory are recommended for dictionary encoding.

  • Auto Encoding uses the current state of tables loaded in HeatWave when generating recommendations. Concurrent change propagation activity is not considered.

  • Encoding recommendations are based on estimates and are therefore not guaranteed to reduce the memory required on HeatWave nodes or improve query performance.