Auto Encoding recommends string column encodings. Using the right string column encodings can reduce the amount of memory required on HeatWave nodes and improve query performance. HeatWave supports two string column encoding types: variable-length and dictionary. HeatWave applies variable-length encoding to string columns by default when data is loaded. Auto Encoding generates string column encoding recommendations by analyzing column data, HeatWave query history, query performance data, and available memory on the MySQL node. For more information about string column encoding, see Section 2.7.1, “Encoding String Columns”.
CALL sys.heatwave_advisor ([options]);
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value":
["output",{"normal"|"silent"|"help"}]
[target_schema,JSON_ARRAY({"schema_name"[,"schema_name"]}]
["exclude_query",JSON_ARRAY("query_id"[,"query_id"] ...)]
["query_session_id",JSON_ARRAY("query_session_id"[,"query_session_id"] ...)]
["auto_enc",JSON_OBJECT(auto_enc_option)]
}
auto_enc_option: {
["mode",{"off"|"recommend"}]
["fixed_enc",JSON_OBJECT("schema.tbl.col",{"varlen"|"dictionary"}
[,"schema.tbl.col",{"varlen"|"dictionary"}] ...]
}
For syntax examples, see Section 2.7.3.4, “Advisor Examples”.
Advisor options are specified as key-value pairs in
JSON
-object format. Options include:
-
output
: Defines how Advisor produces output. Permitted values are:normal
: The default. Produces summarized output that is sent tostdout
and to theheatwave_advisor_report
table. (See Section 2.7.3.5, “Advisor Report Table”.)silent
: Sends output to theheatwave_advisor_report
table only. (See Section 2.7.3.5, “Advisor Report Table”.) The"silent"
output type is useful if human-readable output is not required; when the output is consumed by a script, for example.help
: Displays Advisor command-line help. See Section 2.7.3.4, “Advisor Examples”.
target_schema
: Defines one or more schemas for Advisor to analyze. The list is specified inJSON
-array format. If a target schema is not specified, all schemas in the HeatWave Cluster are analyzed. When a target schema is specified, Advisor generates recommendations for tables belonging to the target schema. For the most accurate recommendations, specify one schema at a time. Only run Advisor on multiple schemas if your queries access tables in multiple schemas.exclude_query
: Defines the IDs of queries to exclude when Advisor analyzes query statistics. To identify query IDs, query theperformance_schema.rpd_query_stats
table. For a query example, see Section 2.7.3.4, “Advisor Examples”.query_session_id
: Defines session IDs for filtering queries by session ID. To identify session IDs, query theperformance_schema.rpd_query_stats
table. For a query example, see Section 2.7.3.4, “Advisor Examples”.-
auto_enc
: Defines settings for Auto Encoding, which provides string column encoding recommendations. Settings include:-
mode
: Defines the operational mode. Permitted values are:off
: The default. Disables the Auto Encoding feature.recommend
: The Auto Encoding feature recommends string column encodings.
fixed_enc
: Defines an encoding type for specified columns. Use this option if you know the encoding you want for a specific column and you are not interested in an encoding recommendation for that column. Only applicable inrecommend
mode. Columns with a fixed encoding type are excluded from encoding recommendations. Thefixed_enc
key is a fully qualified column name without backticks in the following format:schema_name.tbl_name.col_name
. The value is the encoding type; eithervarlen
ordictionary
. Multiple key-value pairs can be specified in a comma-separated list.
-
Auto Encoding is enabled by specifying the
auto_enc
option in
recommend
mode. See
Auto Encoding Syntax.
If you intend to run Advisor for both encoding and data placement recommendations, it is recommended that you run Auto Encoding first, apply the recommended encodings, rerun your queries, and then run Auto Data Placement. This sequence allows data placement performance benefits to be calculated with string column encodings in place, which provides for greater accuracy from Advisor internal models.
For Advisor to provide string column encoding recommendations, tables must be loaded in HeatWave and a query history must be available. Run the queries that you intend to use or run a representative set of queries. Failing to do so can affect query offload after Auto Encoding recommendations are implemented due to query constraints associated with dictionary encoding. For dictionary encoding limitations, see Section 2.12.2, “Dictionary Encoding”.
In the following example, Auto Encoding is run in
recommend
mode, which analyzes column
data, checks the amount of memory on the MySQL node, and
provides encoding recommendations intended to reduce the
amount of space required on HeatWave nodes and optimize
query performance. There is no target schema specified, so
Auto Encoding runs on all schemas loaded in HeatWave
CALL sys.heatwave_advisor(JSON_OBJECT("auto_enc",JSON_OBJECT("mode","recommend")));
The fixed_enc
option can be used in
recommend
mode to specify an encoding
for specific columns. These columns are excluded from
consideration when Auto Encoding generates
recommendations. Manually encoded columns are also
excluded from consideration. (For manual encoding
instructions, see
Section 2.7.1, “Encoding String Columns”.)
CALL sys.heatwave_advisor(JSON_OBJECT("auto_enc",JSON_OBJECT("mode","recommend","fixed_enc",
JSON_OBJECT("tpch.CUSTOMER.C_ADDRESS","varlen"))));
Advisor output provides information about each stage of Advisor execution, including recommended column encodings and estimated HeatWave Cluster memory savings.
mysql> CALL sys.heatwave_advisor(JSON_OBJECT("target_schema",JSON_ARRAY("tpch_1024"),
"auto_enc",JSON_OBJECT("mode","recommend")));
+-------------------------------+
| INITIALIZING HEATWAVE ADVISOR |
+-------------------------------+
| Version: 1.12 |
| |
| Output Mode: normal |
| Excluded Queries: 0 |
| Target Schemas: 1 |
| |
+-------------------------------+
6 rows in set (0.01 sec)
+---------------------------------------------------------+
| ANALYZING LOADED DATA |
+---------------------------------------------------------+
| Total 8 tables loaded in HeatWave for 1 schemas |
| Tables excluded by user: 0 (within target schemas) |
| |
| SCHEMA TABLES COLUMNS |
| NAME LOADED LOADED |
| ------ ------ ------ |
| `tpch_1024` 8 61 |
| |
+---------------------------------------------------------+
8 rows in set (0.15 sec)
+-------------------------------------------------------------------------------------------+
| ENCODING SUGGESTIONS |
+-------------------------------------------------------------------------------------------+
| Total Auto Encoding suggestions produced for 21 columns |
| Queries executed: 200 |
| Total query execution time: 28.82 min |
| Most recent query executed on: Tuesday 8th June 2021 14:42:13 |
| Oldest query executed on: Tuesday 8th June 2021 14:11:45 |
| |
| CURRENT SUGGESTED |
| COLUMN COLUMN COLUMN |
| NAME ENCODING ENCODING |
| ------ -------- --------- |
| `tpch_1024`.`CUSTOMER`.`C_ADDRESS` VARLEN DICTIONARY |
| `tpch_1024`.`CUSTOMER`.`C_COMMENT` VARLEN DICTIONARY |
| `tpch_1024`.`CUSTOMER`.`C_MKTSEGMENT` VARLEN DICTIONARY |
| `tpch_1024`.`CUSTOMER`.`C_NAME` VARLEN DICTIONARY |
| `tpch_1024`.`LINEITEM`.`L_COMMENT` VARLEN DICTIONARY |
| `tpch_1024`.`LINEITEM`.`L_SHIPINSTRUCT` VARLEN DICTIONARY |
| `tpch_1024`.`LINEITEM`.`L_SHIPMODE` VARLEN DICTIONARY |
| `tpch_1024`.`NATION`.`N_COMMENT` VARLEN DICTIONARY |
| `tpch_1024`.`NATION`.`N_NAME` VARLEN DICTIONARY |
| `tpch_1024`.`ORDERS`.`O_CLERK` VARLEN DICTIONARY |
| `tpch_1024`.`ORDERS`.`O_ORDERPRIORITY` VARLEN DICTIONARY |
| `tpch_1024`.`PART`.`P_BRAND` VARLEN DICTIONARY |
| `tpch_1024`.`PART`.`P_COMMENT` VARLEN DICTIONARY |
| `tpch_1024`.`PART`.`P_CONTAINER` VARLEN DICTIONARY |
| `tpch_1024`.`PART`.`P_MFGR` VARLEN DICTIONARY |
| `tpch_1024`.`PARTSUPP`.`PS_COMMENT` VARLEN DICTIONARY |
| `tpch_1024`.`REGION`.`R_COMMENT` VARLEN DICTIONARY |
| `tpch_1024`.`REGION`.`R_NAME` VARLEN DICTIONARY |
| `tpch_1024`.`SUPPLIER`.`S_ADDRESS` VARLEN DICTIONARY |
| `tpch_1024`.`SUPPLIER`.`S_NAME` VARLEN DICTIONARY |
| `tpch_1024`.`SUPPLIER`.`S_PHONE` VARLEN DICTIONARY |
| |
| Applying the suggested encodings might improve query performance and cluster memory usage.|
| Estimated HeatWave Cluster memory savings: 355.60 GiB |
| |
+-------------------------------------------------------------------------------------------+
35 rows in set (18.18 sec)
+-------------------------------------------------------------------------------------------+
| SCRIPT GENERATION |
+-------------------------------------------------------------------------------------------+
| Script generated for applying suggestions for 8 loaded tables |
| |
| Applying changes will take approximately 1.64 h |
| |
| Retrieve script containing 61 generated DDL commands using the query below: |
| SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report WHERE type = "sql"|
| ORDER BY id; |
| |
| Caution: Executing the generated script will alter the column comment and secondary engine|
| flags in the schema |
| |
+-------------------------------------------------------------------------------------------+
9 rows in set (18.20 sec)
To inspect the load script, which includes the DDL
statements required to implement the recommended
encodings, query the
heatwave_advisor_report
table:
SELECT log->>"$.sql" AS "SQL Script" FROM sys.heatwave_advisor_report
WHERE type = "sql" ORDER BY id;
To concatenate generated DDL statements into a single
string that can be copied and pasted for execution, issue
the statements that follow. The
group_concat_max_len
variable sets the result length in bytes for the
GROUP_CONCAT()
function to
accommodate a potentially long string. (The default
group_concat_max_len
setting is 1024 bytes.)
SET SESSION group_concat_max_len = 1000000;
SELECT GROUP_CONCAT(log->>"$.sql" SEPARATOR ' ') FROM sys.heatwave_advisor_report
WHERE type = "sql" ORDER BY id;
Usage Notes:
Auto Encoding analyzes string columns (
CHAR
,VARCHAR
, andTEXT
-type columns) of tables that are loaded in HeatWave. Automatically or manually excluded columns, columns greater than 65532 bytes, and columns with manually defined encodings are excluded from consideration. Auto Encoding also analyzes HeatWave query history to identify query constraints that preclude the use of dictionary encoding. Dictionary-encoded columns are not supported inJOIN
operations, with string functions and operators, or inLIKE
predicates. For dictionary encoding limitations, see Section 2.12.2, “Dictionary Encoding”.The time required to generate encoding recommendations depends on the number of queries to be analyzed, the number of operators, and the complexity of each query.
Encoding recommendations for the same table may differ after changes to data or data statistics. For example, changes to table cardinality or the number of distinct values in a column can affect recommendations.
Auto Encoding does not generate recommendations for a given table if existing encodings do not require modification.
Auto Encoding only recommends dictionary encoding if it is expected to reduce the amount of memory required on HeatWave nodes.
If there is not enough MySQL node memory for the dictionaries of all columns that would benefit from dictionary encoding, the columns estimated to save the most memory are recommended for dictionary encoding.
Auto Encoding uses the current state of tables loaded in HeatWave when generating recommendations. Concurrent change propagation activity is not considered.
Encoding recommendations are based on estimates and are therefore not guaranteed to reduce the memory required on HeatWave nodes or improve query performance.