Encoding string columns helps accelerate the processing of queries that access those columns. HeatWave supports two string column encoding types:
Variable-length encoding (
VARLEN
)Dictionary encoding (
SORTED
)
When tables are loaded into HeatWave, variable-length encoding
is applied to CHAR
,
VARCHAR
, and
TEXT
type columns by default.
To use dictionary encoding, you must define the
RAPID_COLUMN=ENCODING=SORTED
keyword string
in a column comment before loading the table. The keyword
string must be uppercase; otherwise, it is ignored.
You can define the keyword string in a
CREATE TABLE
or
ALTER TABLE
statement, as
shown:
mysql> CREATE TABLE orders (name VARCHAR(100)
COMMENT 'RAPID_COLUMN=ENCODING=SORTED');
mysql> ALTER TABLE orders MODIFY name VARCHAR(100)
COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
If necessary, you can specify variable-length encoding
explicitly using the
RAPID_COLUMN=ENCODING=VARLEN
keyword
string.
Other information is permitted in column comments. For example, it is permitted for a column description to be specified alongside a column encoding keyword string:
COMMENT
'
column_description
RAPID_COLUMN=ENCODING=SORTED'
For string column encoding recommendations, use HeatWave Autopilot Advisor after loading tables into HeatWave and running queries. For more information, see Section 2.7, “Workload Optimization for OLAP”.
To modify or remove a string column encoding, refer to the procedure described in Section 2.4, “Modifying Tables”.
If you intend to run JOIN
operations
involving string columns or use string functions and
operators, variable-length encoding is recommended.
Variable-length encoding provides more expression, filter,
function, and operator support than dictionary encoding.
Otherwise, select the encoding type based on the number of
distinct values in the string column relative to the
cardinality of the table.
Variable-length encoding (
VARLEN
) is best suited to columns with a high number of distinct values, such as “comment” columns.Dictionary encoding (
SORTED
) is best suited to columns with a low number of distinct values, such as “country” columns.
Variable-length encoding requires space for column values on the HeatWave nodes. Dictionary encoding requires space on the MySQL DB System node for dictionaries.
The following table provides an overview of encoding type characteristics:
Table 2.1 Column Encoding Type Characteristics
Encoding Type | Expression, Filter, Function, and Operator Support | Best Suited To | Space Required On |
---|---|---|---|
Variable-length (VARLEN) |
Supports JOIN operations, string functions and
operators, and LIKE predicates. See
Section 2.14.1, “Variable-length Encoding”. |
Columns with a high number of distinct values | HeatWave nodes |
Dictionary (SORTED) |
Does not support JOIN operations, string functions
and operators, or LIKE predicates. |
Columns with a low number of distinct values | MySQL DB System node |
For additional information about string column encoding, see Section 2.14, “String Column Encoding Reference”.