Related Documentation Download this Manual
PDF (US Ltr) - 0.6Mb
PDF (A4) - 0.6Mb


HeatWave User Guide  /  Preparing Data  /  Encoding String Columns

3.3 Encoding String Columns

Encoding string columns helps accelerate the processing of queries that access those columns. HeatWave supports two string column encoding types:

  • Variable-length encoding (VARLEN)

  • Dictionary encoding (SORTED)

When tables are loaded into HeatWave, variable-length encoding is applied to CHAR, VARCHAR, and TEXT-type columns by default. To use dictionary encoding, you must define the RAPID_COLUMN=ENCODING=SORTED keyword string in a column comment before loading the table. The keyword string must be uppercase; otherwise, it is ignored.

You can define the keyword string in a CREATE TABLE or ALTER TABLE statement, as shown:

CREATE TABLE orders (name VARCHAR(100) COMMENT 'RAPID_COLUMN=ENCODING=SORTED');
ALTER TABLE orders MODIFY name VARCHAR(100) COMMENT 'RAPID_COLUMN=ENCODING=SORTED';

If necessary, you can specify variable-length encoding explicitly using the RAPID_COLUMN=ENCODING=VARLEN keyword string.

Note

Other information is permitted in column comments. For example, it is permitted for a column description to be specified alongside a column encoding keyword string:

COMMENT 'column_description RAPID_COLUMN=ENCODING=SORTED'

Tip

For string column encoding recommendations, use the Advisor utility after loading tables into HeatWave and running queries. For more information, see Chapter 8, Workload Optimization using Advisor.

Encoding Type Selection

If you intend to run JOIN operations involving string columns or use string functions and operators, variable-length encoding is recommended. Variable-length encoding provides more expression, filter, function, and operator support than dictionary encoding. Otherwise, select the encoding type based on the number of distinct values in the string column relative to the cardinality of the table.

  • Variable-length encoding (VARLEN) is best suited to columns with a high number of distinct values, such as comment columns.

  • Dictionary encoding (SORTED) is best suited to columns with a low number of distinct values, such as country columns.

Variable-length encoding requires space for column values on the HeatWave nodes. Dictionary encoding requires space on the MySQL DB System node for dictionaries.

The following table provides an overview of encoding type characteristics:

Table 3.1 Column Encoding Type Characteristics

Encoding Type Expression, Filter, Function, and Operator Support Best Suited To Space Required On
Variable-length (VARLEN) Supports JOIN operations, string functions and operators, and LIKE predicates. See Section 11.4.1, “Variable-length Encoding”. Columns with a high number of distinct values HeatWave nodes
Dictionary (SORTED) Does not support JOIN operations, string functions and operators, or LIKE predicates. Columns with a low number of distinct values MySQL DB System node

For additional information about string column encoding, see Section 11.4, “String Column Encoding Reference”.