InnoDB
supports secondary indexes on virtual
generated columns. Other index types are not supported. A
secondary index defined on a virtual column is sometimes
referred to as a “virtual index”.
A secondary index may be created on one or more virtual columns
or on a combination of virtual columns and regular columns or
stored generated columns. Secondary indexes that include virtual
columns may be defined as UNIQUE
.
When a secondary index is created on a virtual generated column, generated column values are materialized in the records of the index. If the index is a covering index (one that includes all the columns retrieved by a query), generated column values are retrieved from materialized values in the index structure instead of computed “on the fly”.
There are additional write costs to consider when using a
secondary index on a virtual column due to computation performed
when materializing virtual column values in secondary index
records during INSERT
and
UPDATE
operations. Even with
additional write costs, secondary indexes on virtual columns may
be preferable to generated stored columns,
which are materialized in the clustered index, resulting in
larger tables that require more disk space and memory. If a
secondary index is not defined on a virtual column, there are
additional costs for reads, as virtual column values must be
computed each time the column's row is examined.
Values of an indexed virtual column are MVCC-logged to avoid
unnecessary recomputation of generated column values during
rollback or during a purge operation. The data length of logged
values is limited by the index key limit of 767 bytes for
COMPACT
and REDUNDANT
row
formats, and 3072 bytes for DYNAMIC
and
COMPRESSED
row formats.
Adding or dropping a secondary index on a virtual column is an in-place operation.
As noted elsewhere, JSON
columns cannot be indexed directly. To create an index that
references such a column indirectly, you can define a
generated column that extracts the information that should be
indexed, then create an index on the generated column, as
shown in this example:
mysql> CREATE TABLE jemp (
-> c JSON,
-> g INT GENERATED ALWAYS AS (c->"$.id"),
-> INDEX i (g)
-> );
Query OK, 0 rows affected (0.28 sec)
mysql> INSERT INTO jemp (c) VALUES
> ('{"id": "1", "name": "Fred"}'), ('{"id": "2", "name": "Wilma"}'),
> ('{"id": "3", "name": "Barney"}'), ('{"id": "4", "name": "Betty"}');
Query OK, 4 rows affected (0.04 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c->>"$.name" AS name
> FROM jemp WHERE g > 2;
+--------+
| name |
+--------+
| Barney |
| Betty |
+--------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT c->>"$.name" AS name
> FROM jemp WHERE g > 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: jemp
partitions: NULL
type: range
possible_keys: i
key: i
key_len: 5
ref: NULL
rows: 2
filtered: 100.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select json_unquote(json_extract(`test`.`jemp`.`c`,'$.name'))
AS `name` from `test`.`jemp` where (`test`.`jemp`.`g` > 2)
1 row in set (0.00 sec)
(We have wrapped the output from the last statement in this example to fit the viewing area.)
When you use EXPLAIN
on a
SELECT
or other SQL statement
containing one or more expressions that use the
->
or ->>
operator, these expressions are translated into their
equivalents using JSON_EXTRACT()
and (if
needed) JSON_UNQUOTE()
instead, as shown
here in the output from SHOW
WARNINGS
immediately following this
EXPLAIN
statement:
mysql> EXPLAIN SELECT c->>"$.name"
> FROM jemp WHERE g > 2 ORDER BY c->"$.name"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: jemp
partitions: NULL
type: range
possible_keys: i
key: i
key_len: 5
ref: NULL
rows: 2
filtered: 100.00
Extra: Using where; Using filesort
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select json_unquote(json_extract(`test`.`jemp`.`c`,'$.name')) AS
`c->>"$.name"` from `test`.`jemp` where (`test`.`jemp`.`g` > 2) order by
json_extract(`test`.`jemp`.`c`,'$.name')
1 row in set (0.00 sec)
See the descriptions of the
->
and
->>
operators, as well as those of the
JSON_EXTRACT()
and
JSON_UNQUOTE()
functions, for
additional information and examples.
This technique also can be used to provide indexes that
indirectly reference columns of other types that cannot be
indexed directly, such as GEOMETRY
columns.
It is also possible to create an index on a
JSON
column using the
JSON_VALUE()
function with an
expression that can be used to optimize queries employing the
expression. See the description of that function for more
information and examples.
JSON columns and indirect indexing in NDB Cluster
It is also possible to use indirect indexing of JSON columns in MySQL NDB Cluster, subject to the following conditions:
NDB
handles aJSON
column value internally as aBLOB
. This means that anyNDB
table having one or more JSON columns must have a primary key, else it cannot be recorded in the binary log.The
NDB
storage engine does not support indexing of virtual columns. Since the default for generated columns isVIRTUAL
, you must specify explicitly the generated column to which to apply the indirect index asSTORED
.
The CREATE TABLE
statement
used to create the table jempn
shown here
is a version of the jemp
table shown
previously, with modifications making it compatible with
NDB
:
CREATE TABLE jempn (
a BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
c JSON DEFAULT NULL,
g INT GENERATED ALWAYS AS (c->"$.id") STORED,
INDEX i (g)
) ENGINE=NDB;
We can populate this table using the following
INSERT
statement:
INSERT INTO jempn (c) VALUES
('{"id": "1", "name": "Fred"}'),
('{"id": "2", "name": "Wilma"}'),
('{"id": "3", "name": "Barney"}'),
('{"id": "4", "name": "Betty"}');
Now NDB
can use index i
,
as shown here:
mysql> EXPLAIN SELECT c->>"$.name" AS name
-> FROM jempn WHERE g > 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: jempn
partitions: p0,p1,p2,p3
type: range
possible_keys: i
key: i
key_len: 5
ref: NULL
rows: 3
filtered: 100.00
Extra: Using pushed condition (`test`.`jempn`.`g` > 2)
1 row in set, 1 warning (0.01 sec)
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select
json_unquote(json_extract(`test`.`jempn`.`c`,'$.name')) AS `name` from
`test`.`jempn` where (`test`.`jempn`.`g` > 2)
1 row in set (0.00 sec)
You should keep in mind that a stored generated column, as
well as any index on such a column, uses
DataMemory
.