Generated Columns is a new feature available in the latest lab release. This work is based on a contribution by Andrey Zhakov. Thanks, Andrey! The Optimizer team modified it to follow the current MySQL design, and to lift a number of limitations.
The syntax is:
<type> [ GENERATED ALWAYS ] AS ( <expression> ) [ VIRTUAL|STORED ]
[ UNIQUE [KEY] ] [ [PRIMARY] KEY ] [ NOT NULL ] [ COMMENT <text> ]
There are two kinds of Generated Columns: virtual (default) and stored. Virtual means that the column will be calculated on the fly when a record is read from a table. Stored means that the column will be calculated when a new record is written in the table, and after that it will be treated as a regular field. Both types can have NOT NULL restrictions, but only a stored Generated Column can be be a part of an index.
There are still a few limitations on Generated Columns:
- The generation expression used can only call native deterministic functions; stored routines and UDFs are not yet supported.
- The length of the generation expressions for a table are subject to the limitations of the .frm file, and thus they can’t be longer than 64K in total. So you can have one field with a generation expression length of 64K, or you can have 30 fields with an average expression length of 2K each.
- A Generated Column can’t refer to itself or to other Generated Columns that are later defined, but it can refer to any previously defined Generated Column. This limitation does not apply to regular columns. A Generated Column can refer to any regular one, no matter where it’s defined.
12345678mysql> CREATE TABLE sales (-> name VARCHAR(20),-> price_eur DOUBLE,-> amount INT,-> total_eur DOUBLE AS (price_eur * amount),-> total_usd DOUBLE AS (total_eur * xrate),-> xrate DOUBLE);Query OK, 0 rows affected (0,16 sec)
What can Generated Columns can be used for? Many things. To name few:
- As a materialized cache for often used expressions:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748CREATE TABLE xmltable (username VARCHAR (20) AS (ExtractValue(doc,'/user/username')),doc TEXT);INSERT INTO xmltable (doc) VALUES('<user><id>1</id><username>tony</username><name>Tony Stark</name><info>A cool one</info></user>'),('<user><id>2</id><username>ned</username><name>Eddard Stark</name><info>A cold one</info></user>'),('<user><id>3</id><username>berty</username><name>Albert Stark</name><info>A farmer</info></user>');SELECT ExtractValue(doc,'/user/id')FROM xmltableWHEREExtractValue(doc,'/user/username') = 'tony' ORExtractValue(doc,'/user/username') = 'ned' ORExtractValue(doc,'/user/username') = 'berty';+------------------------------+| ExtractValue(doc,'/user/id') |+------------------------------+| 1 || 2 || 3 |+------------------------------+SELECT ExtractValue(doc,'/user/id')FROM xmltableWHERE username = 'tony' OR username = 'ned' OR username = 'berty';+------------------------------+| ExtractValue(doc,'/user/id') |+------------------------------+| 1 || 2 || 3 |+------------------------------+
The result of those two queries is exactly the same, but in the first one ExtractValue(doc,’/user/username’) will be evaluated 3 times per record read, while in the second only once per record read. If the “username” columns would be defined as STORED then the generation expression will be evaluated only when a record is inserted or updated.Another similar case is that Generated Columns (GC) could be used to add flexibility by replacing often used expressions with a GC. For example, if you have a bunch of applications that work on the same database then it might be practical to have a unified way to access data without need to keep all apps in sync.
- Providing indexes for joins with non-relational data:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869CREATE TABLE users (userid INT, username VARCHAR(16));CREATE TABLE comments(userid int AS (ExtractValue(comment,'/comment/userid')) STORED,comment TEXT,KEY(userid));INSERT INTO users VALUES (1, 'tony'),(2, 'ned'), (3, 'berty');INSERT INTO comments (comment) VALUES("<comment><id>1</id><userid>1</userid><text>I definitely need a new suit</text></comment>"),("<comment><id>2</id><userid>2</userid><text>No sight of global warming yet</text></comment>"),("<comment><id>3</id><userid>2</userid><text>Traffic jams in King's Landing is truly horrible</text></comment>");SELECTextractvalue(comment,'/comment/id') as id,extractvalue(comment,'/comment/text') as textFROM users JOINcomments ON users.userid=comments.useridWHERE username = 'ned';+------+--------------------------------------------------+| id | text |+------+--------------------------------------------------+| 2 | No sight of global warming yet || 3 | Traffic jams in King's Landing is truly horrible |+------+--------------------------------------------------+EXPLAIN SELECTextractvalue(comment,'/comment/id') AS id,extractvalue(comment,'/comment/text') AS textFROM users JOINcomments ON users.userid=comments.useridWHERE username = 'ned'\G*************************** 1. row ***************************id: 1select_type: SIMPLEtable: userspartitions: NULLtype: ALLpossible_keys: PRIMARYkey: NULLkey_len: NULLref: NULLrows: 3filtered: 33.33Extra: Using where*************************** 2. row ***************************id: 1select_type: SIMPLEtable: commentspartitions: NULLtype: refpossible_keys: useridkey: useridkey_len: 5ref: test.users.useridrows: 2filtered: 100.00Extra: NULL
Here ref access over an index is used to access a table with XML data.
- Working around a limited set of partitioning functions:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152CREATE TABLE characters (race VARCHAR(16) AS (ExtractValue(chars,'/character/race')) STORED,chars TEXT)PARTITION BY LIST COLUMNS (race) (PARTITION hum VALUES IN ("human"),PARTITION hlf VALUES IN ("halfling"),PARTITION elf VALUES IN ("elf"),PARTITION orc VALUES IN ("orc"));INSERT INTO characters(chars) VALUES('<character><name>Elrond</name><race>elf</race></character>'),('<character><name>Bilbo</name><race>halfling</race></character>'),('<character><name>Frodo</name><race>halfling</race></character>');SELECTExtractValue(chars,'/character/name')FROM charactersWHERE race='halfling';+---------------------------------------+| ExtractValue(chars,'/character/name') |+---------------------------------------+| Bilbo || Frodo |+---------------------------------------+EXPLAIN SELECTExtractValue(chars,'/character/name')FROM charactersWHERE race='halfling'\G*************************** 1. row ***************************id: 1select_type: SIMPLEtable: characterspartitions: hlftype: ALLpossible_keys:NULL key: NULLkey_len: NULLref: NULLrows: 2filtered: 50.00Extra: Using where
Note that only one partition is going to be scanned due to partition pruning.
As always, there are more than one way to do things. Generated Columns adds yet another means of solving a variety of interesting challenges. It now also becomes more convenient to deal with derivatives of relational and non-relational data. We look forward to seeing all of the interesting ways that you apply the feature!