WL#6312: Referencing new row in INSERT ... VALUES ... ON DUPLICATE KEY UPDATE

Affects: Server-8.0 — Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

This WL extends the INSERT ... VALUES/SET ... ON DUPLICATE KEY UPDATE syntax to make it possible to declare an alias for the new row and columns in that row, and refer to those aliases in the UPDATE expression.

The intention with this new feature is to be able to replace VALUES(<expression>) clauses with row and column alias names. See also WL#13325.

INSERT ... SELECT ... ON DUPLICATE KEY UPDATE is not affected. Named references to the old and new row are already possible in these statements.

New syntax

INSERT ... VALUES

A new optional AS clause is allowed after the VALUES clause:

INSERT ... VALUES(...) opt_alias ON DUPLICATE KEY UPDATE ...

<opt_alias> ::=
   /* empty */
 | AS <new row alias>
 | AS <new row alias(<column alias list>)

<column alias list> ::=
   <column alias>, <column alias list>
 | <column alias>

Example:

INSERT INTO t1(col1, col2)
  VALUES (1, 2) AS new(col1, new2)
  ON DUPLICATE KEY UPDATE col1 = new.col1 + new2;

INSERT ... SET

The update list can be followed by an optional AS clause:

INSERT ... SET update_list opt_alias ON DUPLICATE KEY UPDATE ...

Examples:

INSERT INTO t1
  SET col1 = 1, col2 = 2
  ON DUPLICATE KEY UPDATE c = b;

INSERT INTO t1
  SET col1 = 1, col2 = 2 AS new(col1, new2)
  ON DUPLICATE KEY UPDATE col1 = new.col1 + new2;

F-1a: The INSERT ... VALUES ... ON DUPLICATE KEY syntax MUST allow an optional AS clause after the VALUES clause.
F-1b: The INSERT ... SET ... ON DUPLICATE KEY syntax MUST allow an optional AS clause after the update_list.
F-2: The row alias name and column alias names given in the AS clause MUST be valid in the entire ON DUPLICATE KEY UPDATE clause.
F-3: The row alias name and column alias names given in the AS clause MUST NOT be valid outside the ON DUPLICATE KEY UPDATE clause.
F-4: If a column alias name is mentioned more than once in a column alias list, error ER_DUP_FIELDNAME MUST be raised.
F-5: If the name of the table inserted into and the new row alias name are the same, error ER_NONUNIQ_TABLE MUST be raised.
F-6: If the new row alias name in the AS clause is the same as the name of an existing table, references to that name resolve to the row in the scope of the row alias name, unless overridden, c.f. F-7.
F-7: If, in the scope of a new row alias name declared in an AS clause, another new table or row alias with the same name is declared (e.g., in an AS clause of a subquery), the new row alias is hidden, and the name resolves to the inner table/row.
F-8: The name of the table being updated is a valid reference to the old row in the scope of an ON DUPLICATE KEY UPDATE clause unless hidden by another table name or alias in an inner scope (e.g., an alias in a subquery).

NF-1: No performance impact.

I-1: No new files.
I-2: Interface SQL01 is extended with an optional new AS clause in INSERT ... VALUES ... ON DUPLICATE KEY UPDATE statements (see LLD).; Interface SQL01 is extended with an optional new AS clause in INSERT ... SET ... ON DUPLICATE KEY UPDATE statements if the update_list is enclosed in parentheses (see LLD).; There are no new reserved words.
I-3: No new commands.
I-4: No new tools.
I-5: No new erros or warnings.

Syntax

Note: This is a syntax description, not an implementation in yacc/bison.

INSERT ... VALUES (...) opt_alias ON DUPLICATE KEY UPDATE ...

<opt_alias> ::=
   /* empty */
 | AS <new row alias>
 | AS <new row alias(<column alias list>)

<column alias list> ::=
   <column alias>, <column alias list>
 | <column alias>

INSERT ... SET (update_list) opt_alias ON DUPLICATE KEY UPDATE ...

Resolving

The new row alias name declared in the AS clause overrides table names. E.g.:

CREATE TABLE t2 ...;
CREATE TABLE t3 ...;
INSERT INTO t1 ...
  VALUES (...) AS t2
  ON DUPLICATE KEY UPDATE col1 = (SELECT col1 FROM t3 WHERE t3.col1 = t2.col1);

This will use the col1 value from the new row of t1, not from table t2.

Rationale: If the new row alias conflict with, or is overridden by, table names, the author of an INSERT statement will, at the time of writing the statement, need full knowledge of which tables will exist when the statement is run. This would make it practically impossible for an application to contain INSERT ... ON DUPLICATE KEY UPDATE statements that will always work.

A similar case is where a table with the same name as the row alias is used in the FROM clause of a subquery:

CREATE TABLE t2 ...;
INSERT INTO t1 ...
  VALUES (...) AS t2
  ON DUPLICATE KEY UPDATE col1 = (SELECT t2.col1 FROM t2);

In this case, t2 is in the FROM clause, and will always resolve to a table name. Hence, the new row alias name is hidden in the context of the subquery.

If the new row alias name declared in the AS clause is reused in other aliases, e.g., AS clauses in subqueries, the new row alias is overridden, e.g.:

CREATE TABLE t2 ...;
INSERT INTO t1 ...
  VALUES ... AS t3
  ON DUPLICATE KEY UPDATE col1 = (SELECT t3.col1 FROM t2 AS t3);

In this case, t3.col1 refers to col1 of table t2, not the new row in the VALUES clause.

Rationale: This mimics how inner queries are allowed to override names of outer queries. Unlike the previous case, the author of the INSERT statement has full control of names used in subqueries.