WL#9807: X Protocol Crud.Insert with upsert

Affects: Server-8.0   —   Status: Complete

The UPSERT command is a very useful operation that allows performing an insert or 
update, depending on whether the record already exists or not, in a single step, 
atomically.

MySQL supports the ON DUPLICATE KEY UPDATE extension to INSERT, which has this 
behaviour, and should be used for implementation.

Limitations
-----------

The intended behaviour is:

* if records exists, replace it with our the values presented, other side create

Building a ON DUPLICATE KEY UPDATE for collections is straight-forward:

  INSERT INTO collection VALUES (key, value) ON DUPLICATE KEY
    UPDATE SET _doc=value;
    
as the structure of the underlying table is known (value goes into _doc) and
unique keys are based on generated columns and aren't updated directly.

For UPSERTS into tables the behaviour is different:

  INSERT INTO collection VALUES ("a", "b", "c") ON DUPLICATE KEY
    UPDATE SET ??="f", ??="b", ??="c";

The column names and keys are not known, unless they come with the query
or they're queried from MySQL. The 2nd introduces the need for
metadata locking, caching and additional queries, which would be a
performance issue.

Therefore this worklog limits the feature to collections only.
FR1 - When a document inserted, if another item already exists with the 
same unique key value (that is, inserting it would cause a duplicate key error), 
the document should replace the original duplicate item.

FR2 - When an upsert leads to an Update, the primary key of the original item
must be preserved

FR3 - Upsert operations on tables should be rejected with an Error.

FR4 - If an upsert on collections with one or more unique keys in addition to the 
PK results in multiple matches/duplicate keys on the same row, the operation must 
succeed.

FR5 - If an upsert on collections with one or more unique keys in addition to the 
PK results in multiple matches/duplicate keys on different rows, the operation 
must fail with an error.
Notes
=====

- For collections, it is only possible to replace the entire document. Partial 
changes/updates are not supported for Upserts at the moment, for the sake of 
simplicity.

- Upsert for tables is not supported. The reason is that for upsert to work, we 
need to know the structure of the table beforehand, at the time that the SQL 
translation is done. For collections, we know the structure, but for tables, the 
column names and keys are not known, unless they come with the query or they're 
queried from MySQL. The 2nd introduces the need for metadata locking, caching 
and additional queries, which would be a performance issue. In the end, the 
effort needed to make it work and/or the usability of the feature would 
outweight the potential benefits.


When a key conflict occurs during insert, the following behavior will take 
place:

- For collections with no additional unique keys other than the primary key, the 
entire document will be replaced. In such cases, the _id is the only column that 
can conflict, so its value will be preserved anyway.

- For collections with a unique key in addition to the _id primary key:
   - If the conflict is on the _id, the document must be replaced while keeping 
the _id (same case as if there was a primary key only)
   - If the conflict is on a key other than _id, the operation must fail with an 
error, to preserve the immutability of the document _id

Summary
-------

The following table hashes out the expected action for different combinations of 
duplicate primary key (PK) and duplicate UNIQUE secondary key cases when there 
are 0, 1 and 2+ unique key conflicts.

Values mean:
-   No key
O   OK, no conflict, the value is unique
X   Conflict
Y   Conflict with different rows (eg PK matches a row and SK matches another)

PK   SK   Note                      Action
--------------------------------------------------------------
O    -    No matches                Normal INSERT
O    O    No matches, SK OK         Normal INSERT
O    X    SK matches, PK OK         UPDATE on the matching SK, maintain old PK
X    -    PK matches                UPDATE on the matching PK
X    O    PK matches, SK OK         UPDATE on the matching PK
X    X    PK and SK match same row  UPDATE on the matching PK
X    Y    PK and SK match diff rows Unique key error

If there are multiple secondary keys, the case where they all match the same row 
or none at all is treated the same as having a single secondary key match (X X). 
If the secondary keys match different rows, it is treated as matches on 
different rows (X Y).


Protocol
========

diff --git a/rapid/plugin/x/protocol/mysqlx_crud.proto 
b/rapid/plugin/x/protocol/mysqlx_crud.proto
index e046f2c..b5c4992 100644
--- a/rapid/plugin/x/protocol/mysqlx_crud.proto
+++ b/rapid/plugin/x/protocol/mysqlx_crud.proto
@@ -137,6 +137,7 @@ message Find {
 // :param projection: name of the columns to insert data into (empty if 
data_model is DOCUMENT)
 // :param row: set of rows to insert into the collection/table (a single 
expression with a JSON document literal or an OBJECT expression)
 // :param args: values for parameters used in row expressions
+// :param upsert: true if this should be treated as an Upsert (that is, update 
on 
duplicate key)
 // :Returns: :protobuf:msg:`Mysqlx.Resultset::`
 message Insert {
   required Collection collection = 1;
@@ -148,7 +149,8 @@ message Insert {
     repeated Mysqlx.Expr.Expr field = 1;
   };
   repeated TypedRow row = 4;
   repeated Mysqlx.Datatypes.Scalar args = 5;
+  optional bool upsert = 6;
 };
 
 // Update documents/rows in a collection/table


SQL Mapping
===========

If the upsert flag is true, the INSERT statement shall generate.

INSERT INTO collection (doc) VALUES (val1)
   ON DUPLICATE KEY 
   UPDATE doc = JSON_SET(VALUES(doc), '$._id', JSON_EXTRACT(doc, '$._id'));


For Tables, an error should be generated in case a upsert is requested.


Compatibility
=============

Compatibility assurance lies on client side. Clients that use new features (in
this case XMessage - field) must ensure backward compatibility by follow the
procedure described under: X Protocol expectations for supported protobuf fields