WL#9807: X Protocol Crud.Insert with upsert
Affects: Server-8.0
—
Status: Complete
The UPSERT command is a very useful operation that allows performing an insert or
update, depending on whether the record already exists or not, in a single step,
atomically.
MySQL supports the ON DUPLICATE KEY UPDATE extension to INSERT, which has this
behaviour, and should be used for implementation.
Limitations
-----------
The intended behaviour is:
* if records exists, replace it with our the values presented, other side create
Building a ON DUPLICATE KEY UPDATE for collections is straight-forward:
INSERT INTO collection VALUES (key, value) ON DUPLICATE KEY
UPDATE SET _doc=value;
as the structure of the underlying table is known (value goes into _doc) and
unique keys are based on generated columns and aren't updated directly.
For UPSERTS into tables the behaviour is different:
INSERT INTO collection VALUES ("a", "b", "c") ON DUPLICATE KEY
UPDATE SET ??="f", ??="b", ??="c";
The column names and keys are not known, unless they come with the query
or they're queried from MySQL. The 2nd introduces the need for
metadata locking, caching and additional queries, which would be a
performance issue.
Therefore this worklog limits the feature to collections only.
FR1 - When a document inserted, if another item already exists with the same unique key value (that is, inserting it would cause a duplicate key error), the document should replace the original duplicate item. FR2 - When an upsert leads to an Update, the primary key of the original item must be preserved FR3 - Upsert operations on tables should be rejected with an Error. FR4 - If an upsert on collections with one or more unique keys in addition to the PK results in multiple matches/duplicate keys on the same row, the operation must succeed. FR5 - If an upsert on collections with one or more unique keys in addition to the PK results in multiple matches/duplicate keys on different rows, the operation must fail with an error.
Notes
=====
- For collections, it is only possible to replace the entire document. Partial
changes/updates are not supported for Upserts at the moment, for the sake of
simplicity.
- Upsert for tables is not supported. The reason is that for upsert to work, we
need to know the structure of the table beforehand, at the time that the SQL
translation is done. For collections, we know the structure, but for tables, the
column names and keys are not known, unless they come with the query or they're
queried from MySQL. The 2nd introduces the need for metadata locking, caching
and additional queries, which would be a performance issue. In the end, the
effort needed to make it work and/or the usability of the feature would
outweight the potential benefits.
When a key conflict occurs during insert, the following behavior will take
place:
- For collections with no additional unique keys other than the primary key, the
entire document will be replaced. In such cases, the _id is the only column that
can conflict, so its value will be preserved anyway.
- For collections with a unique key in addition to the _id primary key:
- If the conflict is on the _id, the document must be replaced while keeping
the _id (same case as if there was a primary key only)
- If the conflict is on a key other than _id, the operation must fail with an
error, to preserve the immutability of the document _id
Summary
-------
The following table hashes out the expected action for different combinations of
duplicate primary key (PK) and duplicate UNIQUE secondary key cases when there
are 0, 1 and 2+ unique key conflicts.
Values mean:
- No key
O OK, no conflict, the value is unique
X Conflict
Y Conflict with different rows (eg PK matches a row and SK matches another)
PK SK Note Action
--------------------------------------------------------------
O - No matches Normal INSERT
O O No matches, SK OK Normal INSERT
O X SK matches, PK OK UPDATE on the matching SK, maintain old PK
X - PK matches UPDATE on the matching PK
X O PK matches, SK OK UPDATE on the matching PK
X X PK and SK match same row UPDATE on the matching PK
X Y PK and SK match diff rows Unique key error
If there are multiple secondary keys, the case where they all match the same row
or none at all is treated the same as having a single secondary key match (X X).
If the secondary keys match different rows, it is treated as matches on
different rows (X Y).
Protocol
========
diff --git a/rapid/plugin/x/protocol/mysqlx_crud.proto
b/rapid/plugin/x/protocol/mysqlx_crud.proto
index e046f2c..b5c4992 100644
--- a/rapid/plugin/x/protocol/mysqlx_crud.proto
+++ b/rapid/plugin/x/protocol/mysqlx_crud.proto
@@ -137,6 +137,7 @@ message Find {
// :param projection: name of the columns to insert data into (empty if
data_model is DOCUMENT)
// :param row: set of rows to insert into the collection/table (a single
expression with a JSON document literal or an OBJECT expression)
// :param args: values for parameters used in row expressions
+// :param upsert: true if this should be treated as an Upsert (that is, update
on
duplicate key)
// :Returns: :protobuf:msg:`Mysqlx.Resultset::`
message Insert {
required Collection collection = 1;
@@ -148,7 +149,8 @@ message Insert {
repeated Mysqlx.Expr.Expr field = 1;
};
repeated TypedRow row = 4;
repeated Mysqlx.Datatypes.Scalar args = 5;
+ optional bool upsert = 6;
};
// Update documents/rows in a collection/table
SQL Mapping
===========
If the upsert flag is true, the INSERT statement shall generate.
INSERT INTO collection (doc) VALUES (val1)
ON DUPLICATE KEY
UPDATE doc = JSON_SET(VALUES(doc), '$._id', JSON_EXTRACT(doc, '$._id'));
For Tables, an error should be generated in case a upsert is requested.
Compatibility
=============
Compatibility assurance lies on client side. Clients that use new features (in
this case XMessage - field) must ensure backward compatibility by follow the
procedure described under: X Protocol expectations for supported protobuf fields
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.