WL#10612: X Protocol IN Operator for JSON

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

Related to DevAPI

Currently, it's not possible to express queries that compare against the contents 
of an array or object.

For example, checking whether an array from an object contains an element or one-
of/all-of of the elements in another list is not possible.

The expression syntax must be extended to support such an operator, using the 
JSON_CONTAINS() function.


Sample expressions that must be possible:
=========================================

3 in [1,2,3,4]

doc.field in [1,2,3,4]

4 in doc.array

doc.field in doc.array

json_quote(substr(field, 1, 5)) in doc.array


Sample expressions that are not valid
=====================================

substr(field, 1, 5) in doc.array

Does not produce a JSON value, cannot implicitly cast because string won't produce 
a castable JSON value
It must be possible to use the cont_in and not_cont_in operator to check for 
inclusion of any combination of:

FR1 - JSON literals

FR2 - JSON document paths

FR3 - Arbitrary expressions when explicitly casted to a JSON value

FR4 - An illegal value found in an operand must result in an error 
   FR4.1 - A function call which result is not explicitly converted to a JSON 
value is illegal
   FR4.2 - An expression result which is not explicitly converted to a JSON value 
is illegal
   FR4.3 - A JSON function call from a list of allowed functions that knowingly 
produce a JSON value is legal

FR5 - If an expression appears in a modify() operation, a NULL result produced by 
JSON function errors must not result in a NULL document

FR6 - String encoded JSON values are accepted and converted to JSON, but they must 
be have their content_type set to JSON at the protocol message level
New CONTAINS and NOT_CONTAINS operators to be introduced in any expression valid 
in CRUD operations.
They are different from the IN operator, in that IN works with SQL and requires 
a set of literals at the right side.

CONTAINS should accept JSON values at the left and right sides of the operator, 
including expressions that generate a JSON value. Expressions that do not 
generate valid JSON values shall be still accepted by the plugin, with the 
expectation that they would eventually be rejected during execution by the 
optimizer.


Protocol
========

At the protocol level, a binary operator called "contains" and not "contains" 
shall be introduced, as follows:

--- a/rapid/plugin/x/protocol/mysqlx_expr.proto
+++ b/rapid/plugin/x/protocol/mysqlx_expr.proto
@@ -187,6 +187,8 @@ message FunctionCall {
 //     * ``like``
 //     * ``not_like``
 //     * ``cast``
+//     * ``cont_in``
+//     * ``not_cont_in``
 //
 //   Using special representation, with more than 2 params
 //     * ``in`` (param[0] IN (param[1], param[2], ...))


Example expression sub-message:

  criteria {
    type: OPERATOR
    operator {
      name: "cont_in"
      param {
        type: LITERAL
        literal {
          type: V_OCTETS
          v_octets {value:"\"one\""}
        }
      }
      param {
        type: IDENT
        identifier {
          document_path {
            type: MEMBER
            value: "tags"
          }
        }
      }
    }
  }

collection.find('"one" in tags')


Mapping to SQL
==============

The JSON_CONTAINS() function shall be used as for MySQL 8.

Both operands must be JSON values and the expression generator must perform the 
following mapping, depending on the type of input:

literal string -> JSON_QUOTE()
other literals -> CAST( as JSON)
document_path -> JSON_EXTRACT(document_path)
array -> JSON_ARRAY()
object -> JSON_OBJECT()

Not allowed to appear in a JSON expression without explicit cast:

function_call (except for whitelisted JSON functions)
column
arbitrary expression

White list of JSON functions is defined as the following regular expression:
JSON_(APPEND|ARRAY|ARRAY_APPEND|ARRAY_INSERT|EXTRACT|INSERT|KEYS|MERGE|OBJECT|QU
OTE|REMOVE|REPLACE|SEARCH|SET)