WL#10604: Create multi-value index

Affects: Server-8.0   —   Status: Complete

Motivation

Support for evaluating expressions involving document fields that are arrays of multiple values, with the possibility of creating indexes on them.

Goals

  • Ensuring 'cont_in' and 'not_cont_in' operators uses indexes.
  • Introducing a new function and/or operator that map to the new JSON_OVERLAPS() function (e.g. X overlaps Y).
  • Introducing a new kind of indexed field: "array".
  • Extending the index type with a CHAR(n) type for homogeneity.

Examples:

{"emails":["foo@bar.com", "info@bar.com"]}

"foo@bar.com" in $.emails --> true
"bla" in $.emails --> false
"bla" not in $.emails --> true

["foo@bar.com", "info@bla.com"] overlaps $.emails --> true

Indexing arrays require the type of the contents to be specified at the time of the index creation, besides specification that the index is on an array.

Example:

coll.createIndex("emails_idx", {fields: [{"field": "$.emails", "type":"CHAR(128)", "array": true}]});

Limitations

  • All array values must be of the same type and the type must be specified at the time the index is created
  • Only arrays of numeric, date and string values are allowed. Nested objects or arrays are not.
  • The index must always accept an empty array field. There is no possibility to create the NOT NULL index.

Index Creation

FR1 - MUST be able to create a plain index on an array field of a collection

FR2 - MUST fail if not allowed type is requested for an array field have to be cast. Allowed array content types must be one of (derived from WL8955): - BINARY[(N)] - CHAR[(N)] [charset_info] - DATE - DATETIME - TIME - DECIMAL[(M[,D])] - SIGNED [INTEGER] - UNSIGNED [INTEGER]

FR3 - MUST be able to omit 'required' option for a scalar field; default is false

FR4 - MUST fail if 'required'=true option is requested for an array field

FR5 - MUST be able to create plain index on scalar field casted to 'CHAR[(N)] [charset_info]' (e.g. CHAR(20) CHARACTER SET latin1 COLLATE latin1_general_cs). This type is not supported by plain index yet.

FR6 - MUST be able to create plain index on scalar field casted to 'TEXT[(N)] [charset_info]' (e.g. TEXT(20) CHARACTER SET latin1 COLLATE latin1_general_cs). Charset info for TEXT type is not supported by plain index yet.

FR7 - MUST be able to use 'CHAR(64)' type as default for a plain index for array fields

FR8 - MUST be able to create a multi-field plain index based on array fields or a mix of the array and scalar fields.

Array Operations

In addition to the existing 'cont_in' and 'not_cont_in' operators, a 'overlaps' and 'not_overlaps' operators are added.

FR9 - Must be allowed to use JSON_OVERLAPS() function in expressions for arrays and/or objects

FR10 - A new operator that performs an "overlap" check must be added, which should map to the JSON_OVERLAPS() function

FR11 - The 'overlaps' operator must take two operands and evaluate to true if the 1st contains one or more values that are also contained in the 2nd. Need to use the same operands limitation as 'cont_in' operator use.

FR12 - Same as FR10, but negated ('not_overlaps')

Notes:

  • The JSON_OVERLAPS() function as described in WL8955 allows operating on Objects and Arrays, but indexing is only possible on Arrays.

Index Usage

Assuming the existence of a suitable plain index on an array field of a collection:

FR13 - The 'cont_in' and 'not_cont_in' operators must use that index if the 2nd operand refers to a document fields

FR14 - The 'overlaps' and 'not_overlaps' operators must use the index if any of the operands refer to a document field

Protocol

The 'create_collection_index' admin command of the mysqlx namespace is used to create plain indexes for collections. It takes the following list of arguments, encoded in the args field of the message:

  • schema: string
  • collection: string
  • name: string
  • unique: bool
  • type: sting [optional, default:INDEX]
  • fields[]:
    • field: string
    • type: string [optional, default:TEXT(64)]
    • required: bool

This WL shall extend the field information sub-object, to allow the use of a new optional boolean attribute named "array", indicating that the field is to be treated as an array of values, make the attribute "required" as an optional and change default value of the attribute "type".

  • schema: string
  • collection: string
  • name: string
  • unique: bool
  • type: sting [optional, default:INDEX]
  • fields[]:
    • field: string
    • type: string [optional, default:TEXT(64); CHAR(64) for 'array'=true]
    • required: bool [optional, default:false]
    • array: bool [optional, default:false]

SQL Mapping

Plain indexing on array fields is possible to operate directly on document paths. There is not necessary to generate any extra virtual/stored column to a collection table.

The creation of the index is achieved by altering an existing table by construction:

ALTER TABLE ADD INDEX ((CAST(JSON_EXTRACT(doc,) AS ARRAY)));

Example: for admin command:

create_collection_index {schema:'foo', collection:'bar', name:'emails_idx', unique:true, fields:{ field:'$.emails', type:'CHAR(128)', array:true}}

the SQL query should be generated as follows:

ALTER TABLE foo.bar ADD UNIQUE INDEX emails_idx ((CAST(JSON_EXTRACT(doc,'$.emails') AS CHAR(128) ARRAY)));

Error and Warnings

Property Value
Error code ER_X_CMD_ARGUMENT_VALUE
Error text Invalid or unsupported type specification for array index '%s'
New error code No
Property Value
Error code ER_X_CMD_ARGUMENT_VALUE
Error text Unsupported argument specification for '%s'
New error code No
Property Value
Error code ER_X_EXPR_BAD_NUM_ARGS
Error text OVERLAPS expression requires two parameters
New error code No
Property Value
Error code ER_X_EXPR_BAD_NUM_ARGS
Error text NOT_OVERLAPS expression requires two parameters
New error code No