WL#10604: Create multi-value index
Motivation
Support for evaluating expressions involving document fields that are arrays of multiple values, with the possibility of creating indexes on them.
Goals
- Ensuring 'cont_in' and 'not_cont_in' operators uses indexes.
- Introducing a new function and/or operator that map to the new JSON_OVERLAPS() function (e.g. X overlaps Y).
- Introducing a new kind of indexed field: "array".
- Extending the index type with a CHAR(n) type for homogeneity.
Examples:
{"emails":["foo@bar.com", "info@bar.com"]}
"foo@bar.com" in $.emails --> true
"bla" in $.emails --> false
"bla" not in $.emails --> true
["foo@bar.com", "info@bla.com"] overlaps $.emails --> true
Indexing arrays require the type of the contents to be specified at the time of the index creation, besides specification that the index is on an array.
Example:
coll.createIndex("emails_idx", {fields: [{"field": "$.emails", "type":"CHAR(128)", "array": true}]});
Limitations
- All array values must be of the same type and the type must be specified at the time the index is created
- Only arrays of numeric, date and string values are allowed. Nested objects or arrays are not.
- The index must always accept an empty array field. There is no possibility to create the NOT NULL index.
Index Creation
FR1 - MUST be able to create a plain index on an array field of a collection
FR2 - MUST fail if not allowed type is requested for an array field have
to be cast. Allowed array content types must be one of (derived from WL8955):
- BINARY[(N)]
- CHAR[(N)] [charset_info]
- DATE
- DATETIME
- TIME
- DECIMAL[(M[,D])]
- SIGNED [INTEGER]
- UNSIGNED [INTEGER]
FR3 - MUST be able to omit 'required' option for a scalar field; default is false
FR4 - MUST fail if 'required'=true option is requested for an array field
FR5 - MUST be able to create plain index on scalar field casted to
'CHAR[(N)] [charset_info]' (e.g. CHAR(20) CHARACTER SET latin1 COLLATE latin1_general_cs).
This type is not supported by plain index yet.
FR6 - MUST be able to create plain index on scalar field casted to 'TEXT[(N)] [charset_info]' (e.g. TEXT(20) CHARACTER SET latin1 COLLATE latin1_general_cs). Charset info for TEXT type is not supported by plain index yet.
FR7 - MUST be able to use 'CHAR(64)' type as default for a plain index for array fields
FR8 - MUST be able to create a multi-field plain index based on array fields or a mix of the array and scalar fields.
Array Operations
In addition to the existing 'cont_in' and 'not_cont_in' operators, a 'overlaps' and 'not_overlaps' operators are added.
FR9 - Must be allowed to use JSON_OVERLAPS() function in expressions for arrays and/or objects
FR10 - A new operator that performs an "overlap" check must be added, which should map to the JSON_OVERLAPS() function
FR11 - The 'overlaps' operator must take two operands and evaluate to true if the 1st contains one or more values that are also contained in the 2nd. Need to use the same operands limitation as 'cont_in' operator use.
FR12 - Same as FR10, but negated ('not_overlaps')
Notes:
- The JSON_OVERLAPS() function as described in WL8955 allows operating on Objects and Arrays, but indexing is only possible on Arrays.
Index Usage
Assuming the existence of a suitable plain index on an array field of a collection:
FR13 - The 'cont_in' and 'not_cont_in' operators must use that index if the 2nd operand refers to a document fields
FR14 - The 'overlaps' and 'not_overlaps' operators must use the index if any of the operands refer to a document field
Protocol
The 'create_collection_index' admin command of the mysqlx namespace is used to create plain indexes for collections. It takes the following list of arguments, encoded in the args field of the message:
- schema: string
- collection: string
- name: string
- unique: bool
- type: sting [optional, default:INDEX]
- fields[]:
- field: string
- type: string [optional, default:TEXT(64)]
- required: bool
This WL shall extend the field information sub-object, to allow the use of a new optional boolean attribute named "array", indicating that the field is to be treated as an array of values, make the attribute "required" as an optional and change default value of the attribute "type".
- schema: string
- collection: string
- name: string
- unique: bool
- type: sting [optional, default:INDEX]
- fields[]:
- field: string
- type: string [optional, default:TEXT(64); CHAR(64) for 'array'=true]
- required: bool [optional, default:false]
- array: bool [optional, default:false]
SQL Mapping
Plain indexing on array fields is possible to operate directly on document paths. There is not necessary to generate any extra virtual/stored column to a collection table.
The creation of the index is achieved by altering an existing table by construction:
ALTER TABLE doc
,
Example: for admin command:
create_collection_index {schema:'foo', collection:'bar', name:'emails_idx', unique:true, fields:{ field:'$.emails', type:'CHAR(128)', array:true}}
the SQL query should be generated as follows:
ALTER TABLE foo
.bar
ADD UNIQUE INDEX emails_idx
((CAST(JSON_EXTRACT(doc
,'$.emails') AS CHAR(128) ARRAY)));
Error and Warnings
Property | Value |
---|---|
Error code | ER_X_CMD_ARGUMENT_VALUE |
Error text | Invalid or unsupported type specification for array index '%s' |
New error code | No |
Property | Value |
---|---|
Error code | ER_X_CMD_ARGUMENT_VALUE |
Error text | Unsupported argument specification for '%s' |
New error code | No |
Property | Value |
---|---|
Error code | ER_X_EXPR_BAD_NUM_ARGS |
Error text | OVERLAPS expression requires two parameters |
New error code | No |
Property | Value |
---|---|
Error code | ER_X_EXPR_BAD_NUM_ARGS |
Error text | NOT_OVERLAPS expression requires two parameters |
New error code | No |