WL#12965: X Protocol: add schema validation to create collection

Affects: Server-8.0   —   Status: Complete

Motivation

User can insert any kind of document into a collection created by X Plugin. Thus it is the users responsibility to validate the content of inserted document.

In case when multiple applications operate on the same collection, all need to implement their own document validation.

Moving the validation from the client to the server allows to:

  • there will be no mismatches in patters on different clients, only "server" is going to be responsible for it,
  • detect inconsistencies in the dataset
  • describe/document the schema of the collection

In case of: bulk insert, prototyping, buggy json-schema, user might want to disable the validation and re-enable it again.

Goal

X Plugin's "createCollection()" functionality should allow to specify a JSON Schema for a collection.

WL#11999 introduces JSON_SCHEMA_VALID()

<bool> = JSON_SCHEMA_VALID(<json_schema>, <json_document>); # introduced by WL#11999

which will be called by the X Plugin when the user calls

var coll = schema.createCollection("mycollection", {
    validator: validationDoc
})

on the DevAPI.

Functional requirements

F1. User must be able to create a collection with validation schema

a. User must be able to insert a document that is valid according to the validation schema

b. User mustn't be able to insert a document that is not valid according to the validation schema

c. User must be able to update to a valid document according to the validation schema

d. User mustn't be able to update a invalid document according to the validation schema

F2. User should be able to change the validation schema on a collection

a. User must be able to insert a document that is valid according to new validation schema

b. User mustn't be able to insert a document that is not valid according to new validation schema

c. The schema change must fail in case when documents existing in the collection are not valid according to new schema

d. User must be able to update to a valid document according to the new validation schema

e. User mustn't be able to update a invalid document according to the new validation schema

F3. User must be able to enable/disable the validation on a collection, without removing the schema

a. User must be able to insert non-valid document when the validation is disabled

b. User mustn't be able to insert non-valid document when the validation is enabled

c. Enabling validation must fail in case when in the collection there are not valid documents according to the schema

d. User must be able to update to a invalid document according to the validation schema when validation is disabled

e. User mustn't be able to update a invalid document according to the validation schema when validation is enabled

F4. Validation schema provided by the user must be validated using a meta-schema.

Non-functional requirements

N1. In case when user set "validation.level" to "OFF", then there must be no performance degradation visible

N2. The performance degradation must be small as possible in case when the validation is enabled

The feature doesn't have direct impact on X Protocol, rather on admin-commands which are application level feature. "create_collection" admin-command should be extended with support of json_schema validation and option that allows enabling or disabling the validation. "ensure_collection" does the same thing as "create_collection", thus it should be extended in the same way. User must be able to change or query it the validation schema. There are no admin-commands that can be used for it, thus following new admin-commands are needed: "modify_collection_options", "get_collection_options".

Protocol

create_collection admin command

"create_collection" command can be executed through namespace "mysqlx". It passes the arguments in "Mysqlx.Datatypes.Object" as JSON like object, which makes the things easier for serializing and parsing.

Options added to "create_collection" should be grouped as "options" field.

All json-schema related options must be group into "validation" field (under "options") and must consist of following fields:

  • "schema" - json-schema may be encoded as protobuf string or by "Mysqlx.Datatypes.Object". This field is required.
  • "level" - enables or disabled validation. Possible values: "strict", "off". Default value for this field is "strict".

The object under "validation" filed must be constructed as following:

Mysqlx.Sql.StmtExecute {
  stmt: "create_collection"
  args {
    type: OBJECT obj {
      fld {
        key:'schema'
        value{type:SCALAR scalar{type:V_STRING v_string{value:'db-schema'}}}
        }
      fld {
        key:'name'
        value{type:SCALAR scalar{type:V_STRING v_string{value:'table-name'}}}
      }
      fld {
        key:'options'
        value{
          type:OBJECT
          obj{
            fld {
              key:'validation'
              value {
                type:OBJECT
                obj{
                  fld {
                    key:'schema'
                    value{type:SCALAR scalar{
                      type:V_STRING
                      v_string{value:'...json-schema...'}}}
                  }
                  fld {
                    key:'level'
                    value{type:SCALAR scalar{
                      type:V_STRING
                      v_string{value:'strict'}}}
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  namespace: "mysqlx"
}

or the schema may be encoded as protobuf Mysqlx.Datatype.Object:

Mysqlx.Sql.StmtExecute {
  stmt: "create_collection"
  args {
    type: OBJECT obj {
      fld {
        key:'schema'
        value{type:SCALAR scalar{type:V_STRING v_string{value:'xtest'}}}
        }
      fld {
        key:'name'
        value{type:SCALAR scalar{type:V_STRING v_string{value:'\000'}}}
      }
      fld {
        key:'options'
        value{
          type:OBJECT
          obj{
            fld {
              key:'validation'
              value{
                type:OBJECT
                obj{
                  fld {
                    key:'schema'
                    value{type:OBJECT
                      obj{
                        ... json-schema fields ...
                      }
                    }
                  }
                  fld {
                    key:'level'
                    value{type:SCALAR scalar{
                      type:V_STRING
                      v_string{value:'strict'}}}
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  namespace: "mysqlx"
}

SQL mapping for "create_collection"

In case when "create_collection" was executed without "options" or "validation" filed, following SQL must be executed:

CREATE TABLE dbschema.table_name
  (doc JSON,
   _id VARBINARY(32) GENERATED ALWAYS AS
      JSON_UNQUOTE(JSON_EXTRACT(doc, '$._id'))) STORED PRIMARY KEY
   _json_schema JSON GENERATED ALWAYS AS
      '{"type":"object"}',
   CONSTRAINT %CONSTRAIN_NAME% CHECK
     (JSON_SCHEMA_VALID(_json_schema, doc)) NOT ENFORCED
  ) CHARSET utf8mb4 ENGINE=InnoDB;

In case when the "validation" was present, the default value of "jsonschema" column changes from '{"type":"object"}' to the value specified in "validation.schema" field. The "level" field if it was set to "strict", then the check constrain changes from "NOT ENFORCED" to "ENFORCED". The name of the constrain must be unique inside a database, thus the %CONSTRAIN_NAME% string must be build like following:

  1. start with dollar character, like all internal X Plugin sql identifiers
  2. followed by "val_strict_"
  3. followed by hashed table name (using same hash algorithm as in column names for indexed fields)

In case of collection with name "test_coll", the constrain name should be following::

$val_strict_889D1D09A01D7029B101675FBE738D1897A37F32

ensure_collection admin command

This admin command must fulfill the same requirements as create_collection, this means that it needs to implement "validation" option and generate similar SQL in that case.

The main difference between the create_collection and ensure_collection commands is handling already existing collections. If there already exist a collection with a given name then ensure_collection should update the validation schema for that collection. If the new collection from the request does not contain an optional options field with a "schema" field then the default schema '{"type":"object"}' should be set.

modify_collection_options admin command

The "modify_collection_options" admin-command must be must accept 'Mysqlx.Datatypes.Object' with following fields:

  • "schema" - defines the database schema in which the collection exists. This field is required.
  • "name" - defines the collection name. This field is required.
  • "options" - one of options that can be changed by the user, currently only "validation" option is available which its defined the same was as "validation" in "create_collection", still with the difference that all its fields are optional. This field is required, "validation" is optional.

User must specify at least one option for modification.

SQL mapping for "modify_collection_options"

In case that modification fails because of missing "jsonschema" column or missing "$val_strict_..." constrain, then X Plugin must assume that the collection was created on old version (not supporting validation) of the plugin and it must update the collection with following:

 ALTER TABLE schema.name
    ADD COLUMN _json_schema JSON GENERATED ALWAYS AS
      ('..users schema...'|'{"type":"object"}') VIRTUAL,
    ADD CONSTRAINT $val_strict_... CHECK
      (JSON_SCHEMA_VALID(_json_schema, doc)) [NOT] ENFORCED;

Where the "$val_strict_..." is set to "enforced" only in case when user set "strict" in "validation.level".

The successful modification should go according to following schema:

  1. "validation" has only "schema":

    ALTER TABLE schema.name MODIFY COLUMN jsonschema JSON GENERATED ALWAYS AS ('..users schema...') VIRTUAL;

  2. "validation" has only "level" set to "strict":

    ALTER TABLE schema.name ALTER CHECK $val_strict_... ENFORCED;

  3. "validation" has only "level" set to "off":

    ALTER TABLE schema.name ALTER CHECK $val_strict_... NOT ENFORCED;

  4. "validation" has "schema" set and "level" set to "off":

    ALTER TABLE schema.name ALTER CHECK $valstrict_... NOT ENFORCED, MODIFY COLUMN jsonschema JSON GENERATED ALWAYS AS ('..users schema...') VIRTUAL;

  5. "validation" has "schema" set and "level" set to "strict":

    ALTER TABLE schema.name MODIFY COLUMN jsonschema JSON GENERATED ALWAYS AS ('..users schema...') VIRTUAL, ALTER CHECK $val_strict_... ENFORCED;

  6. "validation" is empty (missing fields: "schema", "level"):

    return ER_X_CMD_ARGUMENT_OBJECT_EMPTY with following error message: "Arguments value used under \"validation\", must be an object with at last one field"

get_collection_options admin command

The "get_collection_options" admin-command must accept 'Mysqlx.Datatypes.Object' with following fields:

  • "schema" - defines the database schema in which the collection exists. This field is required.
  • "name" - defines the collection name. This field is required.
  • "options" - an array with options that client would like to see. This worklog must introduce "validation" option.

Flow on X Protocol layer, must be following:

client->server: StmtExecute(stmt="create_collection", ns="mysqlx"...)
client<-server: ColumnMetaData(name="Result", type=STRING)
client<-server: Row(string="...json object...")
client<-server: FetchDone
client<-server: StmtExecuteOk

Where the resultset contains one column and one row, in which the field is formated as json object where each option is encoded as field(in root document or sub-document):

{
  "validation": {
    "level": "on",
    "schema": {
       "type": "object"
    }
  }
}

SQL mapping for "get_collection_options"

To return "validation" object, X Plugin must build it using following queries

  • validation.schema SELECT GENERATION_EXPRESSION FROM COLUMNS WHERE TABLE_SCHEMA="%db_name%" AND TABLE_NAME="%table_name%" AND COLUMN_NAME="jsonschema";

    The result of this query is json object escaped to place it in some sort of string type. There is additional processing needed to get the json from it. It may be parsed by X Plugin or by added some sort of conversion UDF. If the query doesn't return an row, then it must be treated that collection has following schema: '' or '{"type":"object"}'

  • validation.level SELECT IF(count(*);"strict";"off") from information_schema.TABLE_CONSTRAINTS WHERE TABLE_SCHEMA=%db_name% and TABLE_NAME=%table_name% and CONSTRAINT_NAME = "$val_strict_..." and ENFORCED="YES";

Both queries could be combined into a single query and the result may be streamed directly to the user.

Access to information schema should be possible for any user, still the content of its tables changes depending on user permissions. In case when:

  • user doesn't have access to the collection
  • user doesn't have access to the database
  • the collection doesn't exist
  • the database doesn't exist

those queries will return an empty resultset. Assuming that user must have at least "SELECT" privilege to the collection, to check that X Plugin can execute following query:

SELECT 1 FROM schema.collection LIMIT 1;

which may return user friendly error in all those cases when query to information_schema returns an empty resultset.

Meta-schema validation

When executing a command that requires the user to provide a validation schema ("create_collection", "ensure_collection" and possibly a "modify_collection_options" command) the user provided validation schema must be validated with a meta-schema to prevent using an invalid schema. If the user provided schema is classified as invalid then the command processing must fail with an ER_X_INVALID_VALIDATION_SCHEMA error.

Meta-schema should be hard-coded, without a possiblity for modification. Meta-schema can be found at: http://json-schema.org/draft-04/schema

To prevent typos and using an invalid fields in validation schema there should exist an additional restraint in the meta-schema by setting a top level "additionalProperties" property to false.

Backward compatibility

Collections created before this worklog was introduced have no notion of a validation schema. It may have the following effects:

  1. For create_collection when there exists old-style collection with the same name no action should be taken and ER_TABLE_EXISTS_ERROR should be returned.
  2. For ensure_collection when there exists an old-style collection with the same name it should be updated by adding a validation schema (if user has not provided one then the default '{"type":"object"}' schema should be used).
  3. For get_collection_options called for an old-style collection it should proceed as if the given collection was created with a default schema.
  4. For modify_collection_options called for an old-style collection there are few different cases: a) Modification that only changes the validation level should set the default validation schema and requested validation level. b) Modification that only changes the validation schema should set the requested validation schema with a default validation level. c) Modification that changes both validation level and schema should set both values accordingly. In case when the new validation schema would invalidate the existing data no action should be taken and collection should not be modified.

Instrumentation

  • Mysqlx_stmt_modify_collection_options - status variable counting the number of "modify_collection_options" admin-commands received.
Property Value
Name Mysqlx_stmt_modify_collection_options
Type LONGLONG
Scope SESSION
Default 0

* Mysqlx_stmt_get_collection_options - status variable counting the number of "get_collection_options" admin-commands received.

Property Value
Name Mysqlx_stmt_get_collection_options
Type LONGLONG
Scope SESSION
Default 0

Error and warnings

In case when creation|modification collection fails because of "validation" options, then those admin commands must forward errors generated by JSON_SCHEMA_VALID to client.

  • When insertion or modification of a document fails because of validation, server returns ER_CHECK_CONSTRAINT_VIOLATED with error message containing "$val_strict_...", then following error should be returned:
Property Value
Error code ER_X_DOCUMENT_DOESNT_MATCH_EXPECTED_SCHEMA
Error text Document is not valid, according to the schema assigned to
collection
New error code Yes
  • When an unknown options was used in "modify_collection_options" or in "get_collection_options", "create_collection" then following error must be returned:
Property Value
Error code ER_X_COLLECTION_OPTION_DOESNT_EXISTS
Error text Requested collection option '%s', doesn't exists.
New error code Yes

* Allowed values for "validation_level" field are defined in this design, thus following error must be return when user set some other value:

Property Value
Error code ER_X_CMD_ARGUMENT_VALUE
Error text Argument value '%s' for validation_level is invalid
New error code No

* "schema" argument of commands "create_collection", "modify_collection_options" and "get_collection_options" is required, if its invalid (for example empty), following error must be returned:

Property Value
Error code ER_X_BAD_SCHEMA
Error text Invalid schema
New error code No

* "collection" argument of commands "create_collection", "modify_collection_options" and "get_collection_options" is required, if its invalid (for example empty), following error must be returned:

Property Value
Error code ER_X_BAD_TABLE
Error text Invalid collection name
New error code No

* At least one option or sub-option must be changed by "modify_collection_options", if use didn't specified what should be change then following error must be returned:

Property Value
Error code ER_X_CMD_ARGUMENT_OBJECT_EMPTY
Error text Arguments value used under "%s", must be an object with at
last one field
Param 1 Field name, in this case "validation"
New error code No

* When the user provided schema is not a valid JSON then the following error must be returned:

Property Value
Error code ER_X_INVALID_VALIDATION_SCHEMA
Error text Validation schema is not a valid JSON
New error code Yes

* When the user provided schema did not pass the meta-schema validation then the following error must be returned:

Property Value
Error code ER_X_INVALID_VALIDATION_SCHEMA
Error text JSON validation schema location %s failed requirement: '%s'
at meta schema location '%s'
Param 1 Location of invalid field in the user provided schema
Param 2 Failed requirement name taken from the meta-schema
Param 3 Failed requirement position in the meta-schema
New error code Yes

* Unsupported argument for any command must return the following error:

Property Value
Error code ER_X_CMD_INVALID_ARGUMENT
Error text '%s' is not a valid field for %s command
Param 1 Unsupported argument name
Param 2 Command name
New error code Yes