WL#11999: Add support for JSON Schema

Affects: Server-8.0   —   Status: Complete

Contents


What is JSON Schema?

JSON Schema is a way to describe the structure of a JSON document. The JSON Schema is a JSON document in itself, and allows you to define required names/attributes, data types etc. As an example, here is an example of a JSON Schema describing a geographic coordinate (taken from json-schema.org):

 {
   "id": "http://json-schema.org/geo",
   "$schema": "http://json-schema.org/draft-06/schema#",
   "description": "A geographical coordinate",
   "type": "object",
   "properties": {
     "latitude": {
       "type": "number"
     },
     "longitude": {
       "type": "number"
     }
   }
 }

The first interesting part is the "type" attribute. It says that the JSON document MUST be an object (start with '{'). Next, it says that the object may have two properties named "latitude" and "longitude". Both of these are required to be a number. So, the following JSON document is valid according to the above JSON Schema:

 {
   "latitude": 63.444697,
   "longitude": 10.445118
 }

However, the following is also valid because neither "latitude" or "longitude" are required:

 {}

In order to make those attributes required, we can add a required attribute to the JSON Schema:

 {
   "id": "http://json-schema.org/geo",
   "$schema": "http://json-schema.org/draft-06/schema#",
   "description": "A geographical coordinate",
   "type": "object",
   "properties": {
     "latitude": {
       "type": "number"
     },
     "longitude": {
       "type": "number"
     }
   }
   "required": ["latitude", "longitude"]
 }

A nice feature would be to have a function that validates a JSON value against a JSON Schema:

 SELECT JSON_SCHEMA_VALID('{
   "id": "http://json-schema.org/geo",
   "$schema": "http://json-schema.org/draft-06/schema#",
   "description": "A geographical coordinate",
   "type": "object",
   "properties": {
     "latitude": {
       "type": "number",
       "minimum": -90,
       "maximum": 90
     },
     "longitude": {
       "type": "number",
       "minimum": -180,
       "maximum": 180
     }
   },
   "required": ["latitude", "longitude"]
 }','{
   "latitude": 63.444697,
   "longitude": 10.445118
 }');
 => Returns "true"
 SELECT JSON_SCHEMA_VALID('{
   "id": "http://json-schema.org/geo",
   "$schema": "http://json-schema.org/draft-06/schema#",
   "description": "A geographical coordinate",
   "type": "object",
   "properties": {
     "latitude": {
       "type": "number",
       "minimum": -90,
       "maximum": 90
     },
     "longitude": {
       "type": "number",
       "minimum": -180,
       "maximum": 180
     }
   },
   "required": ["latitude", "longitude"]
 }','{}');
 => Returns "false" (missing required properties)

rapidjson (which we already use) supports validation of the JSON Schema Draft v4: https://github.com/Tencent/rapidjson/blob/master/doc/schema.md

Other DBMS

As far as we know, MongoDB is the only DBMS that has native support for JSON Schema validation.

MongoDB

Starting from version 3.6, MongoDB also supports JSON Schema (version 4) validation. It's is defined by passing a validator to the createCollection command:

 db.createCollection("coordinates", {
   validator: {
     "$jsonSchema": {
       "bsonType": "object",
       "properties": {
         "latitude": {
           "type": "number",
           "minimum": -90,
           "maximum": 90
         },
         "longitude": {
           "type": "number",
           "minimum": -180,
           "maximum": 180
         }
       },
       "required": [
         "latitude",
         "longitude"
       ]
     }
   }
 })

MongoDB does not support the $ref keyword, which is a pointer to an external JSON Schema (local or remote).

PostgreSQL

There is no native support for JSON schema, but there are extensions that can be used as CHECK constraints (https://github.com/gavinwahl/postgres-json-schema):

 CREATE TABLE example (id serial PRIMARY KEY, data jsonb);
 ALTER TABLE example
   ADD CONSTRAINT data_is_valid CHECK (
     validate_json_schema('{"type": "object"}', data));

The above CHECK constraint enforces that every JSON value stored in the column data MUST be a JSON object. Remote references are not supported.

Possible problems

  • The JSON Schema has a keyword $ref that is a pointer to an external JSON Schema (local or remote). To fully support this, we would need to be able to fetch data from the internet. This may cause security issues, and it might be wise to not support this keyword.

  • For strings it's possible to specify a regex pattern. rapidjson gives us two regex engines to choose between: their internal home-made and std::regex. MySQL however is now using ICU, and it would perhaps be good if we used the same regex engine in both JSON Schema and rest of the server.