WL#13005: Implement JSON_SCHEMA_VALIDATION_REPORT
WL#11999 implemented the function JSON_SCHEMA_VALID that makes it possible to validate a JSON document against a JSON Schema. This worklog is about implementing JSON_SCHEMA_VALIDATION_REPORT(<json schema>, <json doc>) that prints out a structured JSON object giving a more detailed report of the JSON Schema validation
Example usage:
mysql> SELECT JSON_PRETTY(JSON_SCHEMA_VALIDATION_REPORT('{ '> "type": "array" '> }','["test"]' -> )) AS result; +---------------------+ | result | +---------------------+ | { "valid": true } | +---------------------+ 1 row in set (0.00 sec) mysql> SELECT JSON_PRETTY(JSON_SCHEMA_VALIDATION_REPORT('{ '> "type": "object" '> }','["test"]' -> )) AS result; +------------------------------------------------------------------------------- | result +------------------------------------------------------------------------------- | { "valid": false, "reason": "The JSON document location '#' failed requirement 'type' at JSON Schema location '#'", "schema-location": "#", "document-location": "#", "schema-failed-keyword": "type" } | +------------------------------------------------------------------------------- 1 row in set (0.00 sec)
F-1: The server MUST accept the new function <json> = JSON_SCHEMA_VALIDATION_REPORT(<json>, <json>)
F-2: The first argument to JSON_SCHEMA_VALIDATION_REPORT MUST be the JSON Schema definition, and the second argument MUST be the JSON document the user wants to validate.
F-3: The function JSON_SCHEMA_VALIDATION_REPORT MUST return SQL NULL if any of its arguments are SQL NULL.
F-4: If the first argument given to JSON_SCHEMA_VALIDATION_REPORT isn't a valid JSON or a string that can be parsed as valid JSON value, the server MUST return the error message ER_INVALID_TYPE_FOR_JSON
F-5: If the second argument given to JSON_SCHEMA_VALIDATION_REPORT isn't a valid JSON or a string that can be parsed as valid JSON value, the server MUST return the error message ER_INVALID_TYPE_FOR_JSON
F-6: If the provided JSON Schema contains a pointer to an external JSON Schema (local or remote), the server MUST return the error ER_NOT_SUPPORTED_YET.
F-7: If the first argument is a valid JSON/parsable as JSON, but not a JSON object, the server MUST return the error ER_INVALID_JSON_TYPE
F-8: If the JSON document is considered valid according to the JSON Schema, the function MUST return a JSON object with a single property: { "valid": true }
F-9: If the JSON document is not considered valid according to the JSON Schema, the function MUST return a JSON object with five properties: "valid", "reason", "schema-location", "document-location" and "schema-failed-keyword"
Contents |
Return value
We choose to return a structured output to that external tools can provide more error information to the user. This section describes the format of the structured output.
Valid JSON document
If the JSON document is considered valid according to the JSON Schema, the function will return a JSON object with one property, "valid", with the value "true":
mysql> SELECT JSON_PRETTY(JSON_SCHEMA_VALIDATION_REPORT('{ '> "type": "array" '> }','["test"]' -> )) AS result; +---------------------+ | result | +---------------------+ | { "valid": true } | +---------------------+ 1 row in set (0.00 sec)
Invalid JSON document
If the JSON document is considered invalid, the returned JSON object will contain five properties; "valid", "reason", "schema-location", "document- location" and "schema-failed-keyword":
-
"valid" tells whether or not the JSON document was considered valid
-
"reason" gives out a human readable string saying where the validation failed
-
"schema-location" is a JSON pointer URI fragment (see below) that points to where in the JSON Schema the validation failed
-
"document-location" is a JSON pointer URI fragment that points to where in the JSON document the validation failed
-
"schema-failed-keyword" is a string that says which keyword/property in the JSON Schema that was violated
mysql> SELECT JSON_PRETTY(JSON_SCHEMA_VALIDATION_REPORT('{ '> "type": "object" '> }','["test"]' -> )) AS result; +------------------------------------------------------------------------------- | result +------------------------------------------------------------------------------- | { "valid": false, "reason": "The JSON document location '#' failed requirement 'type' at JSON Schema location '#'", "schema-location": "#", "document-location": "#", "schema-fail-keyword": "type" } | +------------------------------------------------------------------------------- 1 row in set (0.00 sec)
JSON Pointer
"JSON Pointer defines a string syntax for identifying a specific value within a JavaScript Object Notation (JSON) document." [1]. It is to JSON what XPath is to XML. "URI Fragment" is one way to encode the JSON Pointer, and the most basic JSON Pointer is "#". This means the root of the document. "#/prop_1" is the property "prop_1", located in the root element:
{ "prop_1": 123 <--- This property }
Note that this is different from JSON path that other functions in MySQL uses (JSON_EXTRACT('$.prop_1', json_arg) etc.). We have chosen to print out the JSON pointer since that's what rapidjson is giving us. If we see that there is a need for the JSON path as well, we will consider adding that later.
[1] https://tools.ietf.org/html/rfc6901
regex patterns and external JSON schema
As with WL#11999, we rely on rapidjson on doing the JSON Schema validation as well as error reporting. There are a few things we can adjust in rapidjson, and those are 1) the regex engine we want to use, and 2) how to handle external JSON schemas. We have chosen to use std::regex as the regex engine, and we return error in case of any pointers to external JSON schemas (see WL#11999 for a reasoning around these desicions).
New Item_func_json_schema_validation_report
We will implement one new Item, "Item_func_json_schema_validation_report" that will handle the new function. This Item will inherit from Item_json_func since it returns a JSON object.
Most of the code in Item_func_json_valid will be factored out so it can be reused by this new Item, since they share almost all functionality and behavior.
New class Json_schema_validation_report
WL#11999 implemented a framework for validating a JSON document against a JSON Schema, and the most important function is:
bool Json_schema_validator::is_valid_json_schema(const char *document_str, size_t document_length, const char *function_name, bool *is_valid) const;
In order to support JSON_SCHEMA_VALIDATION_REPORT, we extend this function with one extra output argument, "report":
bool Json_schema_validator::is_valid_json_schema( const char *document_str, size_t document_length, const char *function_name, bool *is_valid, Json_schema_validation_report *report) const {
The class Json_schema_validation_report has the following definition:
class Json_schema_validation_report { public: const std::string human_readable_reason() const; const LEX_CSTRING schema_location() const; const LEX_CSTRING schema_failed_keyword() const; const LEX_CSTRING document_location() const; };
If the new output argument is nullptr, the function will behave as before. If the new output argument is not nullptr AND the JSON document is found invalid according to the JSON Schema, the Json_schema_validation_report will be populated with error information from rapidjson. This will keep JSON_SCHEMA_VALID untouched in terms of performance, and will provide the same code path for both JSON_SCHEMA_VALID and JSON_SCHEMA_VALIDATION_REPORT.