WL#7909: Server side JSON functions

Status: Complete   —   Priority: Medium

Provide a basic set of built-in functions for working with JSON data.

Requirements:

1. Lets users construct JSON data values from other relational data.

2. Lets users extract relational data from JSON data values.

3. Lets users minimally introspect the structure of JSON values and text (validity, length, depth, keys).

4. Works on strings which are utf8mb4 encoded.

5. Performance should be suitable for read-intensive applications.

Non-requirements:

A. May produce surprising results on strings which are not utf8mb4 encoded.

B. There is limited support for decimal values nested inside JSON documents.

C. Performance may not be suitable for write-intensive applications.

Server-side JSON Functions (Functional Spec)

Revision Description Date Authors
17 Various improvements:
  • Clarify that JSON_APPEND() auto-wraps non-arrays just like JSON_SET(), JSON_REPLACE(), and JSON_INSERT().
  • Add a new JSON_ARRAY_INSERT() function for inserting values into the middle of arrays.
July 20, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
16 Various improvements:
  • Fix some formatting errors.
  • Clarify that we raise an error if there are wildcard/ellipsis tokens in the path expression passed to JSON_CONTAINS().
  • Clarify that JSON_TYPE(), JSON_UNQUOTE(), and JSON_QUOTE() return strings which are encoded in the utf8mb4 character set with the utf8mb4_bin collation.
  • Clarify that JSON_CONTAINS_PATH() considers a wildcarded path to be satisfied if at least one expansion of the path is satisfied.
  • Explain how to use UDFs with JSON values.
July 3, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
15 Various improvements:
  • Clarify that JSN_SEARCH() escape characters must be constant expressions.
  • Clarify that JSN_UNQUOTE() raises an exception if a double-quoted string isn't a valid JSON string literal.
  • Change all function names from JSN_*() TO JSON_*().
  • Change the behavior of JSON_EXTRACT(). If the path expression(s) COULD locate multiple matches on SOME document, then the result(s) are wrapped in an array. This is the behavior suggested by Alfredo and Oystein Grovlen. This behavior makes it easier for applications to distinguish between a single match (which is an array value) and an array of multiple matches.
June 8, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
14 Various improvements:
  • Add a section describing how aggregates operate on JSON values.
  • Clarify that string atoms are double-quoted when a JSON value is serialized to a character type.
  • Clarify the definition of keyNames in path expressions.
  • Make JSON_OBJECT() and JSON_ARRAY() behave like their Postgres counterparts.
  • Clarify the behavior of JSON_CONTAINS().
  • Implicitly CAST geometry values to be geojson objects when they are used as ANY_JSON_ATOM arguments. This means that GEOMETRY is no longer an expected return value of JSON_TYPE().
May 27, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
13 Various improvements: April 8, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
12 Various improvements:
  • Allow JSON_UNQUOTE() to operate on JSON values.
  • Apply the general treatment of null arguments (makes the result null) to null keys passed to JSON_OBJECT().
  • Clarify that JSON_VALID() can operate on TEXT or JSON values, not on ALL_JSON values.
  • Change the return type of JSON_SEARCH() to JSON.
  • Clarify the semantics of the ellipsis token (**) in path expressions.
  • Clarify that non-utf8 character encodings can be used, but they are implicitly coerced into utf8mb4.
  • Clarify that the double-quoted strings returned by JSON_SEARCH() are valid path expressions when they are unquoted by using JSON_UNQUOTE().
  • Clarify the matching rules for double-quoted key names in path expressions.
  • Clarify how boolean expressions are treated when used as ANY_JSON_ATOMs.
  • Clarify that JSON_TYPE() returns INTEGER (rather than INT65) for integer types, STRING for ENUM and SET types, and GEOMETRY for spatial types.
  • Change the catch-all type name from OTHER to OPAQUE.
  • Rename JSON_ROWOBJECT to JSON_OBJECT().
  • Add new JSON_CONTAINS() function.
  • Change the allorSome argument of JSON_CONTAINS_PATH() to be like the oneOrAll argument of JSON_SEARCH(), per architectural review.
  • Simplify the syntax of path ellipses, per architectural review.
  • Forbid wildcards and ellipses in path arguments to the JSON_*() functions which update documents, per architectural review.
April 3, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
11 Various improvements: February 24, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
10 More improvements: February 4, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
9 Corrections to 8th draft:
  • Don't allow wildcard or ellipsis tokens in the path expressions of certain JSON_UPDATE() expressions.
  • Clarify that by "utf8", we mean the MySQL utf8 character encoding, with a maximum length of 3 bytes per character.
  • Clarify that more verbose CAST syntax may be useful when CASTing json values to CHAR in sessions/databases whose default character set is NOT utf8.
  • Clarify that path expressions, like json text, must be encoded in utf8.
  • Shorten the function prefix from MYJSON_ to JSON_, as requested by Edwin.
February 2, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
8 Corrections to 7th draft:
  • Added JSON_QUOTE() and JSON_UNQUOTE() functions.
  • Clarified that the character types under consideration are utf8 encoded.
  • Clarified that string values are implicitly CAST to JSON when UPDATE/INSERTing JSON columns. All other values require explicit CASTs. This is the Postgres behavior and we felt that it struck a nice balance between ease-of-use and strong typing.
January 7, 2015 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
7 Corrections to 6th draft:
  • Clarify that member names in path expressions can be ECMAScript identifiers or ECMAScript string literals.
  • Clarify that path expressions don't have to have any path legs.
December 3, 2014 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
6 Corrections to 5th draft, based on review by Alfredo Kojima, Mike Zinner, and the optimizer group.
  • Change the name of JSON_COUNT() to JSON_LENGTH().
  • Give human-readable names to the magic numbers used for the appendInsertReplace argument to JSON_UPDATE(). Support an "append to array" mode.
  • Added new functions: JSON_KEYS() and JSON_TYPE().
  • Add special string (de)serialization behavior for more date/time types.
  • Added the * and ** wildcard characters to path expressions.
  • Allow the _ wildcard character in JSON_SEARCH() strings. Add a search argument to choose between "exists" and "find all" behaviors.
December 1, 2014 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima.
5 Corrections to 4th draft. November 17, 2014 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen
4 Corrections to 3rd draft:
  • Clarify that strings are NOT implicitly CAST to JSON when passed as ANY_JSON_ATOM arguments.
  • Clarify that when a JSON value is CAST to another datatype but the CAST fails, then a NULL is returned and a warning is raised (but not an error).
  • Extend JSON_EXTRACT() to support multiple paths.
  • Clarify how ANY_JSON_ATOMs are serialized to strings.
  • Clarify what situations cause us to treat an ANY_JSON_ATOM as a boolean rather than an int.
  • Clarify that inexact numerics in ANY_JSON are treated as doubles. That is, they may lose precision.
November 14, 2014 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen
3 Corrections to 2nd draft:
  • Add an escape character override argument to JSON_SEARCH().
  • Don't allow JSON values to be stored in BINARY/BLOB columns. Only JSON and character type columns can hold JSON values.
  • Only allow utf8 encoded character strings as JSON documents.
  • Remove special section on DATETIME values because any valid MySQL type can be put in a JSON value.
  • Added a section on CASTs to/from JSON Values.
November 10, 2014 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen
2 Incorporate feedback on rev 1:
  • Overload the term "valid" to refer to well-formed paths as well as well-formed json documents.
  • Expand the path syntax to include column references.
  • Add a section on array wrapping and adjust the descriptions of JSON_UPDATE() and JSON_MERGE() accordingly.
  • Correct typo (JSON_OBJECT should be JSON_OBJECT()).
  • Stipulate that ANY_JSON inputs are always normalized to sensible values.
  • Add regular descriptions of return values and null handling to all functions.
  • Revamp the document based on the definition of well-formed in RFC 7159 rather than RFC 4627.
  • Change the return type of JSON_EXTRACT() to JSON.
November 6, 2014 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen
1 First cut. November 4, 2014 Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen

Overview

Here we describe a number of builtin functions which will be added to MySQL 5.7. These functions should be useful in constructing and examining json documents. This feature is described by worklog WL-7909 (http://wl.no.oracle.com/worklog/Optimizer-Sprint/?tid=7909).

Terms and Conventions

This functional spec refers to the following documents:

For the remainder of this spec, we use the following terms:

  • ANY_JSON - Json documents can be stored on disk in character types, as well as in the new JSON type. The ANY_JSON wildcard refers to any data type which can hold json documents.
  • ANY_JSON_ATOM - JSON values can contain atomic values which correspond to any MySQL type. These SQL types are known collectively as the ANY_JSON_ATOM types. Special coercions apply when JSON values contain binary data and they are serialized to character-typed columns.
  • binary type - The following MySQL datatypes are known collectively as the binary types:
    • BINARY - Fixed length binary data, (255 bytes max).
    • LONGBLOB - Binary data up to (232 - 1) bytes long.
    • MEDIUMBLOB - Binary data up to (224 - 1) bytes long.
    • BLOB - Binary data up to (216 - 1) bytes long.
    • TINYBLOB - A small blob column (255 bytes max).
    • VARBINARY - Variable length binary data, up to (216 - 1) bytes long.
  • utf8 character type - The following MySQL datatypes are known collectively as the utf8 character types. They all involve the 1-4 byte utf8mb4 character encoding (or its proper subsets, ascii and utf8):
    • CHAR - Fixed length character data, using the ascii, utf8, or utf8mb4 character sets.
    • LONGTEXT - Character data up to (232 - 1) bytes long, using the ascii, utf8, or utf8mb4 character sets.
    • MEDIUMTEXT - Character data up to (224 - 1) bytes long, using the ascii, utf8, or utf8mb4 character sets.
    • TEXT - Character data up to (216 - 1) bytes long, using the ascii, utf8, or utf8mb4 character sets.
    • TINYTEXT - A small text column (255 bytes max), using the ascii, utf8, or utf8mb4 character sets.
    • VARCHAR - Variable length character data, using the ascii, utf8, or utf8mb4 character sets.
  • json document - This is what RFC 7159 calls

    "json text": a well-formed string of json tokens.

  • sensible - A sensible json document is one which is valid and which contains no object whose members have the same key names.
  • target release - This is the release which will contain this functionality. The current plan is that this will be 5.7.
  • valid - The draft Standard defines "valid" json text as a character string which is well-formed according to the grammar in RFC 7159. The JSON datatype is by definition "valid". We also use "valid" to refer to well-formed path expressions. In short, valid means well-formed.

Note that JSON strings are case-sensitive. This is described by section 8.3 (String Comparison) of RFC 7159. So, for example, "lastName" and "lastname" identify different members of a JSON object.

In this spec, we adopt the following convention for displaying DATETIME values:

  • internal - When we mean to show an internal representation of a DATETIME, we print it without any quotes and with a ':' rather than a space separating the day part from the seconds part. So, for instance, 2014-11-10:00:00:00.000000.
  • string - When we mean to show a DATETIME as a string, as for instance as the result of a CAST( datetimeCol AS CHAR), then we use double-quotes and use a space to separate the day part from the seconds part. So for instance, "2014-11-04 00:00:00.000000".

Path Syntax

Various functions require a path expression in order to drill down to a specific element in a document. In these paths, the leading $ character represents the document under consideration. Path legs are separated by periods. Cells in arrays are represented by [N], where N is a non-negative integer. Path expressions, like json text, should be encoded using the ascii, utf8, or utf8mb4 character sets. Other character encodings are implicitly coerced to utf8mb4. The full path syntax follows:


pathExpression> ::= scope  [ ( pathLeg )* ]

scope ::= [ columnReference ] dollarSign

columnReference ::= [ [ databaseIdentifier period  ] tableIdentifier period ] columnIdentifier

databaseIdentifier ::= sqlIdentifier

tableIdentifier ::= sqlIdentifier

columnIdentifier ::= sqlIdentifier

pathLeg ::= member | arrayLocation | doubleAsterisk

member ::= period ( keyName | asterisk )

arrayLocation ::= leftBracket ( non-negative-integer | asterisk ) rightBracket

keyName ::= ECMAScript-identifier | double-quoted-string-literal

doubleAsterisk ::= **


For the functions described in this spec, the scope is the function's document argument. A columnReference is meaningless. For these functions, scope is always "$". An attempt to qualify the scope with a columnReference will raise an error.

A path may not end in **. A path may not contain ***.

The wildcard * and ** tokens have the following meanings:

  • The path leg, .* means "the values of all members in the object".
  • The path leg, [*] means "the values of all cells in the array".
  • The path expression prefix**suffix is an ellipsis meaning "all paths which begin with prefix and end with suffix". So for instance, given the path expression $**.a, the prefix is $, the suffix is .a, so $[3].b[2].c.a would be a qualifying path.

Key names match as follows:

  • unquoted - An unquoted key name in a path expression is exactly matched to member names in documents.
  • quoted - Framing double-quotes are stripped off a key name in a path expression. Then the stripped result is exactly matched to member names in documents. The framing double-quotes allow you to match member names which are not legal ECMAScript identifiers.

Given the following document...

{ "a": [ [ 3, 2 ], [ { "c" : "d" }, 1 ] ], "b": { "c" : 6 }, "one potato": 7, "b.c" : 8 }

...the following paths evaluate like this...

$.a[0] evaluates to [ 3, 2 ]

$.a[0][1] evaluates to 2

$.a[1] evaluates to [ { "c" : "d" }, 1 ]

$.a[1][0] evaluates to { "c" : "d" }

$.a[1][0].c evaluates to "d"

$."one potato" evaluates to 7

$.b.c evaluates to 6

$."b.c" evaluates to 8

And given the following document...

[ 3, { "c" : "d" }, 1 ]

...the following paths evaluate like this...

$[0] evaluates to 3

$[1] evaluates to { "c" : "d" }

$[1].c evaluates to "d"

And given the following document...

{
  "a" : { "b" : "c" },
  "d" : { "b" : "e" },
  "f" : {
          "b" : "g",
          "h" : {
                   "i" : { "j" : "k", "l" : "m" }
                }
        }
}

...the following paths evaluate like this...

$.f**.j evaluates to "k"

$.f**.i.* evaluates to [ "k", "m" ]

And given the following document...

{
  "f" : [
          { "b" : "g", "m" : { "k": "n" } },
          true,
          [ "i", "j", { "k" : "l" } ]
        ]
}

...the following paths evaluate like this...

$.f[2][*].k evaluates to "l"

$.f**.k evaluates to [ "n", "l" ]

Array Wrapping

The Standard defines a convention by which values can be "wrapped" with square brackets in order to convert them into json arrays. This behavior is described at a high level in section 4.46.5 ("Overview of SQL/JSON path language") and in detail in section 9.35 ("SQL/JSON path language: syntax and semantics") GR7. Scalar values are wrapped as single-element arrays in contexts where they must be combined with arrays. If a pathExpression identifies a non-array value, then pathExpression[ 0 ] evaluates to the same value as pathExpression.

Normalization

When a string type is used as an ANY_JSON argument, the string is parsed into a JSON value. An error is raised if the string is not a valid json document. Otherwise, the parsed value is normalized so that it represents a sensible json document. That is, duplicate members are removed. The first (left-most) key/value pair is retained and subsequent members with the same name are thrown away. It is our intention that JSON values are always both valid and sensible. Functions which construct JSON values always construct sensible values.

ANY_JSON

Valid Json documents can be stored on disk in columns whose type can be any of the following:

  • JSON - This is a new datatype introduced by target release. It holds normalized json documents in a structure which can be searched efficiently and incrementally deserialized.
  • utf8 character types - Any of the utf8 character types.

The syntax descriptions below use the pseudo-type ANY_JSON to indicate that an argument's type can be any of the json-bearing datatypes above.

When a utf8 character type is passed as an ANY_JSON argument, it is implicitly CAST into a JSON value. That is, it is parsed as though it were a json document. If it fails to parse, then an error is raised. Other character encodings are implicitly coerced into utf8mb4 when they are used as ANY_JSON arguments.

Warning! When a utf8 character type is CAST into a JSON value, any inexact numerics inside it are treated as DOUBLEs. This means that money values may lose precision.

MySQL supports two other so-called "string" datatypes. Supporting these types will be considered a stretch goal.

  • ENUM - An enumerated type represented internally as a 16 bit integer.
  • SET - A set of values represented internally as a 64 bit integer.

ANY_JSON_ATOM

JSON values can be built up out of any MySQL types, including other JSON-typed values.

An error is raised if an inexact numeric Infinity or NaN is passed as an ANY_JSON_ATOM argument.

When a utf8 character type is passed as an ANY_JSON_ATOM argument, it is NOT implicitly CAST into a JSON value. If the user wants the utf8 character type to be treated as a JSON value, then it must be wrapped with an explicit CAST( value AS JSON ) clause. Other character encodings are implicitly coerced into utf8mb4 before they are explicitly CAST to JSON for use as ANY_JSON_ATOM arguments.

When an integer is passed as an ANY_JSON_ATOM argument, we will try to determine if it represents a boolean. If so, it will be treated as a true or false literal. If not, it will be treated as an integer. The following constructs are treated as booleans:

  • Boolean expressions built from AND, OR, NOT, XOR, and the following operators, predicates, and functions.
  • The true and false literals.
  • Comparison operators: <, <=, >, >=, =, <=>, !=, <>, IS NULL, IS NOT NULL, IS NOT, REGEXP.
  • The coalescing NULLIF operator, when all branches are boolean values.
  • These predicates:
    • between predicate
    • in predicate
    • like predicate
    • quantified comparison predicate
    • exists predicate
  • Boolean-valued functions like:
    • JSON_VALID()
    • JSON_CONTAINS_PATH()
    • GTID_SUBSET()

A JSON value may need to be serialized into a string format. This may happen when storing a JSON value in a utf8 character type column. This may also happen when sending clients the results of ad-hoc queries (as opposed to PREPAREd queries); this is because ad-hoc query results are transported to the client over a text-based protocol, not over the binary protocol used by PREPAREd statements. When a JSON value is serialized into a string format, the following happens to its ANY_JSON_ATOMs:

  • null - Nulls are treated as the json null literal.
  • string - String ANY_JSON_ATOMs are double-quoted.
  • boolean, numeric - These ANY_JSON_ATOMs are converted to string form as though they were CAST to a utf8 character type.
  • DATETIME - DATETIME values are serialized into double-quoted strings of the form "YYYY-MM-DD HH:MM:SS[.fraction]". In the reverse direction, strings of that form are parsed into DATETIME values when they are included in json documents.
  • TIMESTAMP - TIMESTAMP values are serialized into the same string format as DATETIME values. Note, however, that in the reverse direction, those strings are parsed back into DATETIME values. This is because DATETIMEs have a broader range than TIMESTAMPs.
  • DATE - As a stretch goal, DATE will be serialized into double-quoted strings of the form "YYYY-MM-DD". In the reverse direction, strings of that form will be parsed into DATE values when they appear in json documents.
  • TIME - As a stretch goal, TIME values will be serialized into double-quoted strings of the form "HH:MM:SS[.fraction]". In the reverse direction, strings of that form will be parsed into TIME values when they appear in json documents.
  • The expression (a) is treated as the scalar a.
  • An error is raised when a multi-column row like (a, b) is used as a JSON scalar.
  • other - All other ANY_JSON_ATOMs are serialized into double-quoted strings of this form: "base64:typeXX:". Here XX is a type id number and is a base64 encoding of the ANY_JSON_ATOM's binary value.

CASTs to/from JSON Values

JSON values can be explicitly CAST as follows:

other type CAST ( otherType to JSON) CAST (JSON to otherType)
JSON nop nop
utf8 character type The string is parsed into a JSON value. The JSON value is serialized into a string.
other character types Other character encodings are implicitly coerced to utf8mb4 and then treated as described above for utf8 character types. The JSON value is serialized into utf8mb4 and then that string is CAST to the other character encoding. This may result in gibberish.
untyped NULL Results in a NULL value of type JSON -
geometry and subtypes The geometry value is converted into a JSON document by calling ST_AsGeoJSON. Illegal CAST. The workaround is ST_GeomFromGeoJSON( CAST( jsonValue AS CHAR ) )
all other types Results in a JSON document consisting of one, scalar value. Succeeds if the JSON document consists of a single, scalar value of the target type and that scalar value can be CAST to the target type. Otherwise, returns a NULL of the target type and raises a warning (but not an error).

When a JSON value is put into a target value whose type is known, the JSON value is implicitly CAST to the target value using the rules described above.

For example:

CREATE TABLE casts( descr VARCHAR( 20 ), jsonColumn JSON );

INSERT INTO casts VALUES
( 'scalar int', CAST( 1 AS JSON ) ),
( 'scalar datetime', CAST( CURRENT_TIMESTAMP AS JSON ) ),
( 'scalar boolean', CAST( true AS JSON ) ),
( 'scalar bitstring', CAST( b'1010' AS JSON ) ),
( 'scalar string', CAST( '"abc"' AS JSON ) ),
( 'scalar int 2', CAST( '2' AS JSON ) ),
( 'array', CAST( '[ 1, "abc" ]' AS JSON ) )
;

# returns the integer 1
SELECT CAST(  jsonColumn AS UNSIGNED )
FROM casts
WHERE descr = 'scalar int';

# returns, say, the datetime value 2014-11-10:00:00:00.000000
SELECT CAST(  jsonColumn AS DATETIME )
FROM casts
WHERE descr = 'scalar datetime';

# returns NULL
SELECT CAST(  jsonColumn AS UNSIGNED )
FROM casts
WHERE descr = 'array';

# returns the string '"abc"'
SELECT CAST(  jsonColumn AS CHAR )
FROM casts
WHERE descr = 'scalar string';

# returns the string '[ 1, "abc" ]'
SELECT CAST(  jsonColumn AS CHAR )
FROM casts
WHERE descr = 'array';

Implicit casting only occurs when UPDATE/INSERTing a string value into a JSON column. So...

# errors
INSERT INTO casts VALUES( 'scalar int', 1 );
INSERT INTO casts VALUES( 'scalar datetime', CURRENT_DATE );
INSERT INTO casts VALUES( 'scalar boolean', true );
INSERT INTO casts VALUES( 'scalar bitstring', b'1010' );

# these succeed
INSERT INTO casts VALUES( 'scalar string', '"abc"' );
INSERT INTO casts VALUES( 'scalar int', '2' );
INSERT INTO casts VALUES( 'array', '[ 1, "abc" ]' );

Note that the following statement may produce gibberish in a session/database whose default character set is something other than ascii, utf8, or utf8mb4:

SELECT CAST(  jsonColumn AS CHAR ) FROM casts;

The following, more verbose syntax is recommended for such a session/database:

SELECT CAST(  jsonColumn AS CHAR CHARACTER SET 'utf8mb4' ) FROM casts;

Aggregates on JSON Values

When aggregating JSON values, SQL NULLs are ignored (as is done with other datatypes). With the exception of the MAX, MIN, and GROUP_CONCAT aggregates, the JSON value is converted to a result numeric type and then passed to the aggregate operator. For JSON values which are numeric scalars, the conversion to a numeric type should be sensible, although truncation may occur and precision may be lost. For JSON values which are not numeric scalars, the conversion to a numeric type is repeatable (deterministic) but not defined here.

Aggregate Result data type Other behavior
AVG DOUBLE -
BIT_AND BIGINT UNSIGNED -
BIT_OR BIGINT UNSIGNED -
BIT_XOR BIGINT UNSIGNED -
COUNT BIGINT No conversion is performed. Values are simply counted.
COUNT(DISTINCT) BIGINT No conversion is performed. Values are simply counted. Distinctness is determined by the comparison rules described in worklog WL#8249.
GROUP_CONCAT VARCHAR -
MAX JSON Order is determined by the comparison rules described in worklog WL#8249.
MIN JSON Order is determined by the comparison rules described in worklog WL#8249.
STD DOUBLE -
STDDEV_POP DOUBLE -
STDDEV_SAMP DOUBLE -
STDDEV DOUBLE -
VAR_POP DOUBLE -
VAR_SAMP DOUBLE -
VARIANCE DOUBLE -
SUM DOUBLE -

UDFs and JSON Values

Developers may want to process JSON data with user-defined functions which are coded in C or in another language which can use C calling conventions. UDFs support a limited spectrum of datatypes: STRING, INTEGER, REAL, and DECIMAL. Typically, JSON data will pass to/from UDFs as STRINGs. Note that these STRINGs appear to MySQL as char arrays with the vacuous "binary" character encoding. In order to use UDF return values as JSON values, you must declare their character encoding with a CAST expression. E.g.:

CREATE FUNCTION make_json_document RETURNS STRING SONAME "my_udfs.so";

INSERT INTO json_table( doc ) VALUES ( CAST ( make_json_document() AS CHAR CHARACTER SET gb2312 ) );

JSON Functions

We introduce the following new functions.

JSON_VALID()

This method returns true if the indicated text is a valid json document. This method has the same meaning as the Standard's IS JSON predicate.

BOOLEAN JSON_VALID( JSON doc )
BOOLEAN JSON_VALID( TEXT doc )

Returns:

  • null - Returns NULL if doc is NULL.
  • true - Returns true if doc is already a JSON value. Also returns true if doc is a string type and parses as a valid json document.
  • false - Otherwise returns false.

For example:

# returns 1
SELECT JSON_VALID( '{ "firstName" : "Fred", "lastName" : "Flintstone" }' );

# returns 1
SELECT JSON_VALID( '3' );

# returns NULL as IS JSON would
SELECT JSON_VALID( null );

JSON_TYPE()

This method returns the type of a JSON value.

TINYTEXT JSON_TYPE( ANY_JSON doc )

Returns:

  • null - Returns NULL if doc is NULL.
  • typename - Otherwise, returns one of the following strings, encoded in the utf8mb4 character set with the utf8mb4_bin collation:
    • Purely JSON types:
      • "OBJECT" - JSON objects.
      • "ARRAY" - JSON arrays.
      • "BOOLEAN" - The JSON/MySQL literals true and false.
      • "NULL" - The JSON null literal.
    • Numeric types:
      • "INTEGER" - MySQL [ UNSIGNED ] TINYINT, [ UNSIGNED ] SMALLINT, [ UNSIGNED ] INT and [ UNSIGNED ] BIGINT scalars.
      • "DOUBLE" - MySQL FLOAT, DOUBLE, NUMERIC, and DECIMAL scalars.
    • Date/time types:
      • "DATETIME"- MySQL DATETIME and TIMESTAMP scalars.
      • "DATE" - MySQL DATE scalars.
      • "TIME" - MySQL TIME scalars.
    • String types:
      • "STRING" - MySQL utf8 character type scalars, including ENUM and SET scalars.
    • Binary types:
      • "BLOB" - MySQL binary type scalars.
      • "BIT" - MySQL BIT scalars, regardless of length.
    • Other types:
      • "OPAQUE" - All other types.

Behavior:

  • invalid - Raises an error if doc is not a valid json document.

For example:

INSERT INTO orders( orderID, quantity )
  SELECT
   r.orderID,
   CASE( JSON_TYPE( r.doc ) )
     WHEN "INTEGER" THEN CAST( r.doc AS UNSIGNED INT )
     WHEN "OBJECT" THEN CAST( JSON_EXTRACT( r.doc, '$.quantity' ) AS UNSIGNED INT )
     ELSE NULL
   END
  FROM rawOrders r;

JSON_KEYS()

This method returns an array of all keys in the indicated object.

JSON JSON_KEYS( ANY_JSON doc [, TEXT path ] )

Returns:

  • null - Returns NULL if any of the arguments are NULL or if doc+path does not locate an object.
  • empty - Returns an empty array if doc+path locates an empty object.
  • array - Otherwise returns an array of keys.

Behavior:

  • invalid - Raises an error if doc is not a valid json document or if the path is not a valid path expression.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

For example:

# returns ["a", "b"]
SELECT JSON_KEYS
(
  '{ "a" : "foo", "b" : [ true, { "c" : "123" } ] }'
);

# returns []
SELECT JSON_KEYS
(
  '{ "a" : "foo", "b" : [ true, { "c" : {} } ] }',
  '$.b[1].c'
);

# returns NULL
SELECT JSON_KEYS
(
  '{ "a" : "foo", "b" : [ true, { "c" : {} } ] }',
  '$.a.b[2]'
);

JSON_EXTRACT()

This method returns the value of the document element identified by the indicated path(s).

JSON JSON_EXTRACT( ANY_JSON doc, TEXT path [ (,TEXT path)* ] )

Returns:

  • null - Returns NULL if any of the arguments are NULL or if the paths don't locate any values in the document.
  • value - Otherwise, if there is only one path expression AND that path expressions doesn't contain a wildcard or ellipsis token, then the match is returned. It is NOT wrapped in an array.
  • array - Otherwise, the match(es) are returned, wrapped in an array. The elements in the array occur in the order of the

    corresponding path arguments.

The effect of these rules can be summed up as follows: If the path arguments COULD locate multiple matches on SOME document, then the result(s) are wrapped in an array.

Behavior:

  • invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression.

For example:

# returns a JSON value containing just the string "123"
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : "123" } ] }',
  '$.b[ 1 ].c'
);

# returns a JSON value containing just the number 123
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '$.b[ 1 ].c'
);

# raises an error because the document is not valid
SELECT JSON_EXTRACT
(
  '{ "a" : [ }',
  '$.b[ 1 ].c'
);

# raises an error because the path is invalid
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '$.b[ 1 ].'
);

# returns a JSON value containing the number 123 (because of auto-wrapping)
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '$.b[ 1 ].c[ 0 ]'
);

# returns null because the path, although valid, does not identify a value
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '$.b[ 1 ].c[ 1 ]'
);

# returns a JSON value containing the number 123 (due to normalization)
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }',
  '$.b[ 1 ].c'
);

# returns a JSON array ["foo", true]
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }',
  '$.a', '$.b[0]'
);

# returns a the 'true' literal wrapped in an array
SELECT JSON_EXTRACT
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }',
  '$.d', '$.b[0]'
);

# returns NULL because no matches are found
SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[*].b' ) jdoc;

# returns NULL because no matches are found
SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[0].b' ) jdoc;

# returns the scalar 1
SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[0].a' ) jdoc;

# returns [1, 2] (multiple matches wrapped in an array)
SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[*].a' ) jdoc;

# returns [1] because multiple matches are possible
SELECT JSON_EXTRACT( '[ { "a": 1 }, { "b": 2 } ]', '$[*].a' ) jdoc;

# returns [3, 4]
SELECT JSON_EXTRACT( '[ { "a": [3,4] }, { "b": 2 } ]', '$[0].a' ) jdoc;

# returns [[3, 4]]. an extra array wraps the result because multiple matches are possible
SELECT JSON_EXTRACT( '[ { "a": [3,4] }, { "b": 2 } ]', '$[*].a' ) jdoc;

# returns [[3, 4]]. an extra array wraps the result because multiple matches are possible
SELECT JSON_EXTRACT( '[ { "a": [3,4] }, { "b": 2 } ]', '$[0].a', '$[1].a' ) jdoc;

It should be possible to CAST a JSON_EXTRACT() result to CHAR type in order to run full-text searches and build function indexes on json documents. That is, to search and index expressions like this:

SELECT CAST( JSON_EXTRACT( ...) AS CHAR )

JSON_REMOVE()

This function removes the elements identified by the paths. The function takes one or more path clauses and processes them one after the other.

JSON JSON_REMOVE( ANY_JSON doc, TEXT path [ (, TEXT path)* ]  )

Returns:

  • null - Returns NULL if any of the arguments is NULL.
  • original document - Returns the original document if none of the paths identifies an element.
  • value - Otherwise, a smaller document is returned, the result of removing the values located by the paths.

Behavior:

  • invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression. Raises an error when any of the path expressions is $.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

Each path is applied in order, producing a new document, which is then operated on by the next path. So...

JSON_REMOVE( doc, path1, path2 )

=

JSON_REMOVE( MYSJON_REMOVE( doc, path1 ), path2 )

For example:

# returns the document {"a": "foo", "b": [true]}
SELECT JSON_REMOVE
(
  '{"a" : "foo", "b" : [true, {"c" : 123}]}',
  '$.b[ 1 ]'
);

# returns {"a": "foo", "b": [true, {}]} due to normalization
SELECT JSON_REMOVE
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }',
  '$.b[ 1 ].c'
);

# returns {"a": "foo", "b": [true, {}]}
SELECT JSON_REMOVE
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '$.b[ 1 ].c'
);

# returns the original document because the path doesn't identify an element
SELECT JSON_REMOVE
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123, "d" : 456 } ] }',
  '$.b[ 1 ].e'
);

JSON_APPEND()

This method updates a document, appending values to the end of the indicated arrays.

JSON JSON_APPEND
(
    ANY_JSON doc,
    TEXT path, ANY_JSON_ATOM value
    [ (, TEXT path, ANY_JSON_ATOM value)* ]
)

Returns:

  • null - Returns NULL if any of the arguments are null.
  • document - Otherwise returns a valid JSON value produced by appending the indicated values at the end of the indicated arrays. Scalars and objects, identified by paths, are auto-wrapped as is done for the sister functions JSON_INSERT(), JSON_SET(), and JSON_REPLACE().
  • original - May return a document which is identical to the original document.

Behavior:

  • invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...

JSON_APPEND
(
    doc,
    path1, value1,
    path2, value2
)

=

JSON_APPEND
(
  JSON_APPEND
  (
    doc,
    path1, value1
  ),
  path2, value2
)

For example:

# should return {"a": "foo", "b": ["bar", 4], "c": ["wibble", "grape"]} due to autowrapping
SELECT JSON_APPEND
(
   '{ "a" : "foo", "b" : "bar", "c" : "wibble" }',
   '$.b', 4,
   '$.c', "grape"
);

# should return {"a": "foo", "b": [1, 2, 3, 4], "c": ["apple", "pear", "grape"]}
SELECT JSON_APPEND
(
   '{ "a" : "foo", "b" : [ 1, 2, 3 ], "c" : [ "apple", "pear" ] }',
   '$.b', 4,
   '$.c', "grape"
);

JSON_SET()

This method updates a document. Replaces the indicated value if it exists. Inserts the indicated value if it doesn't exist.

JSON JSON_SET
(
    ANY_JSON doc,
    TEXT path, ANY_JSON_ATOM value
    [ (, TEXT path, ANY_JSON_ATOM value)* ]
)

Returns:

  • null - Returns NULL if any of the arguments are null.
  • document - Otherwise returns a valid JSON value produced by inserting or replacing the indicated values at the indicated locations.
  • original - May return a document which is identical to the original document.

Behavior:

  • invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
  • insert - As discussed above, the only values which can be inserted are new members in existing objects or new cells in existing (possibly auto-wrapped) arrays.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

The following values can be inserted into a document. If the path does not identify one of these types of values, then the insert is ignored.

  • A new member in an existing object.
  • A new array slot past the end of an existing value. If the value is not an array, it is auto-wrapped as a single-element array. The existing array is extended by one new slot.

Given the following document...

{ "a" : "foo", "b" : [ 1, 2, 3 ] }

...the following paths have these meanings...

# this path is possible because it references a new member of an existing object
$.c

# this path is synonymous with $.a (see the section on Array Wrapping above).
$.a[0]

# a request to insert at this location would auto-wrap the current
# value of $.a and add an additional slot to that array
$.a[1]

# this path is an impossible reference to an array as though it were
# an object. a request to insert at this location would be ignored.
$.b.c

# this path references an array cell at an index which doesn't exist yet
# but which could be created in order to satisfy an insert request
$.b[3]

When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...

JSON_SET
(
    doc,
    path1, value1,
    path2, value2
)

=

JSON_SET
(
  JSON_SET
  (
    doc,
    path1, value1
  ),
  path2, value2
)

For example:

# returns {"a": {}, "b": [1, 2, 3]}
SELECT JSON_SET
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.a',
  JSON_OBJECT()
);

# returns {"a": "foo", "b": [1, 2, 3], "c": [true, false]}
SELECT JSON_SET
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.c',
  JSON_ARRAY( true, false )
);

# returns {"a": "foo", "b": [1, 2, 3], "c": [true, false]}
SELECT JSON_SET
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.c',
  JSON_ARRAY( CAST( 'true' AS JSON ), CAST( 'false' AS JSON ) )
);

# returns [1, 2]
SELECT JSON_SET
(
  '1',
  '$[3]',
  2
);

# returns {"a": {"b": false, "c": true}}
SELECT JSON_SET
(
   '{ "a" : "foo"}',
   '$.a', JSON_OBJECT( 'b', false ),
   '$.a.c', true
);

JSON_INSERT()

This method updates a document. If the indicated value does NOT exist, then it is inserted. But if the value does exist, then nothing is done; the existing value is NOT overridden.

JSON JSON_INSERT
(
    ANY_JSON doc,
    TEXT path, ANY_JSON_ATOM value
    [ (, TEXT path, ANY_JSON_ATOM value)* ]
)

Returns:

  • null - Returns NULL if any of the arguments are null.
  • document - Otherwise returns a valid JSON value produced by inserting the indicated values at the indicated locations.
  • original - May return a document which is identical to the original document.

The following values can be inserted into a document. If the path does not identify one of these types of values, then the insert is ignored.

  • A new member in an existing object.
  • A new array slot past the end of an existing value. If the value is not an array, then it is auto-wrapped as a single-element array. The existing array is extended by one new slot.

Behavior:

  • invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
  • insert - As discussed above, the only values which can be inserted are new members in existing objects or new slots in existing (possibly auto-wrapped) arrays.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...

JSON_INSERT
(
    doc,
    path1, value1,
    path2, value2
)

=

JSON_INSERT
(
  JSON_INSERT
  (
    doc,
    path1, value1
  ),
  path2, value2
)

For example:

# returns the original document because the path does exist
SELECT JSON_INSERT
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.a',
  true
);

# inserts a number, returns '{"a": "foo", "b": [1, 2, 3], "c": 123}
SELECT JSON_INSERT
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.c',
  123
);

# inserts a string, returns '{"a": "foo", "b": [1, 2, 3], "c": "123"}
SELECT JSON_INSERT
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.c',
  '123'
);

# returns '{"a": ["foo", true], "b": [1, 2, 3]}'
SELECT JSON_INSERT
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.a[1]',
  true
);

# returns {"a": "foo", "b": true}
SELECT JSON_INSERT
(
   '{ "a" : "foo"}',
   '$.b', true,
   '$.b', false
);

JSON_ARRAY_INSERT()

This method updates a document. If a path identifies a cell in an existing array, then the corresponding new value is inserted into that cell and all trailing cells are right-shifted. Otherwise, nothing is done.

    JSON JSON_ARRAY_INSERT
    (
        ANY_JSON doc,
        TEXT path, ANY_JSON_ATOM value
        [ (, TEXT path, ANY_JSON_ATOM value)* ]
    )

Returns:

  • null - Returns NULL if any of the arguments are null.
  • document - Otherwise returns a valid JSON value produced by inserting the indicated values at the indicated cell positions.
  • original - May return a document which is identical to the original document.

Behavior:

  • invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
  • cell identifier - Raises an error if the last leg in the path expression is NOT a cell identifier.
  • big index - If the cell identifier indicates a position past the end of the array, then the new value is simply inserted at the end of the existing array.
  • auto-wrapping - Scalars and objects are NOT auto-wrapped.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...

JSON_ARRAY_INSERT
(
    doc,
    path1, value1,
    path2, value2
)

=

JSON_ARRAY_INSERT
(
  JSON_ARRAY_INSERT
  (
    doc,
    path1, value1
  ),
  path2, value2
)

For example:

# raises an error because the path doesn't end in a cell identifier
SELECT JSON_ARRAY_INSERT
(
  '{ "a": [ 1, 2, 3 ] }',
  '$.a', 4
);

# returns { "a": [ 4, 1, 2, 3 ] }
SELECT JSON_ARRAY_INSERT
(
  '{ "a": [ 1, 2, 3 ] }',
  '$.a[ 0 ]', 4
);

# returns { "a": [ 1, 2, 4, 3 ] }
SELECT JSON_ARRAY_INSERT
(
  '{ "a": [ 1, 2, 3 ] }',
  '$.a[ 2 ]', 4
);

# returns { "a": [ 1, 2, 3, 4 ] }
SELECT JSON_ARRAY_INSERT
(
  '{ "a": [ 1, 2, 3 ] }',
  '$.a[ 100 ]', 4
);

# no auto-wrapping. returns { "a": true }
SELECT JSON_ARRAY_INSERT
(
  '{ "a": true }',
  '$.a[ 0 ]', false
);

# raises an error because of the wildcard
SELECT JSON_ARRAY_INSERT
(
  '[ [ 1, 2, 3 ], [ 4, 5, 6 ] ]',
  '$[*][0]', false
);

JSON_REPLACE()

This method updates a document. If the indicated value exists, then it is replaced with the new value. But nothing happens if the value doesn't exist.

JSON JSON_REPLACE
(
    ANY_JSON doc,
    TEXT path, ANY_JSON_ATOM value
    [ (, TEXT path, ANY_JSON_ATOM value)* ]
)

Returns:

  • null - Returns NULL if any of the arguments are null.
  • document - Otherwise returns a valid JSON value produced by replacing the indicated values at the indicated locations.
  • original - May return a document which is identical to the original document.

Behavior:

  • invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...

JSON_REPLACE
(
    doc,
    path1, value1,
    path2, value2
)

=

JSON_REPLACE
(
  JSON_REPLACE
  (
    doc,
    path1, value1
  ),
  path2, value2
)

For example:

# returns the original document because the path doesn't exist
SELECT JSON_REPLACE
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.c',
  true
);

# returns '{"a": true, "b": [1, 2, 3]}'
SELECT JSON_REPLACE
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.a[0]',
  true
);

# returns the original document because the path doesn't exist
SELECT JSON_REPLACE
(
  '{ "a" : "foo", "b" : [ 1, 2, 3 ] }',
  '$.b[5]',
  true
);

JSON_MERGE()

This function merges two or more json documents into a single document.

JSON JSON_MERGE( ANY_JSON doc, ANY_JSON doc2 [ (, ANY_JSON otherdocs )* ] )

Returns:

  • null - Returns NULL if any of the arguments are null.
  • document - Otherwise returns a sensible JSON value produced by merging the documents as described below.

Behavior:

  • invalid - Raises an error if any of the documents isn't valid.
  • scalars - If any of the documents that are being merged is a scalar, each scalar document is autowrapped as a single value array before merging.
  • arrays - When merging a left element with a right element, if both elements are arrays, then the result is the left array concatenated with the right array. For instance, [ 1, 2 ] merged with [ 3, 4 ] is [ 1, 2, 3, 4 ].
  • array and object - When merging an array with an object, the object is autowrapped as an array and then the rule above is applied. So [ 1, 2 ] merged with { "a" : true } is [ 1, 2, { "a": true } ].
  • objects - When merging two objects, the two objects are concatenated into a single, larger object. So { "a" : "foo" } merged with { "b" : 5 } is { "a" : "foo", "b" : 5 }.
  • duplicates - When two objects are merged and they share a key, the values associated with the shared key are merged.

For example:

# returns [{"a": "foo", "b": [true, {"c": 123}]}, 5, 6]
SELECT JSON_MERGE
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '[ 5, 6]'
);

# returns {"a": "foo", "b": [true, {"c": 123}, false, 34]}
SELECT JSON_MERGE
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '{ "b": [ false, 34 ] }'
);

# returns {"a": "foo", "b": [true, {"c": 123}, "bar"]}
SELECT JSON_MERGE
(
  '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }',
  '{ "b": "bar" }'
);

# returns {"a": {"b": 1, "c": 1}}
SELECT JSON_MERGE
(
  '{ "a" : { "b" : 1 } }',
  '{ "a" : { "c" : 1 } }'
);

JSON_SEARCH()

This function returns path(s) to the given string. The returned path(s) identify object members or array slots which are character strings. If the string occurs more than once in the search scope (either the whole document or the given paths), and if the oneOrAll argument is "one", then one matching path is returned; but which one is not defined. If the oneOrAll argument is "all" and more than one string matches, then all matching paths are returned in a JSON array. If path arguments are supplied, then the search is performed only inside those paths.

JSON JSON_SEARCH
(
    ANY_JSON doc,
    TINYTEXT oneOrAll,
    TEXT searchString
    [
      , CHAR( 1 ) escapeCharacter
      [ (, TEXT path)* ]
    ]
)

Returns:

  • null - Returns NULL if doc or searchString is null. Returns NULL if any of the path arguments is null. Returns NULL if all of the paths are non-existent. Returns NULL if the string isn't found.
  • escapeCharacter - If you want to search for strings containing the special % or _ wildcard characters, then you can prefix the % or _ with the default escape character, '\'. If you need to change the escape character, then you can specify a non-null escapeCharacter argument. Specifying null for this argument has the same meaning as specifying '\'.
  • paths - Otherwise, returns a double-quoted string or array of double-quoted strings which are further refinements of the given paths. The returned paths identify object members or array slots holding the matched strings. The oneOrAll argument determines whether the search terminates on the first match or whether all matches are returned. The double-quoted strings are valid path expressions when they are stripped of their quotes by using JSON_UNQUOTE().

The oneOrAll argument may take the following values:

  • one - Terminate the search quickly, on the first match. It is undefined which match is considered first. The function returns a JSON value containing one path string.
  • all - Return all matching paths. The function returns a JSON array containing all matching path strings. The order of the matching paths is not defined.

The % character matches any number of characters (even 0), as it does for the LIKE operator. So, for example:

  • abc - Matches only the exact string "abc".
  • abc% - Matches "abc", "abcd", etc..
  • a%bc - Matches "abc", "afoobc", etc..

The _ character matches exactly one character, as it does for the LIKE operator. So, for example:

  • a_c - Matches "abc", "a2c", etc..
  • a_b_c - Matches "a1b2c", "a3b4c", etc..

Behavior:

  • invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression.
  • constant - Raises an error if escapeCharacter is not a constant expression.
  • empty - If no path qualifiers are given, then the whole document is searched.

For example:

# returns null because numeric values don't match string values
SELECT JSON_SEARCH
(
  '{ "a" : 123, "b" : [ 123, 456 ] }',
  'one',
  '123'
);

# returns "$.b[2]"
SELECT JSON_SEARCH
(
  '{ "a" : "123", "b" : [ 123, "789", "123", "456", "123" ] }',
  'one',
  '123',
  null,
  '$.b'
);

# could return either "$.a" or "$.b.key"
SELECT JSON_SEARCH
(
  '{ "a" : "123", "b" : { "key" : "123" } }',
  'one',
  '123'
);

# returns "$.b.key"
SELECT JSON_SEARCH
(
  '{ "a" : "1243", "b" : { "key" : "1234" } }',
  'one',
  '123%'
);

# returns "$.b.c"
SELECT JSON_SEARCH
(
  '{ "a" : "1243", "b" : { "key" : "1234", "c": "directorysub%directoryabc" } }',
  'one',
  'dir%torysub@%dir%',
  '@'
);

# returns null because the path doesn't exist
SELECT JSON_SEARCH
(
  '{ "a" : "1243", "b" : { "key" : "1234" } }',
  'one',
  '123%',
  null,
  '$.c'
);

# returns $."one potato"
SELECT JSON_UNQUOTE
(
  JSON_SEARCH
  (
    '{ "onepotato": "foot", "one potato": "food" , "one \\"potato": "fool" }',
    'all',
    'food'
  )
);

JSON_CONTAINS()

This function checks whether a candidate JSON document is contained in a target document or in the piece of the target document identified by a path. See below for a definition of containment.

BOOLEAN JSON_CONTAINS
(
    ANY_JSON target,
    ANY_JSON candidate,
    [ , TEXT path ]
)

Returns:

  • null - Returns NULL if any of the arguments are null.
  • missing - Returns NULL if the path does not identify a section of the target document.
  • true - Returns true if the candidate document is contained in the sub-document identified by target and path.
  • false - Returns false if the candidate document is NOT contained in the path- identified sub-document of target.

Behavior:

  • invalid - Raises an error if either target or candidate is not a valid json document or if the path is not a valid path expression.
  • wildcards - Raises an error if the path expression contains wildcard or ellipsis tokens.

Containment is defined as follows:

  • scalar - A candidate scalar is contained in a target scalar iff they are comparable (defined below) and are equal.
  • array - A candidate array is contained in a target array iff every cell in the candidate is contained in SOME cell of the target.
  • autowrapping - A candidate non-array is contained in a target array iff the candidate is contained in SOME cell of the target.
  • object - A candidate object is contained in a target object iff for each key CK in the candidate...
    • There is a key TK with the same name in the target.
    • And the value bound to CK is contained in the value bound to TK.
  • otherwise - If the candidate and the target don't match any of the above rules, then the candidate is NOT contained in the target.

The following table describes which scalar types are comparable. X marks a comparable combination:

- NULL DECIMAL INT UINT DOUBLE STRING BOOLEAN DATE TIME DATETIME TIMESTAMP OPAQUE
NULL X - - - - - - - - - - -
DECIMAL - X X X X - - - - - - -
INT - X X X X - - - - - - -
UINT - X X X X - - - - - - -
DOUBLE - X X X X - - - - - - -
STRING - - - - - X - - - - - -
BOOLEAN - - - - - - X - - - - -
DATE - - - - - - - X - - - -
TIME - - - - - - - - X - - -
DATETIME - - - - - - - - - X X -
TIMESTAMP - - - - - - - - - X X -
OPAQUE - - - - - - - - - - - X

For example:

# returns 1
SELECT JSON_CONTAINS
(
   CAST('[1, 4, 6]' AS JSON),
   CAST('[1, 6]' AS JSON)
);

# returns 1; even with nested cast off elements
SELECT JSON_CONTAINS
(
   CAST('{"person": {"id": 1, "country": "norway"}}' AS JSON),
   CAST('{"person": {"country": "norway"}}' AS JSON)
);

# returns 1; reordering and duplicates are ok
SELECT JSON_CONTAINS
(
   CAST('[1,3,5]' AS JSON),
   CAST('[5,3,1,5]' AS JSON)
);

# return 0; no type conversion is performed
SELECT JSON_CONTAINS
(
   CAST('[3.14]' AS JSON),
   CAST('[3]' AS JSON)
);

# returns 1, due to auto-wrapping
SELECT JSON_CONTAINS
(
   CAST('[1, 2, 3]' AS JSON),
   CAST(3 AS JSON)
);

JSON_CONTAINS_PATH()

This function checks for the existence of the specified path. Returns false if none of the paths identifies an existing object member or array slot. Otherwise, returns true or false depending on the oneOrAll directive.

BOOLEAN JSON_CONTAINS_PATH
(
    ANY_JSON doc,
    TINYTEXT oneOrAll,
    TEXT path
    [ (, TEXT path)* ]
)

Returns:

  • null - Returns NULL if any of the arguments are null.
  • true/false - Otherwise, returns true or false depending on the value of the oneOrAll argument and on whether the paths identify actual values in the document.

The oneOrAll argument takes these values:

  • all - All of the paths must exist for the function to return true.
  • one - At least one of the paths must exist for the function to return true.

Behavior:

  • invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression. Raises an error if oneOrAll is not 'one' or 'all'.
  • satisfied - A wildcarded path is said to exist if some concrete expansion of the path exists.

For example:

# returns 0 because there is no element at $.a.c
SELECT JSON_CONTAINS_PATH
(
  '{ "a" : 123, "b" : [ 123, 456 ] }',
  'all',
  '$.a.c',
  '$.b[1]'
);

# returns 1 because there is an element at $.b[1]
SELECT JSON_CONTAINS_PATH
(
  '{ "a" : 123, "b" : [ 123, 456 ] }',
  'one',
  '$.a.c',
  '$.b[1]'
);

# returns 0 because there is no element at the given path
SELECT JSON_CONTAINS_PATH
(
  '{ "a" : 123, "b" : [ 123, 456 ] }',
  'all',
  '$.c'
);

# returns 1 because there is an element at $.b[1].c.d
SELECT JSON_CONTAINS_PATH
(
  '{ "a" : 123, "b" : [ 123, { "c" : { "d" : true } } ] }',
  'all',
  '$.b[1].c.d'
);

JSON_LENGTH()

This function counts the number of members in an object or the number of slots in an array. The function takes an optional path argument. Without the path qualifier, the function operates on the top level object or array. With the path qualifier, the function operates on the object or array identified by the path.

INT JSON_LENGTH
(
    ANY_JSON doc
    [, TEXT path ]
)

Returns:

  • null - Returns NULL if any of the arguments is null. Returns NULL if the path does not exist.
  • count - Otherwise, returns:
    • scalar - Returns 1 if the document and path identify a scalar atom.
    • object - If the document and the path identify an object, then returns the number of its members.
    • array- If the document and the path identify an array, then returns the number of cells in the array.

Behavior:

  • invalid - Raises an error if doc is not a valid json document or if the path is not a valid path expression.
  • wildcard - Raises an error if the path expression contains a * or ** wildcard token.

For example:

# returns 0
SELECT JSON_LENGTH
(
  '{}'
);

# returns 1
SELECT JSON_LENGTH
(
  '3'
);

# returns 2
SELECT JSON_LENGTH
(
  '{ "a" : 123, "b" : [ 123, 456, 789 ] }'
);

# returns 3
SELECT JSON_LENGTH
(
  '{ "a" : 123, "b" : [ 123, 456, 789 ] }',
  '$.b'
);

# returns null because the path does not exist
SELECT JSON_LENGTH
(
  '{ "a" : 123, "b" : [ 123, 456, 789 ] }',
  '$.c'
);

JSON_DEPTH()

This function computes the maximum depth of a json document. Any empty object or array has depth 1. An object with members having atomic values has depth 2. And so on.

INT JSON_DEPTH
(
    ANY_JSON doc
)

Returns:

  • null - Returns NULL if doc is NULL.
  • number - Otherwise returns the depth of the document as described above.

Behavior:

  • invalid - Raises an error if doc is not a valid json document.

For example:

# returns 1
SELECT JSON_DEPTH
(
  '{}'
);

# returns 1
SELECT JSON_DEPTH
(
  '[]'
);

# returns 1
SELECT JSON_DEPTH( '"abc"' );

# returns 1
SELECT JSON_DEPTH( CAST( '"abc"' AS JSON ) );

# error, 1 is not an ANY_JSON
SELECT JSON_DEPTH( 1 );

# error, 'abc' does not parse as a JSON document
SELECT JSON_DEPTH( 'abc' );

# returns 1
SELECT JSON_DEPTH( CAST( 1 AS JSON ) );

# returns 2
SELECT JSON_DEPTH
(
  '{ "a" : true, "b" : false, "c" : null }'
);

# returns 2
SELECT JSON_DEPTH
(
  '[ "a", true, "b" , false, "c" , null ]'
);

# returns 2
SELECT JSON_DEPTH
(
  '{ "a" : true, "b" : {}, "c" : null }'
);

# returns 2
SELECT JSON_DEPTH
(
  '[ "a", true, "b" , {}, "c" , null ]'
);

# returns 3
SELECT JSON_DEPTH
(
  '{ "a" : true, "b" : { "e" : false }, "c" : null }'
);

# returns 3
SELECT JSON_DEPTH
(
  '[ "a", true, "b" , { "e" : false }, "c" , null ]'
);

# raises an error because the document is not valid
SELECT JSON_DEPTH
(
  '[ "a", true, "b" , { "e" : false }, "c" , null'
);

JSON_ARRAY()

This function builds a JSON array out of ANY_JSON_ATOMs.

JSON JSON_ARRAY
(
    [ ANY_JSON_ATOM atom [ (, ANY_JSON_ATOM atom)* ] ]
)

Returns a (possibly empty) JSON array.

Behavior:

  • null - Atoms which evaluate as SQL NULLs are translated into JSON null literals.

For example:

# returns the empty array: []
SELECT JSON_ARRAY();

# returns ["Accounting", {"processed": true }]
SELECT JSON_ARRAY( d.deptName, CAST( '{ "processed" : true }' AS JSON ) )
FROM department d
WHERE id = 405;

# returns ["Accounting", 405, true, 2014-11-04:00:00:00.000000]
SELECT JSON_ARRAY( d.deptName, d.id, d.isExempt and true, current_date )
FROM department d
WHERE id = 405;

# returns '["Accounting", 405, true, "2014-11-04 00:00:00.000000"]'
SELECT CAST( JSON_ARRAY( d.deptName, d.id, d.isExempt and true, current_date ) AS CHAR )
FROM department d
WHERE id = 405;

# returns ["Accounting", 405, true, 18:53:38]
SELECT JSON_ARRAY( d.deptName, d.id, d.isExempt and true, current_time )
FROM department d
WHERE id = 405;

# stores a JSON value in a JSON-typed column
INSERT INTO json_table( json_column )
  SELECT JSON_ARRAY( d.deptName, d.id, d.blobColumn )
  FROM department d
  WHERE id = 405;

JSON_OBJECT()

This function builds a JSON object out of ANY_JSON_ATOMs.

JSON JSON_OBJECT
(
    [
      TEXT keyName, ANY_JSON_ATOM atom
      [ (,TEXT keyName, ANY_JSON_ATOM atom)* ]
    ]
)

Returns a (possibly empty) JSON object.

Behavior:

  • error - Raises an error if any of the key names are null.
  • null - Atoms which evaluate as SQL NULLs are translated into JSON null literals.

For example:

# returns the empty object: {}
SELECT JSON_OBJECT();

# returns {"deptName": "Accounting", "id": 405, "isExempt": true, "date": 2014-11-0400:00:00.000000}
SELECT JSON_OBJECT
(
  'deptName', d.deptName,
  'id', d.id,
  'isExempt', d.isExempt and true,
  'date', current_date
)
FROM department d
WHERE id = 405;

JSON_UNQUOTE()

This method strips double-quotes which frame a text value, and unescapes interior quotes (and some other characters) as necessary. It is used to turn a valid JSON string literal into a SQL string literal.

LONGTEXT JSON_UNQUOTE( LONGTEXT textValue )
LONGTEXT JSON_UNQUOTE( JSON jsonValue )

Returns a string which is encoded in the utf8mb4 character set with the utf8mb4_bin collation:

  • json - If the argument is a JSON value, then the value is implicitly serialized into a string as described above in CASTs to/from JSON Values. Then the following rules are applied.
  • null - Returns NULL if textValue is NULL.
  • implicit coercion - If textValue is not a utf8 character type, then it is implicitly coerced to utf8mb4.
  • not quoted - Returns textValue untouched if textValue does not start and end with double-quotes.
  • invalid - Raises an exception if textValue starts and ends with double-quotes but isn't a valid JSON string literal
  • stripped - Otherwise, strips the framing double-quotes from textValue and unescapes interior double-quotes and other characters as described below.

C-style escape sequences and unicode escape sequences are unescaped as follows:

Common name C-style escape notation Escape sequence in UTF-8 bytes Transformed to unescaped UTF-8 bytes
quote \" %x5C %x22 %x22
backslash \\ %x5C %x5C %x5C
backspace \b %x5C %x62 %x08
formfeed \f %x5C %x66 %x0C
linefeed \n %x5C %x6E %x0A
carriage-return \r %x5C %x72 %x0D
tab \t %x5C %x74 %x09
unicode \uXXXX %x5C %x75 followed by 4 hex digits UTF-8 bytes for corresponding unicode character

For example:

# returns the SQL string literal abc
SELECT JSON_UNQUOTE( '"abc"' );

# returns the SQL string literal "abc
SELECT JSON_UNQUOTE( '"abc' );

# raises an error
SELECT JSON_UNQUOTE( 123 );

# returns the SQL string literal abc
SELECT JSON_UNQUOTE
( CAST( CAST( '"abc"' AS JSON ) AS CHAR ) );

# returns 1
SELECT JSON_UNQUOTE
(
  CAST(
    JSON_EXTRACT( '{ "userName" : "fred" }', '$.userName' )
    AS CHAR
  )
) = 'fred';

# returns 0
SELECT
  CAST(
    JSON_EXTRACT( '{ "userName" : "fred" }', '$.userName' )
    AS CHAR
  ) = 'fred';

JSON_QUOTE()

This method wraps double-quotes around a text value, escaping interior quotes (and some other characters) as necessary. It is used to turn a text value into a valid JSON string literal.

LONGTEXT JSON_QUOTE( LONGTEXT textValue )

Returns a string which is encoded in the utf8mb4 character set with the utf8mb4_bin collation:

  • null - Returns NULL if textValue is NULL.
  • implicit coercion - If textValue is not a utf8 character type, then it is implicitly coerced to utf8mb4.
  • quoted - Otherwise, wraps double-quotes around textValue and escapes any interior double-quotes. Other characters are escaped as described below.

The following UTF-8 byte sequences are transformed into C-style escape sequences and unicode escape sequences as follows:

Common name C-style escape notation Original unescaped UTF-8 bytes Transformed to escape sequence in UTF-8 bytes
quote \" %x22 %x5C %x22
backslash \\ %x5C %x5C %x5C
backspace \b %x08 %x5C %x62
formfeed \f %x0C %x5C %x66
linefeed \n %x0A %x5C %x6E
carriage-return \r %x0D %x5C %x72
tab \t %x09 %x5C %x74
unicode \uXXXX A hex number in the range of 00-1F, except for the ones handled above (backspace, formfeed, linefeed, carriage-return, and tab). %x5C %x75 followed by 4 hex digits

For example:

# returns "abc"
SELECT JSON_QUOTE( 'abc' );

# raises an error
SELECT JSON_QUOTE( 123 );

# returns the JSON document consisting of the string scalar "123"
SELECT CAST( JSON_QUOTE( '123' ) AS JSON );

(Compiled by Dag Wanvik and Rick Hillegas)

------------------ Related Worklogs --------------------

The implementation depends on other, related worklogs:

WL#8132 (JSON Storage) WL#8249 (JSON Comparator)

------------------ Abstractions --------------------

The implementation is built around 4 major abstractions:

json_binary::Value

 This is the optimized, in-memory representation of a JSON data value.  For more information, see WL#8132.

Json_dom

 This is a JSON data value unpacked into a tree of element nodes. Each node represents an object, an array, a number, a string, or a literal (true, false, null). Unpacking a JSON data value is an expensive operation.

Json_wrapper

 This is an indirection layer which hides whether a JSON value is represented as a json_binary::Value or a Json_dom. It permits optimal drill-down into a JSON value without requiring unnecessary or excessive unpacking and parsing. A wrapper can serialize itself as a json text string or as a json_binary::Value.

Json_path

 This is a parsed json path expression, represented as a sequence of path legs.
------------------ Memory Management --------------------

json_binary::Value objects are mostly allocated on the stack. The seek() functionality, however, can copy them into a std::vector which keeps them on the heap for row-duration.

json_dom objects are allocated on the heap and are destroyed when the query cursor advances to the next row.

Json_wrappers are re-used across rows. They are allocated out of the query-duration MEMROOT. When a Json_wrapper is re-used, its old contents (json_binary::Value or Json_dom) are destroyed and new contents are poked into the wrapper.

Json_path objects are allocated on the stack and are destroyed when they go out of scope. A Json_path object contains a std::vector of Json_path_leg objects, and this vector allocates space on the heap. The space allocated by the vector is automatically freed when the Json_path object is destroyed.

------------------ JSON Data Value --------------------

The JSON data value is implemented as a new method on the Item class: val_json(). The val_json() method returns a Json_wrapper via an output parameter. The val_json() method obeys this contract:

 /**
   Get a JSON value from an Item.
   All subclasses that can return a JSON value, should override this
   function. The function in the base class is not expected to be
   called. If it is called, it most likely means that some subclass
   is missing an override of val_json().
   @param[in,out] str
     The buffer used for reading the text or binary representation of
     the JSON value. The buffer has to stay alive as long as the
     Json_wrapper is alive, since it may contain data pointed to by
     the Json_wrapper.
   @param[in,out] result The resulting Json_wrapper.
   @return true if successful, false on failure
 */
 virtual bool val_json(String *str, Json_wrapper *result)
------------------ JSN_*() Functions --------------------

The JSN_*() functions are modelled as subclasses of Item_func. The subclasses live in item_func_json.h/cc. All functions which return JSON values are modelled as subclasses of a new intermediate abstract class called Item_func_json:

These are the functions which return int values via Item_func.val_int():

Item_func

 Item_int_func
   Item_func_json_valid (JSN_VALID)
   Item_func_json_length (JSN_LENGTH)
   Item_func_json_contains_path (JSN_CONTAINS_PATH)
   Item_func_json_depth (JSN_DEPTH)

These are the functions which return string values via Item_func.val_str():

Item_func

 Item_str_func
   Item_func_json_type (JSN_TYPE)
   Item_func_json_search (JSN_SEARCH)
   Item_func_json_unquote (JSN_UNQUOTE)
   Item_func_json_quote (JSN_QUOTE)

These are the functions which return JSON values via Item_func.val_json():

Item_func

 Item_func_json
   Item_func_json_keys (JSN_KEYS)
   Item_func_json_extract (JSN_EXTRACT)
   Item_func_json_remove (JSN_REMOVE)
   Item_func_json_append (JSN_APPEND)
   Item_func_json_set (JSN_SET)
   Item_func_json_insert (JSN_INSERT)
   Item_func_json_replace (JSN_REPLACE)
   Item_func_json_merge (JSN_MERGE)
   Item_func_json_array (JSN_ARRAY)
   Item_func_json_row_object (JSN_ROW_OBJECT)
   Item_json_typecast (implements CAST(... AS JSON))

Note that Item_func_json.val_str() serializes the JSON value as a string, when the function result is used in a string context.

The val_int(), val_str(), and val_json() methods perform argument validation, checking for nulls and parsing errors. The number of function arguments is verified earlier on in the create_native methods in item_create.cc. The complicated logic is then distributed as follows:

JSN_VALID

 Forwarded to a helper method, json_is_valid().

JSN_LENGTH JSN_TYPE JSN_KEYS JSN_EXTRACT JSN_CONTAINS_PATH JSN_DEPTH

 Forwarded to the Json_wrapper.

JSN_ARRAY JSN_ROW_OBJECT JSN_QUOTE JSN_UNQUOTE

 Work done inline.

JSN_REMOVE JSN_APPEND JSN_SET JSN_REPLACE JSN_INSERT JSN_MERGE

 Work is performed inline after materializing the Json_dom.

JSN_SEARCH

 Search semantics handled by a fabricated Item_func_like node.
------------------ Helper Functions --------------------

The JSN_*() functions rely on several helper functions in item_json_func.cc, including json_is_valid(), get_json_atom_wrapper(), parse_json() and parse_path():

/**

Helper method for Item_func_json_* methods. Check if a JSON item or
JSON text is valid and, for the latter, optionally construct a DOM
tree (i.e. only if valid).
@param[in]     args       Item_func::args alias
@param[in]     arg_idx    Index (0-based) of argument into the args array
@param[out]    value      Item_func_json_*::m_value alias
@param[in]     func_name  Name of the user-invoked JSN_ function
@param[out]    need_parse True if we needed to parse to validate
@param[in,out] dom        If non-null, we want any text parsed DOM
                          returned at the location pointed to
@param[in]     require_str_or_json
                          If true, generate an error if other types used
                          as input
@returns true if the text is valid JSON (or NULL), else false
*/

static bool json_is_valid(Item **args,

                             uint arg_idx,
                             String &value,
                             const char *func_name,
                             bool &need_parse,
                             Json_dom **dom,
                             bool require_str_or_json)

/**

 Get a JSON value in args[args_idx], either by an existing JSON
 value, or by parsing a string as a JSON text.
 @param[in]     args        the arguments
 @param[in]     arg_idx     the argument index
 @param[in,out] str         string buffer to use
 @param[in]     func_name   the name of the function we are executing
 @param[in,out] wr          the resulting JSON value location
 @result true if we found a value or NULL, false if not.
  • /

bool get_json_wrapper(Item **args,

                     uint arg_idx,
                     String *str,
                     const char *func_name,
                     Json_wrapper *wrapper)

/**

 Get a SQL value in args[args_idx] and convert it to a JSON
 value, possibly a JSON scalar, so that it can be used to
 construct/update other JSON documents.
 @param[in]     args        the arguments
 @param[in]     arg_idx     the argument index
 @param[in,out] value       string buffer to use
 @param[in,out] tmp         temporary scratch space for converting strings to
                            the correct charset; only used if accept_string is
                            true and conversion is needed
 @param[in,out] wr          the resulting JSON value location
 @param[in,out] scalar      optional location to store a parsed JSON boolean
 @param[in]     accept_string  true if the value is not to be treated and parsed
                               as JSON text string
 @result true if we found a value or NULL, false if not.
  • /

bool get_json_atom_wrapper(Item **args,

                          uint arg_idx,
                          String *value,
                          String *tmp,
                          Json_wrapper *wr,
                          Json_scalar_holder *scalar,
                          bool accept_string)

/**

Helper method for Item_func_json_* methods. Assumes that
the caller has already verified that the path expression is not null.
Raises an error if the path expression is syntactically incorrect.
Raises an error if the path expression contains wildcard tokens but is not
supposed to. Otherwise returns a Json_path object.
@param[in]  path_expression  A string Item to be interpreted as a path.
@param[out] value            Holder for path string
@param[in]  forbid_wildcards True if the path shouldn't contain * or **.
@returns The parsed path. NULL if an error occurred, or a NULL string
*/

static Json_path *parse_path(Item * path_expression, String &value,

                            my_bool forbid_wildcards)
------------------ JSON parsing --------------------

The JSON parsing is done via a lightly modified verson of the open source Rapidjson https://github.com/miloyip/rapidjson project.

The parsing of JSON text happens in parse_json() and Json_dom::parse(const char *, size_t , const char **, size_t *).

If we do not need a DOM to be constructed, the parsing happens by calling rapidjson's method

document.AcceptAnyRoot(true).ParseStream(StringStreamSafe)

which return true if the text is valid JSON. This is used by JSN_VALID, since it this case, we do not need a DOM. If we need a DOM, cf. the "dom" argument to json_is_valid, the rapidjson parser is invoked via a SAX style parser, that is, we call rapidjson with our callback class (Rapid_json_handler which implements the callback interface BaseReaderHandler).

reader.AcceptAnyRoot().Parse(ss, handler);

Rapid_json_handler lives in json_dom.cc The callbacks incrementally builds a DOM, cf. Rapid_json_handler::get_built_doc().

------------------ Json_wrapper.seek() --------------------

JSN_*() functions with path arguments position into JSON values via Json_wrapper.seek(). When the path contains an ellipsis, this method materializes the DOM because duplicate detection with Json_binary::Value is difficult. When this happens, the method forwards to Json_dom.seek().

The approach is a little tricky due to the ellipsis token which can appear in path expressions. The ellipsis token is a MySQL extension and is not part of the SQL Standard. The ellipsis token gives rise to the following situation, which is described by this header comment on Json_wrapper.seek():

   Special care must be taken when the path expression contains more than one
   ellipsis (**) token. That is because multiple paths with ellipses may
   identify the same value. Consider the following document:
   { "a": { "x" : { "b": { "y": { "b": { "z": { "c": 100 } } } } } } }
   The innermost value (the number 100) has the following unique,
   non-wildcarded address:
   $.a.x.b.y.b.z.c
   That location is reached by both of the following paths which include
   the ellipsis token:
   $.a.x.b.**.c
   $.a.x.b.y.b.**.c
   And those addresses both satisfy the following path expression which has
   two ellipses:
   $.a.**.b.**.c
   In this case, we only want to return one instance of $.a.x.b.y.b.z.c

The approach taken by Json_dom.seek() is to maintain two vectors of matches as the legs of the path are walked in sequence. The vectors have these meanings:

doms - This is the list of elements which match the first N path legs so far. This result was computed by the previous loop iteration.

candidates - This is the result which is being computed. That is, the list of elements which match the first N+1 legs. Duplicates are eliminated as the candidate vector is filled in.

In order to avoid unnecessary creation of vectors, the two vectors are swapped with each loop iteration.

The seek() method relies, in turn, on a recursive Json_dom.find_child_doms() method. This method handles the tricky semantics of the ellipsis token.