WL#7909: Server side JSON functions
Provide a basic set of built-in functions for working with JSON data.
Requirements:
1. Lets users construct JSON data values from other relational data.
2. Lets users extract relational data from JSON data values.
3. Lets users minimally introspect the structure of JSON values and text (validity, length, depth, keys).
4. Works on strings which are utf8mb4 encoded.
5. Performance should be suitable for read-intensive applications.
Non-requirements:
A. May produce surprising results on strings which are not utf8mb4 encoded.
B. There is limited support for decimal values nested inside JSON documents.
C. Performance may not be suitable for write-intensive applications.
Server-side JSON Functions (Functional Spec)
Revision | Description | Date | Authors |
17 | Various improvements:
|
July 20, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
16 | Various improvements:
|
July 3, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
15 | Various improvements:
|
June 8, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
14 | Various improvements:
|
May 27, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
13 | Various improvements:
|
April 8, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
12 | Various improvements:
|
April 3, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
11 | Various improvements:
|
February 24, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
10 | More improvements:
|
February 4, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
9 | Corrections to 8th draft:
|
February 2, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
8 | Corrections to 7th draft:
|
January 7, 2015 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
7 | Corrections to 6th draft:
|
December 3, 2014 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
6 | Corrections to 5th draft, based on review by Alfredo Kojima,
Mike Zinner, and the optimizer group.
|
December 1, 2014 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen, Alfredo Kojima. |
5 | Corrections to 4th draft. | November 17, 2014 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen |
4 | Corrections to 3rd draft:
|
November 14, 2014 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen |
3 | Corrections to 2nd draft:
|
November 10, 2014 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen |
2 | Incorporate feedback on rev 1:
|
November 6, 2014 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen |
1 | First cut. | November 4, 2014 | Rick Hillegas, Evgeny Potemkin, Dag Wanvik, Knut Anders Hatlen |
- Overview
- Terms and Conventions
- Path Syntax
- Array Wrapping
- Normalization
- ANY_JSON
- ANY_JSON_ATOM
- CASTs to/from JSON Values
- Aggregates on JSON Values
- UDFs and JSON Values
- JSON Functions
Overview
Here we describe a number of builtin functions which will be added to MySQL 5.7. These functions should be useful in constructing and examining json documents. This feature is described by worklog WL-7909 (http://wl.no.oracle.com/worklog/Optimizer-Sprint/?tid=7909).
Terms and Conventions
This functional spec refers to the following documents:
- ECMAScript-262 - The
ECMAScript Language Specification,
edition 5.1, dated June 2011 (http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf).
This is the grammar used by the Standard.
- RFC 7159 - The grammar for well-formed json documents is defined by
RFC 7159 (Tim Bray, March 2014, http://www.rfc-editor.org/rfc/rfc7159.txt).
- Standard - We expect that the next version of the ANSI/ISO SQL Standard
will define language for manipulating json documents in relational
databases. The new json language will be included in part 2 of the Standard (known as the "Standard" in this document).
This functional spec is informed by preliminary drafts of
part 2, dated 2014-10-26, 2014-11-02, and 2014-11-04.
See https://stbeehive.oracle.com/teamcollab/library/st/SQL+standards/Documents/Working%20Drafts.
For the remainder of this spec, we use the following terms:
- ANY_JSON - Json documents can be stored on disk in character types, as well as in the new JSON type. The ANY_JSON wildcard refers to any data type which can hold json documents.
- ANY_JSON_ATOM - JSON values can contain atomic values which correspond to any MySQL type. These SQL types are known collectively as the ANY_JSON_ATOM types. Special coercions apply when JSON values contain binary data and they are serialized to character-typed columns.
- binary type - The following MySQL datatypes are known
collectively as the binary types:
- BINARY - Fixed length binary data, (255 bytes max).
- LONGBLOB - Binary data up to (232 - 1) bytes long.
- MEDIUMBLOB - Binary data up to (224 - 1) bytes long.
- BLOB - Binary data up to (216 - 1) bytes long.
- TINYBLOB - A small blob column (255 bytes max).
- VARBINARY - Variable length binary data, up to (216 - 1) bytes long.
- utf8 character type - The following MySQL datatypes are known
collectively as the utf8 character types. They all involve the
1-4 byte utf8mb4
character encoding (or its proper subsets, ascii and utf8):
- CHAR - Fixed length character data, using the ascii, utf8, or utf8mb4 character sets.
- LONGTEXT - Character data up to (232 - 1) bytes long, using the ascii, utf8, or utf8mb4 character sets.
- MEDIUMTEXT - Character data up to (224 - 1) bytes long, using the ascii, utf8, or utf8mb4 character sets.
- TEXT - Character data up to (216 - 1) bytes long, using the ascii, utf8, or utf8mb4 character sets.
- TINYTEXT - A small text column (255 bytes max), using the ascii, utf8, or utf8mb4 character sets.
- VARCHAR - Variable length character data, using the ascii, utf8, or utf8mb4 character sets.
- json document - This is what RFC 7159 calls
"json text": a well-formed string of json tokens.
- sensible - A sensible json document is one which is valid and which contains no object whose members have the same key names.
- target release - This is the release which will contain this functionality. The current plan is that this will be 5.7.
- valid - The draft Standard defines "valid" json text as a character string which is well-formed according to the grammar in RFC 7159. The JSON datatype is by definition "valid". We also use "valid" to refer to well-formed path expressions. In short, valid means well-formed.
Note that JSON strings are case-sensitive. This is described by section 8.3 (String Comparison) of RFC 7159. So, for example, "lastName" and "lastname" identify different members of a JSON object.
In this spec, we adopt the following convention for displaying DATETIME values:
- internal - When we mean to show an internal representation of a DATETIME, we print it without any quotes and with a ':' rather than a space separating the day part from the seconds part. So, for instance, 2014-11-10:00:00:00.000000.
- string - When we mean to show a DATETIME as a string, as for instance as the result of a CAST( datetimeCol AS CHAR), then we use double-quotes and use a space to separate the day part from the seconds part. So for instance, "2014-11-04 00:00:00.000000".
Path Syntax
Various functions require a path expression in order to drill down to a specific element in a document. In these paths, the leading $ character represents the document under consideration. Path legs are separated by periods. Cells in arrays are represented by [N], where N is a non-negative integer. Path expressions, like json text, should be encoded using the ascii, utf8, or utf8mb4 character sets. Other character encodings are implicitly coerced to utf8mb4. The full path syntax follows:
pathExpression> ::= scope [ ( pathLeg )* ] scope ::= [ columnReference ] dollarSign columnReference ::= [ [ databaseIdentifier period ] tableIdentifier period ] columnIdentifier databaseIdentifier ::= sqlIdentifier tableIdentifier ::= sqlIdentifier columnIdentifier ::= sqlIdentifier pathLeg ::= member | arrayLocation | doubleAsterisk member ::= period ( keyName | asterisk ) arrayLocation ::= leftBracket ( non-negative-integer | asterisk ) rightBracket keyName ::= ECMAScript-identifier | double-quoted-string-literal doubleAsterisk ::= **
For the functions described in this spec, the scope is the function's document argument. A columnReference is meaningless. For these functions, scope is always "$". An attempt to qualify the scope with a columnReference will raise an error.
A path may not end in **. A path may not contain ***.
The wildcard * and ** tokens have the following meanings:
- The path leg, .* means "the values of all members in the object".
- The path leg, [*] means "the values of all cells in the array".
- The path expression prefix**suffix is an ellipsis meaning "all paths which begin with prefix and end with suffix". So for instance, given the path expression $**.a, the prefix is $, the suffix is .a, so $[3].b[2].c.a would be a qualifying path.
Key names match as follows:
- unquoted - An unquoted key name in a path expression is exactly matched to member names in documents.
- quoted - Framing double-quotes are stripped off a key name in a path expression. Then the stripped result is exactly matched to member names in documents. The framing double-quotes allow you to match member names which are not legal ECMAScript identifiers.
Given the following document...
{ "a": [ [ 3, 2 ], [ { "c" : "d" }, 1 ] ], "b": { "c" : 6 }, "one potato": 7, "b.c" : 8 }
...the following paths evaluate like this...
$.a[0] evaluates to [ 3, 2 ] $.a[0][1] evaluates to 2 $.a[1] evaluates to [ { "c" : "d" }, 1 ] $.a[1][0] evaluates to { "c" : "d" } $.a[1][0].c evaluates to "d" $."one potato" evaluates to 7 $.b.c evaluates to 6 $."b.c" evaluates to 8
And given the following document...
[ 3, { "c" : "d" }, 1 ]
...the following paths evaluate like this...
$[0] evaluates to 3 $[1] evaluates to { "c" : "d" } $[1].c evaluates to "d"
And given the following document...
{ "a" : { "b" : "c" }, "d" : { "b" : "e" }, "f" : { "b" : "g", "h" : { "i" : { "j" : "k", "l" : "m" } } } }
...the following paths evaluate like this...
$.f**.j evaluates to "k" $.f**.i.* evaluates to [ "k", "m" ]
And given the following document...
{ "f" : [ { "b" : "g", "m" : { "k": "n" } }, true, [ "i", "j", { "k" : "l" } ] ] }
...the following paths evaluate like this...
$.f[2][*].k evaluates to "l" $.f**.k evaluates to [ "n", "l" ]
Array Wrapping
The Standard defines a convention by which values can be "wrapped" with square brackets in order to convert them into json arrays. This behavior is described at a high level in section 4.46.5 ("Overview of SQL/JSON path language") and in detail in section 9.35 ("SQL/JSON path language: syntax and semantics") GR7. Scalar values are wrapped as single-element arrays in contexts where they must be combined with arrays. If a pathExpression identifies a non-array value, then pathExpression[ 0 ] evaluates to the same value as pathExpression.
Normalization
When a string type is used as an ANY_JSON argument, the string is parsed into a JSON value. An error is raised if the string is not a valid json document. Otherwise, the parsed value is normalized so that it represents a sensible json document. That is, duplicate members are removed. The first (left-most) key/value pair is retained and subsequent members with the same name are thrown away. It is our intention that JSON values are always both valid and sensible. Functions which construct JSON values always construct sensible values.
ANY_JSON
Valid Json documents can be stored on disk in columns whose type can be any of the following:
- JSON - This is a new datatype introduced by target release. It holds normalized json documents in a structure which can be searched efficiently and incrementally deserialized.
- utf8 character types - Any of the utf8 character types.
The syntax descriptions below use the pseudo-type ANY_JSON to indicate that an argument's type can be any of the json-bearing datatypes above.
When a utf8 character type is passed as an ANY_JSON argument, it is implicitly CAST into a JSON value. That is, it is parsed as though it were a json document. If it fails to parse, then an error is raised. Other character encodings are implicitly coerced into utf8mb4 when they are used as ANY_JSON arguments.
Warning! When a utf8 character type is CAST into a JSON value, any inexact numerics inside it are treated as DOUBLEs. This means that money values may lose precision.
MySQL supports two other so-called "string" datatypes. Supporting these types will be considered a stretch goal.
- ENUM - An enumerated type represented internally as a 16 bit integer.
- SET - A set of values represented internally as a 64 bit integer.
ANY_JSON_ATOM
JSON values can be built up out of any MySQL types, including other JSON-typed values.
An error is raised if an inexact numeric Infinity or NaN is passed as an ANY_JSON_ATOM argument.
When a utf8 character type is passed as an ANY_JSON_ATOM argument, it is NOT implicitly CAST into a JSON value. If the user wants the utf8 character type to be treated as a JSON value, then it must be wrapped with an explicit CAST( value AS JSON ) clause. Other character encodings are implicitly coerced into utf8mb4 before they are explicitly CAST to JSON for use as ANY_JSON_ATOM arguments.
When an integer is passed as an ANY_JSON_ATOM argument, we will try to determine if it represents a boolean. If so, it will be treated as a true or false literal. If not, it will be treated as an integer. The following constructs are treated as booleans:
- Boolean expressions built from AND, OR, NOT, XOR, and the following operators, predicates, and functions.
- The true and false literals.
- Comparison operators: <, <=, >, >=, =, <=>, !=, <>, IS NULL, IS NOT NULL, IS NOT, REGEXP.
- The coalescing NULLIF operator, when all branches are boolean values.
- These predicates:
- between predicate
- in predicate
- like predicate
- quantified comparison predicate
- exists predicate
- Boolean-valued functions like:
- JSON_VALID()
- JSON_CONTAINS_PATH()
- GTID_SUBSET()
A JSON value may need to be serialized into a string format. This may happen when storing a JSON value in a utf8 character type column. This may also happen when sending clients the results of ad-hoc queries (as opposed to PREPAREd queries); this is because ad-hoc query results are transported to the client over a text-based protocol, not over the binary protocol used by PREPAREd statements. When a JSON value is serialized into a string format, the following happens to its ANY_JSON_ATOMs:
- null - Nulls are treated as the json null literal.
- string - String ANY_JSON_ATOMs are double-quoted.
- boolean, numeric - These ANY_JSON_ATOMs are converted to string form as though they were CAST to a utf8 character type.
- DATETIME - DATETIME values are serialized into double-quoted strings of the form "YYYY-MM-DD HH:MM:SS[.fraction]". In the reverse direction, strings of that form are parsed into DATETIME values when they are included in json documents.
- TIMESTAMP - TIMESTAMP values are serialized into the same string format as DATETIME values. Note, however, that in the reverse direction, those strings are parsed back into DATETIME values. This is because DATETIMEs have a broader range than TIMESTAMPs.
- DATE - As a stretch goal, DATE will be serialized into double-quoted strings of the form "YYYY-MM-DD". In the reverse direction, strings of that form will be parsed into DATE values when they appear in json documents.
- TIME - As a stretch goal, TIME values will be serialized into double-quoted strings of the form "HH:MM:SS[.fraction]". In the reverse direction, strings of that form will be parsed into TIME values when they appear in json documents.
- The expression (a) is treated as the scalar a.
- An error is raised when a multi-column row like (a, b) is used as a JSON scalar.
- other - All other ANY_JSON_ATOMs are serialized into double-quoted strings of this form: "base64:typeXX:<binary data>". Here XX is a type id number and <binary data> is a base64 encoding of the ANY_JSON_ATOM's binary value.
CASTs to/from JSON Values
JSON values can be explicitly CAST as follows:
other type | CAST ( otherType to JSON) | CAST (JSON to otherType) |
JSON | nop | nop |
utf8 character type | The string is parsed into a JSON value. | The JSON value is serialized into a string. |
other character types | Other character encodings are implicitly coerced to utf8mb4 and then treated as described above for utf8 character types. | The JSON value is serialized into utf8mb4 and then that string is CAST to the other character encoding. This may result in gibberish. |
untyped NULL | Results in a NULL value of type JSON | - |
geometry and subtypes | The geometry value is converted into a JSON document by calling ST_AsGeoJSON. | Illegal CAST. The workaround is ST_GeomFromGeoJSON( CAST( jsonValue AS CHAR ) ) |
all other types | Results in a JSON document consisting of one, scalar value. | Succeeds if the JSON document consists of a single, scalar value of the target type and that scalar value can be CAST to the target type. Otherwise, returns a NULL of the target type and raises a warning (but not an error). |
When a JSON value is put into a target value whose type is known, the JSON value is implicitly CAST to the target value using the rules described above.
For example:
CREATE TABLE casts( descr VARCHAR( 20 ), jsonColumn JSON ); INSERT INTO casts VALUES ( 'scalar int', CAST( 1 AS JSON ) ), ( 'scalar datetime', CAST( CURRENT_TIMESTAMP AS JSON ) ), ( 'scalar boolean', CAST( true AS JSON ) ), ( 'scalar bitstring', CAST( b'1010' AS JSON ) ), ( 'scalar string', CAST( '"abc"' AS JSON ) ), ( 'scalar int 2', CAST( '2' AS JSON ) ), ( 'array', CAST( '[ 1, "abc" ]' AS JSON ) ) ; # returns the integer 1 SELECT CAST( jsonColumn AS UNSIGNED ) FROM casts WHERE descr = 'scalar int'; # returns, say, the datetime value 2014-11-10:00:00:00.000000 SELECT CAST( jsonColumn AS DATETIME ) FROM casts WHERE descr = 'scalar datetime'; # returns NULL SELECT CAST( jsonColumn AS UNSIGNED ) FROM casts WHERE descr = 'array'; # returns the string '"abc"' SELECT CAST( jsonColumn AS CHAR ) FROM casts WHERE descr = 'scalar string'; # returns the string '[ 1, "abc" ]' SELECT CAST( jsonColumn AS CHAR ) FROM casts WHERE descr = 'array';
Implicit casting only occurs when UPDATE/INSERTing a string value into a JSON column. So...
# errors INSERT INTO casts VALUES( 'scalar int', 1 ); INSERT INTO casts VALUES( 'scalar datetime', CURRENT_DATE ); INSERT INTO casts VALUES( 'scalar boolean', true ); INSERT INTO casts VALUES( 'scalar bitstring', b'1010' ); # these succeed INSERT INTO casts VALUES( 'scalar string', '"abc"' ); INSERT INTO casts VALUES( 'scalar int', '2' ); INSERT INTO casts VALUES( 'array', '[ 1, "abc" ]' );
Note that the following statement may produce gibberish in a session/database whose default character set is something other than ascii, utf8, or utf8mb4:
SELECT CAST( jsonColumn AS CHAR ) FROM casts;
The following, more verbose syntax is recommended for such a session/database:
SELECT CAST( jsonColumn AS CHAR CHARACTER SET 'utf8mb4' ) FROM casts;
Aggregates on JSON Values
When aggregating JSON values, SQL NULLs are ignored (as is done with other datatypes). With the exception of the MAX, MIN, and GROUP_CONCAT aggregates, the JSON value is converted to a result numeric type and then passed to the aggregate operator. For JSON values which are numeric scalars, the conversion to a numeric type should be sensible, although truncation may occur and precision may be lost. For JSON values which are not numeric scalars, the conversion to a numeric type is repeatable (deterministic) but not defined here.
Aggregate | Result data type | Other behavior |
AVG | DOUBLE | - |
BIT_AND | BIGINT UNSIGNED | - |
BIT_OR | BIGINT UNSIGNED | - |
BIT_XOR | BIGINT UNSIGNED | - |
COUNT | BIGINT | No conversion is performed. Values are simply counted. |
COUNT(DISTINCT) | BIGINT | No conversion is performed. Values are simply counted. Distinctness is determined by the comparison rules described in worklog WL#8249. |
GROUP_CONCAT | VARCHAR | - |
MAX | JSON | Order is determined by the comparison rules described in worklog WL#8249. |
MIN | JSON | Order is determined by the comparison rules described in worklog WL#8249. |
STD | DOUBLE | - |
STDDEV_POP | DOUBLE | - |
STDDEV_SAMP | DOUBLE | - |
STDDEV | DOUBLE | - |
VAR_POP | DOUBLE | - |
VAR_SAMP | DOUBLE | - |
VARIANCE | DOUBLE | - |
SUM | DOUBLE | - |
UDFs and JSON Values
Developers may want to process JSON data with user-defined functions which are coded in C or in another language which can use C calling conventions. UDFs support a limited spectrum of datatypes: STRING, INTEGER, REAL, and DECIMAL. Typically, JSON data will pass to/from UDFs as STRINGs. Note that these STRINGs appear to MySQL as char arrays with the vacuous "binary" character encoding. In order to use UDF return values as JSON values, you must declare their character encoding with a CAST expression. E.g.:
CREATE FUNCTION make_json_document RETURNS STRING SONAME "my_udfs.so"; INSERT INTO json_table( doc ) VALUES ( CAST ( make_json_document() AS CHAR CHARACTER SET gb2312 ) );
JSON Functions
We introduce the following new functions.
JSON_VALID()
This method returns true if the indicated text is a valid json document. This method has the same meaning as the Standard's IS JSON predicate.
BOOLEAN JSON_VALID( JSON doc ) BOOLEAN JSON_VALID( TEXT doc )Returns:
- null - Returns NULL if doc is NULL.
- true - Returns true if doc is already a JSON value. Also returns true if doc is a string type and parses as a valid json document.
- false - Otherwise returns false.
For example:
# returns 1 SELECT JSON_VALID( '{ "firstName" : "Fred", "lastName" : "Flintstone" }' ); # returns 1 SELECT JSON_VALID( '3' ); # returns NULL as IS JSON would SELECT JSON_VALID( null );JSON_TYPE()
This method returns the type of a JSON value.
TINYTEXT JSON_TYPE( ANY_JSON doc )Returns:
- null - Returns NULL if doc is NULL.
- typename - Otherwise, returns one of the following strings, encoded in the utf8mb4 character set with the utf8mb4_bin collation:
- Purely JSON types:
- "OBJECT" - JSON objects.
- "ARRAY" - JSON arrays.
- "BOOLEAN" - The JSON/MySQL literals true and false.
- "NULL" - The JSON null literal.
- Numeric types:
- "INTEGER" - MySQL [ UNSIGNED ] TINYINT, [ UNSIGNED ] SMALLINT, [ UNSIGNED ] INT and [ UNSIGNED ] BIGINT scalars.
- "DOUBLE" - MySQL FLOAT, DOUBLE, NUMERIC, and DECIMAL scalars.
- Date/time types:
- "DATETIME"- MySQL DATETIME and TIMESTAMP scalars.
- "DATE" - MySQL DATE scalars.
- "TIME" - MySQL TIME scalars.
- String types:
- "STRING" - MySQL utf8 character type scalars, including ENUM and SET scalars.
- Binary types:
- "BLOB" - MySQL binary type scalars.
- "BIT" - MySQL BIT scalars, regardless of length.
- Other types:
- "OPAQUE" - All other types.
Behavior:
- invalid - Raises an error if doc is not a valid json document.
For example:
INSERT INTO orders( orderID, quantity ) SELECT r.orderID, CASE( JSON_TYPE( r.doc ) ) WHEN "INTEGER" THEN CAST( r.doc AS UNSIGNED INT ) WHEN "OBJECT" THEN CAST( JSON_EXTRACT( r.doc, '$.quantity' ) AS UNSIGNED INT ) ELSE NULL END FROM rawOrders r;JSON_KEYS()
This method returns an array of all keys in the indicated object.
JSON JSON_KEYS( ANY_JSON doc [, TEXT path ] )Returns:
- null - Returns NULL if any of the arguments are NULL or if doc+path does not locate an object.
- empty - Returns an empty array if doc+path locates an empty object.
- array - Otherwise returns an array of keys.
Behavior:
- invalid - Raises an error if doc is not a valid json document or if the path is not a valid path expression.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
For example:
# returns ["a", "b"] SELECT JSON_KEYS ( '{ "a" : "foo", "b" : [ true, { "c" : "123" } ] }' ); # returns [] SELECT JSON_KEYS ( '{ "a" : "foo", "b" : [ true, { "c" : {} } ] }', '$.b[1].c' ); # returns NULL SELECT JSON_KEYS ( '{ "a" : "foo", "b" : [ true, { "c" : {} } ] }', '$.a.b[2]' );JSON_EXTRACT()
This method returns the value of the document element identified by the indicated path(s).
JSON JSON_EXTRACT( ANY_JSON doc, TEXT path [ (,TEXT path)* ] )Returns:
- null - Returns NULL if any of the arguments are NULL or if the paths don't locate any values in the document.
- value - Otherwise, if there is only one path expression AND that path expressions doesn't contain a wildcard or ellipsis token, then the match is returned. It is NOT wrapped in an array.
- array - Otherwise, the match(es) are returned, wrapped in an array. The elements in the array occur in the order of the
corresponding path arguments.
The effect of these rules can be summed up as follows: If the path arguments COULD locate multiple matches on SOME document, then the result(s) are wrapped in an array.
Behavior:
- invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression.
For example:
# returns a JSON value containing just the string "123" SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : "123" } ] }', '$.b[ 1 ].c' ); # returns a JSON value containing just the number 123 SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '$.b[ 1 ].c' ); # raises an error because the document is not valid SELECT JSON_EXTRACT ( '{ "a" : [ }', '$.b[ 1 ].c' ); # raises an error because the path is invalid SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '$.b[ 1 ].' ); # returns a JSON value containing the number 123 (because of auto-wrapping) SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '$.b[ 1 ].c[ 0 ]' ); # returns null because the path, although valid, does not identify a value SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '$.b[ 1 ].c[ 1 ]' ); # returns a JSON value containing the number 123 (due to normalization) SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }', '$.b[ 1 ].c' ); # returns a JSON array ["foo", true] SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }', '$.a', '$.b[0]' ); # returns a the 'true' literal wrapped in an array SELECT JSON_EXTRACT ( '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }', '$.d', '$.b[0]' ); # returns NULL because no matches are found SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[*].b' ) jdoc; # returns NULL because no matches are found SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[0].b' ) jdoc; # returns the scalar 1 SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[0].a' ) jdoc; # returns [1, 2] (multiple matches wrapped in an array) SELECT JSON_EXTRACT( '[ { "a": 1 }, { "a": 2 } ]', '$[*].a' ) jdoc; # returns [1] because multiple matches are possible SELECT JSON_EXTRACT( '[ { "a": 1 }, { "b": 2 } ]', '$[*].a' ) jdoc; # returns [3, 4] SELECT JSON_EXTRACT( '[ { "a": [3,4] }, { "b": 2 } ]', '$[0].a' ) jdoc; # returns [[3, 4]]. an extra array wraps the result because multiple matches are possible SELECT JSON_EXTRACT( '[ { "a": [3,4] }, { "b": 2 } ]', '$[*].a' ) jdoc; # returns [[3, 4]]. an extra array wraps the result because multiple matches are possible SELECT JSON_EXTRACT( '[ { "a": [3,4] }, { "b": 2 } ]', '$[0].a', '$[1].a' ) jdoc;It should be possible to CAST a JSON_EXTRACT() result to CHAR type in order to run full-text searches and build function indexes on json documents. That is, to search and index expressions like this:
SELECT CAST( JSON_EXTRACT( ...) AS CHAR )JSON_REMOVE()
This function removes the elements identified by the paths. The function takes one or more path clauses and processes them one after the other.
JSON JSON_REMOVE( ANY_JSON doc, TEXT path [ (, TEXT path)* ] )Returns:
- null - Returns NULL if any of the arguments is NULL.
- original document - Returns the original document if none of the paths identifies an element.
- value - Otherwise, a smaller document is returned, the result of removing the values located by the paths.
Behavior:
- invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression. Raises an error when any of the path expressions is $.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
Each path is applied in order, producing a new document, which is then operated on by the next path. So...
JSON_REMOVE( doc, path1, path2 ) = JSON_REMOVE( MYSJON_REMOVE( doc, path1 ), path2 )For example:
# returns the document {"a": "foo", "b": [true]} SELECT JSON_REMOVE ( '{"a" : "foo", "b" : [true, {"c" : 123}]}', '$.b[ 1 ]' ); # returns {"a": "foo", "b": [true, {}]} due to normalization SELECT JSON_REMOVE ( '{ "a" : "foo", "b" : [ true, { "c" : 123, "c" : 456 } ] }', '$.b[ 1 ].c' ); # returns {"a": "foo", "b": [true, {}]} SELECT JSON_REMOVE ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '$.b[ 1 ].c' ); # returns the original document because the path doesn't identify an element SELECT JSON_REMOVE ( '{ "a" : "foo", "b" : [ true, { "c" : 123, "d" : 456 } ] }', '$.b[ 1 ].e' );JSON_APPEND()
This method updates a document, appending values to the end of the indicated arrays.
JSON JSON_APPEND ( ANY_JSON doc, TEXT path, ANY_JSON_ATOM value [ (, TEXT path, ANY_JSON_ATOM value)* ] )Returns:
- null - Returns NULL if any of the arguments are null.
- document - Otherwise returns a valid JSON value produced by appending the indicated values at the end of the indicated arrays. Scalars and objects, identified by paths, are auto-wrapped as is done for the sister functions JSON_INSERT(), JSON_SET(), and JSON_REPLACE().
- original - May return a document which is identical to the original document.
Behavior:
- invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...
JSON_APPEND ( doc, path1, value1, path2, value2 ) = JSON_APPEND ( JSON_APPEND ( doc, path1, value1 ), path2, value2 )For example:
# should return {"a": "foo", "b": ["bar", 4], "c": ["wibble", "grape"]} due to autowrapping SELECT JSON_APPEND ( '{ "a" : "foo", "b" : "bar", "c" : "wibble" }', '$.b', 4, '$.c', "grape" ); # should return {"a": "foo", "b": [1, 2, 3, 4], "c": ["apple", "pear", "grape"]} SELECT JSON_APPEND ( '{ "a" : "foo", "b" : [ 1, 2, 3 ], "c" : [ "apple", "pear" ] }', '$.b', 4, '$.c', "grape" );JSON_SET()
This method updates a document. Replaces the indicated value if it exists. Inserts the indicated value if it doesn't exist.
JSON JSON_SET ( ANY_JSON doc, TEXT path, ANY_JSON_ATOM value [ (, TEXT path, ANY_JSON_ATOM value)* ] )Returns:
- null - Returns NULL if any of the arguments are null.
- document - Otherwise returns a valid JSON value produced by inserting or replacing the indicated values at the indicated locations.
- original - May return a document which is identical to the original document.
Behavior:
- invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
- insert - As discussed above, the only values which can be inserted are new members in existing objects or new cells in existing (possibly auto-wrapped) arrays.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
The following values can be inserted into a document. If the path does not identify one of these types of values, then the insert is ignored.
- A new member in an existing object.
- A new array slot past the end of an existing value. If the value is not an array, it is auto-wrapped as a single-element array. The existing array is extended by one new slot.
Given the following document...
{ "a" : "foo", "b" : [ 1, 2, 3 ] }...the following paths have these meanings...
# this path is possible because it references a new member of an existing object $.c # this path is synonymous with $.a (see the section on <a href="#Array Wrapping">Array Wrapping</a> above). $.a[0] # a request to insert at this location would auto-wrap the current # value of $.a and add an additional slot to that array $.a[1] # this path is an impossible reference to an array as though it were # an object. a request to insert at this location would be ignored. $.b.c # this path references an array cell at an index which doesn't exist yet # but which could be created in order to satisfy an insert request $.b[3]When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...
JSON_SET ( doc, path1, value1, path2, value2 ) = JSON_SET ( JSON_SET ( doc, path1, value1 ), path2, value2 )For example:
# returns {"a": {}, "b": [1, 2, 3]} SELECT JSON_SET ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.a', JSON_OBJECT() ); # returns {"a": "foo", "b": [1, 2, 3], "c": [true, false]} SELECT JSON_SET ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.c', JSON_ARRAY( true, false ) ); # returns {"a": "foo", "b": [1, 2, 3], "c": [true, false]} SELECT JSON_SET ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.c', JSON_ARRAY( CAST( 'true' AS JSON ), CAST( 'false' AS JSON ) ) ); # returns [1, 2] SELECT JSON_SET ( '1', '$[3]', 2 ); # returns {"a": {"b": false, "c": true}} SELECT JSON_SET ( '{ "a" : "foo"}', '$.a', JSON_OBJECT( 'b', false ), '$.a.c', true );JSON_INSERT()
This method updates a document. If the indicated value does NOT exist, then it is inserted. But if the value does exist, then nothing is done; the existing value is NOT overridden.
JSON JSON_INSERT ( ANY_JSON doc, TEXT path, ANY_JSON_ATOM value [ (, TEXT path, ANY_JSON_ATOM value)* ] )Returns:
- null - Returns NULL if any of the arguments are null.
- document - Otherwise returns a valid JSON value produced by inserting the indicated values at the indicated locations.
- original - May return a document which is identical to the original document.
The following values can be inserted into a document. If the path does not identify one of these types of values, then the insert is ignored.
- A new member in an existing object.
- A new array slot past the end of an existing value. If the value is not an array, then it is auto-wrapped as a single-element array. The existing array is extended by one new slot.
Behavior:
- invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
- insert - As discussed above, the only values which can be inserted are new members in existing objects or new slots in existing (possibly auto-wrapped) arrays.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...
JSON_INSERT ( doc, path1, value1, path2, value2 ) = JSON_INSERT ( JSON_INSERT ( doc, path1, value1 ), path2, value2 )For example:
# returns the original document because the path does exist SELECT JSON_INSERT ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.a', true ); # inserts a number, returns '{"a": "foo", "b": [1, 2, 3], "c": 123} SELECT JSON_INSERT ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.c', 123 ); # inserts a string, returns '{"a": "foo", "b": [1, 2, 3], "c": "123"} SELECT JSON_INSERT ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.c', '123' ); # returns '{"a": ["foo", true], "b": [1, 2, 3]}' SELECT JSON_INSERT ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.a[1]', true ); # returns {"a": "foo", "b": true} SELECT JSON_INSERT ( '{ "a" : "foo"}', '$.b', true, '$.b', false );JSON_ARRAY_INSERT()
This method updates a document. If a path identifies a cell in an existing array, then the corresponding new value is inserted into that cell and all trailing cells are right-shifted. Otherwise, nothing is done.
JSON JSON_ARRAY_INSERT ( ANY_JSON doc, TEXT path, ANY_JSON_ATOM value [ (, TEXT path, ANY_JSON_ATOM value)* ] )Returns:
- null - Returns NULL if any of the arguments are null.
- document - Otherwise returns a valid JSON value produced by inserting the indicated values at the indicated cell positions.
- original - May return a document which is identical to the original document.
Behavior:
- invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
- cell identifier - Raises an error if the last leg in the path expression is NOT a cell identifier.
- big index - If the cell identifier indicates a position past the end of the array, then the new value is simply inserted at the end of the existing array.
- auto-wrapping - Scalars and objects are NOT auto-wrapped.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...
JSON_ARRAY_INSERT ( doc, path1, value1, path2, value2 ) = JSON_ARRAY_INSERT ( JSON_ARRAY_INSERT ( doc, path1, value1 ), path2, value2 )For example:
# raises an error because the path doesn't end in a cell identifier SELECT JSON_ARRAY_INSERT ( '{ "a": [ 1, 2, 3 ] }', '$.a', 4 ); # returns { "a": [ 4, 1, 2, 3 ] } SELECT JSON_ARRAY_INSERT ( '{ "a": [ 1, 2, 3 ] }', '$.a[ 0 ]', 4 ); # returns { "a": [ 1, 2, 4, 3 ] } SELECT JSON_ARRAY_INSERT ( '{ "a": [ 1, 2, 3 ] }', '$.a[ 2 ]', 4 ); # returns { "a": [ 1, 2, 3, 4 ] } SELECT JSON_ARRAY_INSERT ( '{ "a": [ 1, 2, 3 ] }', '$.a[ 100 ]', 4 ); # no auto-wrapping. returns { "a": true } SELECT JSON_ARRAY_INSERT ( '{ "a": true }', '$.a[ 0 ]', false ); # raises an error because of the wildcard SELECT JSON_ARRAY_INSERT ( '[ [ 1, 2, 3 ], [ 4, 5, 6 ] ]', '$[*][0]', false );JSON_REPLACE()
This method updates a document. If the indicated value exists, then it is replaced with the new value. But nothing happens if the value doesn't exist.
JSON JSON_REPLACE ( ANY_JSON doc, TEXT path, ANY_JSON_ATOM value [ (, TEXT path, ANY_JSON_ATOM value)* ] )Returns:
- null - Returns NULL if any of the arguments are null.
- document - Otherwise returns a valid JSON value produced by replacing the indicated values at the indicated locations.
- original - May return a document which is identical to the original document.
Behavior:
- invalid - Raises an error if doc is not null and is not a valid json document. Raises an error if any of the paths is not a valid path expression.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
When multiple path/value pairs are supplied, they are evaluated in order. The first path/value pair is evaluated and a new document is created. The second path/value pair is then evaluated against that new document. So...
JSON_REPLACE ( doc, path1, value1, path2, value2 ) = JSON_REPLACE ( JSON_REPLACE ( doc, path1, value1 ), path2, value2 )For example:
# returns the original document because the path doesn't exist SELECT JSON_REPLACE ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.c', true ); # returns '{"a": true, "b": [1, 2, 3]}' SELECT JSON_REPLACE ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.a[0]', true ); # returns the original document because the path doesn't exist SELECT JSON_REPLACE ( '{ "a" : "foo", "b" : [ 1, 2, 3 ] }', '$.b[5]', true );JSON_MERGE()
This function merges two or more json documents into a single document.
JSON JSON_MERGE( ANY_JSON doc, ANY_JSON doc2 [ (, ANY_JSON otherdocs )* ] )Returns:
- null - Returns NULL if any of the arguments are null.
- document - Otherwise returns a sensible JSON value produced by merging the documents as described below.
Behavior:
- invalid - Raises an error if any of the documents isn't valid.
- scalars - If any of the documents that are being merged is a scalar, each scalar document is autowrapped as a single value array before merging.
- arrays - When merging a left element with a right element, if both elements are arrays, then the result is the left array concatenated with the right array. For instance, [ 1, 2 ] merged with [ 3, 4 ] is [ 1, 2, 3, 4 ].
- array and object - When merging an array with an object, the object is autowrapped as an array and then the rule above is applied. So [ 1, 2 ] merged with { "a" : true } is [ 1, 2, { "a": true } ].
- objects - When merging two objects, the two objects are concatenated into a single, larger object. So { "a" : "foo" } merged with { "b" : 5 } is { "a" : "foo", "b" : 5 }.
- duplicates - When two objects are merged and they share a key, the values associated with the shared key are merged.
For example:
# returns [{"a": "foo", "b": [true, {"c": 123}]}, 5, 6] SELECT JSON_MERGE ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '[ 5, 6]' ); # returns {"a": "foo", "b": [true, {"c": 123}, false, 34]} SELECT JSON_MERGE ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '{ "b": [ false, 34 ] }' ); # returns {"a": "foo", "b": [true, {"c": 123}, "bar"]} SELECT JSON_MERGE ( '{ "a" : "foo", "b" : [ true, { "c" : 123 } ] }', '{ "b": "bar" }' ); # returns {"a": {"b": 1, "c": 1}} SELECT JSON_MERGE ( '{ "a" : { "b" : 1 } }', '{ "a" : { "c" : 1 } }' );JSON_SEARCH()
This function returns path(s) to the given string. The returned path(s) identify object members or array slots which are character strings. If the string occurs more than once in the search scope (either the whole document or the given paths), and if the oneOrAll argument is "one", then one matching path is returned; but which one is not defined. If the oneOrAll argument is "all" and more than one string matches, then all matching paths are returned in a JSON array. If path arguments are supplied, then the search is performed only inside those paths.
JSON JSON_SEARCH ( ANY_JSON doc, TINYTEXT oneOrAll, TEXT searchString [ , CHAR( 1 ) escapeCharacter [ (, TEXT path)* ] ] )Returns:
- null - Returns NULL if doc or searchString is null. Returns NULL if any of the path arguments is null. Returns NULL if all of the paths are non-existent. Returns NULL if the string isn't found.
- escapeCharacter - If you want to search for strings containing the special % or _ wildcard characters, then you can prefix the % or _ with the default escape character, '\'. If you need to change the escape character, then you can specify a non-null escapeCharacter argument. Specifying null for this argument has the same meaning as specifying '\'.
- paths - Otherwise, returns a double-quoted string or array of double-quoted strings which are further refinements of the given paths. The returned paths identify object members or array slots holding the matched strings. The oneOrAll argument determines whether the search terminates on the first match or whether all matches are returned. The double-quoted strings are valid path expressions when they are stripped of their quotes by using JSON_UNQUOTE().
The oneOrAll argument may take the following values:
- one - Terminate the search quickly, on the first match. It is undefined which match is considered first. The function returns a JSON value containing one path string.
- all - Return all matching paths. The function returns a JSON array containing all matching path strings. The order of the matching paths is not defined.
The % character matches any number of characters (even 0), as it does for the LIKE operator. So, for example:
- abc - Matches only the exact string "abc".
- abc% - Matches "abc", "abcd", etc..
- a%bc - Matches "abc", "afoobc", etc..
The _ character matches exactly one character, as it does for the LIKE operator. So, for example:
- a_c - Matches "abc", "a2c", etc..
- a_b_c - Matches "a1b2c", "a3b4c", etc..
Behavior:
- invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression.
- constant - Raises an error if escapeCharacter is not a constant expression.
- empty - If no path qualifiers are given, then the whole document is searched.
For example:
# returns null because numeric values don't match string values SELECT JSON_SEARCH ( '{ "a" : 123, "b" : [ 123, 456 ] }', 'one', '123' ); # returns "$.b[2]" SELECT JSON_SEARCH ( '{ "a" : "123", "b" : [ 123, "789", "123", "456", "123" ] }', 'one', '123', null, '$.b' ); # could return either "$.a" or "$.b.key" SELECT JSON_SEARCH ( '{ "a" : "123", "b" : { "key" : "123" } }', 'one', '123' ); # returns "$.b.key" SELECT JSON_SEARCH ( '{ "a" : "1243", "b" : { "key" : "1234" } }', 'one', '123%' ); # returns "$.b.c" SELECT JSON_SEARCH ( '{ "a" : "1243", "b" : { "key" : "1234", "c": "directorysub%directoryabc" } }', 'one', 'dir%torysub@%dir%', '@' ); # returns null because the path doesn't exist SELECT JSON_SEARCH ( '{ "a" : "1243", "b" : { "key" : "1234" } }', 'one', '123%', null, '$.c' ); # returns $."one potato" SELECT JSON_UNQUOTE ( JSON_SEARCH ( '{ "onepotato": "foot", "one potato": "food" , "one \\"potato": "fool" }', 'all', 'food' ) );JSON_CONTAINS()
This function checks whether a candidate JSON document is contained in a target document or in the piece of the target document identified by a path. See below for a definition of containment.
BOOLEAN JSON_CONTAINS ( ANY_JSON target, ANY_JSON candidate, [ , TEXT path ] )Returns:
- null - Returns NULL if any of the arguments are null.
- missing - Returns NULL if the path does not identify a section of the target document.
- true - Returns true if the candidate document is contained in the sub-document identified by target and path.
- false - Returns false if the candidate document is NOT contained in the path- identified sub-document of target.
Behavior:
- invalid - Raises an error if either target or candidate is not a valid json document or if the path is not a valid path expression.
- wildcards - Raises an error if the path expression contains wildcard or ellipsis tokens.
Containment is defined as follows:
- scalar - A candidate scalar is contained in a target scalar iff they are comparable (defined below) and are equal.
- array - A candidate array is contained in a target array iff every cell in the candidate is contained in SOME cell of the target.
- autowrapping - A candidate non-array is contained in a target array iff the candidate is contained in SOME cell of the target.
- object - A candidate object is contained in a target object iff for each key CK in the candidate...
- There is a key TK with the same name in the target.
- And the value bound to CK is contained in the value bound to TK.
- otherwise - If the candidate and the target don't match any of the above rules, then the candidate is NOT contained in the target.
The following table describes which scalar types are comparable. X marks a comparable combination:
- NULL DECIMAL INT UINT DOUBLE STRING BOOLEAN DATE TIME DATETIME TIMESTAMP OPAQUE NULL X - - - - - - - - - - - DECIMAL - X X X X - - - - - - - INT - X X X X - - - - - - - UINT - X X X X - - - - - - - DOUBLE - X X X X - - - - - - - STRING - - - - - X - - - - - - BOOLEAN - - - - - - X - - - - - DATE - - - - - - - X - - - - TIME - - - - - - - - X - - - DATETIME - - - - - - - - - X X - TIMESTAMP - - - - - - - - - X X - OPAQUE - - - - - - - - - - - X For example:
# returns 1 SELECT JSON_CONTAINS ( CAST('[1, 4, 6]' AS JSON), CAST('[1, 6]' AS JSON) ); # returns 1; even with nested cast off elements SELECT JSON_CONTAINS ( CAST('{"person": {"id": 1, "country": "norway"}}' AS JSON), CAST('{"person": {"country": "norway"}}' AS JSON) ); # returns 1; reordering and duplicates are ok SELECT JSON_CONTAINS ( CAST('[1,3,5]' AS JSON), CAST('[5,3,1,5]' AS JSON) ); # return 0; no type conversion is performed SELECT JSON_CONTAINS ( CAST('[3.14]' AS JSON), CAST('[3]' AS JSON) ); # returns 1, due to auto-wrapping SELECT JSON_CONTAINS ( CAST('[1, 2, 3]' AS JSON), CAST(3 AS JSON) );JSON_CONTAINS_PATH()
This function checks for the existence of the specified path. Returns false if none of the paths identifies an existing object member or array slot. Otherwise, returns true or false depending on the oneOrAll directive.
BOOLEAN JSON_CONTAINS_PATH ( ANY_JSON doc, TINYTEXT oneOrAll, TEXT path [ (, TEXT path)* ] )Returns:
- null - Returns NULL if any of the arguments are null.
- true/false - Otherwise, returns true or false depending on the value of the oneOrAll argument and on whether the paths identify actual values in the document.
The oneOrAll argument takes these values:
- all - All of the paths must exist for the function to return true.
- one - At least one of the paths must exist for the function to return true.
Behavior:
- invalid - Raises an error if doc is not a valid json document or if any of the paths is not a valid path expression. Raises an error if oneOrAll is not 'one' or 'all'.
- satisfied - A wildcarded path is said to exist if some concrete expansion of the path exists.
For example:
# returns 0 because there is no element at $.a.c SELECT JSON_CONTAINS_PATH ( '{ "a" : 123, "b" : [ 123, 456 ] }', 'all', '$.a.c', '$.b[1]' ); # returns 1 because there is an element at $.b[1] SELECT JSON_CONTAINS_PATH ( '{ "a" : 123, "b" : [ 123, 456 ] }', 'one', '$.a.c', '$.b[1]' ); # returns 0 because there is no element at the given path SELECT JSON_CONTAINS_PATH ( '{ "a" : 123, "b" : [ 123, 456 ] }', 'all', '$.c' ); # returns 1 because there is an element at $.b[1].c.d SELECT JSON_CONTAINS_PATH ( '{ "a" : 123, "b" : [ 123, { "c" : { "d" : true } } ] }', 'all', '$.b[1].c.d' );JSON_LENGTH()
This function counts the number of members in an object or the number of slots in an array. The function takes an optional path argument. Without the path qualifier, the function operates on the top level object or array. With the path qualifier, the function operates on the object or array identified by the path.
INT JSON_LENGTH ( ANY_JSON doc [, TEXT path ] )Returns:
- null - Returns NULL if any of the arguments is null. Returns NULL if the path does not exist.
- count - Otherwise, returns:
- scalar - Returns 1 if the document and path identify a scalar atom.
- object - If the document and the path identify an object, then returns the number of its members.
- array- If the document and the path identify an array, then returns the number of cells in the array.
Behavior:
- invalid - Raises an error if doc is not a valid json document or if the path is not a valid path expression.
- wildcard - Raises an error if the path expression contains a * or ** wildcard token.
For example:
# returns 0 SELECT JSON_LENGTH ( '{}' ); # returns 1 SELECT JSON_LENGTH ( '3' ); # returns 2 SELECT JSON_LENGTH ( '{ "a" : 123, "b" : [ 123, 456, 789 ] }' ); # returns 3 SELECT JSON_LENGTH ( '{ "a" : 123, "b" : [ 123, 456, 789 ] }', '$.b' ); # returns null because the path does not exist SELECT JSON_LENGTH ( '{ "a" : 123, "b" : [ 123, 456, 789 ] }', '$.c' );JSON_DEPTH()
This function computes the maximum depth of a json document. Any empty object or array has depth 1. An object with members having atomic values has depth 2. And so on.
INT JSON_DEPTH ( ANY_JSON doc )Returns:
- null - Returns NULL if doc is NULL.
- number - Otherwise returns the depth of the document as described above.
Behavior:
- invalid - Raises an error if doc is not a valid json document.
For example:
# returns 1 SELECT JSON_DEPTH ( '{}' ); # returns 1 SELECT JSON_DEPTH ( '[]' ); # returns 1 SELECT JSON_DEPTH( '"abc"' ); # returns 1 SELECT JSON_DEPTH( CAST( '"abc"' AS JSON ) ); # error, 1 is not an ANY_JSON SELECT JSON_DEPTH( 1 ); # error, 'abc' does not parse as a JSON document SELECT JSON_DEPTH( 'abc' ); # returns 1 SELECT JSON_DEPTH( CAST( 1 AS JSON ) ); # returns 2 SELECT JSON_DEPTH ( '{ "a" : true, "b" : false, "c" : null }' ); # returns 2 SELECT JSON_DEPTH ( '[ "a", true, "b" , false, "c" , null ]' ); # returns 2 SELECT JSON_DEPTH ( '{ "a" : true, "b" : {}, "c" : null }' ); # returns 2 SELECT JSON_DEPTH ( '[ "a", true, "b" , {}, "c" , null ]' ); # returns 3 SELECT JSON_DEPTH ( '{ "a" : true, "b" : { "e" : false }, "c" : null }' ); # returns 3 SELECT JSON_DEPTH ( '[ "a", true, "b" , { "e" : false }, "c" , null ]' ); # raises an error because the document is not valid SELECT JSON_DEPTH ( '[ "a", true, "b" , { "e" : false }, "c" , null' );JSON_ARRAY()
This function builds a JSON array out of ANY_JSON_ATOMs.
JSON JSON_ARRAY ( [ ANY_JSON_ATOM atom [ (, ANY_JSON_ATOM atom)* ] ] )Returns a (possibly empty) JSON array.
Behavior:
- null - Atoms which evaluate as SQL NULLs are translated into JSON null literals.
For example:
# returns the empty array: [] SELECT JSON_ARRAY(); # returns ["Accounting", {"processed": true }] SELECT JSON_ARRAY( d.deptName, CAST( '{ "processed" : true }' AS JSON ) ) FROM department d WHERE id = 405; # returns ["Accounting", 405, true, 2014-11-04:00:00:00.000000] SELECT JSON_ARRAY( d.deptName, d.id, d.isExempt and true, current_date ) FROM department d WHERE id = 405; # returns '["Accounting", 405, true, "2014-11-04 00:00:00.000000"]' SELECT CAST( JSON_ARRAY( d.deptName, d.id, d.isExempt and true, current_date ) AS CHAR ) FROM department d WHERE id = 405; # returns ["Accounting", 405, true, 18:53:38] SELECT JSON_ARRAY( d.deptName, d.id, d.isExempt and true, current_time ) FROM department d WHERE id = 405; # stores a JSON value in a JSON-typed column INSERT INTO json_table( json_column ) SELECT JSON_ARRAY( d.deptName, d.id, d.blobColumn ) FROM department d WHERE id = 405;JSON_OBJECT()
This function builds a JSON object out of ANY_JSON_ATOMs.
JSON JSON_OBJECT ( [ TEXT keyName, ANY_JSON_ATOM atom [ (,TEXT keyName, ANY_JSON_ATOM atom)* ] ] )Returns a (possibly empty) JSON object.
Behavior:
- error - Raises an error if any of the key names are null.
- null - Atoms which evaluate as SQL NULLs are translated into JSON null literals.
For example:
# returns the empty object: {} SELECT JSON_OBJECT(); # returns {"deptName": "Accounting", "id": 405, "isExempt": true, "date": 2014-11-0400:00:00.000000} SELECT JSON_OBJECT ( 'deptName', d.deptName, 'id', d.id, 'isExempt', d.isExempt and true, 'date', current_date ) FROM department d WHERE id = 405;JSON_UNQUOTE()
This method strips double-quotes which frame a text value, and unescapes interior quotes (and some other characters) as necessary. It is used to turn a valid JSON string literal into a SQL string literal.
LONGTEXT JSON_UNQUOTE( LONGTEXT textValue ) LONGTEXT JSON_UNQUOTE( JSON jsonValue )Returns a string which is encoded in the utf8mb4 character set with the utf8mb4_bin collation:
- json - If the argument is a JSON value, then the value is implicitly serialized into a string as described above in CASTs to/from JSON Values. Then the following rules are applied.
- null - Returns NULL if textValue is NULL.
- implicit coercion - If textValue is not a utf8 character type, then it is implicitly coerced to utf8mb4.
- not quoted - Returns textValue untouched if textValue does not start and end with double-quotes.
- invalid - Raises an exception if textValue starts and ends with double-quotes but isn't a valid JSON string literal
- stripped - Otherwise, strips the framing double-quotes from textValue and unescapes interior double-quotes and other characters as described below.
C-style escape sequences and unicode escape sequences are unescaped as follows:
Common name C-style escape notation Escape sequence in UTF-8 bytes Transformed to unescaped UTF-8 bytes quote \" %x5C %x22 %x22 backslash \\ %x5C %x5C %x5C backspace \b %x5C %x62 %x08 formfeed \f %x5C %x66 %x0C linefeed \n %x5C %x6E %x0A carriage-return \r %x5C %x72 %x0D tab \t %x5C %x74 %x09 unicode \uXXXX %x5C %x75 followed by 4 hex digits UTF-8 bytes for corresponding unicode character For example:
# returns the SQL string literal abc SELECT JSON_UNQUOTE( '"abc"' ); # returns the SQL string literal "abc SELECT JSON_UNQUOTE( '"abc' ); # raises an error SELECT JSON_UNQUOTE( 123 ); # returns the SQL string literal abc SELECT JSON_UNQUOTE ( CAST( CAST( '"abc"' AS JSON ) AS CHAR ) ); # returns 1 SELECT JSON_UNQUOTE ( CAST( JSON_EXTRACT( '{ "userName" : "fred" }', '$.userName' ) AS CHAR ) ) = 'fred'; # returns 0 SELECT CAST( JSON_EXTRACT( '{ "userName" : "fred" }', '$.userName' ) AS CHAR ) = 'fred';JSON_QUOTE()
This method wraps double-quotes around a text value, escaping interior quotes (and some other characters) as necessary. It is used to turn a text value into a valid JSON string literal.
LONGTEXT JSON_QUOTE( LONGTEXT textValue )Returns a string which is encoded in the utf8mb4 character set with the utf8mb4_bin collation:
- null - Returns NULL if textValue is NULL.
- implicit coercion - If textValue is not a utf8 character type, then it is implicitly coerced to utf8mb4.
- quoted - Otherwise, wraps double-quotes around textValue and escapes any interior double-quotes. Other characters are escaped as described below.
The following UTF-8 byte sequences are transformed into C-style escape sequences and unicode escape sequences as follows:
Common name C-style escape notation Original unescaped UTF-8 bytes Transformed to escape sequence in UTF-8 bytes quote \" %x22 %x5C %x22 backslash \\ %x5C %x5C %x5C backspace \b %x08 %x5C %x62 formfeed \f %x0C %x5C %x66 linefeed \n %x0A %x5C %x6E carriage-return \r %x0D %x5C %x72 tab \t %x09 %x5C %x74 unicode \uXXXX A hex number in the range of 00-1F, except for the ones handled above (backspace, formfeed, linefeed, carriage-return, and tab). %x5C %x75 followed by 4 hex digits For example:
# returns "abc" SELECT JSON_QUOTE( 'abc' ); # raises an error SELECT JSON_QUOTE( 123 ); # returns the JSON document consisting of the string scalar "123" SELECT CAST( JSON_QUOTE( '123' ) AS JSON );
(Compiled by Dag Wanvik and Rick Hillegas)
------------------ Related Worklogs --------------------The implementation depends on other, related worklogs:
WL#8132 (JSON Storage) WL#8249 (JSON Comparator)
------------------ Abstractions --------------------The implementation is built around 4 major abstractions:
json_binary::Value
This is the optimized, in-memory representation of a JSON data value. For more information, see WL#8132.
Json_dom
This is a JSON data value unpacked into a tree of element nodes. Each node represents an object, an array, a number, a string, or a literal (true, false, null). Unpacking a JSON data value is an expensive operation.
Json_wrapper
This is an indirection layer which hides whether a JSON value is represented as a json_binary::Value or a Json_dom. It permits optimal drill-down into a JSON value without requiring unnecessary or excessive unpacking and parsing. A wrapper can serialize itself as a json text string or as a json_binary::Value.
Json_path
This is a parsed json path expression, represented as a sequence of path legs.------------------ Memory Management --------------------
json_binary::Value objects are mostly allocated on the stack. The seek() functionality, however, can copy them into a std::vector which keeps them on the heap for row-duration.
json_dom objects are allocated on the heap and are destroyed when the query cursor advances to the next row.
Json_wrappers are re-used across rows. They are allocated out of the query-duration MEMROOT. When a Json_wrapper is re-used, its old contents (json_binary::Value or Json_dom) are destroyed and new contents are poked into the wrapper.
Json_path objects are allocated on the stack and are destroyed when they go out of scope. A Json_path object contains a std::vector of Json_path_leg objects, and this vector allocates space on the heap. The space allocated by the vector is automatically freed when the Json_path object is destroyed.
------------------ JSON Data Value --------------------The JSON data value is implemented as a new method on the Item class: val_json(). The val_json() method returns a Json_wrapper via an output parameter. The val_json() method obeys this contract:
/** Get a JSON value from an Item.
All subclasses that can return a JSON value, should override this function. The function in the base class is not expected to be called. If it is called, it most likely means that some subclass is missing an override of val_json().
@param[in,out] str The buffer used for reading the text or binary representation of the JSON value. The buffer has to stay alive as long as the Json_wrapper is alive, since it may contain data pointed to by the Json_wrapper.
@param[in,out] result The resulting Json_wrapper.
@return true if successful, false on failure */ virtual bool val_json(String *str, Json_wrapper *result)------------------ JSN_*() Functions --------------------
The JSN_*() functions are modelled as subclasses of Item_func. The subclasses live in item_func_json.h/cc. All functions which return JSON values are modelled as subclasses of a new intermediate abstract class called Item_func_json:
These are the functions which return int values via Item_func.val_int():
Item_func
Item_int_func Item_func_json_valid (JSN_VALID) Item_func_json_length (JSN_LENGTH) Item_func_json_contains_path (JSN_CONTAINS_PATH) Item_func_json_depth (JSN_DEPTH)
These are the functions which return string values via Item_func.val_str():
Item_func
Item_str_func Item_func_json_type (JSN_TYPE) Item_func_json_search (JSN_SEARCH) Item_func_json_unquote (JSN_UNQUOTE) Item_func_json_quote (JSN_QUOTE)
These are the functions which return JSON values via Item_func.val_json():
Item_func
Item_func_json Item_func_json_keys (JSN_KEYS) Item_func_json_extract (JSN_EXTRACT) Item_func_json_remove (JSN_REMOVE) Item_func_json_append (JSN_APPEND) Item_func_json_set (JSN_SET) Item_func_json_insert (JSN_INSERT) Item_func_json_replace (JSN_REPLACE) Item_func_json_merge (JSN_MERGE) Item_func_json_array (JSN_ARRAY) Item_func_json_row_object (JSN_ROW_OBJECT) Item_json_typecast (implements CAST(... AS JSON))
Note that Item_func_json.val_str() serializes the JSON value as a string, when the function result is used in a string context.
The val_int(), val_str(), and val_json() methods perform argument validation, checking for nulls and parsing errors. The number of function arguments is verified earlier on in the create_native methods in item_create.cc. The complicated logic is then distributed as follows:
JSN_VALID
Forwarded to a helper method, json_is_valid().
JSN_LENGTH JSN_TYPE JSN_KEYS JSN_EXTRACT JSN_CONTAINS_PATH JSN_DEPTH
Forwarded to the Json_wrapper.
JSN_ARRAY JSN_ROW_OBJECT JSN_QUOTE JSN_UNQUOTE
Work done inline.
JSN_REMOVE JSN_APPEND JSN_SET JSN_REPLACE JSN_INSERT JSN_MERGE
Work is performed inline after materializing the Json_dom.
JSN_SEARCH
Search semantics handled by a fabricated Item_func_like node.------------------ Helper Functions --------------------
The JSN_*() functions rely on several helper functions in item_json_func.cc, including json_is_valid(), get_json_atom_wrapper(), parse_json() and parse_path():
/**
Helper method for Item_func_json_* methods. Check if a JSON item or JSON text is valid and, for the latter, optionally construct a DOM tree (i.e. only if valid).
@param[in] args Item_func::args alias @param[in] arg_idx Index (0-based) of argument into the args array @param[out] value Item_func_json_*::m_value alias @param[in] func_name Name of the user-invoked JSN_ function @param[out] need_parse True if we needed to parse to validate @param[in,out] dom If non-null, we want any text parsed DOM returned at the location pointed to @param[in] require_str_or_json If true, generate an error if other types used as input
@returns true if the text is valid JSON (or NULL), else false */
static bool json_is_valid(Item **args,
uint arg_idx, String &value, const char *func_name, bool &need_parse, Json_dom **dom, bool require_str_or_json)
/**
Get a JSON value in args[args_idx], either by an existing JSON value, or by parsing a string as a JSON text.
@param[in] args the arguments @param[in] arg_idx the argument index @param[in,out] str string buffer to use @param[in] func_name the name of the function we are executing @param[in,out] wr the resulting JSON value location
@result true if we found a value or NULL, false if not.
- /
bool get_json_wrapper(Item **args,
uint arg_idx, String *str, const char *func_name, Json_wrapper *wrapper)
/**
Get a SQL value in args[args_idx] and convert it to a JSON value, possibly a JSON scalar, so that it can be used to construct/update other JSON documents.
@param[in] args the arguments @param[in] arg_idx the argument index @param[in,out] value string buffer to use @param[in,out] tmp temporary scratch space for converting strings to the correct charset; only used if accept_string is true and conversion is needed @param[in,out] wr the resulting JSON value location @param[in,out] scalar optional location to store a parsed JSON boolean @param[in] accept_string true if the value is not to be treated and parsed as JSON text string
@result true if we found a value or NULL, false if not.
- /
bool get_json_atom_wrapper(Item **args,
uint arg_idx, String *value, String *tmp, Json_wrapper *wr, Json_scalar_holder *scalar, bool accept_string)
/**
Helper method for Item_func_json_* methods. Assumes that the caller has already verified that the path expression is not null. Raises an error if the path expression is syntactically incorrect. Raises an error if the path expression contains wildcard tokens but is not supposed to. Otherwise returns a Json_path object.
@param[in] path_expression A string Item to be interpreted as a path. @param[out] value Holder for path string @param[in] forbid_wildcards True if the path shouldn't contain * or **.
@returns The parsed path. NULL if an error occurred, or a NULL string */
static Json_path *parse_path(Item * path_expression, String &value,
my_bool forbid_wildcards)------------------ JSON parsing --------------------
The JSON parsing is done via a lightly modified verson of the open source Rapidjson https://github.com/miloyip/rapidjson project.
The parsing of JSON text happens in parse_json() and Json_dom::parse(const char *, size_t , const char **, size_t *).
If we do not need a DOM to be constructed, the parsing happens by calling rapidjson's method
document.AcceptAnyRoot(true).ParseStream(StringStreamSafe)
which return true if the text is valid JSON. This is used by JSN_VALID, since it this case, we do not need a DOM. If we need a DOM, cf. the "dom" argument to json_is_valid, the rapidjson parser is invoked via a SAX style parser, that is, we call rapidjson with our callback class (Rapid_json_handler which implements the callback interface BaseReaderHandler).
reader.AcceptAnyRoot().Parse(ss, handler);
Rapid_json_handler lives in json_dom.cc The callbacks incrementally builds a DOM, cf. Rapid_json_handler::get_built_doc().
------------------ Json_wrapper.seek() --------------------JSN_*() functions with path arguments position into JSON values via Json_wrapper.seek(). When the path contains an ellipsis, this method materializes the DOM because duplicate detection with Json_binary::Value is difficult. When this happens, the method forwards to Json_dom.seek().
The approach is a little tricky due to the ellipsis token which can appear in path expressions. The ellipsis token is a MySQL extension and is not part of the SQL Standard. The ellipsis token gives rise to the following situation, which is described by this header comment on Json_wrapper.seek():
Special care must be taken when the path expression contains more than one ellipsis (**) token. That is because multiple paths with ellipses may identify the same value. Consider the following document:
{ "a": { "x" : { "b": { "y": { "b": { "z": { "c": 100 } } } } } } }
The innermost value (the number 100) has the following unique, non-wildcarded address:
$.a.x.b.y.b.z.c
That location is reached by both of the following paths which include the ellipsis token:
$.a.x.b.**.c $.a.x.b.y.b.**.c
And those addresses both satisfy the following path expression which has two ellipses:
$.a.**.b.**.c
In this case, we only want to return one instance of $.a.x.b.y.b.z.c
The approach taken by Json_dom.seek() is to maintain two vectors of matches as the legs of the path are walked in sequence. The vectors have these meanings:
doms - This is the list of elements which match the first N path legs so far. This result was computed by the previous loop iteration.
candidates - This is the result which is being computed. That is, the list of elements which match the first N+1 legs. Duplicates are eliminated as the candidate vector is filled in.
In order to avoid unnecessary creation of vectors, the two vectors are swapped with each loop iteration.
The seek() method relies, in turn, on a recursive Json_dom.find_child_doms() method. This method handles the tricky semantics of the ellipsis token.