WL#8249: JSON comparator

Affects: Server-Prototype Only   —   Status: Complete

Goal of this WL is to introduce JSON comparator, similar to DATE/TIME/DATETIME comparator, that would allow comparisons of JSON scalars vs SQL constants, JSON scalars vs JSON scalars. The comparator would rely on the DOM built in scope of WL#7909.

The comparator will convert the SQL constant to a JSON scalar and compare the values. Rules are as following:
JSON scalar vs JSON scalar) true if the two scalars have the same type and the same value
JSON array/object vs JSON scalar) always false
JSON array vs JSON array) true if length and contents match
JSON object vs JSON object) true if they have the same set of keys, and the same associated values

It is not a goal of this WL to implement support for ordering JSON values using ORDER BY.

Comparing JSON values with other JSON values

The JSON comparator will provide a total order of all JSON values. This order can be used to determine if one JSON value is smaller than, equal to or greater than another JSON value. This can be used for comparing JSON values with the =, <, <=, >, >=, <> and <=> operators.

The primary order of JSON values will be: null < number < string < object < array < boolean < date < time < datetime/timestamp < opaque

Internally, in each category, the ordering will be as described below.

Number

JSON values can contain both exact numbers (int/uint/decimal) and inexact numbers (double). If we use a mix of MySQL's existing comparators for the respective types, we get inconsistent ordering. For example, given the scalars 9223372036854775805, 9223372036854775806, 9223372036854775807 and 9.223372036854776E18, the existing comparators for comparing numbers would give these orderings:

int comparator:
  9223372036854775805 < 9223372036854775806 < 9223372036854775807
  (not defined for 9.223372036854776E18)

double comparator:
  9223372036854775805 = 9223372036854775806 = 9223372036854775807 = 9.223372036854776E18

When comparing inexact numbers with exact numbers, we will convert the inexact number to an exact number, so that the ordering will be consistent and not lose precision for the exact numbers. The order will be such as this:

JSON comparator:
  9223372036854775805 < 9223372036854775806 < 9223372036854775807 < 9.223372036854776E18 = 9223372036854776000 < 9223372036854776001

String

Strings will be ordered lexicographically by calling memcmp() on the first N bytes of the utf8mb4 representation of the two strings being compared, where N is the length of the shorter string. If the first N bytes of the two strings are identical, the shorter string is considered smaller than the longer string. For example: "a" < "ab" < "b" < "bc"

This ordering is equivalent to the ordering of SQL strings with collation utf8mb4_bin.

Object

Two JSON objects are equal if they have the same set of keys, and each value is equal to the value associated with the same key in the other object. The order of two objects that are not equal, is unspecified, but the order is deterministic.

Array

Two JSON arrays are equal if they have the same length, and each value in the first array is equal to the value on the same position in the other array.

If they are not equal, their order is determined by the elements on the first position where there is a difference. The array with the smaller value on that position is ordered first. If all values of the shorter array are equal to the corresponding values in the longer array, the shorter array is ordered first.

Examples:

[] < ["a"] < ["ab"] < ["ab", "cd", "ef"] < ["ab", "ef"]

Boolean

false < true

Date

The earlier date is ordered before the more recent date.

Time

The smaller of the two time values is ordered before the larger one.

Datetime/timestamp

A value that represents an earlier point in time, is ordered before a value that represents a later point in time. A datetime value that represents the same point in time as timestamp value, is equal to that timestamp value.

Opaque

If the field types of two opaque values are different, the value with the smaller field type is ordered before the other value. Otherwise, the first N bytes of the two values are compared by memcmp(), where N is the number of bytes in the shorter value. If there is no difference in the first N bytes, the shorter value is ordered before the longer value.

Comparing JSON values with SQL values

If a JSON value is compared with an SQL value, the SQL value is converted to JSON, and the two JSON values are compared as described above. The conversion from SQL to JSON is performed using the rules of ANY_JSON_ATOM in WL#7909.

If a JSON value is compared with SQL NULL, the result is UNKNOWN.

Examples:

# returns true
SELECT CAST(1 AS JSON) = 1;
# returns false
SELECT CAST(1 AS JSON) = '1.0';
# returns true
SELECT JSN_EXTRACT('{"a":"b"}', '$.a') = 'b';
# returns unknown
SELECT JSN_EXTRACT('{"a":"b"}', '$.a') = NULL;

The Json_wrapper class will be extended with a new member function that compares two JSON values according to the rules laid out in the high-level specification above:

 /**
   Compare this JSON value to another JSON value.
   @param[in] other the other JSON value
   @retval -1 if this JSON value is less than the other JSON value
   @retval 0 if the two JSON values are equal
   @retval 1 if this JSON value is greater than the other JSON value
 */
 int compare(const Json_wrapper &other) const;

The Arg_comparator class will get a new member function called compare_json(), which will use Json_wrapper::compare() to compare the two arguments Arg_comparator::a and Arg_comparator::b.

Arg_comparator::compare_json() will transform the arguments a and b to Json_wrapper objects by using the function get_json_atom_wrapper() provided by WL#7909. The returned Json_wrapper objects could wrap a JSON binary value, a JSON DOM or an SQL value that is implicitly converted to a JSON scalar value, so that compare_json() doesn't need any special logic to handle the differences between the various ways a JSON value could be represented.

The decision about which comparator to use, is made by Arg_comparator::set_cmp_func(). We will change this function so that it chooses Arg_comparator::compare_json() whenever at least one of the arguments to the comparator has field type MYSQL_TYPE_JSON.

Use of indexes is disabled for predicates that compare a JSON value to a value of some other SQL type. This is because the index would follow the comparison rules of the data type of the column on which it is defined, which are different from the rules of the JSON comparator. The disabling happens in opt_range.cc/comparable_in_index() for range scans, and in sql_optimizer.cc/add_key_field() for ref access.

Optimizations:

1) To allow quick comparison of values where the types differ (for example when comparing a boolean to an object), we will add a two-dimensional array that tells if the two values are known to be different without looking at their contents, like this:

 int cmp= type_comparison[a->type()][b->type()]
 if (cmp != 0)
   return cmp;

If the array cell is 0, it means either that a and b are of the same type, or that they have similar types (like integer and double) that can be compared, and their contents will have to be examined. If it's not 0, the comparison can return immediately.

2) The get_json_atom_wrapper() function currently allocates a Json_dom object on the heap if it's called on an Item that represents an SQL value. This would cause a predicate such as json_col = sql_col to allocate one Json_dom object per row in order to transform sql_col into a JSON value. To avoid this, we will add a new parameter to get_json_atom_wrapper(), in which the caller can provide a pointer to pre-allocated memory where the Json_dom object can be put instead of the heap. The pre-allocated memory will be provided in the form of a boost::variant over all the scalar sub-classes of Json_dom, in a struct called Json_scalar_holder:

 struct Json_scalar_holder : public Sql_alloc
 {
   boost::variant<boost::blank, Json_string, Json_decimal, Json_int, Json_uint,
                  Json_double, Json_boolean, Json_null, Json_datetime,
                  Json_opaque> m_val;
 };

3) In order to improve the performance in the common case where a JSON value is compared to an SQL literal, the Json_scalar_holder object will be cached in an instance variable in Arg_comparator. When the comparator detects that the SQL value is a constant (by calling Item::const_item()) and that it has Json_scalar_holder value cached from the previous row, it will use the cached value directly and avoid the SQL to JSON conversion for all except the first row.