WL#9192: Add JSON_STORAGE_SIZE / JSON_STORAGE_FREE functions

Affects: Server-8.0   —   Status: Complete

It is not currently possible to be able to tell the disk-size of the JSON native type. In most cases, the size will be similar to text-based (since there is no compression etc), but in in-person presentations of the new JSON functionality, wanting to know actual size is one of the top requests.

Since the type also supports future in-place update (WL#8963), there is a potential for fragmentation (for example, there could be an in-place delete, or a long string in the middle of the document could have been replaced with a shorter one).

I propose the following two functions:

1) JSON_STORAGE_SIZE(jsn) : Accepts either a json string or json binary type. For a string it creates an intermediate binary representation of and returns the required storage size in bytes. For a native-type it returns the actual size in bytes.

2) JSON_STORAGE_FREE(jsn) : Returns the free space of a json binary type in bytes (including fragmentation and padding saved for inplace update). For a JSON string, create an intermediate binary representation and return the padding that is taken for this object.

F-1: The functions should return NULL when the input is NULL.

F-2: The functions should raise an error if the input does not evaluate to a JSON value or a string that contains valid JSON text.

F-3: If the input is a JSON value on binary form, the functions should return the actual size or free space in that binary value.

F-4: If the input is a JSON value that is not on binary form, the functions should return the size or free space that would be in a binary JSON value created by converting the input value to the binary format.

F-5: If the input is a character string that is a valid JSON text, the functions should return the size or free space that would be in a binary JSON value created by parsing the input as JSON and converting the result to the binary format.

We will add the functions JSON_STORAGE_SIZE and JSON_STORAGE_FREE as described below.

JSON_STORAGE_SIZE

This function calculates the size in bytes of the binary representation of a JSON value.

INT JSON_STORAGE_SIZE(doc JSON)
INT JSON_STORAGE_SIZE(doc TEXT)

Returns:

  • null - Returns NULL if doc is NULL.
  • actual size - Returns the actual storage size of doc if the type of doc is JSON and it refers to a binary document in a table.
  • predicted size - Otherwise, doc is parsed as JSON (if its type is not already JSON) and converted to a JSON binary representation, and the size of the binary representation is returned.

Raises an error if doc is a string which is not a valid JSON text.

Examples:

# Returns NULL.
SELECT JSON_STORAGE_SIZE(NULL);
# Returns the size used by each json_col in t1.
SELECT JSON_STORAGE_SIZE(json_col) FROM t1;
# Returns the size of the 1st array element in each json_col in t1.
SELECT JSON_STORAGE_SIZE(json_col->'$[0]') FROM t1;
# Parses text_col, converts it to binary, and returns the size of the
# generated binary representation. Raises an error if text_col
# contains text that is not NULL and not valid JSON.
SELECT JSON_STORAGE_SIZE(text_col) FROM t1;
# Generates the binary representation of a JSON array and returns its size.
SELECT JSON_STORAGE_SIZE(JSON_ARRAY(1, 2, 3));
# Parses the JSON text, converts it to binary JSON, and returns the
# size of the generated binary representation.
SELECT JSON_STORAGE_SIZE('[1, 2, 3]');
# Raises an error.
SELECT JSON_STORAGE_SIZE('this is not JSON');

JSON_STORAGE_FREE

INT JSON_STORAGE_FREE(doc JSON)
INT JSON_STORAGE_FREE(doc TEXT)

Returns:

  • null - Returns NULL if doc is NULL.
  • actual fragmentation - Returns the actual amount of unused space in the stored representation of doc if the type of doc is JSON and it refers to a binary value stored in a table.
  • predicted fragmentation - Otherwise, returns the amount of unused space in a JSON binary representation created by parsing doc as JSON (if its type is not already JSON) and converting it to binary.

Raises an error if doc is a string which is not a valid JSON text.

At the moment, predicted fragmentation is always zero, since no padding is added to the binary representation when a JSON value is serialized. If padding of values is added at some point, predicted fragmentation will be the amount of padding that is added if doc is converted to JSON binary representation.

actual fragmentation can be greater than zero if a partial update (WL#8963) has been performed on the column and replaced a larger value with a smaller one.

Examples:

# Returns NULL.
SELECT JSON_STORAGE_FREE(NULL);
# Returns the amount of free space in each json_col in t1.
SELECT JSON_STORAGE_FREE(json_col) FROM t1;
# Returns the amount of free space that would be in a JSON binary
# created by first parsing text_col as JSON and then converting it to
# binary. Should be zero as long as no padding is added by default.
# (Raises an error if any of the text_col values contains invalid
# JSON. Returns NULL for the rows where text_col is NULL.)
SELECT JSON_STORAGE_FREE(text_col) FROM t1;
# Returns the amount of free space that would be in a JSON binary
# created by serializing the JSON array. Should be zero when there is
# no padding.
SELECT JSON_STORAGE_FREE(JSON_ARRAY(1, 2, 3));
# Returns the amount of free space that would be in a JSON binary
# created by parsing the input text as JSON and serializing the JSON
# array. Should be zero when there is no padding.
SELECT JSON_STORAGE_FREE('[1, 2, 3]');
# Raises an error.
SELECT JSON_STORAGE_FREE('this is not JSON');

New Item classes

Two new classes will be added: Item_func_json_storage_size and Item_func_json_storage_free. Both of them will inherit from Item_func_int and override the val_int() function, which will calculate the value to return.

The classes will live in sql/item_json_func.h and sql/item_json_func.cc.

The new functions will be wired into the parser by adding entries for them in the func_array array in sql/item_create.cc.

Item_func_json_storage_size::val_int() will calculate the size of the binary by using the Field_blob::data_size() function if the function argument is a reference to a JSON column, or by serializing the argument with Json_wrapper::to_binary() otherwise.

Item_func_json_storage_free::val_int() will use a new function Json_wrapper::get_free_space() for calculating the amount of free space in a JSON binary value.

Changes to the json_binary::Value class

A new function is added to the json_binary::Value class:

/**
  Get the amount of unused space in the binary representation of this value.
  @param      thd    THD handle
  @param[out] space  the amount of free space
  @return false on success, true on error
*/
bool get_free_space(const THD *thd, size_t *space) const

Changes to the Json_wrapper class

/**
  Calculate the amount of unused space inside a JSON binary value.
  @param[out] space  the amount of unused space, or zero if this is a DOM
  @return false on success
  @return true if the JSON binary value was invalid
*/
bool get_free_space(size_t *space) const;

The actual calculation happens in json_binary::Value::get_free_space(). If the wrapper is wrapping a DOM rather than a binary, this function will return 0 (zero) free space.