WL#8867: Add JSON table functions
Affects: Server-8.0
—
Status: Complete
JSON_TABLE creates a relational view of JSON data. It maps the result of a JSON data evaluation into relational rows and columns. You can query the result returned by the function as a regular relational table using SQL. There is a need to pick JSON data apart into several rows in order to use SQL effectively on the data, e.g. join, project, aggregate.... Examples select * from json_table( '[{"a":"3"},{"a":2},{"b":1},{"a":0}]', "$[*]" columns (id for ordinality, jpath varchar(100) path "$.a", jexst int exists path '$.b') ) as tt; +----+-------+-------+ | id | jpath | jexst | +----+-------+-------+ | 1 | 3 | 0 | | 2 | 2 | 0 | | 3 | NULL | 1 | | 4 | 0 | 0 | +----+-------+-------+ select * from json_table( '[ {"a":"3", "n": ["b","a","c"]}, {"a":2, "n": [1,2]}, {"b":1, "n": ["zzz"]}, {"a":0, "n": [0.1, 0.02]} ]', "$[*]", columns ( id for ordinality, jpath json path "$.a", jexst int exists path '$.b', nested path "$.n[*]" columns ( id_n1 for ordinality, jpath_n1 json path "$") , nested path "$.n[*]" columns ( id_n2 for ordinality, jpath_n2 json path "$") ) ) as tt; +------+-------+-------+-------+----------+-------+----------+ | id | jpath | jexst | id_n1 | jpath_n1 | id_n2 | jpath_n2 | +------+-------+-------+-------+----------+-------+----------+ | 1 | "3" | 0 | 1 | "b" | NULL | NULL | | 1 | "3" | 0 | 2 | "a" | NULL | NULL | | 1 | "3" | 0 | 3 | "c" | NULL | NULL | | 1 | "3" | 0 | NULL | NULL | 1 | "b" | | 1 | "3" | 0 | NULL | NULL | 2 | "a" | | 1 | "3" | 0 | NULL | NULL | 3 | "c" | | 2 | 2 | 0 | 1 | 1 | NULL | NULL | | 2 | 2 | 0 | 2 | 2 | NULL | NULL | | 2 | 2 | 0 | NULL | NULL | 1 | 1 | | 2 | 2 | 0 | NULL | NULL | 2 | 2 | | 3 | NULL | 1 | 1 | "zzz" | NULL | NULL | | 3 | NULL | 1 | NULL | NULL | 1 | "zzz" | | 4 | 0 | 0 | 1 | 0.1 | NULL | NULL | | 4 | 0 | 0 | 2 | 0.02 | NULL | NULL | | 4 | 0 | 0 | NULL | NULL | 1 | 0.1 | | 4 | 0 | 0 | NULL | NULL | 2 | 0.02 | +------+-------+-------+-------+----------+-------+----------+ 16 rows in set (0,00 sec) Other popular data bases have similar capabilities, e.g. PostGreSQL (json_each[1], SQL server (OPENJSON [2]), Oracle DB (JSON_TABLE[3]). This implementation is designed to be close follow Oracle's, but somewhat more limited. What we miss compared to Oracle: *) JSON query wrapper clause for JSON QUERY column. Same as above, we rely on already existing implicit conversion *) JSON TABLE ON ERROR clause. Currently we always throw an error from JSON functions on incorrect input. So for consistency reasons this one is dropped. A related (inverse) capability not covered in this work log is the ability to compose a single JSON value based on a result set, cf. JSON aggregation functions [4]. Beside the JSON_TABLE function, this WL introduces LATERAL JOIN for table functions. Since currently there's only one table function, it'll be used only for JSON_TABLE. [1] http://www.postgresql.org/docs/9.5/static/functions-json.html [2] http://blogs.msdn.com/b/jocapc/archive/2015/05/16/json-support-in-sql-server-2016.aspx [3] https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6287
Functional requirements ----------------------- F1) This WL should add the JSON_TABLE function which is able to represent JSON document as a relational table according to HLS specification F2) JSON_TABLE function should accept any JSON- and string- typed expression as the data source F3) JSON_TABLE should be able to refer to fields specified prior to it in the FROM clause, except when JSON_TABLE is RIGHT JOIN'ed to a table F4) JSON_TABLE should accept any string-typed, valid JSON path expression F5) JSON_TABLE should use an in-memory tmp table, and overflow to disk when needed. F6) Attempt to save non-scalar data (array or object) to a scalar (an sql-typed) field should trigger ON ERROR clause F7) Missing value for a field should trigger ON MISSING clause. F8) Default values for ON MISSING and ON ERROR clauses both are valid JSON doc in a string. Specifying non-scalar default value for a scalar field should throw an error. Non-functional requirements --------------------------- NF1) WL shouldn't introduce bugs or regressions
Overview -------- This WL introduces a function of a new kind, table function, called JSON_TABLE. It's used to extract data from a JSON document and form a relational table, that could be processed further using SQL. It's a virtual table, in this sense it's like a derived table, with only difference in how result data is obtained. Just like a derived table, it's specified in the FROM clause, uses tmp table to store its result, and is able to provide generated indexes for ref access. In order to feed data to JSON_TABLE, this WL introduces LATERAL JOIN. It's implicit and thus doesn't introduce any new syntax and is used only for joining table functions. New syntax ---------- JSON_TABLE (, ) [AS] columns_clause: COLUMNS ( ); columns_list: column | columns_list , column ; column: FOR ORDINALITY | PATH [ ] [ ] | EXISTS PATH | NESTED [ PATH ] ; on_error: | NULL ON ERROR | ERROR ON ERROR | DEFAULT ON ERROR ; on_empty: | NULL ON EMPTY | ERROR ON EMPTY | DEFAULT ON EMPTY ; Alias is mandatory for table function. JSON data source expr: expression that returns JSON data for JSON_TABLE. E.g a constant ('{"a":1}'), a column (t1.jsn_data, given that table t1 is specified prior to JSON_TABLE in the FROM clause), a function call ( JSON_EXTRACT(t1,jsn_data,'$.post.comments')). String path: a constant string containing JSON path expression. The path is applied to the data source. The JSON value that matches the path is called the row source and is used to generate a row of relational data. The COLUMNS clause evaluates the row source, finds specific JSON values within the row source, and returns those JSON values as SQL values in individual columns of a row of relational data. There's 4 types of columns supported by JSON_TABLE: 1) FOR ORDINALITY This column enumerates rows in the COLUMNS clause. Could be used to distinguish parent rows with same value for multiple rows generated by NESTED PATH clause. Count starts from 1, its type is UNSIGNED INT. 2) PATH [ ] [ ] Columns of this type are used to extract scalar values specified by . is MySQL type and specifies the column type. JSON_TABLE extracts data as JSON then, using regular automatic conversion that is applicable to JSON data, coerces it to the column type. Depending on the column type, JSON_TABLE behaves slightly different: If the column type is an SQL type, then only scalar value could be saved in the column. Saving and object/array triggers 'on error' clause. Also, 'on error' clause is triggered when error occur during coercion from value saved in JSON to the table field, e.g saving 'asd' to an integer field. Missing value triggers 'on empty' clause. A key for ref access could be generated on such column, similar to the automatically generated keys for derived tables. If the column type is varchar, then in addition to above, the saved values will be unquoted. If the columns type is JSON, then only missing values trigger the 'on empty' clause. Objects/arrays are saved as is. Ref keys can't be generated on such column. Optional clause defines what JSON_TABLE would do when data is object/array is saved (depending on type): ERROR: an error will be thrown NULL: the column will be set to NULL, this is the default behavior. If error occur during coercion, then a warning is thrown. DEFAULT : given string will be parsed as JSON (have to be valid) and stored instead of object/array. Warning is thrown if error is caused by coercion. Column type rules also apply to the default value. When a value saved to field got truncated, e.g. saving 3.14159 into DECIMAL(10,1) field, a warning will be issued independently of ON ERROR clause. When default value got truncated, the warning issued only once per statement. Optional clause defines what JSON_TABLE would do when data is missing (depending on type). This clause is also triggered on a column in a NESTED PATH clause when the latter doesn't have a match and NULL complemented row is produced for it. Clause following values: ERROR: an error will be thrown NULL: the column will be set to NULL, this is the default behavior DEFAULT : given string will be parsed as JSON (have to be valid) and stored instead of missing value. Column type rules also apply to the default value. select * from json_table( '[{"a":"3"},{"a":2},{"b":1},{"a":0},{"a":[1,2]}]', "$[*]" columns ( id for ordinality, jpath varchar(100) path "$.a" default '999' on error default '111' on empty, jsn_path json path "$.a" default '{"x": 333} on empty, jexst int exists path '$.b') ) as tt; +------+-------+------------+-------+ | id | jpath | jsn_path | jexst | +------+-------+------------+-------+ | 1 | 3 | "3" | 0 | | 2 | 2 | 2 | 0 | | 3 | 111 | {"x": 333} | 1 | | 4 | 0 | 0 | 0 | | 5 | 999 | [1, 2] | 0 | +------+-------+------------+-------+ 3) EXISTS PATH This column returns either 0 or 1 depending whether the data is present at the location specified by . is same as in previous type of column. 4) NESTED [ PATH ] This clause allows to flatten nested objects or arrays in JSON data into a single row along with the JSON values from the parent object or array. Several PATH clauses allows to project JSON values from multiple levels of nesting into a single row. The path is relative to the parent path - row path of JSON_TABLE, or path of parent NESTED PATH clause in case of deeper nesting. Column names are subject to the same rules and limitations as the fields names of a regular tables. Behaviour --------- All constant strings are checked to be valid JSON/JSON Path, an error is thrown otherwise. How records are produced ------------------------ Each match of the path in COLUMNS clause results in as single row of the result table. E.g SELECT * FROM JSON_TABLE ('[1,2,3]', '$[*]' COLUMNS ( num INT PATH '$')) AS jt; results in +------+ | num | +------+ | 1 | | 2 | | 3 | +------+ Here '$[*]' matches each element of the array, and '$' saves its value in the result field. NESTED PATH clause produces set of records for each match in the COLUMNS clause where is belongs to. If there's no match, all columns of the NESTED PATH are set to NULL. This implements OUTER join between top clause and NESTED PATH. INNER join could be emulated by applying a condition in WHERE clause. E.g SELECT * FROM JSON_TABLE ( '[ {"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}, {"a":3}]', '$[*]' COLUMNS ( a INT PATH '$.a', NESTED PATH '$.b[*]' COLUMNS ( b INT PATH '$' ))) AS jt; would yield +------+------+ | a | b | +------+------+ | 1 | 11 | | 1 | 111 | | 2 | 22 | | 2 | 222 | | 3 | NULL | +------+------+ When there's two or more NESTED PATH clause in same COLUMNS clause they're called sibling NESTED PATHs and processed one after another, one at a time. While one NESTED PATH is producing records, columns of sibling NESTED PATH expressions are set to NULL. This results in total number of records for a single match in containing COLUMNS clause to be sum, not the product, of all record produced by NESTED PATHs. For example: SELECT * FROM JSON_TABLE ( '[ {"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]', '$[*]' COLUMNS ( a INT PATH '$.a', NESTED PATH '$.b[*]' COLUMNS ( b1 INT PATH '$' ), NESTED PATH '$.b[*]' COLUMNS ( b2 INT PATH '$' ))) AS jt; returns: +------+------+------+ | a | b1 | b2 | +------+------+------+ | 1 | 11 | NULL | | 1 | 111 | NULL | | 1 | NULL | 11 | | 1 | NULL | 111 | | 2 | 22 | NULL | | 2 | 222 | NULL | | 2 | NULL | 22 | | 2 | NULL | 222 | +------+------+------+ The ORDINALITY column enumerates records produced by the COLUMNS clause, and could be used to distinguish parent records of NESTED PATH, especially if values in parent records are the same: SELECT * FROM JSON_TABLE( '[{"a": "a_val", "b": [ {"c": "c_val", "l": [1,2]} ] }, {"a": "a_val", "b": [ {"c": "c_val", "l": [11]}, {"c": "c_val", "l": [22]} ] }]', '$[*]' COLUMNS ( top_ord FOR ORDINALITY, apath VARCHAR(10) PATH '$.a', NESTED PATH '$.b[*]' COLUMNS ( bpath VARCHAR(10) PATH '$.c', ord FOR ORDINALITY, NESTED PATH '$.l[*]' COLUMNS ( lpath varchar(10) PATH '$' ) ) )) as jt; results in following set: +---------+---------+---------+------+-------+ | top_ord | apath | bpath | ord | lpath | +---------+---------+---------+------+-------+ | 1 | a_val | c_val | 1 | 1 | | 1 | a_val | c_val | 1 | 2 | | 2 | a_val | c_val | 1 | 11 | | 2 | a_val | c_val | 2 | 22 | +---------+---------+---------+------+-------+ Source document contain array of two elements and each produces two records. apath and bpath values are the same over whole result set and can't be used to determine, whether lpath values are came from same parent or different. The ord columns tells that first two values are came from single object since its value is the same over set of records with top_ord == 1. Second two values are from different objects, since ord's values are different over set of records with top_ord == 2. Execution algorithm ------------------- The approach above is implemented in following way. Execution algorithm uses array of states, flattened list of fields from all COLUMNS clauses, and a stack of columns. For each NESTED PATH clause a single state element is used. Each field points to the appropriate state, this is set up during fix_fields. NESTED PATH column points to two states, its own and children's. Stack is used to deal with NESTED PATH clauses, both nested and sibling. Sibling NESTED PATH columns are linked via next_nested/prev_nested fields. Consider example: JSON_TABLE('...', '$[*]' COLUMNS( id1 FOR ORDINALITY, NESTED PATH '$.a' COLUMNS ( id2 FOR ORDINALITY), NESTED PATH '$.b' COLUMNS ( id3 FOR ORDINALITY) )) AS JT In-memory layout will be following: states | 1 | 2 | 3 | 4 | +-------+-------+-------+-----+ ^ ^^^ ^ ^ ^ ^ ^ | ||| | | | | +-----------+ | ||| | | | +------+ | +------+ ||| +---------------+ | | | +---------+|| +-+ +-+ | | | | | +---+| | | | | | own chld | own chld | own chld | | | | | | | | | | +---------+-----+----------+-----+----------+------+ fields | NP $[*] | id1 | NP '$.a' | id2 | NP '$.b' | id 3 | Execution consists of two loops, main and stack. The former starts fetching the data and fills row for storing in the result table. The latter continues fetching data. Stack loop starts right after the main one. Execution starts with obtaining JSON value from JSON_TABLE's source argument and saving it to the state#1. Main loop The Item_func_json_table::fill_list() starts from the first column in the list, it's root NESTED PATH (top COLUMNS clause), it seeks the given path and saves array of matches and the first matched value in child's state - #2. Also it sets state's #2 rowid to 1. Then fill_list() pushes current NESTED PATH (the root one) to stack and goes to the next field. It's ORDINALITY column, takes rowid from state#2 and saves it to the result item. Next is NESTED PATH, it seeks its own path '$.a' in the current value in state#2, saves array of matches and first matched value to child's state - #3. Again, fill_list() pushes current NESTED PATH (first sibling) to the stack ( now it's ([NP1],[NP root])). Then it's again ORDINALITY column. The next is NESTED PATH sibling #2. It checks whether there's a previous sibling currently run (supplying records), and there is - NESTED PATH #1, thus all fields (including nested ones) of NESTED PATH #2 are set to NULL and skipped. fill_list() reaches end of the list of fields and writes the row to the result table. Stack loop The top element is taken from the stack, its json iterator and rowid are incremented. If iterator doesn't reach the end, then we can produce one more record and fill_list() increments rowid in it's child state and repositions the list to the column after the one from stack top and goes to the beginning of main loop. When there're no more matches the top element is popped (now it s ([NP root])). If popped column doesn't have next sibling, then the fill_list() repeats the stack loop with the new stack top. When there's a next sibling NESTED PATH (like in our case, #2), the columns of current one (#1) are set to NULL, NESTED PATH's #2 path '$.b' is sought the current value in state#2, the array of matched values and first matched value is saved to child's state - #4. Rowid in state #4 is set to 1. The NESTED PATH #2 is pushed to the stack (now it's ([NP2],[NP root])), list of fields is repositioned to column id3 (next after stack's top, NP2) and fill_list() goes to the beginning of main loop. Stack loop ends normally when there's no more elements in the stack. This means there's no matches in the top NESTED PATH clause and this is end of JSON_TABLE execution, fill_list() exits. LATERAL JOIN ------------ This WL introduces a new kind of join - LATERAL JOIN. It's implicit, and thus doesn't add any new syntax, and is always used only for table functions. It allows table function to refer to fields from tables noted prior to table function in the FROM clause. It's used to feed data to the table function. Consider example: SELECT * FROM t1; +-----------------------------------------------------------------------------+ | jsn | +-----------------------------------------------------------------------------+ | {"a": "a_val", "b": [{"c": "c_val", "l": [1, 2]}]} | | {"a": "a_val", "b": [{"c": "c_val", "l": [11]}, {"c": "c_val", "l": [22]}]} | +-----------------------------------------------------------------------------+ Then following query SELECT jt.* FROM t1, JSON_TABLE(jsn, '$' COLUMNS ( apath VARCHAR(10) PATH '$.a', NESTED PATH '$.b[*]' COLUMNS ( bpath VARCHAR(10) PATH '$.c', ord FOR ORDINALITY, NESTED PATH '$.l[*]' COLUMNS ( lpath varchar(10) PATH '$' )))) as jt; would return +---------+---------+------+-------+ | apath | bpath | ord | lpath | +---------+---------+------+-------+ | a_val | c_val | 1 | 1 | | a_val | c_val | 1 | 2 | | a_val | c_val | 1 | 11 | | a_val | c_val | 2 | 22 | +---------+---------+------+-------+ Lateral join is implemented by setting new member variable end_lateral_table in SELECT_LEX to the table function's TABLE_LIST object. This variable is taken into account when setting last name resolution table in find_field_in_tables(). This way, when table function's TABLE_LIST is being set up and fix_fields() is called on the source column, name resolution stops right before the table function and name lookup can find the column only in already opened tables. Since data should be read when JSON_TABLE uses it, the table function can't be RIGHT JOIN'ed to a table it feeds JSON data from. EXPLAIN ------- Each table function is printed by EXPLAIN as a regular table with an addition. Structured explain adds two fields to output: "table_function" along with the function name, and "using_temporary_table" , which is always set to true. The whole output could look like following: explain format=json select * from json_table( '[ {"a":"3", "n": { "l": 1}}, {"a":2, "n": { "l": 1}}, {"b":1, "n": { "l": 1}}, {"a":0, "n": { "l": 1}} ]', "$[*]" columns ( id for ordinality, jpath varchar(100) path "$.a", jexst int exists path '$.b', nested path "$.n" columns ( id_n for ordinality, jpath_n varchar(50) path "$.l") ) ) as tt; EXPLAIN { "query_block": { "select_id": 1, "cost_info": { "query_cost": "7.22" }, "table": { "table_name": "tt", "access_type": "ALL", "rows_examined_per_scan": 42, "rows_produced_per_join": 42, "filtered": "100.00", "table_function": "json_table", "using_temporary_table": true, "cost_info": { "read_cost": "3.02", "eval_cost": "4.20", "prefix_cost": "7.22", "data_read_per_join": "7K" }, "used_columns": [ "id", "jpath", "jexst", "id_n", "jpath_n" ] } } } Warnings: Note 1003 /* select#1 */ select `tt`.`id` AS `id`,`tt`.`jpath` AS `jpath`,`tt`.`jexst` AS `jexst`,`tt`.`id_n` AS `id_n`,`tt`.`jpath_n` AS `jpath_n` from json_table('[\n {"a":"3", "n": { "l": 1}},\n {"a":2, "n": { "l": 1}},\n {"b":1, "n": { "l": 1}},\n {"a":0, "n": { "l": 1}}\n ]', '$[*]' columns (id for ordinality, jpath varchar(100) path '$.a', jexst int exists path '$.b', nested path '$.n' columns (id_n for ordinality, jpath_n varchar(50) path '$.l'))) `tt` Notes on expanded query printout: *) JSON data source expression is printed as is *) JSON path expressions are always printed in single quotes *) when both ON EMPTY and ON ERROR clauses are given, the latter always follows the former In traditional EXPLAIN, Those two, "Table function" and "Uses temporary table" are added to the Extra field: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tt partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 42 filtered: 100.00 Extra: Table function: json_table; Using temporary 1 row in set, 1 warning (0,01 sec)
Overview -------- In order to add support for table functions, this WL add/changes code in following areas: *) parser, to add new syntax *) the JSON_TABLE table function itself *) resolver, to support LATERAL join *) optimizer and executor, to unify support for table functions with derived tables, as internally they're the same, except the way the data is filled. *) derived tables code, mostly renaming, to make it more sensible for table functions *) virtual tmp table code is extended to be able to create a real table. Added a wrapper around it. *) EXPLAIN code, to print out the table function *) List_iterator now can position itself to the given element New structures -------------- Several new structures are used to hold columns info and are used during execution: /// Column description for JSON_TABLE function class Json_table_column: public Create_field { public: /// Column type enum jt_column_enum jtc_type; /// Type of ON ERROR clause enum jtc_on_enum on_error; /// Type of ON MISSING clause enum jtc_on_enum on_missing; /// Default value string for ON MISSING clause LEX_STRING *default_missing_str; /// Parsed JSON for default value of ON MISSING clause Json_wrapper default_missing_json; /// Default value string for ON ERROR clause LEX_STRING *default_error_str; /// Parsed JSON string for ON ERROR clause Json_wrapper default_error_json; /// List of nested columns, valid only for NESTED PATH List*nested_columns; /// Item to store column's value Item *itm; /// Nested path LEX_STRING m_path_str; /// parsed nested path Json_path m_path_json; /// Pointer to data JT_stack_element *jds; /// Pointer to child's data, valid only for NESTED PATH JT_stack_element *child_jds, *parent_jds; // Next sibling nested path Json_table_column *next_nested, *prev_nested, *last; } This class represents a column of JSON_TABLE during parse time and during execution. Columns are grouped in the lists, each list represents a single COLUMNS clause. class JT_stack_element { public: /// Vector of found values Json_wrapper_vector v; /// Iterator for vector above Json_wrapper_vector::iterator it; /// Current value Json_wrapper w; /// Current rowid uint rowid; /// TRUE <=> NESTED PATH associated with this element is producing records bool in_run; } This class represents data source for columns in a NESTED PATH clause. The data is filled and iterated, when a NESTED PATH column is processed. Columns from that NESTED PATH use this element to extract their data. These elements are stored in Item_func_json_table::jds array. Here's an example how two above structures are used to represent a JSON_TABLE call: JSON_TABLE(..., '$[*]' COLUMNS ( id FOR ORDINALITY, val JSON PATH '$.val', NESTED PATH '$.arr_1[*]' COLUMNS ( arr1_id FOR ORDINALITY, arr1_val JSON PATH '$.' ), NESTED PATH '$.arr_2[*]' COLUMNS ( arr2_id FOR ORDINALITY, arr2_val JSON PATH '$.' )) as JT This will be represented as following: +-----------+------------------+-----------------+ | NP '$[*]' | NP '$.arr_1[*]' | NP '$.arr_2[*]' | +-----------+------------------+-----------------+ ^ ^ ^ +---+ | | | +---+ +------+ ::jds | | | | | < [ NP '$[*]' ] > | | | | | ::nested_columns | | | | | +-+ ::jds ::jds v | | < [ id ], [ val ], [ NP '$.arr_1[*]' ], [ NP '$.arr_2[*]' ]> | | ::nested_columns ::nested_columns | | +-------------------+ +--------+ v v < [ arr1_id ], [ arr1_val ] > < [ arr2_id ], [ arr2_val ] > I.e. in total there will be 4 lists, linked via nested_columns. Also there's two more lists that are used by JSON_TABLE implementation: *) Item_func_json_table::vt_list - it's the list which is used to create internal temporary table for the result. For the example above it'll contain following: < [ id ], [ val ], [ arr1_id ], [ arr1_val ], [ arr2_id ], [ arr2_val ] > *) Item_func_json_table::flat_list - this list is used for producing records. Essentially it's the flattened tree of columns. In case of example, it'll look like following: < [ NP '$[*]' ], [ id ], [ val ], [ NP '$.arr_1[*]' ], [ arr1_id ], [ arr1_val ], [ NP '$.arr_2[*]' ], [ arr2_id ], [ arr2_val ] > Beside above, JSON_TABLE uses list of Items, one Item per non-NESTED PATH column, which are used to store values of a single row. When all columns are filled, i.e columns' values are stored into the Items, these Items are written to the temporary table. The list is stored in Item_func_json_table::row. JSON_TABLE function ------------------- The table function itself is implemented as the Item_func_json_table class, an Item_json_func subclass. The fix_fields() function fixes the first arg (data source) and validates all paths, default values, generates vt_list and flat_list lists recursively traversing the parsed tree of COLUMN clauses lists with help of create_json_table_col_list() function. This function handles single COLUMNS cause list at a time. Entry point for filling the result table is val_int() function. The function execution, which algorithm is described in the HLS, is split in 3 functions: *) val_int() fetches the value from first arg and stores it in the first data source element - jds[0], then calls *) fill_list() function, it implements both, the main JSON_TABLE execution loop and the stack loop. The former goes through the flat_list and calls fill_column() for each column, the latter picks appropriate NESTED_PATH column for producing the next row. For setting columns of a NESTED PATH to their default values, the set_subtree_to_default() helper function is used. *) fill_column() actually stores data to the result table. It handles all kinds of columns. Parser ------ The changes to parser are trivial and simply supports new syntax (described in HLS). The WL adds 4 new tokens: JSON_TABLE, NESTED, ORDINALITY, PATH. Resolver -------- In resolver this WL implements the implicit LATERAL join for table functions. It's always used to join table functions, JSON_TABLE in particular. There's no way to enable it for regular tables. Lateral join is automatically disabled when table function's source is constant, i.e doesn't force table function to be dependent on other tables. The lateral join is implemented by using new member variable SELECT_LEX::end_lateral_table. When it's non-NULL, it's used as the last name resolution table, and is set to point to table function's TABLE_LIST object by SELECT_LEX::resolve_derived(). So when TABLE_LIST::setup_table_function() calls fix_fields() on the first arg (JSON_TABLE's data source) then resolver can find fields only in the already opened tables prior to the JSON_TABLE. Optimizer and Executor ---------------------- As table functions are unified with derived tables, there is almost no need for dedicated handling of the table functions in both optimizer and executor. Minor exceptions are: *) TABLE_LIST::fetch_number_of_rows() always return 42, as there's no way to estimate how many rows a table function will return. *) Executor uses new helper function - join_materialize_table_function() to fill the table function's result table. It basically calls val_int() on the table function. *) A new wrapper class is added to handle the table function's result table: /** Class for writing result of a table function to the result table */ class Table_function_result : public Sql_alloc { THD *thd; TABLE *table; public: Table_function_result(THD *thd_arg) :thd(thd_arg), table(NULL) {} /** Create, but not instantiate the result table */ bool create_result_table(List *column_types, ulonglong options, const char *table_alias); /** Write given row of items to the result table and handle overflow to disk */ bool write_row(List - *row); /** Returns a field with given index */ Field *get_field(uint i) { DBUG_ASSERT(i < table->s->fields); return table->field[i]; } /** Delete all rows in the table */ void empty_table(); /** Set the default row */ void default_row() {}; friend bool TABLE_LIST::setup_table_function(THD *thd); }; *) Executor now can call materialization function many times, once per read first row, if the new QEP_TAB::rematerialize flag is set, which happens when JSON_TABLE's data source argument isn't a constant. Derived tables -------------- Table functions are unified with derived tables and are processed in a very similar way. Although there're several exceptions: *) Table functions are set up by means of TABLE_LIST::setup_table_function(), instead of TABLE_LIST::setup_materialized_derived. It fixes the table function's item itself (which fixes all its arguments) and instantiates the Table_function_result object. The rest is same as for derived tables. *) create_derived() is renamed to create_materialized_table(), SELECT_LEX::resolve_derived() to SELECT_LEX::resolve_placeholder_tables(), in order to make functions' names more relevant to the functions themselves. *) create_materialized_table() doesn't support MyISAM tables as result table for table functions. The reason is that MyISAM requires more wiring in and it didn't worked out of the box, for some reason. Since there're already a plan to drop MyISAM completely, it's been decided to not waste time on dealing with MyISAM's issues. Virtual tables -------------- Prior to this WL, internal virtual tmp tables were used only for storing SP variables, and real instantiated table wasn't needed. The virtual tmp table code accepts list of Create_field (create_tmp_table accepts list of Items) and this is more convenient for table functions. Due to this, virtual tmp table is extended, to be able to instantiate real table for table functions. EXPLAIN ------- Changes in this area are pretty straightforward and just prints out JSON_TABLE with its arguments in EXPLAIN's query printout. Also, for table functions EXPLAIN prints "Table function:
; Uses tmp table" in the Extra field.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.