WL#8867: Add JSON table functions

Affects: Server-8.0 — Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

JSON_TABLE creates a relational view of JSON  data. It maps the result of a JSON
data evaluation into relational rows  and columns. You can query the result
returned by the function as a regular relational table using SQL. 
There is a need to pick JSON data apart into several rows in order to use SQL
effectively on the data, e.g. join, project, aggregate....

Examples

select * from
  json_table(
    '[{"a":"3"},{"a":2},{"b":1},{"a":0}]',
    "$[*]" columns (id for ordinality,
             jpath varchar(100) path "$.a",
             jexst int exists path '$.b')
  ) as tt;

+----+-------+-------+
| id | jpath | jexst |
+----+-------+-------+
|  1 | 3     |     0 |
|  2 | 2     |     0 |
|  3 | NULL  |     1 |
|  4 | 0     |     0 |
+----+-------+-------+

 select * from   json_table(
   '[
     {"a":"3", "n": ["b","a","c"]},
     {"a":2, "n": [1,2]},
     {"b":1, "n": ["zzz"]},
     {"a":0, "n": [0.1, 0.02]}
     ]',
   "$[*]",
   columns (
     id for ordinality,
     jpath json path "$.a",
     jexst int exists path '$.b',
     nested path "$.n[*]" columns (
       id_n1 for ordinality,
       jpath_n1 json path "$") ,
     nested path "$.n[*]" columns (
       id_n2 for ordinality,
       jpath_n2 json path "$")
     )
   ) as tt;

+------+-------+-------+-------+----------+-------+----------+
| id   | jpath | jexst | id_n1 | jpath_n1 | id_n2 | jpath_n2 |
+------+-------+-------+-------+----------+-------+----------+
|    1 | "3"   |     0 |     1 | "b"      |  NULL | NULL     |
|    1 | "3"   |     0 |     2 | "a"      |  NULL | NULL     |
|    1 | "3"   |     0 |     3 | "c"      |  NULL | NULL     |
|    1 | "3"   |     0 |  NULL | NULL     |     1 | "b"      |
|    1 | "3"   |     0 |  NULL | NULL     |     2 | "a"      |
|    1 | "3"   |     0 |  NULL | NULL     |     3 | "c"      |
|    2 | 2     |     0 |     1 | 1        |  NULL | NULL     |
|    2 | 2     |     0 |     2 | 2        |  NULL | NULL     |
|    2 | 2     |     0 |  NULL | NULL     |     1 | 1        |
|    2 | 2     |     0 |  NULL | NULL     |     2 | 2        |
|    3 | NULL  |     1 |     1 | "zzz"    |  NULL | NULL     |
|    3 | NULL  |     1 |  NULL | NULL     |     1 | "zzz"    |
|    4 | 0     |     0 |     1 | 0.1      |  NULL | NULL     |
|    4 | 0     |     0 |     2 | 0.02     |  NULL | NULL     |
|    4 | 0     |     0 |  NULL | NULL     |     1 | 0.1      |
|    4 | 0     |     0 |  NULL | NULL     |     2 | 0.02     |
+------+-------+-------+-------+----------+-------+----------+
16 rows in set (0,00 sec)

Other popular data bases have similar capabilities, e.g. PostGreSQL
(json_each[1], SQL server (OPENJSON [2]), Oracle DB (JSON_TABLE[3]).
This implementation is designed to be close follow Oracle's, but somewhat more
limited. What we miss compared to Oracle:
*) JSON query wrapper clause for JSON QUERY column. Same as above, we rely on
already existing implicit conversion
*) JSON TABLE ON ERROR clause. Currently we always throw an error from JSON
functions on incorrect input. So for consistency reasons this one is dropped.

A related (inverse) capability not covered in this work log is the ability to
compose a single JSON value based on a result set, cf. 
JSON aggregation functions [4].

Beside the JSON_TABLE function, this WL introduces LATERAL JOIN for table
functions. Since currently there's only one table function, it'll  be
used only for JSON_TABLE.

[1] http://www.postgresql.org/docs/9.5/static/functions-json.html
[2]
http://blogs.msdn.com/b/jocapc/archive/2015/05/16/json-support-in-sql-server-2016.aspx
[3] https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6287

Functional requirements
-----------------------
F1) This WL should add the JSON_TABLE function which is able to represent JSON
document as a relational table according to HLS specification
F2) JSON_TABLE function should accept any JSON- and string- typed expression as
the data source
F3) JSON_TABLE should be able to refer to fields specified prior to it in the
FROM clause, except when JSON_TABLE is RIGHT JOIN'ed to a table
F4) JSON_TABLE should accept any string-typed, valid JSON path expression
F5) JSON_TABLE should use an in-memory tmp table, and overflow to disk when
needed.
F6) Attempt to save non-scalar data (array or object) to a scalar (an
sql-typed) field should trigger ON ERROR clause
F7) Missing value for a field should trigger ON MISSING clause.
F8) Default values for ON MISSING and ON ERROR clauses both are valid JSON doc
in a string. Specifying non-scalar default value for a scalar field should
throw an error.


Non-functional requirements
---------------------------
NF1) WL shouldn't introduce bugs or regressions

Overview
--------
This WL introduces a function of a new kind, table function, called
JSON_TABLE. It's used to extract data from a JSON document and form a relational
table, that could be processed further using SQL. It's a virtual table,
in this sense it's like a derived table, with only difference in how result
data is obtained. Just like a derived table, it's specified in the FROM
clause, uses tmp table to store its result, and is able to provide generated
indexes for ref access. 
In order to feed data to JSON_TABLE, this WL introduces LATERAL JOIN. It's
implicit and thus doesn't introduce any new syntax and is used only for
joining table functions.


New syntax
----------

JSON_TABLE (
  ,
   ) [AS] 

columns_clause:
  COLUMNS (  );

columns_list:
  column
  | columns_list , column
  ;

column:
      FOR ORDINALITY
  |    PATH  [  ] [  ]
  |    EXISTS PATH 
  |  NESTED [ PATH ]  
  ;

on_error:
  |  NULL ON ERROR
  |  ERROR ON ERROR
  |  DEFAULT  ON ERROR
  ;

on_empty:
  |  NULL ON EMPTY
  |  ERROR ON EMPTY
  |  DEFAULT  ON EMPTY
  ;

Alias is mandatory for table function.

JSON data source expr: expression that returns JSON data for JSON_TABLE. E.g a
constant ('{"a":1}'), a column (t1.jsn_data, given that table t1 is
specified prior to JSON_TABLE in the FROM clause), a function call 
( JSON_EXTRACT(t1,jsn_data,'$.post.comments')).

String path: a constant string containing JSON path expression. The path is
applied to the data source. The JSON value that matches the path is called the
row source and is used to generate a row of relational data. 
The COLUMNS clause evaluates the row source, finds specific JSON values within
the row source, and returns those JSON values as SQL values in individual
columns of a row of relational data.

There's 4 types of columns supported by JSON_TABLE:
1)  FOR ORDINALITY
  This column enumerates rows in the COLUMNS clause. Could be used to
  distinguish parent rows with same value for multiple rows generated by
  NESTED PATH clause. Count starts from 1, its type is UNSIGNED INT.

2)   PATH  [  ] [  ]
  Columns of this type are used to extract scalar values specified by
  .  is MySQL type and specifies the column type. JSON_TABLE
  extracts data as JSON then, using regular automatic conversion that is
  applicable to JSON data, coerces it to the column type. Depending on the
  column type, JSON_TABLE behaves slightly different:
    If the column type is an SQL type, then only scalar value could be saved
      in the column. Saving and object/array triggers 'on error' clause. Also,
      'on error' clause is triggered when error occur during coercion from
      value saved in JSON to the table field, e.g saving 'asd' to an integer
      field.
      Missing value triggers 'on empty' clause. A key for ref access could be
      generated on such column, similar to the automatically generated keys
      for derived tables.
    If the column type is varchar, then in addition to above, the saved values
      will be unquoted.
    If the columns type is JSON, then only missing values trigger the
      'on empty' clause. Objects/arrays are saved as is. Ref keys can't be
      generated on such column.
      
  Optional  clause defines what JSON_TABLE would do when data is
  object/array is saved (depending on type):
    ERROR: an error will be thrown
    NULL: the column will be set to NULL, this is the default behavior. If
    error occur during coercion, then a warning is thrown.
    DEFAULT : given string will be parsed as JSON (have to be
      valid) and stored instead of object/array. Warning is thrown if error is
      caused by coercion. Column type rules also apply to the default value. 
    When a value saved to field got truncated, e.g. saving 3.14159 into
    DECIMAL(10,1) field, a warning will be issued independently of ON ERROR
    clause. When default value got truncated, the warning issued only once per
    statement.

  Optional  clause defines what JSON_TABLE would do when data is
  missing (depending on type). This clause is also triggered on a column in a
  NESTED PATH clause when the latter doesn't have a match and NULL
  complemented row is produced for it. Clause  following values:
    ERROR: an error will be thrown
    NULL: the column will be set to NULL, this is the default behavior
    DEFAULT : given string will be parsed as JSON (have to be
      valid) and stored instead of missing value. Column type rules also apply
      to the default value.

    select * from json_table(
      '[{"a":"3"},{"a":2},{"b":1},{"a":0},{"a":[1,2]}]',
      "$[*]" columns (
        id for ordinality,
        jpath varchar(100) path "$.a" default '999' on error
          default '111' on empty,
        jsn_path json path "$.a" default '{"x": 333} on empty,
        jexst int exists path '$.b')
      ) as tt;
    +------+-------+------------+-------+
    | id   | jpath | jsn_path   | jexst |
    +------+-------+------------+-------+
    |    1 | 3     | "3"        |     0 |
    |    2 | 2     | 2          |     0 |
    |    3 | 111   | {"x": 333} |     1 |
    |    4 | 0     | 0          |     0 |
    |    5 | 999   | [1, 2]     |     0 |
    +------+-------+------------+-------+

3)   EXISTS PATH 
  This column returns either 0 or 1 depending whether the data is present at
  the location specified by .  is same as
  in previous type of column.

4) NESTED [ PATH ]  
  This clause allows to flatten nested objects or arrays in JSON data into a
  single row along with the JSON values from the parent object or array.
  Several PATH clauses allows to project JSON values from multiple levels of
  nesting into a single row.
  The path is relative to the parent path - row path of JSON_TABLE, or path of
  parent NESTED PATH clause in case of deeper nesting.

Column names are subject to the same rules and limitations as the fields names
of a regular tables.

Behaviour
---------
All constant strings are checked to be valid JSON/JSON Path, an error is
thrown otherwise. 

How records are produced
------------------------
Each match of the path in COLUMNS clause results in as single row of the
result table. E.g
 SELECT * FROM JSON_TABLE ('[1,2,3]', '$[*]' COLUMNS (
   num INT PATH '$')) AS jt;

results in

+------+
| num  |
+------+
|    1 |
|    2 |
|    3 |
+------+

Here '$[*]' matches each element of the array, and '$' saves its value in the
result field.

NESTED PATH clause produces set of records for each match in the COLUMNS
clause where is belongs to. If there's no match, all columns of the NESTED
PATH are set to NULL. This implements OUTER join between top clause and NESTED
PATH. INNER join could be emulated by applying a condition in WHERE clause. E.g

  SELECT * FROM JSON_TABLE (
    '[ {"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}, {"a":3}]',
    '$[*]' COLUMNS (
    a INT PATH '$.a',
    NESTED PATH '$.b[*]' COLUMNS (
      b INT PATH '$'
    ))) AS jt;

would yield

+------+------+
| a    | b    |
+------+------+
|    1 |   11 |
|    1 |  111 |
|    2 |   22 |
|    2 |  222 |
|    3 | NULL |
+------+------+

When there's two or more NESTED PATH clause in same COLUMNS clause they're
called sibling NESTED PATHs and processed one after another, one at a time.
While one NESTED PATH is producing records, columns of sibling NESTED PATH 
expressions are set to NULL. This results in total number of records for a 
single match in containing COLUMNS clause to be sum, not the product, of all
record produced by NESTED PATHs. For example:

  SELECT * FROM JSON_TABLE (
    '[ {"a": 1, "b": [11,111]}, {"a": 2, "b": [22,222]}]',
    '$[*]' COLUMNS (
    a INT PATH '$.a',
    NESTED PATH '$.b[*]' COLUMNS (
      b1 INT PATH '$'
    ),
    NESTED PATH '$.b[*]' COLUMNS (
      b2 INT PATH '$'
    ))) AS jt;

returns:

+------+------+------+
| a    | b1   | b2   |
+------+------+------+
|    1 |   11 | NULL |
|    1 |  111 | NULL |
|    1 | NULL |   11 |
|    1 | NULL |  111 |
|    2 |   22 | NULL |
|    2 |  222 | NULL |
|    2 | NULL |   22 |
|    2 | NULL |  222 |
+------+------+------+

The ORDINALITY column enumerates records produced by the COLUMNS clause, and
could be used to distinguish parent records of NESTED PATH, especially if
values in parent records are the same:

SELECT *
FROM JSON_TABLE(
  '[{"a": "a_val",
   "b": [
          {"c": "c_val",
           "l": [1,2]}
        ]
    }, {"a": "a_val",
     "b": [
            {"c": "c_val",
             "l": [11]},
            {"c": "c_val",
             "l": [22]}
           ]
    }]',
    '$[*]' COLUMNS (
     top_ord FOR ORDINALITY,
     apath VARCHAR(10) PATH '$.a',
     NESTED PATH '$.b[*]' COLUMNS (
        bpath VARCHAR(10) PATH '$.c',
        ord FOR ORDINALITY,
        NESTED PATH '$.l[*]' COLUMNS (
           lpath varchar(10) PATH '$'
        )
     )
   )) as jt;

results in following set:

+---------+---------+---------+------+-------+
| top_ord | apath   | bpath   | ord  | lpath |
+---------+---------+---------+------+-------+
|       1 |  a_val  |  c_val  |    1 | 1     |
|       1 |  a_val  |  c_val  |    1 | 2     |
|       2 |  a_val  |  c_val  |    1 | 11    |
|       2 |  a_val  |  c_val  |    2 | 22    |
+---------+---------+---------+------+-------+

Source document contain array of two elements and each produces two records. 
apath and bpath values are the same over whole result set and can't be used to
determine, whether lpath values are came from same parent or different. The
ord columns tells that first two values are came from single object since its
value is the same over set of records with top_ord == 1. Second two values are
from different objects, since ord's values are different over set of records
with top_ord == 2.

Execution algorithm
-------------------
The approach above is implemented in following way. Execution algorithm uses 
array of states, flattened list of fields from all COLUMNS clauses, and a
stack of columns. For each NESTED PATH clause a single state element is used.
Each field points to the appropriate state, this is set up during fix_fields.
NESTED PATH column points to two states, its own and children's. Stack is used
to deal with NESTED PATH clauses, both nested and sibling. Sibling NESTED PATH
columns are linked via next_nested/prev_nested fields.

Consider example:

JSON_TABLE('...', '$[*]' COLUMNS(
  id1 FOR ORDINALITY,
  NESTED PATH '$.a' COLUMNS (
    id2 FOR ORDINALITY),
  NESTED PATH '$.b' COLUMNS (
    id3 FOR ORDINALITY)
  )) AS JT

In-memory layout will be following:

states       |   1   |   2   |   3   |  4  |
             +-------+-------+-------+-----+
                ^      ^^^ ^   ^  ^     ^ ^
                |      ||| |   |  |     | +-----------+
                |      ||| |   |  |     +------+      |
         +------+      ||| +---------------+   |      |
         |   +---------+||   +-+  +-+      |   |      |
         |   |      +---+|   |      |      |   |      |
        own chld    |   own chld    |     own chld    |
         |   |      |    |   |      |      |   |      |
       +---------+-----+----------+-----+----------+------+  
fields | NP $[*] | id1 | NP '$.a' | id2 | NP '$.b' | id 3 |

Execution consists of two loops, main and stack. The former starts fetching
the data and fills row for storing in the result table. The latter continues
fetching data. Stack loop starts right after the main one.

Execution starts with obtaining JSON value from JSON_TABLE's source argument
and saving it to the state#1. 

Main loop
The Item_func_json_table::fill_list() starts
from the first column in the list, it's root NESTED PATH (top COLUMNS clause),
it seeks the given path and saves array of matches and the first matched value
in child's state - #2. Also it sets state's #2 rowid to 1. Then fill_list()
pushes current NESTED PATH (the root one) to stack and goes to the
next field. It's ORDINALITY column, takes rowid from state#2 and saves it to
the result item. Next is NESTED PATH, it seeks its own path '$.a' in the
current value in state#2, saves array of matches and first matched value to
child's state - #3. Again, fill_list() pushes current NESTED PATH (first
sibling) to the stack ( now it's ([NP1],[NP root])). Then it's again
ORDINALITY column. The next is NESTED PATH sibling #2. It checks whether there's
a previous sibling currently run (supplying records), and there is - NESTED PATH
#1, thus all fields (including nested ones) of NESTED PATH #2 are set to NULL
and skipped. fill_list() reaches end of the list of fields and writes the row
to the result table.

Stack loop
The top element is taken from the stack, its json iterator and rowid are
incremented. If iterator doesn't reach the end, then we can produce one more
record and fill_list() increments rowid in it's child state and repositions the
list to the column after the one from stack top and goes to the beginning of
main loop.
When there're no more matches the top element is popped (now it s ([NP root])).
If popped column doesn't have next sibling, then the fill_list() repeats the
stack loop with the new stack top.
When there's a next sibling NESTED PATH (like in our case, #2), the columns of
current one (#1) are set to NULL, NESTED PATH's #2 path '$.b' is sought
the current value in state#2, the array of matched values and first matched
value is saved to child's state - #4. Rowid in state #4 is set to 1.
The NESTED PATH #2 is pushed to the stack (now it's ([NP2],[NP root])), list of
fields is repositioned to column id3 (next after stack's top, NP2) and fill_list()
goes to the beginning of main loop.

Stack loop ends normally when there's no more elements in the stack. This means
there's no matches in the top NESTED PATH clause and this is end of JSON_TABLE
execution, fill_list() exits.

LATERAL JOIN
------------
This WL introduces a new kind of join - LATERAL JOIN. It's implicit, and thus
doesn't add any new syntax, and is always used only for table functions. It
allows table function to refer to fields from tables noted prior to table
function in the FROM clause. It's used to feed data to the table function. 
Consider example:

SELECT * FROM t1;
+-----------------------------------------------------------------------------+
| jsn                                                                         |
+-----------------------------------------------------------------------------+
| {"a": "a_val", "b": [{"c": "c_val", "l": [1, 2]}]}                          |
| {"a": "a_val", "b": [{"c": "c_val", "l": [11]}, {"c": "c_val", "l": [22]}]} |
+-----------------------------------------------------------------------------+

Then following query

SELECT jt.* FROM t1, JSON_TABLE(jsn,
  '$' COLUMNS (
    apath VARCHAR(10) PATH '$.a',
    NESTED PATH '$.b[*]' COLUMNS (
      bpath VARCHAR(10) PATH '$.c',
      ord FOR ORDINALITY,
      NESTED PATH '$.l[*]' COLUMNS (
      lpath varchar(10) PATH '$'
    )))) as jt;

would return

+---------+---------+------+-------+
| apath   | bpath   | ord  | lpath |
+---------+---------+------+-------+
|  a_val  |  c_val  |    1 | 1     |
|  a_val  |  c_val  |    1 | 2     |
|  a_val  |  c_val  |    1 | 11    |
|  a_val  |  c_val  |    2 | 22    |
+---------+---------+------+-------+

Lateral join is implemented by setting new member variable end_lateral_table
in SELECT_LEX to the table function's TABLE_LIST object. This variable is
taken into account when setting last name resolution table in
find_field_in_tables(). This way, when table function's TABLE_LIST is being
set up and fix_fields() is called on the source column, name resolution stops
right before the table function and name lookup can find the column only in
already opened tables.

Since data should be read when JSON_TABLE uses it, the table function can't be
RIGHT JOIN'ed to a table it feeds JSON data from.


EXPLAIN
-------
Each table function is printed by EXPLAIN as a regular table with an
addition. Structured explain adds two fields to output: "table_function" along
with the function name, and "using_temporary_table" , which is always set to
true. The whole output could look like following:

explain format=json select * from
  json_table(
    '[
          {"a":"3", "n": { "l": 1}},
          {"a":2, "n": { "l": 1}},
          {"b":1, "n": { "l": 1}},
          {"a":0, "n": { "l": 1}}
        ]',
    "$[*]" columns (
      id for ordinality,
      jpath varchar(100) path "$.a",
      jexst int exists path '$.b',
      nested path "$.n" columns (
        id_n for ordinality,
        jpath_n varchar(50) path "$.l")
    )
  ) as tt;
EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "7.22"
    },
    "table": {
      "table_name": "tt",
      "access_type": "ALL",
      "rows_examined_per_scan": 42,
      "rows_produced_per_join": 42,
      "filtered": "100.00",
      "table_function": "json_table",
      "using_temporary_table": true,
      "cost_info": {
        "read_cost": "3.02",
        "eval_cost": "4.20",
        "prefix_cost": "7.22",
        "data_read_per_join": "7K"
      },
      "used_columns": [
        "id",
        "jpath",
        "jexst",
        "id_n",
        "jpath_n"
      ]
    }
  }
}
Warnings:
Note	1003	/* select#1 */ select `tt`.`id` AS `id`,`tt`.`jpath` AS
`jpath`,`tt`.`jexst` AS `jexst`,`tt`.`id_n` AS `id_n`,`tt`.`jpath_n` AS
`jpath_n` from json_table('[\n      {"a":"3", "n": { "l": 1}},\n      {"a":2,
"n": { "l": 1}},\n      {"b":1, "n": { "l": 1}},\n      {"a":0, "n": { "l":
1}}\n    ]', '$[*]' columns (id for ordinality, jpath varchar(100) path '$.a',
jexst int exists path '$.b', nested path '$.n' columns (id_n for ordinality,
jpath_n varchar(50) path '$.l'))) `tt`

Notes on expanded query printout:
*) JSON data source expression is printed as is
*) JSON path expressions are always printed in single quotes
*) when both ON EMPTY and ON ERROR clauses are given, the latter always
follows the former

In traditional EXPLAIN, Those two, "Table function" and "Uses temporary table"
are added to the Extra field:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tt
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 42
     filtered: 100.00
        Extra: Table function: json_table; Using temporary
1 row in set, 1 warning (0,01 sec)

Overview
--------
In order to add support for table functions, this WL add/changes code in
following areas:
  *) parser, to add new syntax
  *) the JSON_TABLE table function itself
  *) resolver, to support LATERAL join
  *) optimizer and executor, to unify support for table functions with derived
  tables, as internally they're the same, except the way the data is filled.
  *) derived tables code, mostly renaming, to make it more sensible for table
  functions
  *) virtual tmp table code is extended to be able to create a real table.
  Added a wrapper around it.
  *) EXPLAIN code, to print out the table function
  *) List_iterator now can position itself to the given element

New structures
--------------
Several new structures are used to hold columns info and are used during
execution:

/// Column description for JSON_TABLE function
class Json_table_column: public Create_field
{
public:
  /// Column type
  enum jt_column_enum jtc_type;
  /// Type of ON ERROR clause
  enum jtc_on_enum  on_error;
  /// Type of ON MISSING clause
  enum jtc_on_enum  on_missing;
  /// Default value string for ON MISSING clause
  LEX_STRING        *default_missing_str;
  /// Parsed JSON for default value of ON MISSING clause
  Json_wrapper       default_missing_json;
  /// Default value string for ON ERROR clause
  LEX_STRING        *default_error_str;
  /// Parsed JSON string for ON ERROR clause
  Json_wrapper       default_error_json;
  /// List of nested columns, valid only for NESTED PATH
  List *nested_columns;
  /// Item to store column's value
  Item    *itm;
  /// Nested path
  LEX_STRING        m_path_str;
  /// parsed nested path
  Json_path         m_path_json;
  /// Pointer to data
  JT_stack_element  *jds;
  /// Pointer to child's data, valid only for NESTED PATH
  JT_stack_element  *child_jds, *parent_jds;
  //  Next sibling nested path
  Json_table_column *next_nested, *prev_nested, *last;
}

This class represents a column of JSON_TABLE during parse time and during
execution. Columns are grouped in the lists, each list represents a single
COLUMNS clause.

class JT_stack_element
{
public:
  /// Vector of found values
  Json_wrapper_vector v;
  /// Iterator for vector above
  Json_wrapper_vector::iterator it;
  /// Current value
  Json_wrapper w;
  /// Current rowid
  uint rowid;
  /// TRUE <=> NESTED PATH associated with this element is producing records
  bool in_run;
}

This class represents data source for columns in a NESTED PATH clause. The
data is filled and iterated, when a NESTED PATH column is processed. Columns
from that NESTED PATH use this element to extract their data. These
elements are stored in Item_func_json_table::jds array.

Here's an example how two above structures are used to represent a JSON_TABLE
call:

JSON_TABLE(..., '$[*]' COLUMNS (
  id FOR ORDINALITY,
  val JSON PATH '$.val',
  NESTED PATH '$.arr_1[*]' COLUMNS (
    arr1_id FOR ORDINALITY,
    arr1_val JSON PATH '$.'
  ),
  NESTED PATH '$.arr_2[*]' COLUMNS (
    arr2_id FOR ORDINALITY,
    arr2_val JSON PATH '$.'
  )) as JT

This will be represented as following:

      +-----------+------------------+-----------------+
      | NP '$[*]' |  NP '$.arr_1[*]' | NP '$.arr_2[*]' |
      +-----------+------------------+-----------------+
        ^            ^                 ^
    +---+            |                 |
    |                +---+             +------+
  ::jds                  |                    |
    |                    |                    |
< [ NP '$[*]' ] >        |                    |
    |                    |                    |
::nested_columns         |                    |
    |                    |                    |
    +-+                ::jds                ::jds
      v                  |                    | 
    < [ id ], [ val ], [ NP '$.arr_1[*]' ], [ NP '$.arr_2[*]' ]>
                         |                    |
                   ::nested_columns      ::nested_columns
                         |                    |
     +-------------------+           +--------+
     v                               v
   < [ arr1_id ], [ arr1_val ] >   < [ arr2_id ], [ arr2_val ] >

I.e. in total there will be 4 lists, linked via nested_columns.

Also there's two more lists that are used by JSON_TABLE implementation:
*) Item_func_json_table::vt_list - it's the list which is used to create
internal temporary table for the result. For the example above it'll contain
following:

  < [ id ], [ val ], [ arr1_id ], [ arr1_val ], [ arr2_id ], [ arr2_val ] >
*) Item_func_json_table::flat_list - this list is used for producing records.
Essentially it's the flattened tree of columns. In case of example, it'll look
like following:

< [ NP '$[*]' ], [ id ], [ val ], [ NP '$.arr_1[*]' ],

     [ arr1_id ], [ arr1_val ], [ NP '$.arr_2[*]' ], [ arr2_id ], [ arr2_val ] >

Beside above, JSON_TABLE uses list of Items, one Item per non-NESTED PATH
column, which are used to store values of a single row. When all columns are
filled, i.e columns' values are stored into the Items, these Items are written
to the temporary table. The list is stored in Item_func_json_table::row.

JSON_TABLE function
-------------------
The table function itself is implemented as the Item_func_json_table class,
an Item_json_func subclass. The fix_fields() function fixes the first arg 
(data source) and validates all paths, default values, generates vt_list and
flat_list lists recursively traversing the parsed tree of COLUMN clauses lists
with help of create_json_table_col_list() function. This function handles
single COLUMNS cause list at a time.

Entry point for filling the result table is val_int() function. The function
execution, which algorithm is described in the HLS, is split in 3 functions:
*) val_int() fetches the value from first arg and stores it in the first data
  source element - jds[0], then calls
*) fill_list() function, it implements both, the main JSON_TABLE execution
loop and the stack loop. The former goes through the flat_list and calls
fill_column() for each column, the latter picks appropriate NESTED_PATH column
for producing the next row. For setting columns of a NESTED PATH to their
default values, the set_subtree_to_default() helper function is used.
*) fill_column() actually stores data to the result table. It handles all
kinds of columns. 

Parser
------
The changes to parser are trivial and simply supports new syntax (described in
HLS). The WL adds 4 new tokens: JSON_TABLE, NESTED, ORDINALITY, PATH.

Resolver
--------
In resolver this WL implements the implicit LATERAL join for table functions.
It's always used to join table functions, JSON_TABLE in particular. There's no
way to enable it for regular tables. Lateral join is automatically disabled
when table function's source is constant, i.e doesn't force table function to
be dependent on other tables. The lateral join is implemented by using new
member variable SELECT_LEX::end_lateral_table. When it's non-NULL, it's used as
the last name resolution table, and is set to point to table function's
TABLE_LIST object by SELECT_LEX::resolve_derived(). So when
TABLE_LIST::setup_table_function() calls fix_fields() on the first arg
(JSON_TABLE's data source) then resolver can find fields only in the already
opened tables prior to the JSON_TABLE.

Optimizer and Executor
----------------------
As table functions are unified with derived tables, there is almost no need for
dedicated handling of the table functions in both optimizer and executor.
Minor exceptions are:
*) TABLE_LIST::fetch_number_of_rows() always return 42, as there's no way to
estimate how many rows a table function will return.
*) Executor uses new helper function - join_materialize_table_function() to
fill the table function's result table. It basically calls val_int() on the
table function.
*) A new wrapper class is added to handle the table function's result table:
/**
  Class for writing result of a table function to the result table
*/

class Table_function_result : public Sql_alloc
{
  THD *thd;
  TABLE *table;
public:

  Table_function_result(THD *thd_arg)
    :thd(thd_arg), table(NULL)
  {}

  /**
    Create, but not instantiate the result table
  */
  bool create_result_table(List *column_types,
                           ulonglong options,
                           const char *table_alias);
  /**
    Write given row of items to the result table and handle overflow to disk
  */
  bool write_row(List *row);
  /**
    Returns a field with given index
  */
  Field *get_field(uint i)
  {
    DBUG_ASSERT(i < table->s->fields);
    return table->field[i];
  }
  /**
    Delete all rows in the table
  */
  void empty_table();

  /**
    Set the default row
  */
  void default_row() {};
  friend bool TABLE_LIST::setup_table_function(THD *thd);
};

*) Executor now can call materialization function many times, once per read
first row, if the new QEP_TAB::rematerialize flag is set, which happens when
JSON_TABLE's data source argument isn't a constant.

Derived tables
--------------
Table functions are unified with derived tables and are processed in a very
similar way. Although there're several exceptions:
*) Table functions are set up by means of TABLE_LIST::setup_table_function(),
instead of TABLE_LIST::setup_materialized_derived. It fixes the table
function's item itself (which fixes all its arguments) and instantiates the
Table_function_result object. The rest is same as for derived tables.
*) create_derived() is renamed to create_materialized_table(),
SELECT_LEX::resolve_derived() to SELECT_LEX::resolve_placeholder_tables(),
in order to make functions' names more relevant to the functions themselves.
*) create_materialized_table() doesn't support MyISAM tables as result table
for table functions. The reason is that MyISAM requires more wiring in and it
didn't worked out of the box, for some reason. Since there're already a plan
to drop MyISAM completely, it's been decided to not waste time on dealing with
MyISAM's issues.

Virtual tables
--------------
Prior to this WL, internal virtual tmp tables were used only for storing SP
variables, and real instantiated table wasn't needed. The virtual tmp table
code accepts list of Create_field (create_tmp_table accepts list of Items) and
this is more convenient for table functions. Due to this, virtual tmp table is
extended, to be able to instantiate real table for table functions.

EXPLAIN
-------
Changes in this area are pretty straightforward and just prints out JSON_TABLE
with its arguments in EXPLAIN's query printout. Also, for table functions
EXPLAIN prints "Table function: ; Uses tmp table" in the Extra
field.