MySQL :: New JSON format for EXPLAIN

MySQL Blog Archive

For the latest blogs go to blogs.oracle.com/mysql

New JSON format for EXPLAIN

Posted on Oct 31, 2024 by Magnus Brevik
Category: Optimizer
Tags:

MySQL provides two great tools for analyzing query plans: EXPLAIN and EXPLAIN ANALYZE. EXPLAIN displays the execution plan chosen by the optimizer and stops before executing, while EXPLAIN ANALYZE actually executes the query and gathers statistics about processing time and number of rows returned by each operation. The output format can be tabular (TRADITIONAL), TREE or JSON. The first two are intended for humans, while the JSON format is primarily aimed at machines, while still being human readable. The JSON format should be the ideal format for automated query analysis, but it has not been properly representing the plan structure since we switched to iterator-based execution plans. This has made it impossible to use the JSON format with EXPLAIN ANALYZE, as EXPLAIN ANALYZE is directly tied to the iterator structure of the plans.

In MySQL 8.3 Community Edition we introduced a new JSON format for EXPLAIN and EXPLAIN ANALYZE, along with the system variable "explain_json_format_version={1,2}" to switch between the formats. The new JSON format reflects how the iterators are built and is a direct match for the TREE format. Each object in the JSON format corresponds to a line in the TREE format, but the JSON format contains more information in a machine readable format that can be accessed by e.g. EXPLAIN INTO or a client side application. This can be especially useful with EXPLAIN ANALYZE, which is supported by the new JSON format, as well as the recently added EXPLAIN INTO and EXPLAIN FOR SCHEMA functionality. All values in the EXPLAIN tree can thus be accessed through the JSON format and analyzed programmatically.

How to use the new JSON format

We cannot talk about a new feature without an example, so let's look at a simple SELECT query:

mysql> EXPLAIN FORMAT=TREE SELECT name, quantity FROM orders JOIN items i ON item_id = i.id WHERE quantity > 1000;
-> Nested loop inner join  (cost=49828 rows=76470)
    -> Filter: ((orders.quantity > 1000) and (orders.item_id is not null))  (cost=23063 rows=76470)
        -> Table scan on orders  (cost=23063 rows=229432)
    -> Single-row index lookup on i using PRIMARY (id=orders.item_id)  (cost=0.25 rows=1)

The TREE format, while compact and easy to read and understand for a human reader, would require manual parsing to analyze in code. With EXPLAIN FORMAT=JSON you can get a JSON object instead, which can be handled with JSON functions either client-side in your preferred programming language, or in MySQL with EXPLAIN INTO. This JSON format was introduced in MySQL 5.6, and is reflective of the plan structure as it was at that time. With the conversion to iterator-based plans this old JSON EXPLAIN format was no longer representative of the internal plan structure.

# Old JSON EXPLAIN format, default
mysql> SET explain_json_format_version=1;
mysql> EXPLAIN FORMAT=JSON SELECT name, quantity FROM orders JOIN items i ON item_id = i.id WHERE quantity > 1000\G
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "49827.84"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "orders",
          "access_type": "ALL",
          "possible_keys": [
            "fk_item_id"
          ],
          "rows_examined_per_scan": 229432,
          "rows_produced_per_join": 76469,
          "filtered": "33.33",
          "cost_info": {
            "read_cost": "15416.48",
            "eval_cost": "7646.97",
            "prefix_cost": "23063.45",
            "data_read_per_join": "1M"
          },
          "used_columns": [
            "item_id",
            "quantity"
          ],
          "attached_condition": "((`customer`.`orders`.`quantity` > 1000) and (`customer`.`orders`.`item_id` is not null))"
        }
      },
      {
        "table": {
          "table_name": "i",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "id"
          ],
          "key_length": "4",
          "ref": [
            "customer.orders.item_id"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 76469,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "19117.42",
            "eval_cost": "7646.97",
            "prefix_cost": "49827.84",
            "data_read_per_join": "19M"
          },
          "used_columns": [
            "id",
            "name"
          ]
        }
      }
    ]
  }
}

As we can see from this example the iterator structure from the actual execution plan shown in the TREE format is not reflected in the old JSON format. The filter iterator is more of a footnote in the table scan on the orders table, and all the information about rows and cost of the join are under the two table objects. While the values are correct the structure is wrong, and this makes it harder to use the information to optimize your queries.

With the development of the new MySQL hypergraph optimizer, available in HeatWave MySQL, a new JSON EXPLAIN format was introduced, as the old format has no relation to the legacy, the planning process, or the final plan created by the optimizer. Using the Hypergraph optimizer with EXPLAIN FORMAT=JSON will always yield this new JSON format. If you are using the old optimizer and running MySQL 8.3 or newer, the new JSON format can be accessed by setting the system variable "explain_json_format_version" to 2.

# New JSON EXPLAIN format
mysql> SET explain_json_format_version=2;
mysql> EXPLAIN FORMAT=JSON SELECT name, quantity FROM orders JOIN items i ON item_id = i.id WHERE quantity > 1000;
{
  "query": "/* select#1 */ select `i`.`name` AS `name`,`customer`.`orders`.`quantity` AS `quantity` from `customer`.`orders` join `customer`.`items` `i` where ((`i`.`id` = `customer`.`orders`.`item_id`) and (`customer`.`orders`.`quantity` > 1000))",
  "inputs": [
    {
      "inputs": [
        {
          "operation": "Table scan on orders",
          "table_name": "orders",
          "access_type": "table",
          "schema_name": "customer",
          "used_columns": [
            "item_id",
            "quantity"
          ],
          "estimated_rows": 229432,
          "estimated_total_cost": 23063.45
        }
      ],
      "condition": "((orders.quantity > 1000) and (orders.item_id is not null))",
      "operation": "Filter: ((orders.quantity > 1000) and (orders.item_id is not null))",
      "access_type": "filter",
      "estimated_rows": 76469.68433094025,
      "estimated_total_cost": 23063.45
    },
    {
      "alias": "i",
      "covering": false,
      "operation": "Single-row index lookup on i using PRIMARY (id = orders.item_id)",
      "index_name": "PRIMARY",
      "table_name": "items",
      "access_type": "index",
      "schema_name": "customer",
      "used_columns": [
        "id",
        "name"
      ],
      "estimated_rows": 1,
      "lookup_condition": "id = orders.item_id",
      "index_access_type": "index_lookup",
      "estimated_total_cost": 0.25000130770776513
    }
  ],
  "join_type": "inner join",
  "operation": "Nested loop inner join",
  "access_type": "join",
  "estimated_rows": 76469.68433094025,
  "join_algorithm": "nested_loop",
  "estimated_total_cost": 49827.83951582908
}

We won't go through every field in the new JSON format, as most of them are the same as or renamed versions of the old format, or pretty self-explanatory, but I will mention a couple of them.

As mentioned earlier, the new JSON format is a direct match to the TREE format. The top iterator of the TREE format is the top level object in the JSON format, and the child iterators can be found in the "inputs" field in the JSON objects. To map each iterator to the relevant JSON object, you can look at the "operation" field in the JSON, which is what gets printed in the TREE format.

A field present in the old JSON format which has now slightly changed behaviour is "table_name". In the old format "table_name" was actually the alias of the table and not the name of the underlying table. This has been changed in the new format, where "table_name" is now the base table name, and the new "alias" field has been added to table access iterators when the table has an alias. The alias is still what is referred to in the parent iterators of the table accesses.

Using the new JSON format with EXPLAIN ANALYZE

Another neat feature with the new JSON format is that, while the old JSON format was not suitable for showing EXPLAIN ANALYZE output, since the new format is based on the same iterator structure as the TREE format, we can now get the output of EXPLAIN ANALYZE in a JSON format. This allows programmatic analysis of the execution timing, as EXPLAIN ANALYZE FORMAT=JSON contains several fields with information that previously had to be manually parsed from the TREE format.
Let's look at EXPLAIN ANALYZE of our previous query:

mysql> EXPLAIN ANALYZE FORMAT=TREE SELECT name, quantity FROM orders JOIN items i ON item_id = i.id WHERE quantity > 1000\G
-> Nested loop inner join (cost=49828 rows=76470) (actual time=1.95..5182 rows=227103 loops=1)
    -> Filter: ((orders.quantity > 1000) and (orders.item_id is not null)) (cost=23063 rows=76470) (actual time=1.82..1271 rows=227103 loops=1)
        -> Table scan on orders (cost=23063 rows=229432) (actual time=1.76..1135 rows=229376 loops=1)
    -> Single-row index lookup on i using PRIMARY (id=orders.item_id) (cost=0.25 rows=1) (actual time=0.0164..0.0164 rows=1 loops=227103)

And here is the same plan with FORMAT=JSON:

mysql> SET explain_json_format_version=2;
mysql> EXPLAIN ANALYZE FORMAT=JSON SELECT name, quantity FROM orders JOIN items i ON item_id = i.id WHERE quantity > 1000\G
{
  "query": "/* select#1 */ select `i`.`name` AS `name`,`customer`.`orders`.`quantity` AS `quantity` from `customer`.`orders` join `customer`.`items` `i` where ((`i`.`id` = `customer`.`orders`.`item_id`) and (`customer`.`orders`.`quantity` > 1000))",
  "inputs": [
    {
      "inputs": [
        {
          "operation": "Table scan on orders",
          "table_name": "orders",
          "access_type": "table",
          "actual_rows": 229376.0,
          "schema_name": "customer",
          "actual_loops": 1,
          "used_columns": [
            "item_id",
            "quantity"
          ],
          "estimated_rows": 229432.0,
          "actual_last_row_ms": 1123.953248,
          "actual_first_row_ms": 1.868662,
          "estimated_total_cost": 23063.45
        }
      ],
      "condition": "((orders.quantity > 1000) and (orders.item_id is not null))",
      "operation": "Filter: ((orders.quantity > 1000) and (orders.item_id is not null))",
      "access_type": "filter",
      "actual_rows": 227103.0,
      "actual_loops": 1,
      "estimated_rows": 76469.68433094025,
      "actual_last_row_ms": 1252.748685,
      "actual_first_row_ms": 1.92276,
      "estimated_total_cost": 23063.45
    },
    {
      "alias": "i",
      "covering": false,
      "operation": "Single-row index lookup on i using PRIMARY (id = orders.item_id)",
      "index_name": "PRIMARY",
      "table_name": "items",
      "access_type": "index",
      "actual_rows": 1.0,
      "schema_name": "customer",
      "actual_loops": 227103,
      "used_columns": [
        "id",
        "name"
      ],
      "estimated_rows": 1.0,
      "lookup_condition": "id = orders.item_id",
      "index_access_type": "index_lookup",
      "actual_last_row_ms": 0.016049919261304342,
      "actual_first_row_ms": 0.015992197227689638,
      "estimated_total_cost": 0.25000130770776513
    }
  ],
  "join_type": "inner join",
  "operation": "Nested loop inner join",
  "access_type": "join",
  "actual_rows": 227103.0,
  "actual_loops": 1,
  "estimated_rows": 76469.68433094025,
  "join_algorithm": "nested_loop",
  "actual_last_row_ms": 5071.693038,
  "actual_first_row_ms": 2.071419,
  "estimated_total_cost": 49827.83951582908
}

With EXPLAIN ANALYZE we get a few extra fields starting with "actual_", which contain information about the execution. In the TREE format these are the same as the execution information in the iterators like "(actual time=<actual_first_row_ms>..<actual_last_row_ms> rows=<actual_rows> loops=<actual_loops>)".

Summary

The new JSON format for EXPLAIN and EXPLAIN ANALYZE was first introduced in MySQL 8.3 Community Edition, and is now also available in the MySQL 8.4 LTS and 9.x Innovation releases on all platforms including HeatWave MySQL on OCI, AWS and Azure. This new format allows for detailed query analysis based on the actual execution plan, either locally or in the server with EXPLAIN INTO. All information found in the TREE format is easily accessible with JSON functions, so you don't have to manually parse the TREE output. This is especially useful with EXPLAIN ANALYZE, as it did not have a JSON format in earlier versions. The new JSON format is already in use by HeatWave Autopilot for index suggestions in the cloud, and should be helpful for gathering additional insights in your own applications as well. Remember to set "explain_json_format_version" to 2, or use the new MySQL hypergraph optimizer in HeatWave MySQL 9.x to enjoy the benefits of the new JSON format for EXPLAIN and EXPLAIN ANALYZE.

Thanks for using MySQL!