WL#5257: Create a first API for the Optimizer trace
Status: Complete
This is a sub-piece of WL#4800 "Optimizer trace/debugger". It is only about defining the interface that the "optimizer trace" module will expose to the rest of the Server sql/ code. Once the interface is developed and approved, the Optimizer team will be able to use this interface to add more tracing points into the optimizer code, which is WL#5842. Though the present WL will already add some tracing points as a proof of concept. This task is about creating a first version of the API. Tracing will be mainly for the use of the Optimizer team and Bug Verification team. We want to be allowed to change the shape and content of the trace at will at least for a long undefined time until we are satisfied with it, without caring for backward-compatibility, or with users' applications which may read and parse the trace. That is mainly because we are not sure up-front of what the best tracing is. The prototype, originally by Sergey Petrunia, is located on our usual internal central host in tree mysql-next-mr-opt-backporting-wl4800 .
Tracing is per-session (you can enable tracing for your session only; you can see only your statement's trace). Trace is generated as a statement executes, kept available for the user to read it, and deleted when not needed (see "offset" and "limit" further down for defining this moment). Trace creation is incremental: the trace is built piece after piece: we first put into it information about constant tables detection, then about ranges, etc, following the existing flow of the Optimizer code. When the statement has finished executing, the trace is readable with select * from information_schema.OPTIMIZER_TRACE; Taking an example, this query: SELECT (SELECT 1 FROM t6 WHERE d = c) AS RESULT FROM t5 ; gives this trace: "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "join_preparation": { "select#": 2, "steps": [ { "expanded_query": "/* select#2 */ select 1 from `test`.`t6` where (`test`.`t6`.`d` = `test`.`t5`.`c`)" } ] } }, { "expanded_query": "/* select#1 */ select (/* select#2 */ select 1 from `test`.`t6` where (`test`.`t6`.`d` = `test`.`t5`.`c`)) AS `RESULT` from `test`.`t5`" } ] } }, { "join_optimization": { "select#": 1, "steps": [ { "records_estimation": [ { "database": "test", "table": "t5", "records": 1, "cost": 1, "table_type": "system" } ] }, { "attaching_conditions_to_tables": { "original_condition": null, "attached_conditions_computation": [ ], "attached_conditions_summary": [ ] } }, { "refine_plan": [ ] } ] } }, { "join_execution": { "select#": 1, "steps": [ { "subselect_execution": { "select#": 2, "steps": [ { "join_optimization": { "select#": 2, "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "(`test`.`t6`.`d` = NULL)", "steps": [ { "transformation": "equality_propagation", "resulting_condition": "multiple equal(NULL, `test`.`t6`.`d`)" }, { "transformation": "constant_propagation", "resulting_condition": "multiple equal(NULL, `test`.`t6`.`d`)" }, { "transformation": "trivial_condition_removal", "resulting_condition": "multiple equal(NULL, `test`.`t6`.`d`)" } ] } }, { "ref_optimizer_key_uses": [ { "database": "test", "table": "t6", "field": "d", "equals": "NULL", "null_rejecting": true } ] }, { "records_estimation": [ { "database": "test", "table": "t6", "range_analysis": { "table_scan": { "records": 2, "cost": 4.5034 }, "potential_range_indices": [ { "index": "d", "usable": true, "key_parts": [ "d" ] } ], "best_covering_index_scan": { "index": "d", "cost": 1.4233, "chosen": true }, "setup_range_conditions": [ { "impossible_condition": { "cause": "comparison_with_null_always_false" } } ], "impossible_range": true }, "records": 0, "cause": "impossible_where_condition" } ] } ], "empty_result": { "cause": "no matching row in const table" } } }, { "join_execution": { "select#": 2, "steps": [ ] } } ] } } ] } } ] } The format above is JSON. More details about JSON are in the Low Level Design; in short, a JSON object is an unordered set of key/value pairs (like a Perl associative array), and a JSON array is an ordered set of values (like a Perl list). The trace starts with a top object which has one key/value pair: the key is "steps", the value is an object, made of several key/value pairs which are: an object for key "join_preparation", one for key "join_optimization", etc. Each such object also contains sub-objects or arrays, and so on, down to the simplest type which is scalars (string, number, boolean). Such trace is filled only if the user has enabled it with "SET SESSION.OPTIMIZER_TRACE=enabled=on"; otherwise trace is not created, to save CPU time and memory. @@OPTIMIZER_TRACE is a "set of flags" variable (like @@OPTIMIZER_SWITCH), it has one master flag to enable/disable tracing ("enabled=on|off") and then additional flags to fine-tune the trace (described below). The trace is not only about SELECT statements: trace is generated for SELECT, EXPLAIN SELECT, INSERT or REPLACE ( with VALUES or SELECT), UPDATE/DELETE and their multi-table variants, SET (unless it manipulates @@@@optimizer_trace), DO, DECLARE/CASE/IF/RETURN (stored routines language elements). QA's automated tests add the requirement that keys can contain only characters in [a-zA-Z_-]. QA wants to automatically verify that a test is hitting a certain optimization so they will inspect keys and values. Likely they will have to parse the JSON output into an object of the test's native language and inspect this object ("does the object have property X and is it set to true?"); Python and Perl have modules to do such parsing. Requirement from Support: EXPLAIN SELECT should suffice to produce the trace; they may not always have the option to execute SELECT on the customers' machine, especially if the query is slow. We expect tracing to be rarely turned on, only when there is a DBA/user to look at it, and only for few sessions at a time. Trace can be un-compiled with a "cmake" flag (like cmake -DOPTIMIZER_TRACE=0). It is expected that release binaries will have it compiled in. A consequence is that the most performance-critical common case is: release binary with tracing compiled in but not enabled at runtime. It must show a performance loss as small as possible compared to a vanilla tree not having this WL. We don't want to have to duplicate code like this: DBUG_PRINT("info",("cost %g",cost)); OPT_TRACE(thd->opt_trace, add("cost",cost)); Thus, any optimizer trace operation, *even* if tracing is run-time disabled, has an implicit DBUG_PRINT inside. This way, only the second line above is needed, and several DBUG_PRINT() could be removed from the Optimizer code. A debug binary will refuse to build if optimizer trace is not compiled in. In case of crash (segfault, assertion), the stack-trace printer will print the optimizer trace to the error log, as a debugging help. As the trace grows, it needs more space, allocated with malloc() or realloc(); if such OS functions fail to allocate memory, the statement fails (this is a constraint of the implementation). New variables ============= 1) --optimizer-trace=name Controls tracing of the Optimizer: optimizer_trace=option=val[,option=val...], where option is one of {enabled, end_marker, one_line} and val is one of {on, off, default} All flags are off by default. 1a) "enabled": turns trace on or off for the session. 1b) "end_marker": - when "off", a key/value pair, where the value is an object, looks like: "key": { content of object } - when "on", it looks like: "key": { content of object } /* key */ the interest is that when "content of object" is large, many lines, it's hard to see what object "}" closes. Reminding of the key of the closing object is a way to help. But note that this isn't JSON-compliant. " 1c) "one_line": when "off" the trace looks like the long example above; when "on", no whitespace is put in the trace: no "\n", no indentation; this is not readable by a human but perfectly readable by JSON parsing programs; the only interest is to decrease memory usage and client/server transfer. 2) --optimizer-trace-features=name Enables/disables tracing of selected features of the Optimizer: optimizer_trace_features=option=val[,option=val...], where option is one of {greedy_search, range_optimizer, dynamic_range, repeated_subselect} and val is one of {on, off, default} All flags are on by default. 2a) "greedy_search": when "off", the search for a best order of tables (you know, the optimizer has to decide whether accessing t1 then t2 is better than accessing t2 then t1) is not traced; this may avoid explosion of the trace for many-table joins 2b) "range_optimizer": when "off", range optimizer is not traced 2c) "dynamic_range": when "off", the calls to the range optimizer during execution (when EXPLAIN says "range checked for each record") are traced only the first time 2d) "repeated_subselect": when "off", the multiple executions of a subquery are traced only the first time. 3) --optimizer-trace-max-mem-size=# Maximum allowed cumulated size of stored optimizer traces Default is 16kB. A 20-table query like in BUG#50595 (file bug.test there) generates a trace which is bigger than 4GB which is unreasonable to fit in memory; the culprit is greedy search. A configurable limit on the memory size of the trace is implemented, controllable by a session variable (with a low default to protect the innocent user). If the limit is exceeded, the trace is truncated, and a column MISSING_BYTES_BEYOND_MAX_MEM_SIZE of the INFORMATION_SCHEMA.OPTIMIZER_TRACE table shows how many bytes could not be written to the trace (if no truncation, the column contains 0). A smarter system may be done in WL#5297. 4) --optimizer-trace-limit=# Maximum number of shown optimizer traces (default 1) --optimizer-trace-offset=# Offset of first optimizer trace to show; see manual (default -1) When we create a trace, the previous trace is automatically deleted. So when executing a stored routine, only the trace of the execution's end will be visible: all the rest is lost. To allow the user to require tracing for a certain statement inside a stored routine (assuming the user cannot change the routine's code to insert SELECT * FROM INFORMATION_OPTIMIZER_TRACE into it), we will support SET optimizer_trace_offset=OFS,optimizer_trace_limit=LIM - if OFS>=0: the OPTIMIZER_TRACE table should contain traces of LIM statements after the OFSth statement counting from when tracing was enabled (i.e. optimizer_trace=enabled=on was issued); - if OFS<0: the OPTIMIZER_TRACE table should contain traces of LIM statements after the (-OFS)th statement counting backwards from the SELECT on OPTIMIZER_TRACE. - any statement reading OPTIMIZER_TRACE does not count - the default is OFS=-1: boils down to the last executed statement. The memory limit on optimizer trace's size applies to the concatenation of all which the implementation needs to keep in memory: - if OFS>=0, it means LIM traces - if OFS<0, it means (-OFS) traces, because, even though [-OFS,-OFS+LIM[ is shown in the table, [-OFS+LIM+1,last] also need to be in memory as they will be shown in the table after executing a few more statement ("the window slides"). If the memory limit is exceeded, the last ones won't be traced, and the MISSING_BYTES_BEYOND_MAX_MEM_SIZE column of INFORMATION_SCHEMA.OPTIMIZER_TRACE will tell count of missing bytes (this limit is cumulative over all remembered traces). If OFS>=0, tracing will stop. If OFS<0, a next statement will cause the oldest kept trace to be purged, which will make room. Security ======== When tracing is enabled, additional security checks are done; docs team: ping me and I'll explain (it's a bit complicated). Columns in INFORMATION_SCHEMA.OPTIMIZER_TRACE ============================================= "QUERY" (string): the verbatim text of the query "TRACE" (string): trace of this query "MISSING_BYTES_BEYOND_MAX_MEM_SIZE": how many bytes of trace have been thrown away because the optimizer-trace-max-mem-size limit was reached. "INSUFFICIENT_PRIVILEGES": is 1 if the trace cannot be shown due to insufficient privileges for the user. Misc points for documentation ============================= This should be mentioned in the doc of the feature: - if people want to save their trace to a file they should use SELECT INTO DUMPFILE instead of SELECT INTO OUTFILE (OUTFILE escapes '\n', making the trace non-JSON compliant). This should be mentioned in the manual: - EXPLAIN EXTENDED, when it prints the query as a warning, now prints numbers of "pieces" in the form of /*select#X*/; like: "/* select#1 */ select (/* select#2 */ select 1 from `test`.`t6` where (`test`.`t6`.`d` = `test`.`t5`.`c`)) AS `RESULT` from `test`.`t5`" These select#X correspond to the "id" column in EXPLAIN output, so the printout above can serve to decrypt this "id" column, which was so far just described in the manual as "The SELECT identifier. This is the sequential number of the SELECT within the query." in http://dev.mysql.com/doc/refman/5.5/en/explain-output.html (it's not trivial, when there are nested subqueries and unions, to know how the parser progresses, and thus not trivial to know what select number 5 corresponds to; in the printout it's more obvious). - We must note clearly that traces are subject to change; and we do it frequently as we understand more optimizer code and refactor it. Future ====== More ideas are pushed to WL#5297 which will happen later.
All the Low-Level Design is available, as Doxygen comments in file sql/opt_trace.h of the tree, or much more readable in autogenerated HTML at: [usual adress of our intranet]/~gbichot/mysql-wl4800/html/OPTIMIZER_TRACE.html The above is the main page which has an introduction and then links to each class which has Doxygen comments. A unit test has been added, to check that valid usage of the API produces expected JSON output, that invalid usage produces invalid JSON, and to test some important API functions.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.