After code is generated, the low level sp_instr
instructions are optimized. The optimization focuses on two areas:
Dead code removal,
Jump shortcut resolution.
These two optimizations are performed together, as they both are a problem involving flow analysis in the graph that represents the generated code.
The code that implements these optimizations is
sp_head::optimize().
Do not confuse sp_head::optimize() with the
component named the optimizer, as they are
very different. The former is specific to Stored Programs, and
focuses on improving the flow of statements, whereas the latter
is general to queries, and focuses on finding the best execution
plan when executing a single statement. For the optimizer, see
http://dev.mysql.com/doc/en/optimization.html.
The (Stored Program) optimizer is invoked from only one place, in the following code:
db_load_routine(..., sp_head **sphp, ...)
{
...
(*sphp)->optimize();
...
}
By disabling the call to sp_head::optimize()
and recompiling the code, SHOW PROCEDURE CODE
will display the code before flow
optimization.
When investigating issues related to this area, you may want to
use a DBUG_EXECUTE_IF to avoid recompiling
the server with or without flow optimization every time. Be
careful to shutdown and restart the server with or without the
call to sp_head::optimize() for each test, or
you will find that caching of a Stored
Program code does interfere.
Dead code is also known as unreachable code: code that cannot possibly be executed, because no path in the logic leads to it.
For example, consider the following SQL code:
CREATE PROCEDURE proc_5()
BEGIN
DECLARE i INT DEFAULT 0;
again:
WHILE TRUE DO
BEGIN
set i:= i+1;
SELECT "This code is alive";
IF (i = 100) THEN
LEAVE again;
END IF;
ITERATE again;
SELECT "This code is dead";
END;
END WHILE;
END$$
Before flow optimization, the compiled code is:
SHOW PROCEDURE CODE proc_5; Pos Instruction 0 set i@0 0 1 jump_if_not 10(10) 1 2 set i@0 (i@0 + 1) 3 stmt 0 "SELECT "This code is alive"" 4 jump_if_not 7(7) (i@0 = 100) 5 jump 10 6 jump 7 7 jump 1 8 stmt 0 "SELECT "This code is dead"" 9 jump 1
Note the instruction at position 8: the previous instruction is an unconditional jump, so the flow of control can never reach 8 by coming from 7. Because there exists no jump in the entire code that leads to 8 either, the instruction at 8 is unreachable. By looking further in the flow, because 8 is unreachable and there are no jumps to position 9, the instruction at position 9 is also unreachable.
The instruction at position 6 is also unreachable, for a similar
reason: the THEN part of the if contains a
jump, due to the statement LEAVE again;, so
that the code never executes the jump generated by the compiler
to go from the end of the THEN block to the
statement following the IF.
After detecting all the unreachable instructions, and simplifying the code, the result after flow optimization is:
SHOW PROCEDURE CODE proc_5; Pos Instruction 0 set i@0 0 1 jump_if_not 10(10) 1 2 set i@0 (i@0 + 1) 3 stmt 0 "SELECT "This code is alive"" 4 jump_if_not 1(1) (i@0 = 100) 5 jump 10
The flow optimizer is good at detecting
most of the dead code, but has limitations.
For example, coding in SQL IF FALSE THEN ... END
IF; leads to code that can never be executed, but
since the flow optimizer does neither propagate constants nor
consider impossible conditional jumps, this code will not be
removed.
The goal of the flow optimizer is mostly to perform simple local optimizations with a low cost. It's not a fully featured code optimizer, and does not guard against poor SQL.
The term jump shortcut refers to the following optimization: when instruction A is a jump (conditional or not) that goes to position B, and when B is an unconditional jump to position C, the code can be changed so that A can jump to C directly, taking a shortcut to avoid the unnecessary B. Consider the following SQL code:
CREATE PROCEDURE proc_6(x int, y int, z int)
BEGIN
SELECT "Start";
IF (x > 0)
THEN
BEGIN
SELECT "x looks ok";
IF (y > 0)
THEN
BEGIN
SELECT "so does y";
IF (z > 0)
THEN
SELECT "even z is fine";
ELSE
SELECT "bad z";
END IF;
END;
ELSE
SELECT "bad y";
END IF;
END;
ELSE
SELECT "bad x";
END IF;
SELECT "Finish";
END$$
Before flow optimization, the compiled code is:
SHOW PROCEDURE CODE proc_6; Pos Instruction 0 stmt 0 "SELECT "Start"" 1 jump_if_not 12(13) (x@0 > 0) 2 stmt 0 "SELECT "x looks ok"" 3 jump_if_not 10(11) (y@1 > 0) 4 stmt 0 "SELECT "so does y"" 5 jump_if_not 8(9) (z@2 > 0) 6 stmt 0 "SELECT "even z is fine"" 7 jump 9 8 stmt 0 "SELECT "bad z"" 9 jump 11 10 stmt 0 "SELECT "bad y"" 11 jump 13 12 stmt 0 "SELECT "bad x"" 13 stmt 0 "SELECT "Finish""
Note the jump 9 at position 7: since the
instruction at position 9 is jump 11, the
code at position 7 can be simplified to jump
11. The optimization is also recursive: since the
instruction 11 is jump 13, the final jump
destination for the instruction at position 7 is jump
13. Conditional jumps are optimized also, so that the
instruction 5: jump_if_not 8(9) can be
optimized to jump_if_not 8(13).
After flow optimization, the compiled code is:
SHOW PROCEDURE CODE proc_6; Pos Instruction 0 stmt 0 "SELECT "Start"" 1 jump_if_not 12(13) (x@0 > 0) 2 stmt 0 "SELECT "x looks ok"" 3 jump_if_not 10(13) (y@1 > 0) 4 stmt 0 "SELECT "so does y"" 5 jump_if_not 8(13) (z@2 > 0) 6 stmt 0 "SELECT "even z is fine"" 7 jump 13 8 stmt 0 "SELECT "bad z"" 9 jump 13 10 stmt 0 "SELECT "bad y"" 11 jump 13 12 stmt 0 "SELECT "bad x"" 13 stmt 0 "SELECT "Finish""
Note the differences with every jump instruction.
For clarity, this example has been designed to not involve dead code. Note that in general, an instruction that was reachable before taking a shortcut might become unreachable after the shortcut, so that the optimizations for jump shortcuts and dead code are tightly intertwined.
