WL#12074: Volcano iterator executor base
Affects: Server-8.0
—
Status: Complete
Building on WL#11785 (Volcano iterator design), this worklog aims to deliver the basics of a new SQL executor, using a consistent iterator design.
No (new) functional requirements. All queries should remain the same. Even performance schema results should be the same, except for some unusual queries as detailed in “Performance considerations”. No (new) non-functional requirements. Performance should not regress.
The new executor will not replace the old one immediately; the two will live side-by-side for some time. (However, removing the old one is a prerequisite for a new join optimizer, as the new join optimizer will want to output e.g. bushy joins, which do not have a natural representation in the data structures used by the old one.) This is to reduce the scope of this worklog, and also mitigate risk should there be unexpected issues. Repeating and expanding on the long-term goals from WL#11785: * An execution structure that can trivially support bushy joins, if the optimizer wants to output them. * A framework enabling better EXPLAIN and EXPLAIN ANALYZE, as the entire query execution is modelled with an iterator tree (where each call can be timed by inserting timing/counting iterators), instead of a flat list. * Making nested loop no longer hard-wired into the executor; in particular, hash joins would be on equal footing with nested loop, making it significantly easier to implement (Erik already has a working prototype, although a production-ready implementation would also need a new join optimizer). * More composable and reusable structures with clearer separation of concern; e.g., aggregation (GROUP BY) is implemented twice today, with one copy for temporary tables and one for sending directly to the user (both intertwined with logic for HAVING and actually sending the finished result). After this specific worklog, the new executor will support queries consisting of primary and const tables, joins (except semijoins), filtering (WHERE/HAVING), grouping (except rollup), limit/offset and some forms of materialization (derived tables, and materialization before e.g. a final sorting). In particular, this means it does not support SELECT DISTINCT, CTEs, windowing functions, semijoins, loose scan or BNL/BKA; however, it is a useful subset, and will allow us to run e.g. nearly all DBT-3 queries, assuming BNL is turned off. MySQL will transparently fall back to the old executor whenever an unsupported query is given. New iterators include: * FilterIterator: Implements WHERE/HAVING (but see below). * LimitOffsetIterator: Implements LIMIT/OFFSET (except for SQL_CALC_ROWS, in which case it will only implement OFFSET). * AggregateIterator: Implements aggregate functions, and does grouping if GROUP BY is active. * NestedLoopiterator: Joins two iterators using a nested-loop join (inner join, outer join or anti-join). * MaterializeIterator: Reads from another iterator, puts the results into a temporary table and then reads those results back one or more times. * FakeSingleRowIterator: Returns a single row and then nothing. Used only in certain situations with const tables (e.g., if we only have const tables, we still need an iterator to read that single row from). The optimizer will not be changed as part of this worklog; we have to parse the existing structures set up for the old optimizer and construct our iterator tree from that. When we remove the old executor, we can make the optimizer output more suitable data structures, e.g. a set of simple structures corresponding directly to an iterator tree. In general, queries will return the same result 1:1, but there is one exception: For the MySQL extension dealing with aliases in HAVING, nondeterministic functions may be executed a different number of times from what the user expects. (A partial fix for this was added in 5.7, with a full fix coming in as a side effect of implementing window functions; this worklog effectively reverts to 5.7 behavior.) Given that this is an extension, only appears in very specific cases involving HAVING and temporary tables, and the number of executions of nondeterministic functions is poorly specified anyway, this was deemed acceptable. Regular EXPLAIN will not change at this time. However, we will support an experimental (unsupported, subject to change in later versions) tree format (e.g. EXPLAIN FORMAT=TREE SELECT ...), which will output the generated iterator tree, helping the user understand how execution was actually set up. The tree format needs more visual space but is also much more precise than the old EXPLAIN format; e.g., instead of just “Using filesort”, it will mark clearly what is sorted, and on which columns, and references to internal jargon will be removed (e.g. it says “Buffered reference lookup” instead of “eq_ref”). There will be no costs in this tree at the current stage. An example EXPLAIN output from a DBT-3 query, with some lines truncated for the benefit of the WL system: -> Aggregate: sum((lineitem.l_extendedprice * (1 - lineitem.l_discount))) -> Nested loop inner join -> Filter: (((part.p_brand = 'Brand#22') and (part.p_container [...] -> Table scan: part -> Filter: (((lineitem.l_shipinstruct = 'DELIVER IN PERSON') and [...] -> Reference lookup on lineitem using i_l_partkey (l_partkey=part.p_partkey)
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.