WL#11785: Volcano iterator design
Make a new API for iterating over records that is powerful enough to replace all existing record iterator abstractions in MySQL, and use that API to replace READ_RECORD and the READ_RECORD-like interface in QEP_TAB (unifying the two).
No functional requirements.
Non-functional requirements: Performance will remain generally the same.
As of today, MySQL contains at least six different abstractions over the concept of a row iterator (not counting the handler interface): 1. The QUICK_SELECT_I interface, abstracting over various ways of reading indexes on the a table (so-called “quicks”), such as range scans, index merge or group min/max. 2. The READ_RECORD interface, abstracting over QUICK_SELECT_I (#1), full table scans, full index scans, and sort buffer results. 3. An interface that's part of QEP_TAB but doesn't have its own name, which abstracts over the function pointers in READ_RECORD (#2), and also various join-specific access types (REF, EQ_REF, full-text search etc.). 4. QEP_operation, which abstracts over temporary tables or join buffering (used as part of BNL and BKA). 5. QEP_TAB::next_select, abtracting over #4, nested-loop joins, GROUP BY processing, and a few other things. 6. Query_result, abstracting over UNION, early EXISTS termination, sending results to the user, and probably a few other things. #1, #4 and #6 are based on C++ classes; #2, #3 and #5 are based on C function pointers. #4, #5 and #6 are partially push-based; the others are pull-based. In addition, there are many kinds of operations that could be abstracted but are done with ad-hoc code outside these four abstractions, such as table materialization, (partially) sorting, and unique. None of the abstractions are capable of composition (e.g. you can't feed sort/limit results directly to another sort/limit). This worklog aims to build a single abstraction that has a small API, but is generic enough to unify the first four abstractions plus joins (possibly all five); basically a unified, composable iterator abstraction borrowed from the classic Volcano database system. The long-term benefits of such an abstraction include: * An execution structure that can trivially support bushy joins, if the optimizer wants to output them. * A framework enabling better EXPLAIN and EXPLAIN ANALYZE, as the entire query execution is modelled with an iterator tree (where each call can be timed by inserting timing/counting iterators), instead of a flat list. * More unified data access throughout the server; e.g., filesort today has to know whether it's reading from a QUICK_SELECT_I or a simple table, with different code paths for each. * Reduced abstraction penalty; you do not need to pay for joins or subselects if you do not use them (e.g., today, even simple selects are modeled as a single-table nested loop join). This specific worklog will not aim to realize all these benefits at once. It will merely create the new abstraction and use it to replace abstractions #2 and #3, as well as sorting. This yields a net result of one abstraction less, and also reduces the confusion stemming from juggling C function pointers around. The new interface will consist of a C++ interface (with also some protected helpers in the base class for convenience) called RowIterator, with the following member functions: * Constructor and destructor. * Init(QEP_TAB*): Actually opens up all required resources, possibly also doing real work (e.g. in SortingIterator, Init() will do the sort); can fail. Init() can be called multiple times, to rewind the iterator. * Read(): Reads a single row, putting rows into the record buffers. Like the existing read_record() abstraction, returns -1 (EOF), 0 (OK) or 1 (error). * UnlockRow(): Like the existing rr_unlock_row, marks the row as “not part of the result set” (i.e., WHERE predicate failed), allowing more relaxed transaction isolation levels to drop any locks (and associated gap locks) on the last row read. The parameter list for the constructor is deliberately vaguely specified, as it is likely to change over time. In particular, in the first version, we are likely to be taking in a TABLE* in all RowIterators, since we will still not be somewhat bound to the TABLE abstraction to specify things like read sets or do error handling (which is bound to the handler). However, this limitation is likely to be lifted in the future. As a positive side effect, the use of destructors makes resource cleanup less spread throughout the code base, as the iterators can take ownership of data and clean it up when it is no longer needed. There will be no functional change from this worklog. There should be no adverse performance effects. RowIterator implementations made as part of this worklog: TableScanIterator - sequential scans IndexScanIterator - full scans along an index IndexRangeScanIterator - wraps a QUICK_SELECT_I SortingIterator - sorts the output of another iterator SortBufferIterator - reads sorted rows from a buffer (used by SortingIterator) SortBufferIndirectIterator - reads row IDs from a buffer and corresponding rows from a table (used by SortingIterator and some forms of unique) SortFileIterator - reads sorted rows from a file (used by SortingIterator) SortFileIndirectIterator - reads row ID from a file and corresponding rows from a table (used by SortingIterator and some forms of unique) RefIterator - right-hand side of REF join access RefOrNullIterator - right-hand side of REF_OR_NULL join access EQRefIterator - right-hand side of EQ_REF join acces ConstIterator - reads a const table (0 or 1 rows) FullTextSearchIterator - searches a full-text index DynamicRangeIterator - QS_DYNAMIC_RANGE; calls the range optimizer for each row and then wraps a QUICK_SELECT_I or table scan as needed PushedJoinRefIterator - reads the output of a join that was pushed down to NDB