WL#11785: Volcano iterator design

Affects: Server-8.0   —   Status: Complete

Make a new API for iterating over records that is powerful enough to replace all existing record iterator abstractions in MySQL, and use that API to replace READ_RECORD and the READ_RECORD-like interface in QEP_TAB (unifying the two).

No functional requirements.

Non-functional requirements: Performance will remain generally the same.

As of today, MySQL contains at least six different abstractions over the
concept of a row iterator (not counting the handler interface):

  1. The QUICK_SELECT_I interface, abstracting over various ways of reading
     indexes on the a table (so-called “quicks”), such as range scans,
     index merge or group min/max.
  2. The READ_RECORD interface, abstracting over QUICK_SELECT_I (#1),
     full table scans, full index scans, and sort buffer results.
  3. An interface that's part of QEP_TAB but doesn't have its own name,
     which abstracts over the function pointers in READ_RECORD (#2), and also
     various join-specific access types (REF, EQ_REF, full-text search etc.).
  4. QEP_operation, which abstracts over temporary tables or join buffering
     (used as part of BNL and BKA).
  5. QEP_TAB::next_select, abtracting over #4, nested-loop joins, GROUP BY
     processing, and a few other things.
  6. Query_result, abstracting over UNION, early EXISTS termination,
     sending results to the user, and probably a few other things.

#1, #4 and #6 are based on C++ classes; #2, #3 and #5 are based on C function
pointers. #4, #5 and #6 are partially push-based; the others are pull-based.
In addition, there are many kinds of operations that could be abstracted but
are done with ad-hoc code outside these four abstractions, such as table
materialization, (partially) sorting, and unique. None of the abstractions
are capable of composition (e.g. you can't feed sort/limit results directly
to another sort/limit).

This worklog aims to build a single abstraction that has a small API, but is
generic enough to unify the first four abstractions plus joins (possibly all
five); basically a unified, composable iterator abstraction borrowed from
the classic Volcano database system. The long-term benefits of such an
abstraction include:

  * An execution structure that can trivially support bushy joins, if the
    optimizer wants to output them.
  * A framework enabling better EXPLAIN and EXPLAIN ANALYZE, as the entire
    query execution is modelled with an iterator tree (where each call can be
    timed by inserting timing/counting iterators), instead of a flat list.
  * More unified data access throughout the server; e.g., filesort today has
    to know whether it's reading from a QUICK_SELECT_I or a simple table, with
    different code paths for each.
  * Reduced abstraction penalty; you do not need to pay for joins or
    subselects if you do not use them (e.g., today, even simple selects are
    modeled as a single-table nested loop join).

This specific worklog will not aim to realize all these benefits at once.
It will merely create the new abstraction and use it to replace abstractions
#2 and #3, as well as sorting. This yields a net result of one abstraction
less, and also reduces the confusion stemming from juggling C function
pointers around.

The new interface will consist of a C++ interface (with also some protected
helpers in the base class for convenience) called RowIterator, with the
following member functions:

  * Constructor and destructor.
  * Init(QEP_TAB*): Actually opens up all required resources, possibly also
    doing real work (e.g. in SortingIterator, Init() will do the sort);
    can fail. Init() can be called multiple times, to rewind the iterator.
  * Read(): Reads a single row, putting rows into the record buffers.
    Like the existing read_record() abstraction, returns -1 (EOF), 0 (OK)
    or 1 (error).
  * UnlockRow(): Like the existing rr_unlock_row, marks the row as “not
    part of the result set” (i.e., WHERE predicate failed), allowing more
    relaxed transaction isolation levels to drop any locks (and associated
    gap locks) on the last row read.

The parameter list for the constructor is deliberately vaguely specified,
as it is likely to change over time. In particular, in the first version,
we are likely to be taking in a TABLE* in all RowIterators, since we will
still not be somewhat bound to the TABLE abstraction to specify things like
read sets or do error handling (which is bound to the handler). However, this
limitation is likely to be lifted in the future.

As a positive side effect, the use of destructors makes resource cleanup less
spread throughout the code base, as the iterators can take ownership of data
and clean it up when it is no longer needed.

There will be no functional change from this worklog. There should be no
adverse performance effects.

RowIterator implementations made as part of this worklog:

  TableScanIterator - sequential scans
  IndexScanIterator - full scans along an index
  IndexRangeScanIterator - wraps a QUICK_SELECT_I
  SortingIterator - sorts the output of another iterator
  SortBufferIterator - reads sorted rows from a buffer (used by 
  SortBufferIndirectIterator - reads row IDs from a buffer and corresponding 
rows from a table (used by SortingIterator and some forms of unique)
  SortFileIterator - reads sorted rows from a file (used by SortingIterator)
  SortFileIndirectIterator - reads row ID from a file and corresponding rows 
from a table (used by SortingIterator and some forms of unique)
  RefIterator - right-hand side of REF join access
  RefOrNullIterator - right-hand side of REF_OR_NULL join access
  EQRefIterator - right-hand side of EQ_REF join acces
  ConstIterator - reads a const table (0 or 1 rows)
  FullTextSearchIterator - searches a full-text index
  DynamicRangeIterator - QS_DYNAMIC_RANGE; calls the range optimizer for each 
row and then wraps a QUICK_SELECT_I or table scan as needed
  PushedJoinRefIterator - reads the output of a join that was pushed down to NDB