Handles materialization; the first call to Init() will scan the given iterator to the end, store the results in a temporary table (optionally with deduplication), and then Read() will allow you to read that table repeatedly without the cost of executing the given subquery many times (unless you ask for rematerialization).
More...
|
| MaterializeIterator (THD *thd, Operands operands, const MaterializePathParameters *path_params, unique_ptr_destroy_only< RowIterator > table_iterator, JOIN *join) |
|
bool | Init () override |
| Initialize or reinitialize the iterator. More...
|
|
int | Read () override |
| Read a single row. More...
|
|
void | SetNullRowFlag (bool is_null_row) override |
| Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row). More...
|
|
void | StartPSIBatchMode () override |
| Start performance schema batch mode, if supported (otherwise ignored). More...
|
|
void | EndPSIBatchModeIfStarted () override |
| Ends performance schema batch mode, if started. More...
|
|
void | UnlockRow () override |
|
const IteratorProfiler * | GetProfiler () const override |
| Get profiling data for this iterator (for 'EXPLAIN ANALYZE'). More...
|
|
const Profiler * | GetTableIterProfiler () const |
|
| TableRowIterator (THD *thd, TABLE *table) |
|
void | UnlockRow () override |
| The default implementation of unlock-row method of RowIterator, used in all access methods except EQRefIterator. More...
|
|
void | SetNullRowFlag (bool is_null_row) override |
| Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row). More...
|
|
void | StartPSIBatchMode () override |
| Start performance schema batch mode, if supported (otherwise ignored). More...
|
|
void | EndPSIBatchModeIfStarted () override |
| Ends performance schema batch mode, if started. More...
|
|
| RowIterator (THD *thd) |
|
virtual | ~RowIterator ()=default |
|
| RowIterator (const RowIterator &)=delete |
|
| RowIterator (RowIterator &&)=default |
|
virtual void | SetOverrideProfiler ([[maybe_unused]] const IteratorProfiler *profiler) |
|
virtual RowIterator * | real_iterator () |
| If this iterator is wrapping a different iterator (e.g. More...
|
|
virtual const RowIterator * | real_iterator () const |
|
|
bool | doing_hash_deduplication () const |
| Whether we are deduplicating using a hash field on the temporary table. More...
|
|
bool | MaterializeRecursive () |
| Recursive materialization happens much like regular materialization, but some steps are repeated multiple times. More...
|
|
bool | MaterializeOperand (const Operand &operand, ha_rows *stored_rows) |
|
int | read_next_row (const Operand &operand) |
|
bool | check_unique_fields_hash_map (TABLE *t, bool write, bool *found, bool *spill) |
| Check presence of row in hash map, and make hash map iterator ready for writing value. More...
|
|
void | backup_or_restore_blob_pointers (bool backup) |
| Save (or restore) blob pointers in Field::m_blob_backup . More...
|
|
void | update_row_in_hash_map () |
|
bool | store_row_in_hash_map () |
| Store the current row image into the hash map. More...
|
|
bool | handle_hash_map_full (const Operand &operand, ha_rows *stored_rows) |
|
bool | process_row (const Operand &operand, Operands &operands, TABLE *t, uchar *set_counter_0, uchar *set_counter_1, bool *read_next) |
|
bool | process_row_hash (const Operand &operand, TABLE *t, ha_rows *stored_rows) |
|
bool | materialize_hash_map (TABLE *t, ha_rows *stored_rows) |
| Walk through de-duplicated rows from in-memory hash table and/or spilled overflow HF chunks [1] and write said rows to table t, updating stored_rows counter. More...
|
|
bool | load_HF_row_into_hash_map () |
| We just read a row from a HF chunk file. More...
|
|
void | init_hash_map_for_new_exec () |
|
template<typename Profiler>
class anonymous_namespace{composite_iterators.cc}::MaterializeIterator< Profiler >
Handles materialization; the first call to Init() will scan the given iterator to the end, store the results in a temporary table (optionally with deduplication), and then Read() will allow you to read that table repeatedly without the cost of executing the given subquery many times (unless you ask for rematerialization).
When materializing, MaterializeIterator takes care of evaluating any items that need so, and storing the results in the fields of the outgoing table – which items is governed by the temporary table parameters.
Conceptually (although not performance-wise!), the MaterializeIterator is a no-op if you don't ask for deduplication[1], and in some cases (e.g. when scanning a table only once), we elide it. However, it's not necessarily straightforward to do so by just not inserting the iterator, as the optimizer will have set up everything (e.g., read sets, or what table upstream items will read from) assuming the materialization will happen, so the realistic option is setting up everything as if materialization would happen but not actually write to the table; see StreamingIterator for details.
[1] if we have a UNION DISTINCT or INTERSECT or EXCEPT it is not a no-op
- for UNION DISTINCT MaterializeIterator de-duplicates rows via a key on the materialized table in two ways: a) a unique key if possible or a non-unique key on a hash of the row, if not. For details, see
create_tmp_table
.
- INTERSECT and EXCEPE use two ways: a) using in-memory hashing (with posible spill to disk), in which case the materialized table is keyless, or if this approach overflows, b) using a non-unique key on the materialized table, the keys being the hash of the rows.
MaterializeIterator conceptually materializes iterators, not JOINs or Query_expressions. However, there are many details that leak out (e.g., setting performance schema batch mode, slices, reusing CTEs, etc.), so we need to send them in anyway.
'Profiler' should be 'IteratorProfilerImpl' for 'EXPLAIN ANALYZE' and 'DummyIteratorProfiler' otherwise. It is implemented as a a template parameter rather than a pointer to a base class in order to minimize the impact this probe has on normal query execution.
template<typename Profiler >
bool anonymous_namespace{composite_iterators.cc}::MaterializeIterator< Profiler >::doing_hash_deduplication |
( |
| ) |
const |
|
inlineprivate |
Whether we are deduplicating using a hash field on the temporary table.
(This condition mirrors check_unique_fields().) If so, we compute a hash value for every row, look up all rows with the same hash and manually compare them to the row we are trying to insert.
Note that this is not the common way of deduplicating as we go. The common method is to have a regular index on the table over the right columns, and in that case, ha_write_row() will fail with an ignorable error, so that the row is ignored even though check_unique_fields() is not called. However, B-tree indexes have limitations, in particular on length, that sometimes require us to do this instead. See create_tmp_table() for details.
template<typename Profiler >
void anonymous_namespace{composite_iterators.cc}::MaterializeIterator< Profiler >::SetNullRowFlag |
( |
bool |
is_null_row | ) |
|
|
inlineoverridevirtual |
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row).
This is used for outer joins, when an iterator hasn't produced any rows and we need to produce a NULL-complemented row. Init() or Read() won't necessarily reset this flag, so if you ever set is to true, make sure to also set it to false when needed.
Note that this can be called without Init() having been called first. For example, NestedLoopIterator can hit EOF immediately on the outer iterator, which means the inner iterator doesn't get an Init() call, but will still forward SetNullRowFlag to both inner and outer iterators.
TODO: We shouldn't need this. See the comments on AggregateIterator for a bit more discussion on abstracting out a row interface.
Implements RowIterator.
template<typename Profiler >
void anonymous_namespace{composite_iterators.cc}::MaterializeIterator< Profiler >::StartPSIBatchMode |
( |
| ) |
|
|
inlineoverridevirtual |
Start performance schema batch mode, if supported (otherwise ignored).
PFS batch mode is a mitigation to reduce the overhead of performance schema, typically applied at the innermost table of the entire join. If you start it before scanning the table and then end it afterwards, the entire set of handler calls will be timed only once, as a group, and the costs will be distributed evenly out. This reduces timer overhead.
If you start PFS batch mode, you must also take care to end it at the end of the scan, one way or the other. Do note that this is true even if the query ends abruptly (LIMIT is reached, or an error happens). The easiest workaround for this is to simply call EndPSIBatchModeIfStarted() on the root iterator at the end of the scan. See the PFSBatchMode class for a useful helper.
The rules for starting batch and ending mode are:
- If you are an iterator with exactly one child (FilterIterator etc.), forward any StartPSIBatchMode() calls to it.
- If you drive an iterator (read rows from it using a for loop or similar), use PFSBatchMode as described above.
- If you have multiple children, ignore the call and do your own handling of batch mode as appropriate. For materialization, #2 would typically apply. For joins, it depends on the join type (e.g., NestedLoopIterator applies batch mode only when scanning the innermost table).
The upshot of this is that when scanning a single table, batch mode will typically be activated for that table (since we call StartPSIBatchMode() on the root iterator, and it will trickle all the way down to the table iterator), but for a join, the call will be ignored and the join iterator will activate batch mode by itself as needed.
Reimplemented from RowIterator.
template<typename Profiler >
Profiler anonymous_namespace{composite_iterators.cc}::MaterializeIterator< Profiler >::m_profiler |
|
private |
Profiling data for this iterator.
Used for 'EXPLAIN ANALYZE'. Note that MaterializeIterator merely (re)materializes a set of rows. It delegates the task of iterating over those rows to m_table_iterator. m_profiler thus records:
- The total number of rows materialized (for the initial materialization and any subsequent rematerialization).
- The total time spent on all materializations.
It does not measure the time spent accessing the materialized rows. That is handled by m_table_iter_profiler. The example below illustrates what 'EXPLAIN ANALYZE' output will be like. (Cost-data has been removed for the sake of simplicity.) The second line represents the MaterializeIterator that materializes x1, and the first line represents m_table_iterator, which is a TableScanIterator in this example.
-> Table scan on x1 (actual time=t1..t2 rows=r1 loops=l1) -> Materialize CTE x1 if needed (actual time=t3..t4 rows=r2 loops=l2)
t3 is the average time (across l2 materializations) spent materializing x1. Since MaterializeIterator does no iteration, we always set t3=t4. 'actual time' is cumulative, so that the values for an iterator should include the time spent in all its descendants. Therefore we know that t1*l1>=t3*l2 . (Note that t1 may be smaller than t3. We may re-scan x1 repeatedly without rematerializing it. Restarting a scan is quick, bringing the average time for fetching the first row (t1) down.)