Handles materialization; the first call to Init() will scan the given iterator to the end, store the results in a temporary table (optionally with deduplication), and then Read() will allow you to read that table repeatedly without the cost of executing the given subquery many times (unless you ask for rematerialization). More...

Inheritance diagram for MaterializeIterator< Profiler >:

[legend]

Classes
struct	Invalidator

Public Member Functions
	MaterializeIterator (THD thd, Mem_root_array< materialize_iterator::QueryBlock > query_blocks_to_materialize, const MaterializePathParameters path_params, unique_ptr_destroy_only< RowIterator > table_iterator, JOIN *join)

bool	Init () override
	Initialize or reinitialize the iterator. More...

int	Read () override
	Read a single row. More...

void	SetNullRowFlag (bool is_null_row) override
	Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row). More...

void	StartPSIBatchMode () override
	Start performance schema batch mode, if supported (otherwise ignored). More...

void	EndPSIBatchModeIfStarted () override
	Ends performance schema batch mode, if started. More...

void	UnlockRow () override

const IteratorProfiler *	GetProfiler () const override
	Get profiling data for this iterator (for 'EXPLAIN ANALYZE'). More...

const Profiler *	GetTableIterProfiler () const

Public Member Functions inherited from TableRowIterator
	TableRowIterator (THD thd, TABLE table)

void	UnlockRow () override
	The default implementation of unlock-row method of RowIterator, used in all access methods except EQRefIterator. More...

void	SetNullRowFlag (bool is_null_row) override
	Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row). More...

void	StartPSIBatchMode () override
	Start performance schema batch mode, if supported (otherwise ignored). More...

void	EndPSIBatchModeIfStarted () override
	Ends performance schema batch mode, if started. More...

Public Member Functions inherited from RowIterator
	RowIterator (THD *thd)

virtual	~RowIterator ()=default

	RowIterator (const RowIterator &)=delete

	RowIterator (RowIterator &&)=default

virtual void	SetOverrideProfiler ([[maybe_unused]] const IteratorProfiler *profiler)

virtual RowIterator *	real_iterator ()
	If this iterator is wrapping a different iterator (e.g. More...

virtual const RowIterator *	real_iterator () const

Private Member Functions
bool	doing_hash_deduplication () const
	Whether we are deduplicating using a hash field on the temporary table. More...

bool	doing_deduplication () const
	Whether we are deduplicating, whether through a hash field or a regular unique index. More...

bool	MaterializeRecursive ()
	Recursive materialization happens much like regular materialization, but some steps are repeated multiple times. More...

bool	MaterializeQueryBlock (const materialize_iterator::QueryBlock &query_block, ha_rows *stored_rows)

Private Attributes
Mem_root_array< materialize_iterator::QueryBlock >	m_query_blocks_to_materialize

unique_ptr_destroy_only< RowIterator >	m_table_iterator

Common_table_expr *	m_cte
	If we are materializing a CTE, points to it (otherwise nullptr). More...

Query_expression *	m_query_expression
	The query expression we are materializing. More...

JOIN *const	m_join
	See constructor. More...

const int	m_ref_slice
	The slice to set when accessing temporary table; used if anything upstream (e.g. More...

const bool	m_rematerialize
	If true, we need to materialize anew for each Init() (because the contents of the table will depend on some outer non-constant value). More...

const bool	m_reject_multiple_rows
	See constructor. More...

const ha_rows	m_limit_rows
	See constructor. More...

Mem_root_array< Invalidator >	m_invalidators

Profiler	m_profiler
	Profiling data for this iterator. More...

Profiler	m_table_iter_profiler
	Profiling data for m_table_iterator. More...

Additional Inherited Members
Protected Member Functions inherited from TableRowIterator
int	HandleError (int error)

void	PrintError (int error)

TABLE *	table () const

Protected Member Functions inherited from RowIterator
THD *	thd () const

Detailed Description

template<typename Profiler>
class MaterializeIterator< Profiler >

Handles materialization; the first call to Init() will scan the given iterator to the end, store the results in a temporary table (optionally with deduplication), and then Read() will allow you to read that table repeatedly without the cost of executing the given subquery many times (unless you ask for rematerialization).

When materializing, MaterializeIterator takes care of evaluating any items that need so, and storing the results in the fields of the outgoing table – which items is governed by the temporary table parameters.

Conceptually (although not performance-wise!), the MaterializeIterator is a no-op if you don't ask for deduplication, and in some cases (e.g. when scanning a table only once), we elide it. However, it's not necessarily straightforward to do so by just not inserting the iterator, as the optimizer will have set up everything (e.g., read sets, or what table upstream items will read from) assuming the materialization will happen, so the realistic option is setting up everything as if materialization would happen but not actually write to the table; see StreamingIterator for details.

MaterializeIterator conceptually materializes iterators, not JOINs or Query_expressions. However, there are many details that leak out (e.g., setting performance schema batch mode, slices, reusing CTEs, etc.), so we need to send them in anyway.

'Profiler' should be 'IteratorProfilerImpl' for 'EXPLAIN ANALYZE' and 'DummyIteratorProfiler' otherwise. It is implemented as a a template parameter rather than a pointer to a base class in order to minimize the impact this probe has on normal query execution.

Constructor & Destructor Documentation

◆ MaterializeIterator()

template<typename Profiler >

MaterializeIterator< Profiler >::MaterializeIterator	(	THD *	thd,
		Mem_root_array< materialize_iterator::QueryBlock >	query_blocks_to_materialize,
		const MaterializePathParameters *	path_params,
		unique_ptr_destroy_only< RowIterator >	table_iterator,
		JOIN *	join
	)

Parameters

thd	Thread handler.
query_blocks_to_materialize	List of query blocks to materialize.
path_params	MaterializePath settings.
table_iterator	Iterator used for scanning the temporary table after materialization.
join	When materializing within the same JOIN (e.g., into a temporary table before sorting), as opposed to a derived table or a CTE, we may need to change the slice on the join before returning rows from the result table. If so, join and ref_slice would need to be set, and query_blocks_to_materialize should contain only one member, with the same join.

Member Function Documentation

◆ doing_deduplication()

template<typename Profiler >

bool MaterializeIterator< Profiler >::doing_deduplication

private

Whether we are deduplicating, whether through a hash field or a regular unique index.

◆ doing_hash_deduplication()

template<typename Profiler >

bool MaterializeIterator< Profiler >::doing_hash_deduplication ( ) const

inlineprivate

Whether we are deduplicating using a hash field on the temporary table.

(This condition mirrors check_unique_constraint().) If so, we compute a hash value for every row, look up all rows with the same hash and manually compare them to the row we are trying to insert.

Note that this is not the common way of deduplicating as we go. The common method is to have a regular index on the table over the right columns, and in that case, ha_write_row() will fail with an ignorable error, so that the row is ignored even though check_unique_constraint() is not called. However, B-tree indexes have limitations, in particular on length, that sometimes require us to do this instead. See create_tmp_table() for details.

◆ EndPSIBatchModeIfStarted()

template<typename Profiler >

void MaterializeIterator< Profiler >::EndPSIBatchModeIfStarted

overridevirtual

Ends performance schema batch mode, if started.

It's always safe to call this.

Iterators that have children (composite iterators) must forward the EndPSIBatchModeIfStarted() call to every iterator they could conceivably have called StartPSIBatchMode() on. This ensures that after such a call to on the root iterator, all handlers are out of batch mode.

Reimplemented from RowIterator.

◆ GetProfiler()

template<typename Profiler >

const IteratorProfiler * MaterializeIterator< Profiler >::GetProfiler ( ) const

inlineoverridevirtual

Get profiling data for this iterator (for 'EXPLAIN ANALYZE').

Valid for TimingIterator, MaterializeIterator and TemptableAggregateIterator only.

Reimplemented from RowIterator.

◆ GetTableIterProfiler()

template<typename Profiler >

const Profiler * MaterializeIterator< Profiler >::GetTableIterProfiler ( ) const

inline

◆ Init()

template<typename Profiler >

bool MaterializeIterator< Profiler >::Init ( )

overridevirtual

Initialize or reinitialize the iterator.

You must always call Init() before trying a Read() (but Init() does not imply Read()).

You can call Init() multiple times; subsequent calls will rewind the iterator (or reposition it, depending on whether the iterator takes in e.g. a Index_lookup) and allow you to read the records anew.

Implements RowIterator.

◆ MaterializeQueryBlock()

template<typename Profiler >

bool MaterializeIterator< Profiler >::MaterializeQueryBlock	(	const materialize_iterator::QueryBlock &	query_block,
		ha_rows *	stored_rows
	)

private

Read the value of TABLE::m_set_counter from record[1]. The value can be found there after a call to check_unique_constraint if the row was found. Note that m_set_counter a priori points to record[0], which is used when writing and updating the counter.

◆ MaterializeRecursive()

template<typename Profiler >

bool MaterializeIterator< Profiler >::MaterializeRecursive

private

Recursive materialization happens much like regular materialization, but some steps are repeated multiple times.

Our general strategy is:

Materialize all non-recursive query blocks, once.
Materialize all recursive query blocks in turn.
Repeat #2 until no query block writes any more rows (ie., we have converged) – for UNION DISTINCT queries, rows removed by deduplication do not count. Each materialization sees only rows that were newly added since the previous iteration; see FollowTailIterator for more details on the implementation.

Note that the result table is written to while other iterators are still reading from it; again, see FollowTailIterator. This means that each run of #2 can potentially run many actual CTE iterations – possibly the entire query to completion if we have only one query block.

This is not how the SQL standard specifies recursive CTE execution (it assumes building up the new result set from scratch for each iteration, using the previous iteration's results), but it is equivalent, and more efficient for the class of queries we support, since we don't need to re-create the same rows over and over again.

◆ Read()

template<typename Profiler >

int MaterializeIterator< Profiler >::Read ( )

overridevirtual

Read a single row.

The row data is not actually returned from the function; it is put in the table's (or tables', in case of a join) record buffer, ie., table->records[0].

Return values

0	OK
-1	End of records
1	Error

Implements RowIterator.

◆ SetNullRowFlag()

template<typename Profiler >

void MaterializeIterator< Profiler >::SetNullRowFlag ( bool is_null_row )

inlineoverridevirtual

Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row).

This is used for outer joins, when an iterator hasn't produced any rows and we need to produce a NULL-complemented row. Init() or Read() won't necessarily reset this flag, so if you ever set is to true, make sure to also set it to false when needed.

Note that this can be called without Init() having been called first. For example, NestedLoopIterator can hit EOF immediately on the outer iterator, which means the inner iterator doesn't get an Init() call, but will still forward SetNullRowFlag to both inner and outer iterators.

TODO: We shouldn't need this. See the comments on AggregateIterator for a bit more discussion on abstracting out a row interface.

Implements RowIterator.

◆ StartPSIBatchMode()

template<typename Profiler >

void MaterializeIterator< Profiler >::StartPSIBatchMode ( )

inlineoverridevirtual

Start performance schema batch mode, if supported (otherwise ignored).

PFS batch mode is a mitigation to reduce the overhead of performance schema, typically applied at the innermost table of the entire join. If you start it before scanning the table and then end it afterwards, the entire set of handler calls will be timed only once, as a group, and the costs will be distributed evenly out. This reduces timer overhead.

If you start PFS batch mode, you must also take care to end it at the end of the scan, one way or the other. Do note that this is true even if the query ends abruptly (LIMIT is reached, or an error happens). The easiest workaround for this is to simply call EndPSIBatchModeIfStarted() on the root iterator at the end of the scan. See the PFSBatchMode class for a useful helper.

The rules for starting batch and ending mode are:

If you are an iterator with exactly one child (FilterIterator etc.), forward any StartPSIBatchMode() calls to it.
If you drive an iterator (read rows from it using a for loop or similar), use PFSBatchMode as described above.
If you have multiple children, ignore the call and do your own handling of batch mode as appropriate. For materialization, #2 would typically apply. For joins, it depends on the join type (e.g., NestedLoopIterator applies batch mode only when scanning the innermost table).

The upshot of this is that when scanning a single table, batch mode will typically be activated for that table (since we call StartPSIBatchMode() on the root iterator, and it will trickle all the way down to the table iterator), but for a join, the call will be ignored and the join iterator will activate batch mode by itself as needed.

Reimplemented from RowIterator.

◆ UnlockRow()

template<typename Profiler >

void MaterializeIterator< Profiler >::UnlockRow ( )

inlineoverridevirtual

Implements RowIterator.

Member Data Documentation

◆ m_cte

template<typename Profiler >

Common_table_expr* MaterializeIterator< Profiler >::m_cte

private

If we are materializing a CTE, points to it (otherwise nullptr).

Used so that we see if some other iterator already materialized the table, avoiding duplicate work.

◆ m_invalidators

template<typename Profiler >

Mem_root_array<Invalidator> MaterializeIterator< Profiler >::m_invalidators

private

◆ m_join

template<typename Profiler >

JOIN* const MaterializeIterator< Profiler >::m_join

private

See constructor.

◆ m_limit_rows

template<typename Profiler >

const ha_rows MaterializeIterator< Profiler >::m_limit_rows

private

See constructor.

◆ m_profiler

template<typename Profiler >

Profiler MaterializeIterator< Profiler >::m_profiler

private

Profiling data for this iterator.

Used for 'EXPLAIN ANALYZE'. Note that MaterializeIterator merely (re)materializes a set of rows. It delegates the task of iterating over those rows to m_table_iterator. m_profiler thus records:

The total number of rows materialized (for the initial materialization and any subsequent rematerialization).
The total time spent on all materializations.

It does not measure the time spent accessing the materialized rows. That is handled by m_table_iter_profiler. The example below illustrates what 'EXPLAIN ANALYZE' output will be like. (Cost-data has been removed for the sake of simplicity.) The second line represents the MaterializeIterator that materializes x1, and the first line represents m_table_iterator, which is a TableScanIterator in this example.

-> Table scan on x1 (actual time=t1..t2 rows=r1 loops=l1) -> Materialize CTE x1 if needed (actual time=t3..t4 rows=r2 loops=l2)

t3 is the average time (across l2 materializations) spent materializing x1. Since MaterializeIterator does no iteration, we always set t3=t4. 'actual time' is cumulative, so that the values for an iterator should include the time spent in all its descendants. Therefore we know that t1*l1>=t3*l2 . (Note that t1 may be smaller than t3. We may re-scan x1 repeatedly without rematerializing it. Restarting a scan is quick, bringing the average time for fetching the first row (t1) down.)

◆ m_query_blocks_to_materialize

template<typename Profiler >

Mem_root_array<materialize_iterator::QueryBlock> MaterializeIterator< Profiler >::m_query_blocks_to_materialize

private

◆ m_query_expression

template<typename Profiler >

Query_expression* MaterializeIterator< Profiler >::m_query_expression

private

The query expression we are materializing.

For derived tables, we materialize the entire query expression; for materialization within a query expression (e.g. for sorting or for windowing functions), we materialize only parts of it. Used to clear correlated CTEs within the unit when we rematerialize, since they depend on values from outside the query expression, and those values may have changed since last materialization.

◆ m_ref_slice

template<typename Profiler >

const int MaterializeIterator< Profiler >::m_ref_slice

private

The slice to set when accessing temporary table; used if anything upstream (e.g.

WHERE, HAVING) wants to evaluate values based on its contents. See constructor.

◆ m_reject_multiple_rows

template<typename Profiler >

const bool MaterializeIterator< Profiler >::m_reject_multiple_rows

private

See constructor.

◆ m_rematerialize

template<typename Profiler >

const bool MaterializeIterator< Profiler >::m_rematerialize

private

If true, we need to materialize anew for each Init() (because the contents of the table will depend on some outer non-constant value).

◆ m_table_iter_profiler

template<typename Profiler >

Profiler MaterializeIterator< Profiler >::m_table_iter_profiler

private

Profiling data for m_table_iterator.

'this' is a descendant of m_table_iterator in 'EXPLAIN ANALYZE' output, and 'elapsed time' should be cumulative. Therefore, m_table_iter_profiler will measure the sum of the time spent materializing the result rows and iterating over those rows.

◆ m_table_iterator

template<typename Profiler >

unique_ptr_destroy_only<RowIterator> MaterializeIterator< Profiler >::m_table_iterator

private

The documentation for this class was generated from the following file:

sql/iterators/composite_iterators.cc

Classes

Public Member Functions

Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ MaterializeIterator()

Member Function Documentation

◆ doing_deduplication()

◆ doing_hash_deduplication()

◆ EndPSIBatchModeIfStarted()

◆ GetProfiler()

◆ GetTableIterProfiler()

◆ Init()

◆ MaterializeQueryBlock()

◆ MaterializeRecursive()

◆ Read()

◆ SetNullRowFlag()

◆ StartPSIBatchMode()

◆ UnlockRow()

Member Data Documentation

◆ m_cte

◆ m_invalidators

◆ m_join

◆ m_limit_rows

◆ m_profiler

◆ m_query_blocks_to_materialize

◆ m_query_expression

◆ m_ref_slice

◆ m_reject_multiple_rows

◆ m_rematerialize

◆ m_table_iter_profiler

◆ m_table_iterator