An iterator that removes consecutive rows that are the same according to a set of items (typically the join key), so-called “loose scan” (not to be confused with “loose index scan”, which is made by the range optimizer). More...

#include <composite_iterators.h>

Inheritance diagram for RemoveDuplicatesIterator:

[legend]

Public Member Functions
	RemoveDuplicatesIterator (THD thd, unique_ptr_destroy_only< RowIterator > source, JOIN join, std::span< Item * > group_items)

void	SetNullRowFlag (bool is_null_row) override
	Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row). More...

void	StartPSIBatchMode () override
	Start performance schema batch mode, if supported (otherwise ignored). More...

void	EndPSIBatchModeIfStarted () override
	Ends performance schema batch mode, if started. More...

void	UnlockRow () override

Public Member Functions inherited from RowIterator
	RowIterator (THD *thd)

virtual	~RowIterator ()=default

	RowIterator (const RowIterator &)=delete

	RowIterator (RowIterator &&)=default

bool	Init ()
	Initialize or reinitialize the iterator. More...

int	Read ()
	Read a single row. More...

virtual const IteratorProfiler *	GetProfiler () const
	Get profiling data for this iterator (for 'EXPLAIN ANALYZE'). More...

virtual void	SetOverrideProfiler (const IteratorProfiler *profiler)

virtual RowIterator *	real_iterator ()
	If this iterator is wrapping a different iterator (e.g. More...

virtual const RowIterator *	real_iterator () const

uint64_t	num_init_calls () const
	Returns the number of times Init() has been called on this iterator. More...

uint64_t	num_rows () const
	Returns the number of times Read() has returned a row successfully from this iterator. More...

uint64_t	num_full_reads () const
	Returns the number of times the iterator has been fully read. More...

Private Member Functions
bool	DoInit () override

int	DoRead () override

Private Attributes
unique_ptr_destroy_only< RowIterator >	m_source

Bounds_checked_array< Cached_item * >	m_caches

bool	m_first_row

Additional Inherited Members
Protected Member Functions inherited from RowIterator
THD *	thd () const

Detailed Description

An iterator that removes consecutive rows that are the same according to a set of items (typically the join key), so-called “loose scan” (not to be confused with “loose index scan”, which is made by the range optimizer).

This is similar in spirit to WeedoutIterator above (removing duplicates allows us to treat the semijoin as a normal join), but is much cheaper if the data is already ordered/grouped correctly, as the removal can happen before the join, and it does not need a temporary table.

Constructor & Destructor Documentation

◆ RemoveDuplicatesIterator()

RemoveDuplicatesIterator::RemoveDuplicatesIterator	(	THD *	thd,
		unique_ptr_destroy_only< RowIterator >	source,
		JOIN *	join,
		std::span< Item * >	group_items
	)

Member Function Documentation

◆ DoInit()

bool RemoveDuplicatesIterator::DoInit ( )

overrideprivatevirtual

Implements RowIterator.

◆ DoRead()

int RemoveDuplicatesIterator::DoRead ( )

overrideprivatevirtual

Implements RowIterator.

◆ EndPSIBatchModeIfStarted()

void RemoveDuplicatesIterator::EndPSIBatchModeIfStarted ( )

inlineoverridevirtual

Ends performance schema batch mode, if started.

It's always safe to call this.

Iterators that have children (composite iterators) must forward the EndPSIBatchModeIfStarted() call to every iterator they could conceivably have called StartPSIBatchMode() on. This ensures that after such a call to on the root iterator, all handlers are out of batch mode.

Reimplemented from RowIterator.

◆ SetNullRowFlag()

void RemoveDuplicatesIterator::SetNullRowFlag ( bool is_null_row )

inlineoverridevirtual

Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row).

This is used for outer joins, when an iterator hasn't produced any rows and we need to produce a NULL-complemented row. Init() or Read() won't necessarily reset this flag, so if you ever set is to true, make sure to also set it to false when needed.

Note that this can be called without Init() having been called first. For example, NestedLoopIterator can hit EOF immediately on the outer iterator, which means the inner iterator doesn't get an Init() call, but will still forward SetNullRowFlag to both inner and outer iterators.

TODO: We shouldn't need this. See the comments on AggregateIterator for a bit more discussion on abstracting out a row interface.

Implements RowIterator.

◆ StartPSIBatchMode()

void RemoveDuplicatesIterator::StartPSIBatchMode ( )

inlineoverridevirtual

Start performance schema batch mode, if supported (otherwise ignored).

PFS batch mode is a mitigation to reduce the overhead of performance schema, typically applied at the innermost table of the entire join. If you start it before scanning the table and then end it afterwards, the entire set of handler calls will be timed only once, as a group, and the costs will be distributed evenly out. This reduces timer overhead.

If you start PFS batch mode, you must also take care to end it at the end of the scan, one way or the other. Do note that this is true even if the query ends abruptly (LIMIT is reached, or an error happens). The easiest workaround for this is to simply call EndPSIBatchModeIfStarted() on the root iterator at the end of the scan. See the PFSBatchMode class for a useful helper.

The rules for starting batch and ending mode are:

If you are an iterator with exactly one child (FilterIterator etc.), forward any StartPSIBatchMode() calls to it.
If you drive an iterator (read rows from it using a for loop or similar), use PFSBatchMode as described above.
If you have multiple children, ignore the call and do your own handling of batch mode as appropriate. For materialization, #2 would typically apply. For joins, it depends on the join type (e.g., NestedLoopIterator applies batch mode only when scanning the innermost table).

The upshot of this is that when scanning a single table, batch mode will typically be activated for that table (since we call StartPSIBatchMode() on the root iterator, and it will trickle all the way down to the table iterator), but for a join, the call will be ignored and the join iterator will activate batch mode by itself as needed.

Reimplemented from RowIterator.

◆ UnlockRow()

void RemoveDuplicatesIterator::UnlockRow ( )

inlineoverridevirtual

Implements RowIterator.

Member Data Documentation

◆ m_caches

Bounds_checked_array<Cached_item *> RemoveDuplicatesIterator::m_caches

private

◆ m_first_row

bool RemoveDuplicatesIterator::m_first_row

private

◆ m_source

unique_ptr_destroy_only<RowIterator> RemoveDuplicatesIterator::m_source

private

The documentation for this class was generated from the following files:

sql/iterators/composite_iterators.h
sql/iterators/composite_iterators.cc

Public Member Functions

Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ RemoveDuplicatesIterator()

Member Function Documentation

◆ DoInit()

◆ DoRead()

◆ EndPSIBatchModeIfStarted()

◆ SetNullRowFlag()

◆ StartPSIBatchMode()

◆ UnlockRow()

Member Data Documentation

◆ m_caches

◆ m_first_row

◆ m_source