MySQL 8.0.39
Source Code Documentation
WeedoutIterator Class Referencefinal

Like semijoin materialization, weedout works on the basic idea that a semijoin is just like an inner join as we long as we can get rid of the duplicates somehow. More...

#include <composite_iterators.h>

Inheritance diagram for WeedoutIterator:
[legend]

Public Member Functions

 WeedoutIterator (THD *thd, unique_ptr_destroy_only< RowIterator > source, SJ_TMP_TABLE *sj, table_map tables_to_get_rowid_for)
 
bool Init () override
 Initialize or reinitialize the iterator. More...
 
int Read () override
 Read a single row. More...
 
void SetNullRowFlag (bool is_null_row) override
 Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row). More...
 
void EndPSIBatchModeIfStarted () override
 Ends performance schema batch mode, if started. More...
 
void UnlockRow () override
 
- Public Member Functions inherited from RowIterator
 RowIterator (THD *thd)
 
virtual ~RowIterator ()=default
 
 RowIterator (const RowIterator &)=delete
 
 RowIterator (RowIterator &&)=default
 
virtual const IteratorProfilerGetProfiler () const
 Get profiling data for this iterator (for 'EXPLAIN ANALYZE'). More...
 
virtual void SetOverrideProfiler ([[maybe_unused]] const IteratorProfiler *profiler)
 
virtual void StartPSIBatchMode ()
 Start performance schema batch mode, if supported (otherwise ignored). More...
 
virtual RowIteratorreal_iterator ()
 If this iterator is wrapping a different iterator (e.g. More...
 
virtual const RowIteratorreal_iterator () const
 

Private Attributes

unique_ptr_destroy_only< RowIteratorm_source
 
SJ_TMP_TABLEm_sj
 
const table_map m_tables_to_get_rowid_for
 

Additional Inherited Members

- Protected Member Functions inherited from RowIterator
THDthd () const
 

Detailed Description

Like semijoin materialization, weedout works on the basic idea that a semijoin is just like an inner join as we long as we can get rid of the duplicates somehow.

(This is advantageous, because inner joins can be reordered, whereas semijoins generally can't.) However, unlike semijoin materialization, weedout removes duplicates after the join, not before it. Consider something like

SELECT * FROM t1 WHERE a IN ( SELECT b FROM t2 );

Semijoin materialization solves this by materializing t2, with deduplication, and then joining. Weedout joins t1 to t2 and then leaves only one output row per t1 row. The disadvantage is that this potentially needs to discard more rows; the (potential) advantage is that we deduplicate on t1 instead of t2.

Weedout, unlike materialization, works in a streaming fashion; rows are output (or discarded) as they come in, with a temporary table used for recording the row IDs we've seen before. (We need to deduplicate on t1's row IDs, not its contents.) See SJ_TMP_TABLE for details about the table format.

Constructor & Destructor Documentation

◆ WeedoutIterator()

WeedoutIterator::WeedoutIterator ( THD thd,
unique_ptr_destroy_only< RowIterator source,
SJ_TMP_TABLE sj,
table_map  tables_to_get_rowid_for 
)

Member Function Documentation

◆ EndPSIBatchModeIfStarted()

void WeedoutIterator::EndPSIBatchModeIfStarted ( )
inlineoverridevirtual

Ends performance schema batch mode, if started.

It's always safe to call this.

Iterators that have children (composite iterators) must forward the EndPSIBatchModeIfStarted() call to every iterator they could conceivably have called StartPSIBatchMode() on. This ensures that after such a call to on the root iterator, all handlers are out of batch mode.

Reimplemented from RowIterator.

◆ Init()

bool WeedoutIterator::Init ( )
overridevirtual

Initialize or reinitialize the iterator.

You must always call Init() before trying a Read() (but Init() does not imply Read()).

You can call Init() multiple times; subsequent calls will rewind the iterator (or reposition it, depending on whether the iterator takes in e.g. a Index_lookup) and allow you to read the records anew.

Implements RowIterator.

◆ Read()

int WeedoutIterator::Read ( )
overridevirtual

Read a single row.

The row data is not actually returned from the function; it is put in the table's (or tables', in case of a join) record buffer, ie., table->records[0].

Return values
0OK
-1End of records
1Error

Implements RowIterator.

◆ SetNullRowFlag()

void WeedoutIterator::SetNullRowFlag ( bool  is_null_row)
inlineoverridevirtual

Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row).

This is used for outer joins, when an iterator hasn't produced any rows and we need to produce a NULL-complemented row. Init() or Read() won't necessarily reset this flag, so if you ever set is to true, make sure to also set it to false when needed.

Note that this can be called without Init() having been called first. For example, NestedLoopIterator can hit EOF immediately on the outer iterator, which means the inner iterator doesn't get an Init() call, but will still forward SetNullRowFlag to both inner and outer iterators.

TODO: We shouldn't need this. See the comments on AggregateIterator for a bit more discussion on abstracting out a row interface.

Implements RowIterator.

◆ UnlockRow()

void WeedoutIterator::UnlockRow ( )
inlineoverridevirtual

Implements RowIterator.

Member Data Documentation

◆ m_sj

SJ_TMP_TABLE* WeedoutIterator::m_sj
private

◆ m_source

unique_ptr_destroy_only<RowIterator> WeedoutIterator::m_source
private

◆ m_tables_to_get_rowid_for

const table_map WeedoutIterator::m_tables_to_get_rowid_for
private

The documentation for this class was generated from the following files: