BufferingWindowIterator is like WindowIterator, but deals with window functions that need to buffer rows. More...

#include <window_iterators.h>

Inheritance diagram for BufferingWindowIterator:

Public Member Functions
	BufferingWindowIterator (THD thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param temp_table_param, JOIN *join, int output_slice)

void	SetNullRowFlag (bool is_null_row) override
	Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row). More...

void	StartPSIBatchMode () override
	Start performance schema batch mode, if supported (otherwise ignored). More...

void	EndPSIBatchModeIfStarted () override
	Ends performance schema batch mode, if started. More...

void	UnlockRow () override

Public Member Functions inherited from RowIterator
	RowIterator (THD *thd)

virtual	~RowIterator ()=default

	RowIterator (const RowIterator &)=delete

	RowIterator (RowIterator &&)=default

bool	Init ()
	Initialize or reinitialize the iterator. More...

int	Read ()
	Read a single row. More...

virtual const IteratorProfiler *	GetProfiler () const
	Get profiling data for this iterator (for 'EXPLAIN ANALYZE'). More...

virtual void	SetOverrideProfiler (const IteratorProfiler *profiler)

virtual RowIterator *	real_iterator ()
	If this iterator is wrapping a different iterator (e.g. More...

virtual const RowIterator *	real_iterator () const

uint64_t	num_init_calls () const
	Returns the number of times Init() has been called on this iterator. More...

uint64_t	num_rows () const
	Returns the number of times Read() has returned a row successfully from this iterator. More...

uint64_t	num_full_reads () const
	Returns the number of times the iterator has been fully read. More...

Private Member Functions
bool	DoInit () override

int	DoRead () override

int	ReadBufferedRow (bool new_partition_or_eof)

Private Attributes
unique_ptr_destroy_only< RowIterator > const	m_source
	The iterator we are reading from. More...

Temp_table_param *	m_temp_table_param
	Parameters for the temporary table we are outputting to. More...

Window *	m_window
	The window function itself. More...

JOIN *	m_join
	The join we are a part of. More...

int	m_input_slice
	The slice we will be using when reading rows. More...

int	m_output_slice
	The slice we will be using when outputting rows. More...

bool	m_possibly_buffered_rows
	If true, we may have more buffered rows to process that need to be checked for before reading more rows from the source. More...

bool	m_last_input_row_started_new_partition
	Whether the last input row started a new partition, and was tucked away to finalize the previous partition; if so, we need to bring it back for processing before we read more rows. More...

bool	m_eof
	Whether we have seen the last input row. More...

Additional Inherited Members
Protected Member Functions inherited from RowIterator
THD *	thd () const

Detailed Description

BufferingWindowIterator is like WindowIterator, but deals with window functions that need to buffer rows.

If we don't need to buffer rows to evaluate the window functions, execution is simple; see WindowIterator for details. In that case, we can just evaluate the window functions as we go here, similar to the non-windowing flow.

If we do need buffering, though, we buffer the row in Read(). Next, we enter a loop calling process_buffered_windowing_record, and conditionally return the row. That is, if process_buffered_windowing_record was able to complete evaluation of a row (cf. output_row_ready), including its window functions given how much has already been buffered, we return a row, else we read more rows, and postpone evaluation and returning till we have enough rows in the buffer.

When we have read a full partition (or reach EOF), we evaluate any remaining rows. Note that since we have to read one row past the current partition to detect that that previous row was indeed the last row in a partition, we need to re-establish the first row of the next partition when we are done processing the current one. This is because the record will be overwritten (many times) during evaluation of window functions in the current partition.

Usually [1], for window execution we have two or three tmp tables per windowing step involved (although not all are always materialized; they may be just streaming through StreamingIterator):

The input table, corresponding to the parent iterator. Holds (possibly sorted) records ready for windowing, sorted on expressions concatenated from any PARTITION BY and ORDER BY clauses.
The output table, as given by temp_table_param: where we write the evaluated records from this step. Note that we may optimize away this last write if we have no final ORDER BY or DISTINCT.
If we have buffering, the frame buffer, held by Window::m_frame_buffer[_param].

[1] This is not always the case. For the first window, if we have no PARTITION BY or ORDER BY in the window, and there is more than one table in the join, the logical input can consist of more than one table (e.g. a NestedLoopIterator).

The first thing we do in Read() is: We copy fields from IN to OUT (copy_fields), and evaluate non-WF functions (copy_funcs): those functions then read their arguments from IN and store their result into their result_field which is a field in OUT.

Then, let's take SUM(A+FLOOR(B)) OVER (ROWS 2 FOLLOWING) as example. Above, we have stored A and the result of FLOOR in OUT. Now we buffer (save) the row from OUT into the FB: For that, we copy both field A and FLOOR's result_field from OUT to FB; a single copy_fields() call handles both copy jobs. Then we look at the rows we have buffered and may realize that we have enough of the frame to calculate SUM for a certain row (not necessarily the one we just buffered; might be an earlier row, in our example it is the row which is 2 rows above the buffered row). If we do, to calculate WFs, we bring back the frame's rows; which is done by: first copying field A and FLOOR's result_field back from FB to OUT, thus getting in OUT all that SUM needs (A and FLOOR), then giving that OUT row to SUM (SUM will then add the row's value to its total; that happens in copy_funcs). After we have done that on all rows of the frame, we have the values of SUM ready in OUT, we also restore the row which owns this SUM value, in the same way as we restored the frame's rows, and we return from Read() - we're done for this row. However, on the next Read() call, we loop to check if we can calculate one more row with the frame we have, and if so, we do, until we can't calculate any more rows – in which case we're back to just buffering.

Constructor & Destructor Documentation

◆ BufferingWindowIterator()

BufferingWindowIterator::BufferingWindowIterator	(	THD *	thd,
		unique_ptr_destroy_only< RowIterator >	source,
		Temp_table_param *	temp_table_param,
		JOIN *	join,
		int	output_slice
	)

Member Function Documentation

◆ DoInit()

bool BufferingWindowIterator::DoInit ( )

overrideprivatevirtual

Implements RowIterator.

◆ DoRead()

int BufferingWindowIterator::DoRead ( )

overrideprivatevirtual

Implements RowIterator.

◆ EndPSIBatchModeIfStarted()

void BufferingWindowIterator::EndPSIBatchModeIfStarted ( )

inlineoverridevirtual

Ends performance schema batch mode, if started.

It's always safe to call this.

Iterators that have children (composite iterators) must forward the EndPSIBatchModeIfStarted() call to every iterator they could conceivably have called StartPSIBatchMode() on. This ensures that after such a call to on the root iterator, all handlers are out of batch mode.

Reimplemented from RowIterator.

◆ ReadBufferedRow()

int BufferingWindowIterator::ReadBufferedRow ( bool new_partition_or_eof )

private

◆ SetNullRowFlag()

void BufferingWindowIterator::SetNullRowFlag ( bool is_null_row )

inlineoverridevirtual

Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag is true, you'll get only NULLs no matter what is actually in the buffer (typically some old leftover row).

This is used for outer joins, when an iterator hasn't produced any rows and we need to produce a NULL-complemented row. Init() or Read() won't necessarily reset this flag, so if you ever set is to true, make sure to also set it to false when needed.

Note that this can be called without Init() having been called first. For example, NestedLoopIterator can hit EOF immediately on the outer iterator, which means the inner iterator doesn't get an Init() call, but will still forward SetNullRowFlag to both inner and outer iterators.

TODO: We shouldn't need this. See the comments on AggregateIterator for a bit more discussion on abstracting out a row interface.

Implements RowIterator.

◆ StartPSIBatchMode()

void BufferingWindowIterator::StartPSIBatchMode ( )

inlineoverridevirtual

Start performance schema batch mode, if supported (otherwise ignored).

PFS batch mode is a mitigation to reduce the overhead of performance schema, typically applied at the innermost table of the entire join. If you start it before scanning the table and then end it afterwards, the entire set of handler calls will be timed only once, as a group, and the costs will be distributed evenly out. This reduces timer overhead.

If you start PFS batch mode, you must also take care to end it at the end of the scan, one way or the other. Do note that this is true even if the query ends abruptly (LIMIT is reached, or an error happens). The easiest workaround for this is to simply call EndPSIBatchModeIfStarted() on the root iterator at the end of the scan. See the PFSBatchMode class for a useful helper.

The rules for starting batch and ending mode are:

If you are an iterator with exactly one child (FilterIterator etc.), forward any StartPSIBatchMode() calls to it.
If you drive an iterator (read rows from it using a for loop or similar), use PFSBatchMode as described above.
If you have multiple children, ignore the call and do your own handling of batch mode as appropriate. For materialization, #2 would typically apply. For joins, it depends on the join type (e.g., NestedLoopIterator applies batch mode only when scanning the innermost table).

The upshot of this is that when scanning a single table, batch mode will typically be activated for that table (since we call StartPSIBatchMode() on the root iterator, and it will trickle all the way down to the table iterator), but for a join, the call will be ignored and the join iterator will activate batch mode by itself as needed.

Reimplemented from RowIterator.

◆ UnlockRow()

void BufferingWindowIterator::UnlockRow ( )

inlineoverridevirtual

Implements RowIterator.

Member Data Documentation

◆ m_eof

bool BufferingWindowIterator::m_eof

private

Whether we have seen the last input row.

◆ m_input_slice

int BufferingWindowIterator::m_input_slice

private

The slice we will be using when reading rows.

◆ m_join

JOIN* BufferingWindowIterator::m_join

private

The join we are a part of.

◆ m_last_input_row_started_new_partition

bool BufferingWindowIterator::m_last_input_row_started_new_partition

private

Whether the last input row started a new partition, and was tucked away to finalize the previous partition; if so, we need to bring it back for processing before we read more rows.

◆ m_output_slice

int BufferingWindowIterator::m_output_slice

private

The slice we will be using when outputting rows.

◆ m_possibly_buffered_rows

bool BufferingWindowIterator::m_possibly_buffered_rows

private

If true, we may have more buffered rows to process that need to be checked for before reading more rows from the source.

◆ m_source

unique_ptr_destroy_only<RowIterator> const BufferingWindowIterator::m_source

private

The iterator we are reading from.

◆ m_temp_table_param

Temp_table_param* BufferingWindowIterator::m_temp_table_param

private

Parameters for the temporary table we are outputting to.

◆ m_window

Window* BufferingWindowIterator::m_window

private

The window function itself.

The documentation for this class was generated from the following files:

sql/iterators/window_iterators.h
sql/iterators/window_iterators.cc

Public Member Functions

Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ BufferingWindowIterator()

Member Function Documentation

◆ DoInit()

◆ DoRead()

◆ EndPSIBatchModeIfStarted()

◆ ReadBufferedRow()

◆ SetNullRowFlag()

◆ StartPSIBatchMode()

◆ UnlockRow()

Member Data Documentation

◆ m_eof

◆ m_input_slice

◆ m_join

◆ m_last_input_row_started_new_partition

◆ m_output_slice

◆ m_possibly_buffered_rows

◆ m_source

◆ m_temp_table_param

◆ m_window