MySQL 9.1.0
Source Code Documentation
anonymous_namespace{window_iterators.cc} Namespace Reference

Functions

void SwitchSlice (JOIN *join, int slice_num)
 
void reset_wf_states (Func_ptr_array *func_ptr, bool framing)
 Minion for reset_framing_wf_states and reset_non_framing_wf_state, q.v. More...
 
void reset_framing_wf_states (Func_ptr_array *func_ptr)
 Walk the function calls and reset any framing window function's window state. More...
 
void reset_non_framing_wf_state (Func_ptr_array *func_ptr)
 Walk the function calls and reset any non-framing window function's window state. More...
 
bool buffer_record_somewhere (THD *thd, Window *w, int64 rowno)
 Save a window frame buffer to frame buffer temporary table. More...
 
bool buffer_windowing_record (THD *thd, Temp_table_param *param, bool *new_partition)
 If we cannot evaluate all window functions for a window on the fly, buffer the current row for later processing by process_buffered_windowing_record. More...
 
bool read_frame_buffer_row (int64 rowno, Window *w, bool for_nth_value)
 Read row rowno from frame buffer tmp file using cached row positions to minimize positioning work. More...
 
void dbug_allow_write_all_columns (Temp_table_param *param, std::map< TABLE *, my_bitmap_map * > &map)
 
void dbug_restore_all_columns (std::map< TABLE *, my_bitmap_map * > &map)
 
bool bring_back_frame_row (THD *thd, Window *w, int64 rowno, Window_retrieve_cached_row_reason reason, int fno=0)
 Bring back buffered data to the record of qep_tab-1 [1], and optionally execute copy_funcs() to the OUT table. More...
 
bool process_wfs_needing_partition_cardinality (THD *thd, Temp_table_param *param, const Window::st_nth &have_nth_value, const Window::st_lead_lag &have_lead_lag, const int64 current_row, Window *w, Window_retrieve_cached_row_reason current_row_reason)
 Process window functions that need partition cardinality. More...
 
bool process_buffered_windowing_record (THD *thd, Temp_table_param *param, const bool new_partition_or_eof, bool *output_row_ready)
 While there are more unprocessed rows ready to process given the current partition/frame state, process such buffered rows by evaluating/aggregating the window functions defined over this window on the current frame, moving the frame if required. More...
 

Function Documentation

◆ bring_back_frame_row()

bool anonymous_namespace{window_iterators.cc}::bring_back_frame_row ( THD thd,
Window w,
int64  rowno,
Window_retrieve_cached_row_reason  reason,
int  fno = 0 
)

Bring back buffered data to the record of qep_tab-1 [1], and optionally execute copy_funcs() to the OUT table.

[1] This is not always the case. For the first window, if we have no PARTITION BY or ORDER BY in the window, and there is more than one table in the join, the logical input can consist of more than one table (qep_tab-1 .. qep_tab-n), so the record accordingly.

This method works by temporarily reversing the "normal" direction of the field copying.

Also make a note of the position of the record we retrieved in the window's m_frame_buffer_positions to be able to optimize succeeding retrievals.

Parameters
thdThe current thread
wThe current window
rownoThe row number (in the partition) to set up
reasonWhat kind of row to retrieve
fnoUsed with NTH_VALUE and LEAD/LAG to specify which window function's position cache to use, i.e. what index of m_frame_buffer_positions to update. For the second LEAD/LAG window function in a query, the index would be REA_MISC_POSITIONS (reason) + <no of NTH functions> + 2.
Returns
true on error

◆ buffer_record_somewhere()

bool anonymous_namespace{window_iterators.cc}::buffer_record_somewhere ( THD thd,
Window w,
int64  rowno 
)

Save a window frame buffer to frame buffer temporary table.

Parameters
thdThe current thread
wThe current window
rownoThe rowno in the current partition (1-based)

◆ buffer_windowing_record()

bool anonymous_namespace{window_iterators.cc}::buffer_windowing_record ( THD thd,
Temp_table_param param,
bool *  new_partition 
)

If we cannot evaluate all window functions for a window on the fly, buffer the current row for later processing by process_buffered_windowing_record.

Parameters
thdCurrent thread
paramThe temporary table parameter
[in,out]new_partitionIf input is not nullptr: sets the bool pointed to to true if a new partition was found and there was a previous partition; if so the buffering of the first row in new partition isn't done and must be repeated later: we save away the row as rowno FBC_FIRST_IN_NEXT_PARTITION, then fetch it back later, cf. end_write_wf. If input is nullptr, this is the "later" call to buffer the first row of the new partition: buffer the row.
Returns
true if error.

◆ dbug_allow_write_all_columns()

void anonymous_namespace{window_iterators.cc}::dbug_allow_write_all_columns ( Temp_table_param param,
std::map< TABLE *, my_bitmap_map * > &  map 
)
inline

◆ dbug_restore_all_columns()

void anonymous_namespace{window_iterators.cc}::dbug_restore_all_columns ( std::map< TABLE *, my_bitmap_map * > &  map)
inline

◆ process_buffered_windowing_record()

bool anonymous_namespace{window_iterators.cc}::process_buffered_windowing_record ( THD thd,
Temp_table_param param,
const bool  new_partition_or_eof,
bool *  output_row_ready 
)

While there are more unprocessed rows ready to process given the current partition/frame state, process such buffered rows by evaluating/aggregating the window functions defined over this window on the current frame, moving the frame if required.

This method contains the main execution time logic of the evaluation window functions if we need buffering for one or more of the window functions defined on the window.

Moving (sliding) frames can be executed using a naive or optimized strategy for aggregate window functions, like SUM or AVG (but not MAX, or MIN). In the naive approach, for each row considered for processing from the buffer, we visit all the rows defined in the frame for that row, essentially leading to N*M complexity, where N is the number of rows in the result set, and M is the number for rows in the frame. This can be slow for large frames, obviously, so we can choose an optimized evaluation strategy using inversion. This means that when rows leave the frame as we move it forward, we re-use the previous aggregate state, but compute the inverse function to eliminate the contribution to the aggregate by the row(s) leaving the frame, and then use the normal aggregate function to add the contribution of the rows moving into the frame. The present method contains code paths for both strategies.

For integral data types, this is safe in the sense that the result will be the same if no overflow occurs during normal evaluation. For floating numbers, optimizing in this way may lead to different results, so it is not done by default, cf the session variable "windowing_use_high_precision".

Since the evaluation strategy is chosen based on the "most difficult" window function defined on the window, we must also be able to evaluate non-aggregates like ROW_NUMBER, NTILE, FIRST_VALUE in the code path of the optimized aggregates, so there is redundant code for those in the naive and optimized code paths. Note that NTILE forms a class of its own of the non-aggregates: it needs two passes over the partition's rows since the cardinality is needed to compute it. Furthermore, FIRST_VALUE and LAST_VALUE heed the frames, but they are not aggregates.

The is a special optimized code path for static aggregates: when the window frame is the default, e.g. the entire partition and there is no ORDER BY specified, the value of the framing window functions, i.e. SUM, AVG, FIRST_VALUE, LAST_VALUE can be evaluated once and for all and saved when we visit and evaluate the first row of the partition. For later rows we restore the aggregate values and just fill in the other fields and evaluate non-framing window functions for the row.

The code paths both for naive execution and optimized execution differ depending on whether we have ROW or RANGE boundaries in a explicit frame.

A word on BLOBs. Below we make copies of rows into the frame buffer. This is a temporary table, so BLOBs get copied in the normal way.

Sometimes we save records containing already computed framing window functions away into memory only: is the lifetime of the referenced BLOBs long enough? We have two cases:

BLOB results from wfs: Any BLOB results will reside in the copies in result fields of the Items ready for the out file, so they no longer need any BLOB memory read from the frame buffer tmp file.

BLOB fields not evaluated by wfs: Any other BLOB field will be copied as well, and would not have life-time past the next read from the frame buffer, but they are never used since we fill in the fields from the current row after evaluation of the window functions, so we don't need to make special copies of such BLOBs. This can be (and was) tested by shredding any BLOBs deallocated by InnoDB at the next read.

We also save away in memory the next record of the next partition while processing the current partition. Any blob there will have its storage from the read of the input file, but we won't be touching that for reading again until after we start processing the next partition and save the saved away next partition row to the frame buffer.

Note that the logic of this function is centered around the window, not around the window function. It is about putting rows in a partition, in a frame, in a set of peers, and passing this information to all window functions attached to this window; each function looks at the partition, frame, or peer set in its own particular way (for example RANK looks at the partition, SUM looks at the frame).

Parameters
thdCurrent thread
paramCurrent temporary table
new_partition_or_eofTrue if (we are about to start a new partition and there was a previous partition) or eof
[out]output_row_readyTrue if there is a row record ready to write to the out table
Returns
true if error

The current window

The frame

This is the row we are currently considering for processing and getting ready for output, cf. output_row_ready.

This is the row number of the last row we have buffered so far.

If true, use code path for static aggregates

If true, use code path for ROW bounds with optimized strategy

If true, use code path for RANGE bounds with optimized strategy

We need to evaluate FIRST_VALUE, or optimized MIN/MAX

We need to evaluate LAST_VALUE, or optimized MIN/MAX

We need to evaluate NTH_VALUE

We need to evaluate LEAD/LAG rows

True if an inversion optimization strategy is used. For common code paths.

RANGE was specified as the bounds unit for the frame

UNBOUNDED FOLLOWING was specified for the frame

Row_number of the first row in the frame. Invariant: lower_limit >= 1 after initialization.

Row_number of the logically last row to be computed in the frame, may be higher than the number of rows in the partition. The actual highest row number is computed later, see upper below.

needs peerset of current row to evaluate a wf for the current row.

needs the last peer of the current row within a frame.

For optimized strategy we want to save away the previous aggregate result and reuse in later round by inversion. This keeps track of whether we managed to compute results for this current row (result are "primed"), so we can use inversion in later rows. Cf Window::m_aggregates_primed.

Possible adjustment of the logical upper_limit: no rows exist beyond last_rowno_in_cache.

< iterates over rows in a frame

< RANGE: # of visited rows seen before the frame

Whether we know the start of the frame yet. The a priori setting is inherited from the previous current row.

◆ process_wfs_needing_partition_cardinality()

bool anonymous_namespace{window_iterators.cc}::process_wfs_needing_partition_cardinality ( THD thd,
Temp_table_param param,
const Window::st_nth have_nth_value,
const Window::st_lead_lag have_lead_lag,
const int64  current_row,
Window w,
Window_retrieve_cached_row_reason  current_row_reason 
)

Process window functions that need partition cardinality.

◆ read_frame_buffer_row()

bool anonymous_namespace{window_iterators.cc}::read_frame_buffer_row ( int64  rowno,
Window w,
bool  for_nth_value 
)

Read row rowno from frame buffer tmp file using cached row positions to minimize positioning work.

◆ reset_framing_wf_states()

void anonymous_namespace{window_iterators.cc}::reset_framing_wf_states ( Func_ptr_array func_ptr)
inline

Walk the function calls and reset any framing window function's window state.

Parameters
func_ptran array of function call items which might represent or contain window function calls

◆ reset_non_framing_wf_state()

void anonymous_namespace{window_iterators.cc}::reset_non_framing_wf_state ( Func_ptr_array func_ptr)
inline

Walk the function calls and reset any non-framing window function's window state.

Parameters
func_ptran array of function call items which might represent or contain window function calls

◆ reset_wf_states()

void anonymous_namespace{window_iterators.cc}::reset_wf_states ( Func_ptr_array func_ptr,
bool  framing 
)
inline

Minion for reset_framing_wf_states and reset_non_framing_wf_state, q.v.

Parameters
func_ptrthe set of functions
framingtrue if we want to reset for framing window functions

◆ SwitchSlice()

void anonymous_namespace{window_iterators.cc}::SwitchSlice ( JOIN join,
int  slice_num 
)