MySQL 9.5.0
Source Code Documentation
window_iterators.h
Go to the documentation of this file.
1#ifndef SQL_ITERATORS_WINDOW_ITERATORS_H_
2#define SQL_ITERATORS_WINDOW_ITERATORS_H_
3
4/* Copyright (c) 2018, 2025, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27#include "my_alloc.h"
29
30class JOIN;
31class THD;
33class Window;
34
35/*
36 WindowIterator is similar to AggregateIterator, but deals with windowed
37 aggregates (i.e., OVER expressions). It deals specifically with aggregates
38 that don't need to buffer rows.
39
40 Window function execution is centered around temporary table materialization;
41 every window corresponds to exactly one materialization (although the
42 “materialization” can often be shortcut to streaming). For every window,
43 we must materialize/evaluate exactly the aggregates that belong to that
44 window, and no others (earlier ones are just copied from the temporary table
45 fields, later ones are ignored). Thus, create_tmp_table() has special logic
46 when materializing a temporary table for a window function; if the
47 Temp_table_param has m_window set (non-nullptr), we ignore all aggregates that
48 don't belong to that window. E.g., assume we have foo() OVER w1, bar() OVER
49 w2, baz() OVER w2, quux() OVER w3, the temporary tables and field lists will
50 look like:
51
52 Temp table | SELECT list
53 foo() bar() baz() |
54 before wnd: | foo() bar() baz()
55 window 1: value ----- ----- | temp_w1.foo bar() baz()
56 window 2: value value value | temp_w2.foo temp_w2.bar temp_w2.baz
57
58 In e.g. step 2, w2.foo is simply copied from w1.foo (through
59 temp_table_param->copy_fields), while w2.bar and w2.baz are evaluated
60 from bar() and baz() (through temp_table_param->copy_func).
61
62 WindowIterator only takes responsibility for resetting the window functions
63 on a window boundary; the rest is handled by correct input ordering (typically
64 through sorting) and delicate ordering of copy_funcs() calls.
65 (BufferingWindowIterator, below, has more intricate logic for feeding rows
66 into the window functions, and only stopping to output new rows whenever
67 process_buffered_windowing_record() signals it is time to do that -- but apart
68 from that, the separation of concerns is much the same.)
69
70 In particular, ordering of copies gets complicated when we have expressions
71 that depend on window functions, or even window functions from multiple
72 windows. Say we have something like foo() OVER w1 + bar() OVER w2.
73 split_sum_funcs() will have made slices for us so that we have separate items
74 for foo() and bar():
75
76 base slice window 1 output window 2 output
77 0: <ref1> + <ref2> + + temp_w2.+
78 1: foo() OVER w1 foo() temp_w1.foo temp_w2.foo
79 2: bar() OVER w2 bar() N/A temp_w2.bar
80
81 We first copy fields and non-WF-related functions into the output table,
82 from the previous slice (e.g., for window 2, we copy temp_w1.foo to
83 temp_w2.foo); these are always safe. Then, we copy/evaluate the window
84 functions themselves (#1 or #2, depending on which window we are evaluating).
85 Finally, we get to the composite item (#0); in order not to evaluate the
86 window functions anew, the references in the add expression must refer to the
87 temporary table fields that we just populated, so we need to be in the
88 _output_ slice. When buffering is active (BufferingWindowIterator), we have
89 more phases to deal with; it would be good to have this documented as well.
90
91 If we are outputting to a temporary table, we take over responsibility
92 for storing the fields from MaterializeIterator, which would otherwise do it.
93 */
94class WindowIterator final : public RowIterator {
95 public:
97 Temp_table_param *temp_table_param, // Includes the window.
98 JOIN *join, int output_slice);
99
100 void SetNullRowFlag(bool is_null_row) override {
101 m_source->SetNullRowFlag(is_null_row);
102 }
103
104 void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
105 void EndPSIBatchModeIfStarted() override {
106 m_source->EndPSIBatchModeIfStarted();
107 }
108
109 void UnlockRow() override {
110 // There's nothing we can do here.
111 }
112
113 private:
114 bool DoInit() override;
115 int DoRead() override;
116
117 /// The iterator we are reading from.
119
120 /// Parameters for the temporary table we are outputting to.
122
123 /// The window function itself.
125
126 /// The join we are a part of.
128
129 /// The slice we will be using when reading rows.
131
132 /// The slice we will be using when outputting rows.
134};
135
136/**
137 BufferingWindowIterator is like WindowIterator, but deals with window
138 functions that need to buffer rows.
139
140 If we don't need to buffer rows to evaluate the window functions, execution
141 is simple; see WindowIterator for details. In that case, we can just evaluate
142 the window functions as we go here, similar to the non-windowing flow.
143
144 If we do need buffering, though, we buffer the row in Read(). Next, we enter a
145 loop calling process_buffered_windowing_record, and conditionally return
146 the row. That is, if process_buffered_windowing_record was able to complete
147 evaluation of a row (cf. output_row_ready), including its window functions
148 given how much has already been buffered, we return a row, else we read more
149 rows, and postpone evaluation and returning till we have enough rows in the
150 buffer.
151
152 When we have read a full partition (or reach EOF), we evaluate any remaining
153 rows. Note that since we have to read one row past the current partition to
154 detect that that previous row was indeed the last row in a partition, we
155 need to re-establish the first row of the next partition when we are done
156 processing the current one. This is because the record will be overwritten
157 (many times) during evaluation of window functions in the current partition.
158
159 Usually [1], for window execution we have two or three tmp tables per
160 windowing step involved (although not all are always materialized;
161 they may be just streaming through StreamingIterator):
162
163 - The input table, corresponding to the parent iterator. Holds (possibly
164 sorted) records ready for windowing, sorted on expressions concatenated from
165 any PARTITION BY and ORDER BY clauses.
166
167 - The output table, as given by temp_table_param: where we write the evaluated
168 records from this step. Note that we may optimize away this last write if
169 we have no final ORDER BY or DISTINCT.
170
171 - If we have buffering, the frame buffer, held by
172 Window::m_frame_buffer[_param].
173
174 [1] This is not always the case. For the first window, if we have no
175 PARTITION BY or ORDER BY in the window, and there is more than one table
176 in the join, the logical input can consist of more than one table
177 (e.g. a NestedLoopIterator).
178
179 The first thing we do in Read() is:
180 We copy fields from IN to OUT (copy_fields), and evaluate non-WF functions
181 (copy_funcs): those functions then read their arguments from IN and store
182 their result into their result_field which is a field in OUT.
183
184 Then, let's take SUM(A+FLOOR(B)) OVER (ROWS 2 FOLLOWING) as example. Above,
185 we have stored A and the result of FLOOR in OUT. Now we buffer (save) the row
186 from OUT into the FB: For that, we copy both field A and FLOOR's result_field
187 from OUT to FB; a single copy_fields() call handles both copy jobs. Then we
188 look at the rows we have buffered and may realize that we have enough of the
189 frame to calculate SUM for a certain row (not necessarily the one we just
190 buffered; might be an earlier row, in our example it is the row which is 2
191 rows above the buffered row). If we do, to calculate WFs, we bring back the
192 frame's rows; which is done by: first copying field A and FLOOR's result_field
193 back from FB to OUT, thus getting in OUT all that SUM needs (A and FLOOR),
194 then giving that OUT row to SUM (SUM will then add the row's value to its
195 total; that happens in copy_funcs). After we have done that on all rows of the
196 frame, we have the values of SUM ready in OUT, we also restore the row which
197 owns this SUM value, in the same way as we restored the frame's rows, and we
198 return from Read() - we're done for this row. However, on the next Read()
199 call, we loop to check if we can calculate one more row with the frame
200 we have, and if so, we do, until we can't calculate any more rows -- in which
201 case we're back to just buffering.
202 */
204 public:
207 Temp_table_param *temp_table_param, // Includes the window.
208 JOIN *join, int output_slice);
209
210 void SetNullRowFlag(bool is_null_row) override {
211 m_source->SetNullRowFlag(is_null_row);
212 }
213
214 void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
215 void EndPSIBatchModeIfStarted() override {
216 m_source->EndPSIBatchModeIfStarted();
217 }
218
219 void UnlockRow() override {
220 // There's nothing we can do here.
221 }
222
223 private:
224 bool DoInit() override;
225 int DoRead() override;
226 int ReadBufferedRow(bool new_partition_or_eof);
227
228 /// The iterator we are reading from.
230
231 /// Parameters for the temporary table we are outputting to.
233
234 /// The window function itself.
236
237 /// The join we are a part of.
239
240 /// The slice we will be using when reading rows.
242
243 /// The slice we will be using when outputting rows.
245
246 /// If true, we may have more buffered rows to process that need to be
247 /// checked for before reading more rows from the source.
249
250 /// Whether the last input row started a new partition, and was tucked away
251 /// to finalize the previous partition; if so, we need to bring it back
252 /// for processing before we read more rows.
254
255 /// Whether we have seen the last input row.
256 bool m_eof;
257};
258
259#endif // SQL_ITERATORS_WINDOW_ITERATORS_H_
BufferingWindowIterator is like WindowIterator, but deals with window functions that need to buffer r...
Definition: window_iterators.h:203
void UnlockRow() override
Definition: window_iterators.h:219
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:238
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:232
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:241
int DoRead() override
Definition: window_iterators.cc:1610
bool DoInit() override
Definition: window_iterators.cc:1594
int ReadBufferedRow(bool new_partition_or_eof)
Definition: window_iterators.cc:1706
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:214
bool m_eof
Whether we have seen the last input row.
Definition: window_iterators.h:256
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:244
bool m_possibly_buffered_rows
If true, we may have more buffered rows to process that need to be checked for before reading more ro...
Definition: window_iterators.h:248
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:229
Window * m_window
The window function itself.
Definition: window_iterators.h:235
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:215
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:210
bool m_last_input_row_started_new_partition
Whether the last input row started a new partition, and was tucked away to finalize the previous part...
Definition: window_iterators.h:253
BufferingWindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1582
Definition: sql_optimizer.h:133
A context for reading through a single table using a chosen access method: index read,...
Definition: row_iterator.h:82
THD * thd() const
Definition: row_iterator.h:255
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
Object containing parameters used when creating and using temporary tables.
Definition: temp_table_param.h:97
Definition: window_iterators.h:94
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:133
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:100
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:104
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:127
WindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1534
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:118
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:105
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:121
int DoRead() override
Definition: window_iterators.cc:1559
bool DoInit() override
Definition: window_iterators.cc:1547
void UnlockRow() override
Definition: window_iterators.h:109
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:130
Window * m_window
The window function itself.
Definition: window_iterators.h:124
Represents the (explicit) window of a SQL 2003 section 7.11 <window clause>, or the implicit (inlined...
Definition: window.h:110
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
std::unique_ptr< T, Destroy_only< T > > unique_ptr_destroy_only
std::unique_ptr, but only destroying.
Definition: my_alloc.h:480
std::string join(const detail::range auto &rng, std::string_view delim)
join elements of a range into a string separated by a delimiter.
Definition: string.h:74
repeated Source source
Definition: replication_asynchronous_connection_failover.proto:42