MySQL 8.4.0
Source Code Documentation
window_iterators.h
Go to the documentation of this file.
1#ifndef SQL_ITERATORS_WINDOW_ITERATORS_H_
2#define SQL_ITERATORS_WINDOW_ITERATORS_H_
3
4/* Copyright (c) 2018, 2024, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27#include "my_alloc.h"
29
30class JOIN;
31class THD;
33class Window;
34
35/*
36 WindowIterator is similar to AggregateIterator, but deals with windowed
37 aggregates (i.e., OVER expressions). It deals specifically with aggregates
38 that don't need to buffer rows.
39
40 Window function execution is centered around temporary table materialization;
41 every window corresponds to exactly one materialization (although the
42 “materialization” can often be shortcut to streaming). For every window,
43 we must materialize/evaluate exactly the aggregates that belong to that
44 window, and no others (earlier ones are just copied from the temporary table
45 fields, later ones are ignored). Thus, create_tmp_table() has special logic
46 when materializing a temporary table for a window function; if the
47 Temp_table_param has m_window set (non-nullptr), we ignore all aggregates that
48 don't belong to that window. E.g., assume we have foo() OVER w1, bar() OVER
49 w2, baz() OVER w2, quux() OVER w3, the temporary tables and field lists will
50 look like:
51
52 Temp table | SELECT list
53 foo() bar() baz() |
54 before wnd: | foo() bar() baz()
55 window 1: value ----- ----- | temp_w1.foo bar() baz()
56 window 2: value value value | temp_w2.foo temp_w2.bar temp_w2.baz
57
58 In e.g. step 2, w2.foo is simply copied from w1.foo (through
59 temp_table_param->copy_fields), while w2.bar and w2.baz are evaluated
60 from bar() and baz() (through temp_table_param->copy_func).
61
62 WindowIterator only takes responsibility for resetting the window functions
63 on a window boundary; the rest is handled by correct input ordering (typically
64 through sorting) and delicate ordering of copy_funcs() calls.
65 (BufferingWindowIterator, below, has more intricate logic for feeding rows
66 into the window functions, and only stopping to output new rows whenever
67 process_buffered_windowing_record() signals it is time to do that -- but apart
68 from that, the separation of concerns is much the same.)
69
70 In particular, ordering of copies gets complicated when we have expressions
71 that depend on window functions, or even window functions from multiple
72 windows. Say we have something like foo() OVER w1 + bar() OVER w2.
73 split_sum_funcs() will have made slices for us so that we have separate items
74 for foo() and bar():
75
76 base slice window 1 output window 2 output
77 0: <ref1> + <ref2> + + temp_w2.+
78 1: foo() OVER w1 foo() temp_w1.foo temp_w2.foo
79 2: bar() OVER w2 bar() N/A temp_w2.bar
80
81 We first copy fields and non-WF-related functions into the output table,
82 from the previous slice (e.g., for window 2, we copy temp_w1.foo to
83 temp_w2.foo); these are always safe. Then, we copy/evaluate the window
84 functions themselves (#1 or #2, depending on which window we are evaluating).
85 Finally, we get to the composite item (#0); in order not to evaluate the
86 window functions anew, the references in the add expression must refer to the
87 temporary table fields that we just populated, so we need to be in the
88 _output_ slice. When buffering is active (BufferingWindowIterator), we have
89 more phases to deal with; it would be good to have this documented as well.
90
91 If we are outputting to a temporary table, we take over responsibility
92 for storing the fields from MaterializeIterator, which would otherwise do it.
93 */
94class WindowIterator final : public RowIterator {
95 public:
97 Temp_table_param *temp_table_param, // Includes the window.
98 JOIN *join, int output_slice);
99
100 bool Init() override;
101
102 int Read() override;
103
104 void SetNullRowFlag(bool is_null_row) override {
105 m_source->SetNullRowFlag(is_null_row);
106 }
107
108 void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
109 void EndPSIBatchModeIfStarted() override {
110 m_source->EndPSIBatchModeIfStarted();
111 }
112
113 void UnlockRow() override {
114 // There's nothing we can do here.
115 }
116
117 private:
118 /// The iterator we are reading from.
120
121 /// Parameters for the temporary table we are outputting to.
123
124 /// The window function itself.
126
127 /// The join we are a part of.
129
130 /// The slice we will be using when reading rows.
132
133 /// The slice we will be using when outputting rows.
135};
136
137/**
138 BufferingWindowIterator is like WindowIterator, but deals with window
139 functions that need to buffer rows.
140
141 If we don't need to buffer rows to evaluate the window functions, execution
142 is simple; see WindowIterator for details. In that case, we can just evaluate
143 the window functions as we go here, similar to the non-windowing flow.
144
145 If we do need buffering, though, we buffer the row in Read(). Next, we enter a
146 loop calling process_buffered_windowing_record, and conditionally return
147 the row. That is, if process_buffered_windowing_record was able to complete
148 evaluation of a row (cf. output_row_ready), including its window functions
149 given how much has already been buffered, we return a row, else we read more
150 rows, and postpone evaluation and returning till we have enough rows in the
151 buffer.
152
153 When we have read a full partition (or reach EOF), we evaluate any remaining
154 rows. Note that since we have to read one row past the current partition to
155 detect that that previous row was indeed the last row in a partition, we
156 need to re-establish the first row of the next partition when we are done
157 processing the current one. This is because the record will be overwritten
158 (many times) during evaluation of window functions in the current partition.
159
160 Usually [1], for window execution we have two or three tmp tables per
161 windowing step involved (although not all are always materialized;
162 they may be just streaming through StreamingIterator):
163
164 - The input table, corresponding to the parent iterator. Holds (possibly
165 sorted) records ready for windowing, sorted on expressions concatenated from
166 any PARTITION BY and ORDER BY clauses.
167
168 - The output table, as given by temp_table_param: where we write the evaluated
169 records from this step. Note that we may optimize away this last write if
170 we have no final ORDER BY or DISTINCT.
171
172 - If we have buffering, the frame buffer, held by
173 Window::m_frame_buffer[_param].
174
175 [1] This is not always the case. For the first window, if we have no
176 PARTITION BY or ORDER BY in the window, and there is more than one table
177 in the join, the logical input can consist of more than one table
178 (e.g. a NestedLoopIterator).
179
180 The first thing we do in Read() is:
181 We copy fields from IN to OUT (copy_fields), and evaluate non-WF functions
182 (copy_funcs): those functions then read their arguments from IN and store
183 their result into their result_field which is a field in OUT.
184
185 Then, let's take SUM(A+FLOOR(B)) OVER (ROWS 2 FOLLOWING) as example. Above,
186 we have stored A and the result of FLOOR in OUT. Now we buffer (save) the row
187 from OUT into the FB: For that, we copy both field A and FLOOR's result_field
188 from OUT to FB; a single copy_fields() call handles both copy jobs. Then we
189 look at the rows we have buffered and may realize that we have enough of the
190 frame to calculate SUM for a certain row (not necessarily the one we just
191 buffered; might be an earlier row, in our example it is the row which is 2
192 rows above the buffered row). If we do, to calculate WFs, we bring back the
193 frame's rows; which is done by: first copying field A and FLOOR's result_field
194 back from FB to OUT, thus getting in OUT all that SUM needs (A and FLOOR),
195 then giving that OUT row to SUM (SUM will then add the row's value to its
196 total; that happens in copy_funcs). After we have done that on all rows of the
197 frame, we have the values of SUM ready in OUT, we also restore the row which
198 owns this SUM value, in the same way as we restored the frame's rows, and we
199 return from Read() - we're done for this row. However, on the next Read()
200 call, we loop to check if we can calculate one more row with the frame
201 we have, and if so, we do, until we can't calculate any more rows -- in which
202 case we're back to just buffering.
203 */
205 public:
208 Temp_table_param *temp_table_param, // Includes the window.
209 JOIN *join, int output_slice);
210
211 bool Init() override;
212
213 int Read() override;
214
215 void SetNullRowFlag(bool is_null_row) override {
216 m_source->SetNullRowFlag(is_null_row);
217 }
218
219 void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
220 void EndPSIBatchModeIfStarted() override {
221 m_source->EndPSIBatchModeIfStarted();
222 }
223
224 void UnlockRow() override {
225 // There's nothing we can do here.
226 }
227
228 private:
229 int ReadBufferedRow(bool new_partition_or_eof);
230
231 /// The iterator we are reading from.
233
234 /// Parameters for the temporary table we are outputting to.
236
237 /// The window function itself.
239
240 /// The join we are a part of.
242
243 /// The slice we will be using when reading rows.
245
246 /// The slice we will be using when outputting rows.
248
249 /// If true, we may have more buffered rows to process that need to be
250 /// checked for before reading more rows from the source.
252
253 /// Whether the last input row started a new partition, and was tucked away
254 /// to finalize the previous partition; if so, we need to bring it back
255 /// for processing before we read more rows.
257
258 /// Whether we have seen the last input row.
259 bool m_eof;
260};
261
262#endif // SQL_ITERATORS_WINDOW_ITERATORS_H_
BufferingWindowIterator is like WindowIterator, but deals with window functions that need to buffer r...
Definition: window_iterators.h:204
void UnlockRow() override
Definition: window_iterators.h:224
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:241
bool Init() override
Initialize or reinitialize the iterator.
Definition: window_iterators.cc:1575
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:235
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:244
int Read() override
Read a single row.
Definition: window_iterators.cc:1591
int ReadBufferedRow(bool new_partition_or_eof)
Definition: window_iterators.cc:1687
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:219
bool m_eof
Whether we have seen the last input row.
Definition: window_iterators.h:259
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:247
bool m_possibly_buffered_rows
If true, we may have more buffered rows to process that need to be checked for before reading more ro...
Definition: window_iterators.h:251
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:232
Window * m_window
The window function itself.
Definition: window_iterators.h:238
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:220
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:215
bool m_last_input_row_started_new_partition
Whether the last input row started a new partition, and was tucked away to finalize the previous part...
Definition: window_iterators.h:256
BufferingWindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1563
Definition: sql_optimizer.h:133
A context for reading through a single table using a chosen access method: index read,...
Definition: row_iterator.h:82
THD * thd() const
Definition: row_iterator.h:228
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
Object containing parameters used when creating and using temporary tables.
Definition: temp_table_param.h:95
Definition: window_iterators.h:94
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:134
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:104
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:108
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:128
WindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1515
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:119
bool Init() override
Initialize or reinitialize the iterator.
Definition: window_iterators.cc:1528
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:109
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:122
int Read() override
Read a single row.
Definition: window_iterators.cc:1540
void UnlockRow() override
Definition: window_iterators.h:113
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:131
Window * m_window
The window function itself.
Definition: window_iterators.h:125
Represents the (explicit) window of a SQL 2003 section 7.11 <window clause>, or the implicit (inlined...
Definition: window.h:110
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
std::unique_ptr< T, Destroy_only< T > > unique_ptr_destroy_only
std::unique_ptr, but only destroying.
Definition: my_alloc.h:477
std::string join(Container cont, const std::string &delim)
join elements of an container into a string separated by a delimiter.
Definition: string.h:151
repeated Source source
Definition: replication_asynchronous_connection_failover.proto:42