MySQL 8.3.0
Source Code Documentation
window_iterators.h
Go to the documentation of this file.
1#ifndef SQL_ITERATORS_WINDOW_ITERATORS_H_
2#define SQL_ITERATORS_WINDOW_ITERATORS_H_
3
4/* Copyright (c) 2018, 2023, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is also distributed with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have included with MySQL.
16
17 This program is distributed in the hope that it will be useful,
18 but WITHOUT ANY WARRANTY; without even the implied warranty of
19 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 GNU General Public License, version 2.0, for more details.
21
22 You should have received a copy of the GNU General Public License
23 along with this program; if not, write to the Free Software
24 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25
26#include "my_alloc.h"
28
29class JOIN;
30class THD;
32class Window;
33
34/*
35 WindowIterator is similar to AggregateIterator, but deals with windowed
36 aggregates (i.e., OVER expressions). It deals specifically with aggregates
37 that don't need to buffer rows.
38
39 Window function execution is centered around temporary table materialization;
40 every window corresponds to exactly one materialization (although the
41 “materialization” can often be shortcut to streaming). For every window,
42 we must materialize/evaluate exactly the aggregates that belong to that
43 window, and no others (earlier ones are just copied from the temporary table
44 fields, later ones are ignored). Thus, create_tmp_table() has special logic
45 when materializing a temporary table for a window function; if the
46 Temp_table_param has m_window set (non-nullptr), we ignore all aggregates that
47 don't belong to that window. E.g., assume we have foo() OVER w1, bar() OVER
48 w2, baz() OVER w2, quux() OVER w3, the temporary tables and field lists will
49 look like:
50
51 Temp table | SELECT list
52 foo() bar() baz() |
53 before wnd: | foo() bar() baz()
54 window 1: value ----- ----- | temp_w1.foo bar() baz()
55 window 2: value value value | temp_w2.foo temp_w2.bar temp_w2.baz
56
57 In e.g. step 2, w2.foo is simply copied from w1.foo (through
58 temp_table_param->copy_fields), while w2.bar and w2.baz are evaluated
59 from bar() and baz() (through temp_table_param->copy_func).
60
61 WindowIterator only takes responsibility for resetting the window functions
62 on a window boundary; the rest is handled by correct input ordering (typically
63 through sorting) and delicate ordering of copy_funcs() calls.
64 (BufferingWindowIterator, below, has more intricate logic for feeding rows
65 into the window functions, and only stopping to output new rows whenever
66 process_buffered_windowing_record() signals it is time to do that -- but apart
67 from that, the separation of concerns is much the same.)
68
69 In particular, ordering of copies gets complicated when we have expressions
70 that depend on window functions, or even window functions from multiple
71 windows. Say we have something like foo() OVER w1 + bar() OVER w2.
72 split_sum_funcs() will have made slices for us so that we have separate items
73 for foo() and bar():
74
75 base slice window 1 output window 2 output
76 0: <ref1> + <ref2> + + temp_w2.+
77 1: foo() OVER w1 foo() temp_w1.foo temp_w2.foo
78 2: bar() OVER w2 bar() N/A temp_w2.bar
79
80 We first copy fields and non-WF-related functions into the output table,
81 from the previous slice (e.g., for window 2, we copy temp_w1.foo to
82 temp_w2.foo); these are always safe. Then, we copy/evaluate the window
83 functions themselves (#1 or #2, depending on which window we are evaluating).
84 Finally, we get to the composite item (#0); in order not to evaluate the
85 window functions anew, the references in the add expression must refer to the
86 temporary table fields that we just populated, so we need to be in the
87 _output_ slice. When buffering is active (BufferingWindowIterator), we have
88 more phases to deal with; it would be good to have this documented as well.
89
90 If we are outputting to a temporary table, we take over responsibility
91 for storing the fields from MaterializeIterator, which would otherwise do it.
92 */
93class WindowIterator final : public RowIterator {
94 public:
96 Temp_table_param *temp_table_param, // Includes the window.
97 JOIN *join, int output_slice);
98
99 bool Init() override;
100
101 int Read() override;
102
103 void SetNullRowFlag(bool is_null_row) override {
104 m_source->SetNullRowFlag(is_null_row);
105 }
106
107 void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
108 void EndPSIBatchModeIfStarted() override {
109 m_source->EndPSIBatchModeIfStarted();
110 }
111
112 void UnlockRow() override {
113 // There's nothing we can do here.
114 }
115
116 private:
117 /// The iterator we are reading from.
119
120 /// Parameters for the temporary table we are outputting to.
122
123 /// The window function itself.
125
126 /// The join we are a part of.
128
129 /// The slice we will be using when reading rows.
131
132 /// The slice we will be using when outputting rows.
134};
135
136/**
137 BufferingWindowIterator is like WindowIterator, but deals with window
138 functions that need to buffer rows.
139
140 If we don't need to buffer rows to evaluate the window functions, execution
141 is simple; see WindowIterator for details. In that case, we can just evaluate
142 the window functions as we go here, similar to the non-windowing flow.
143
144 If we do need buffering, though, we buffer the row in Read(). Next, we enter a
145 loop calling process_buffered_windowing_record, and conditionally return
146 the row. That is, if process_buffered_windowing_record was able to complete
147 evaluation of a row (cf. output_row_ready), including its window functions
148 given how much has already been buffered, we return a row, else we read more
149 rows, and postpone evaluation and returning till we have enough rows in the
150 buffer.
151
152 When we have read a full partition (or reach EOF), we evaluate any remaining
153 rows. Note that since we have to read one row past the current partition to
154 detect that that previous row was indeed the last row in a partition, we
155 need to re-establish the first row of the next partition when we are done
156 processing the current one. This is because the record will be overwritten
157 (many times) during evaluation of window functions in the current partition.
158
159 Usually [1], for window execution we have two or three tmp tables per
160 windowing step involved (although not all are always materialized;
161 they may be just streaming through StreamingIterator):
162
163 - The input table, corresponding to the parent iterator. Holds (possibly
164 sorted) records ready for windowing, sorted on expressions concatenated from
165 any PARTITION BY and ORDER BY clauses.
166
167 - The output table, as given by temp_table_param: where we write the evaluated
168 records from this step. Note that we may optimize away this last write if
169 we have no final ORDER BY or DISTINCT.
170
171 - If we have buffering, the frame buffer, held by
172 Window::m_frame_buffer[_param].
173
174 [1] This is not always the case. For the first window, if we have no
175 PARTITION BY or ORDER BY in the window, and there is more than one table
176 in the join, the logical input can consist of more than one table
177 (e.g. a NestedLoopIterator).
178
179 The first thing we do in Read() is:
180 We copy fields from IN to OUT (copy_fields), and evaluate non-WF functions
181 (copy_funcs): those functions then read their arguments from IN and store
182 their result into their result_field which is a field in OUT.
183
184 Then, let's take SUM(A+FLOOR(B)) OVER (ROWS 2 FOLLOWING) as example. Above,
185 we have stored A and the result of FLOOR in OUT. Now we buffer (save) the row
186 from OUT into the FB: For that, we copy both field A and FLOOR's result_field
187 from OUT to FB; a single copy_fields() call handles both copy jobs. Then we
188 look at the rows we have buffered and may realize that we have enough of the
189 frame to calculate SUM for a certain row (not necessarily the one we just
190 buffered; might be an earlier row, in our example it is the row which is 2
191 rows above the buffered row). If we do, to calculate WFs, we bring back the
192 frame's rows; which is done by: first copying field A and FLOOR's result_field
193 back from FB to OUT, thus getting in OUT all that SUM needs (A and FLOOR),
194 then giving that OUT row to SUM (SUM will then add the row's value to its
195 total; that happens in copy_funcs). After we have done that on all rows of the
196 frame, we have the values of SUM ready in OUT, we also restore the row which
197 owns this SUM value, in the same way as we restored the frame's rows, and we
198 return from Read() - we're done for this row. However, on the next Read()
199 call, we loop to check if we can calculate one more row with the frame
200 we have, and if so, we do, until we can't calculate any more rows -- in which
201 case we're back to just buffering.
202 */
204 public:
207 Temp_table_param *temp_table_param, // Includes the window.
208 JOIN *join, int output_slice);
209
210 bool Init() override;
211
212 int Read() override;
213
214 void SetNullRowFlag(bool is_null_row) override {
215 m_source->SetNullRowFlag(is_null_row);
216 }
217
218 void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
219 void EndPSIBatchModeIfStarted() override {
220 m_source->EndPSIBatchModeIfStarted();
221 }
222
223 void UnlockRow() override {
224 // There's nothing we can do here.
225 }
226
227 private:
228 int ReadBufferedRow(bool new_partition_or_eof);
229
230 /// The iterator we are reading from.
232
233 /// Parameters for the temporary table we are outputting to.
235
236 /// The window function itself.
238
239 /// The join we are a part of.
241
242 /// The slice we will be using when reading rows.
244
245 /// The slice we will be using when outputting rows.
247
248 /// If true, we may have more buffered rows to process that need to be
249 /// checked for before reading more rows from the source.
251
252 /// Whether the last input row started a new partition, and was tucked away
253 /// to finalize the previous partition; if so, we need to bring it back
254 /// for processing before we read more rows.
256
257 /// Whether we have seen the last input row.
258 bool m_eof;
259};
260
261#endif // SQL_ITERATORS_WINDOW_ITERATORS_H_
BufferingWindowIterator is like WindowIterator, but deals with window functions that need to buffer r...
Definition: window_iterators.h:203
void UnlockRow() override
Definition: window_iterators.h:223
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:240
bool Init() override
Initialize or reinitialize the iterator.
Definition: window_iterators.cc:1574
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:234
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:243
int Read() override
Read a single row.
Definition: window_iterators.cc:1590
int ReadBufferedRow(bool new_partition_or_eof)
Definition: window_iterators.cc:1686
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:218
bool m_eof
Whether we have seen the last input row.
Definition: window_iterators.h:258
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:246
bool m_possibly_buffered_rows
If true, we may have more buffered rows to process that need to be checked for before reading more ro...
Definition: window_iterators.h:250
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:231
Window * m_window
The window function itself.
Definition: window_iterators.h:237
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:219
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:214
bool m_last_input_row_started_new_partition
Whether the last input row started a new partition, and was tucked away to finalize the previous part...
Definition: window_iterators.h:255
BufferingWindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1562
Definition: sql_optimizer.h:132
A context for reading through a single table using a chosen access method: index read,...
Definition: row_iterator.h:81
THD * thd() const
Definition: row_iterator.h:227
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:35
Object containing parameters used when creating and using temporary tables.
Definition: temp_table_param.h:94
Definition: window_iterators.h:93
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:133
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:103
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:107
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:127
WindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1514
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:118
bool Init() override
Initialize or reinitialize the iterator.
Definition: window_iterators.cc:1527
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:108
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:121
int Read() override
Read a single row.
Definition: window_iterators.cc:1539
void UnlockRow() override
Definition: window_iterators.h:112
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:130
Window * m_window
The window function itself.
Definition: window_iterators.h:124
Represents the (explicit) window of a SQL 2003 section 7.11 <window clause>, or the implicit (inlined...
Definition: window.h:104
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
std::unique_ptr< T, Destroy_only< T > > unique_ptr_destroy_only
std::unique_ptr, but only destroying.
Definition: my_alloc.h:476
std::string join(Container cont, const std::string &delim)
join elements of an container into a string separated by a delimiter.
Definition: string.h:150
repeated Source source
Definition: replication_asynchronous_connection_failover.proto:41