MySQL  8.0.27
Source Code Documentation
window_iterators.h
Go to the documentation of this file.
1 #ifndef SQL_WINDOW_ITERATORS_INCLUDED
2 #define SQL_WINDOW_ITERATORS_INCLUDED
3 
4 /* Copyright (c) 2018, 2021, Oracle and/or its affiliates.
5 
6  This program is free software; you can redistribute it and/or modify
7  it under the terms of the GNU General Public License, version 2.0,
8  as published by the Free Software Foundation.
9 
10  This program is also distributed with certain software (including
11  but not limited to OpenSSL) that is licensed under separate terms,
12  as designated in a particular file or component or in included license
13  documentation. The authors of MySQL hereby grant you an additional
14  permission to link the program and your derivative works with the
15  separately licensed software that they have included with MySQL.
16 
17  This program is distributed in the hope that it will be useful,
18  but WITHOUT ANY WARRANTY; without even the implied warranty of
19  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20  GNU General Public License, version 2.0, for more details.
21 
22  You should have received a copy of the GNU General Public License
23  along with this program; if not, write to the Free Software
24  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25 
26 #include "my_alloc.h"
27 #include "sql/row_iterator.h"
28 
29 class JOIN;
30 class THD;
31 class Temp_table_param;
32 class Window;
33 
34 /*
35  WindowIterator is similar to AggregateIterator, but deals with windowed
36  aggregates (i.e., OVER expressions). It deals specifically with aggregates
37  that don't need to buffer rows.
38 
39  Window function execution is centered around temporary table materialization;
40  every window corresponds to exactly one materialization (although the
41  “materialization” can often be shortcut to streaming). For every window,
42  we must materialize/evaluate exactly the aggregates that belong to that
43  window, and no others (earlier ones are just copied from the temporary table
44  fields, later ones are ignored). Thus, create_tmp_table() has special logic
45  when materializing a temporary table for a window functon; if the
46  Temp_table_param has m_window set (non-nullptr), we ignore all aggregates that
47  don't belong to that window. E.g., assume we have foo() OVER w1, bar() OVER
48  w2, baz() OVER w2, quux() OVER w3, the temporary tables and field lists will
49  look like:
50 
51  Temp table | SELECT list
52  foo() bar() baz() |
53  before wnd: | foo() bar() baz()
54  window 1: value ----- ----- | temp_w1.foo bar() baz()
55  window 2: value value value | temp_w2.foo temp_w2.bar temp_w2.baz
56 
57  In e.g. step 2, w2.foo is simply copied from w1.foo (through
58  temp_table_param->copy_fields), while w2.bar and w2.baz are evaluated
59  from bar() and baz() (through temp_table_param->copy_func).
60 
61  WindowIterator only takes responsibility for resetting the window functions
62  on a window boundary; the rest is handled by correct input ordering (typically
63  through sorting) and delicate ordering of copy_funcs() calls.
64  (BufferingWindowIterator, below, has more intricate logic for feeding rows
65  into the window functions, and only stopping to output new rows whenever
66  process_buffered_windowing_record() signals it is time to do that -- but apart
67  from that, the separation of concerns is much the same.)
68 
69  In particular, ordering of copies gets complicated when we have expressions
70  that depend on window functions, or even window functions from multiple
71  windows. Say we have something like foo() OVER w1 + bar() OVER w2.
72  split_sum_funcs() will have made slices for us so that we have separate items
73  for foo() and bar():
74 
75  base slice window 1 output window 2 output
76  0: <ref1> + <ref2> + + temp_w2.+
77  1: foo() OVER w1 foo() temp_w1.foo temp_w2.foo
78  2: bar() OVER w2 bar() N/A temp_w2.bar
79 
80  We first copy fields and non-WF-related functions into the output table,
81  from the previous slice (e.g., for window 2, we copy temp_w1.foo to
82  temp_w2.foo); these are always safe. Then, we copy/evaluate the window
83  functions themselves (#1 or #2, depending on which window we are evaluating).
84  Finally, we get to the composite item (#0); in order not to evaluate the
85  window functions anew, the references in the add expression must refer to the
86  temporary table fields that we just populated, so we need to be in the
87  _output_ slice. When buffering is active (BufferingWindowIterator), we have
88  more phases to deal with; it would be good to have this documented as well.
89 
90  If we are outputting to a temporary table, we take over responsibility
91  for storing the fields from MaterializeIterator, which would otherwise do it.
92  */
93 class WindowIterator final : public RowIterator {
94  public:
96  Temp_table_param *temp_table_param, // Includes the window.
97  JOIN *join, int output_slice);
98 
99  bool Init() override;
100 
101  int Read() override;
102 
103  void SetNullRowFlag(bool is_null_row) override {
104  m_source->SetNullRowFlag(is_null_row);
105  }
106 
107  void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
108  void EndPSIBatchModeIfStarted() override {
109  m_source->EndPSIBatchModeIfStarted();
110  }
111 
112  void UnlockRow() override {
113  // There's nothing we can do here.
114  }
115 
116  private:
117  /// The iterator we are reading from.
119 
120  /// Parameters for the temporary table we are outputting to.
122 
123  /// The window function itself.
125 
126  /// The join we are a part of.
128 
129  /// The slice we will be using when reading rows.
131 
132  /// The slice we will be using when outputting rows.
134 };
135 
136 /**
137  BufferingWindowIterator is like WindowIterator, but deals with window
138  functions that need to buffer rows.
139 
140  If we don't need to buffer rows to evaluate the window functions, execution
141  is simple; see WindowIterator for details. In that case, we can just evaluate
142  the window functions as we go here, similar to the non-windowing flow.
143 
144  If we do need buffering, though, we buffer the row in Read(). Next, we enter a
145  loop calling process_buffered_windowing_record, and conditionally return
146  the row. That is, if process_buffered_windowing_record was able to complete
147  evaluation of a row (cf. output_row_ready), including its window functions
148  given how much has already been buffered, we return a row, else we read more
149  rows, and postpone evaluation and returning till we have enough rows in the
150  buffer.
151 
152  When we have read a full partition (or reach EOF), we evaluate any remaining
153  rows. Note that since we have to read one row past the current partition to
154  detect that that previous row was indeed the last row in a partition, we
155  need to re-establish the first row of the next partition when we are done
156  processing the current one. This is because the record will be overwritten
157  (many times) during evaluation of window functions in the current partition.
158 
159  Usually [1], for window execution we have two or three tmp tables per
160  windowing step involved (although not all are always materialized;
161  they may be just streaming through StreamingIterator):
162 
163  - The input table, corresponding to the parent iterator. Holds (possibly
164  sorted) records ready for windowing, sorted on expressions concatenated from
165  any PARTITION BY and ORDER BY clauses.
166 
167  - The output table, as given by temp_table_param: where we write the evaluated
168  records from this step. Note that we may optimize away this last write if
169  we have no final ORDER BY or DISTINCT.
170 
171  - If we have buffering, the frame buffer, held by
172  Window::m_frame_buffer[_param].
173 
174  [1] This is not always the case. For the first window, if we have no
175  PARTITION BY or ORDER BY in the window, and there is more than one table
176  in the join, the logical input can consist of more than one table
177  (e.g. a NestedLoopIterator).
178 
179  The first thing we do in Read() is:
180  We copy fields from IN to OUT (copy_fields), and evaluate non-WF functions
181  (copy_funcs): those functions then read their arguments from IN and store
182  their result into their result_field which is a field in OUT.
183 
184  Then, let's take SUM(A+FLOOR(B)) OVER (ROWS 2 FOLLOWING) as example. Above,
185  we have stored A and the result of FLOOR in OUT. Now we buffer (save) the row
186  from OUT into the FB: For that, we copy both field A and FLOOR's result_field
187  from OUT to FB; a single copy_fields() call handles both copy jobs. Then we
188  look at the rows we have buffered and may realize that we have enough of the
189  frame to calculate SUM for a certain row (not necessarily the one we just
190  buffered; might be an earlier row, in our example it is the row which is 2
191  rows above the buffered row). If we do, to calculate WFs, we bring back the
192  frame's rows; which is done by: first copying field A and FLOOR's result_field
193  back from FB to OUT, thus getting in OUT all that SUM needs (A and FLOOR),
194  then giving that OUT row to SUM (SUM will then add the row's value to its
195  total; that happens in copy_funcs). After we have done that on all rows of the
196  frame, we have the values of SUM ready in OUT, we also restore the row which
197  owns this SUM value, in the same way as we restored the frame's rows, and we
198  return from Read() - we're done for this row. However, on the next Read()
199  call, we loop to check if we can calculate one more row with the frame
200  we have, and if so, we do, until we can't calculate any more rows -- in which
201  case we're back to just buffering.
202  */
203 class BufferingWindowIterator final : public RowIterator {
204  public:
207  Temp_table_param *temp_table_param, // Includes the window.
208  JOIN *join, int output_slice);
209 
210  bool Init() override;
211 
212  int Read() override;
213 
214  void SetNullRowFlag(bool is_null_row) override {
215  m_source->SetNullRowFlag(is_null_row);
216  }
217 
218  void StartPSIBatchMode() override { m_source->StartPSIBatchMode(); }
219  void EndPSIBatchModeIfStarted() override {
220  m_source->EndPSIBatchModeIfStarted();
221  }
222 
223  void UnlockRow() override {
224  // There's nothing we can do here.
225  }
226 
227  private:
228  int ReadBufferedRow(bool new_partition_or_eof);
229 
230  /// The iterator we are reading from.
232 
233  /// Parameters for the temporary table we are outputting to.
235 
236  /// The window function itself.
238 
239  /// The join we are a part of.
241 
242  /// The slice we will be using when reading rows.
244 
245  /// The slice we will be using when outputting rows.
247 
248  /// If true, we may have more buffered rows to process that need to be
249  /// checked for before reading more rows from the source.
251 
252  /// Whether the last input row started a new partition, and was tucked away
253  /// to finalize the previous partition; if so, we need to bring it back
254  /// for processing before we read more rows.
256 
257  /// Whether we have seen the last input row.
258  bool m_eof;
259 };
260 
261 #endif // SQL_WINDOW_ITERATORS_INCLUDED
BufferingWindowIterator is like WindowIterator, but deals with window functions that need to buffer r...
Definition: window_iterators.h:203
void UnlockRow() override
Definition: window_iterators.h:223
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:240
bool Init() override
Initialize or reinitialize the iterator.
Definition: window_iterators.cc:1575
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:234
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:243
int Read() override
Read a single row.
Definition: window_iterators.cc:1591
int ReadBufferedRow(bool new_partition_or_eof)
Definition: window_iterators.cc:1684
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:218
bool m_eof
Whether we have seen the last input row.
Definition: window_iterators.h:258
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:246
bool m_possibly_buffered_rows
If true, we may have more buffered rows to process that need to be checked for before reading more ro...
Definition: window_iterators.h:250
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:231
Window * m_window
The window function itself.
Definition: window_iterators.h:237
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:219
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:214
bool m_last_input_row_started_new_partition
Whether the last input row started a new partition, and was tucked away to finalize the previous part...
Definition: window_iterators.h:255
BufferingWindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1563
Definition: sql_optimizer.h:125
A context for reading through a single table using a chosen access method: index read,...
Definition: row_iterator.h:60
THD * thd() const
Definition: row_iterator.h:190
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_class.h:821
Object containing parameters used when creating and using temporary tables.
Definition: temp_table_param.h:98
Definition: window_iterators.h:93
int m_output_slice
The slice we will be using when outputting rows.
Definition: window_iterators.h:133
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: window_iterators.h:103
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: window_iterators.h:107
JOIN * m_join
The join we are a part of.
Definition: window_iterators.h:127
WindowIterator(THD *thd, unique_ptr_destroy_only< RowIterator > source, Temp_table_param *temp_table_param, JOIN *join, int output_slice)
Definition: window_iterators.cc:1516
unique_ptr_destroy_only< RowIterator > const m_source
The iterator we are reading from.
Definition: window_iterators.h:118
bool Init() override
Initialize or reinitialize the iterator.
Definition: window_iterators.cc:1529
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: window_iterators.h:108
Temp_table_param * m_temp_table_param
Parameters for the temporary table we are outputting to.
Definition: window_iterators.h:121
int Read() override
Read a single row.
Definition: window_iterators.cc:1541
void UnlockRow() override
Definition: window_iterators.h:112
int m_input_slice
The slice we will be using when reading rows.
Definition: window_iterators.h:130
Window * m_window
The window function itself.
Definition: window_iterators.h:124
Represents the (explicit) window of a SQL 2003 section 7.11 <window clause>, or the implicit (inlined...
Definition: window.h:94
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
std::unique_ptr< T, Destroy_only< T > > unique_ptr_destroy_only
std::unique_ptr, but only destroying.
Definition: my_alloc.h:464
std::string join(Container cont, const std::string &delim)
join elements of an container into a string separated by a delimiter.
Definition: string.h:144
repeated Source source
Definition: replication_asynchronous_connection_failover.proto:41