MySQL 9.1.0
Source Code Documentation
bka_iterator.h
Go to the documentation of this file.
1#ifndef SQL_ITERATORS_BKA_ITERATOR_H_
2#define SQL_ITERATORS_BKA_ITERATOR_H_
3
4/* Copyright (c) 2019, 2024, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27/**
28 Batch key access (BKA) is a join strategy that uses multi-range read (MRR)
29 to get better read ordering on the table on the inner side. It reads
30 a block of rows from the outer side, picks up the join keys (refs) from
31 each row, and sends them all off in one big read request. The handler can
32 then order these and read them in whatever order it would prefer. This is
33 especially attractive when used with rotating media; the reads can then be
34 ordered such that it does not require constant seeking (disk-sweep MRR,
35 or DS-MRR).
36
37 BKA is implemented with two iterators working in concert. The BKAIterator
38 reads rows from the outer side into a buffer. When the buffer is full or we
39 are out of rows, it then sets up the key ranges and hand it over to the
40 MultiRangeRowIterator, which does the actual request, and reads rows from it
41 until there are none left. For each inner row returned, MultiRangeRowIterator
42 loads the appropriate outer row(s) from the buffer, doing the actual join.
43
44 The reason for this split is twofold. First, it allows us to accurately time
45 (for EXPLAIN ANALYZE) the actual table read. Second, and more importantly,
46 we can have other iterators between the BKAIterator and MultiRangeRowIterator,
47 in particular FilterIterator.
48 */
49
50#include <assert.h>
51#include <stddef.h>
52#include <sys/types.h>
53#include <iterator>
54#include <memory>
55
56#include "my_alloc.h"
57
58#include "my_inttypes.h"
59#include "my_table_map.h"
60#include "sql/handler.h"
63#include "sql/join_type.h"
64#include "sql/mem_root_array.h"
65#include "sql/pack_rows.h"
66#include "sql_string.h"
67#include "template_utils.h"
68
69class Item;
70class JOIN;
72class THD;
73struct Index_lookup;
74struct KEY_MULTI_RANGE;
75struct TABLE;
76
77/**
78 The BKA join iterator, with an arbitrary iterator tree on the outer side
79 and a MultiRangeRowIterator on the inner side (possibly with a filter or
80 similar in-between). See file comment for more details.
81 */
82class BKAIterator final : public RowIterator {
83 public:
84 /**
85 @param thd Thread handle.
86 @param outer_input The iterator to read the outer rows from.
87 @param outer_input_tables Each outer table involved.
88 Used to know which fields we are to read into our buffer.
89 @param inner_input The iterator to read the inner rows from.
90 Must end up in a MultiRangeRowIterator.
91 @param max_memory_available Number of bytes available for row buffers,
92 both outer rows and MRR buffers. Note that allocation is incremental,
93 so we can allocate less than this.
94 @param mrr_bytes_needed_for_single_inner_row Number of bytes MRR needs
95 space for in its buffer for holding a single row from the inner table.
96 @param expected_inner_rows_per_outer_row Number of inner rows we
97 statistically expect for each outer row. Used for dividing the buffer
98 space between inner rows and MRR row buffer (if we expect many inner
99 rows, we can't load as many outer rows).
100 @param store_rowids Whether we need to make sure all tables below us have
101 row IDs available, after Read() has been called. Used only if
102 we are below a weedout operation.
103 @param tables_to_get_rowid_for A map of which tables BKAIterator needs
104 to call position() for itself. tables that are in outer_input_tables
105 but not in this map, are expected to be handled by some other iterator.
106 tables that are in this map but not in outer_input_tables will be
107 ignored.
108 @param mrr_iterator Pointer to the MRR iterator at the bottom of
109 inner_input. Used to send row ranges and buffers.
110 @param join_type What kind of join we are executing.
111 */
113 const Prealloced_array<TABLE *, 4> &outer_input_tables,
115 size_t max_memory_available,
116 size_t mrr_bytes_needed_for_single_inner_row,
117 float expected_inner_rows_per_outer_row, bool store_rowids,
118 table_map tables_to_get_rowid_for,
120
121 bool Init() override;
122
123 int Read() override;
124
125 void SetNullRowFlag(bool is_null_row) override {
126 m_outer_input->SetNullRowFlag(is_null_row);
127 m_inner_input->SetNullRowFlag(is_null_row);
128 }
129
130 void UnlockRow() override {
131 // Since we don't know which condition that caused the row to be rejected,
132 // we can't know whether we could also unlock the outer row
133 // (it may still be used as parts of other joined rows).
135 m_inner_input->UnlockRow();
136 }
137 }
138
139 void EndPSIBatchModeIfStarted() override {
140 m_outer_input->EndPSIBatchModeIfStarted();
141 m_inner_input->EndPSIBatchModeIfStarted();
142 }
143
144 private:
145 /// Clear out the MEM_ROOT and prepare for reading rows anew.
146 void BeginNewBatch();
147
148 /// If there are more outer rows, begin the next batch. If not,
149 /// move to the EOF state.
150 void BatchFinished();
151
152 /// Find the next unmatched row, and load it for output as a NULL-complemented
153 /// row. (Assumes the NULL row flag has already been set on the inner table
154 /// iterator.) Returns 0 if a row was found, -1 if no row was found. (Errors
155 /// cannot happen.)
157
158 /// Read a batch of outer rows (BeginNewBatch() must have been called
159 /// earlier). Returns -1 for no outer rows found (sets state to END_OF_ROWS),
160 /// 0 for OK (sets state to RETURNING_JOINED_ROWS) or 1 for error.
161 int ReadOuterRows();
162
163 enum class State {
164 /**
165 We are about to start reading outer rows into our buffer.
166 A single Read() call will fill it up, so there is no
167 in-between “currently reading” state.
168 */
170
171 /**
172 We are returning rows from the MultiRangeRowIterator.
173 (For antijoins, we are looking up the rows, but don't actually
174 return them.)
175 */
177
178 /**
179 We are an outer join or antijoin, and we're returning NULL-complemented
180 rows for those outer rows that never had a matching inner row. Note that
181 this is done in the BKAIterator and not the MRR iterator for two reasons:
182 First, it gives more sensible EXPLAIN ANALYZE numbers. Second, the
183 NULL-complemented rows could be filtered inadvertently by a FilterIterator
184 before they reach the BKAIterator.
185 */
187
188 /**
189 Both the outer and inner side are out of rows.
190 */
192 };
193
195
198
199 /// The MEM_ROOT we are storing the outer rows on, and also allocating MRR
200 /// buffer from. In total, this should not go significantly over
201 /// m_max_memory_available bytes.
203
204 /// Buffered outer rows.
206
207 /// Tables and columns needed for each outer row. Rows/columns that are not
208 /// needed are filtered out in the constructor; the rest are read and stored
209 /// in m_rows.
211
212 /// Used for serializing the row we read from the outer table(s), before it
213 /// stored into the MEM_ROOT and put into m_rows. Should there not be room in
214 /// m_rows for the row, it will stay in this variable until we start reading
215 /// the next batch of outer rows.
216 ///
217 /// If there are no BLOB/TEXT column in the join, we calculate an upper bound
218 /// of the row size that is used to preallocate this buffer. In the case of
219 /// BLOB/TEXT columns, we cannot calculate a reasonable upper bound, and the
220 /// row size is calculated per row. The allocated memory is kept for the
221 /// duration of the iterator, so that we (most likely) avoid reallocations.
223
224 /// Whether we have a row in m_outer_row_buffer from the previous batch of
225 /// rows that we haven't stored in m_rows yet.
227
228 /// For each outer row, how many bytes we need in the MRR buffer (ie., the
229 /// number of bytes we expect to use on rows from the inner table).
230 /// This is the expected number of inner rows per key, multiplied by the
231 /// (fixed) size of each inner row. We use this information to stop scanning
232 /// before we've used up the entire RAM allowance on outer rows, so that
233 /// we have space remaining for the inner rows (in the MRR buffer), too.
235
236 /// Estimated number of bytes used on m_mem_root so far.
237 size_t m_bytes_used = 0;
238
239 /// Whether we've seen EOF from the outer iterator.
241
242 /// See max_memory_available in the constructor.
244
245 /// See max_memory_available in the constructor.
247
248 /// See mrr_iterator in the constructor.
250
251 /// The join type of the BKA join.
253
254 /// If we are synthesizing NULL-complemented rows (for an outer join or
255 /// antijoin), points to the next row within "m_rows" that we haven't
256 /// considered yet.
258};
259
260/**
261 The iterator actually doing the reads from the inner table during BKA.
262 See file comment.
263 */
265 public:
266 /**
267 @param thd Thread handle.
268 @param table The inner table to scan.
269 @param ref The index condition we are looking up on.
270 @param mrr_flags Flags passed on to MRR.
271 @param join_type
272 What kind of BKA join this MRR iterator is part of.
273 @param outer_input_tables
274 Which tables are on the left side of the BKA join (the MRR iterator
275 is always alone on the right side). This is needed so that it can
276 unpack the rows into the right tables, with the right format.
277 @param store_rowids Whether we need to keep row IDs.
278 @param tables_to_get_rowid_for
279 Tables we need to call table->file->position() for; if a table
280 is present in outer_input_tables but not this, some other iterator
281 will make sure that table has the correct row ID already present
282 after Read().
283 */
285 int mrr_flags, JoinType join_type,
286 const Prealloced_array<TABLE *, 4> &outer_input_tables,
287 bool store_rowids, table_map tables_to_get_rowid_for);
288
289 /**
290 Specify which outer rows to read inner rows for.
291 Must be called before Init(), and be valid until the last Read().
292 */
295 m_begin = begin;
296 m_end = end;
297 }
298
299 /**
300 Specify an unused chunk of memory MRR can use for the returned inner rows.
301 Must be called before Init(), and must be at least big enough to hold
302 one inner row.
303 */
304 void set_mrr_buffer(uchar *ptr, size_t size) {
305 m_mrr_buffer.buffer = ptr;
307 }
308
309 /**
310 Specify an unused chunk of memory that we can use to mark which inner rows
311 have been read (by the parent BKA iterator) or not. This is used for outer
312 joins to know which rows need NULL-complemented versions, and for semijoins
313 and antijoins to avoid matching the same inner row more than once.
314
315 Must be called before Init() for semijoins, outer joins and antijoins, and
316 never called otherwise. There must be room at least for one bit per row
317 given in set_rows().
318 */
320
321 /**
322 Mark that the BKA iterator has seen the last row we returned from Read().
323 (It could have been discarded by a FilterIterator before it reached them.)
324 Will be a no-op for inner joins; see set_match_flag_buffer()..
325 */
327 if (m_match_flag_buffer != nullptr) {
328 size_t row_number = std::distance(m_begin, m_last_row_returned);
329 m_match_flag_buffer[row_number / 8] |= 1 << (row_number % 8);
330 }
331 }
332
333 /**
334 Check whether the given row has been marked as read
335 (using MarkLastRowAsRead()) or not. Used internally when doing semijoins,
336 and also by the BKAIterator when synthesizing NULL-complemented rows for
337 outer joins or antijoins.
338 */
340 assert(m_match_flag_buffer != nullptr);
341 size_t row_number = std::distance(m_begin, row);
342 return m_match_flag_buffer[row_number / 8] & (1 << (row_number % 8));
343 }
344
345 /**
346 Do the actual multi-range read with the rows given by set_rows() and using
347 the temporary buffer given in set_mrr_buffer().
348 */
349 bool Init() override;
350
351 /**
352 Read another inner row (if any) and load the appropriate outer row(s)
353 into the associated table buffers.
354 */
355 int Read() override;
356
357 private:
358 // Thunks from function pointers to the actual callbacks.
359 static range_seq_t MrrInitCallbackThunk(void *init_params, uint n_ranges,
360 uint flags) {
361 return (pointer_cast<MultiRangeRowIterator *>(init_params))
362 ->MrrInitCallback(n_ranges, flags);
363 }
364 static uint MrrNextCallbackThunk(void *init_params, KEY_MULTI_RANGE *range) {
365 return (pointer_cast<MultiRangeRowIterator *>(init_params))
366 ->MrrNextCallback(range);
367 }
368 static bool MrrSkipRecordCallbackThunk(range_seq_t seq, char *range_info,
369 uchar *) {
370 return (reinterpret_cast<MultiRangeRowIterator *>(seq))
371 ->MrrSkipRecord(range_info);
372 }
373
374 // Callbacks we get from the handler during the actual read.
375 range_seq_t MrrInitCallback(uint n_ranges, uint flags);
377 bool MrrSkipIndexTuple(char *range_info);
378 bool MrrSkipRecord(char *range_info);
379
380 /// Handler for the table we are reading from.
382
383 /// The index condition.
385
386 /// Flags passed on to MRR.
387 const int m_mrr_flags;
388
389 /// Current outer rows to read inner rows for. Set by set_rows().
392
393 /// Which row we are at in the [m_begin, m_end) range.
394 /// Used during the MRR callbacks.
396
397 /// What row we last returned from Read() (used for MarkLastRowAsRead()).
399
400 /// Temporary space for storing inner rows, used by MRR.
401 /// Set by set_mrr_buffer().
403
404 /// See set_match_flag_buffer().
406
407 /// Tables and columns needed for each outer row. Same as m_outer_input_tables
408 /// in the corresponding BKAIterator.
410
411 /// The join type of the BKA join we are part of. Same as m_join_type in the
412 /// corresponding BKAIterator.
414};
415
416#endif // SQL_ITERATORS_BKA_ITERATOR_H_
The BKA join iterator, with an arbitrary iterator tree on the outer side and a MultiRangeRowIterator ...
Definition: bka_iterator.h:82
const size_t m_mrr_bytes_needed_for_single_inner_row
See max_memory_available in the constructor.
Definition: bka_iterator.h:246
int MakeNullComplementedRow()
Find the next unmatched row, and load it for output as a NULL-complemented row.
Definition: bka_iterator.cc:250
BKAIterator(THD *thd, unique_ptr_destroy_only< RowIterator > outer_input, const Prealloced_array< TABLE *, 4 > &outer_input_tables, unique_ptr_destroy_only< RowIterator > inner_input, size_t max_memory_available, size_t mrr_bytes_needed_for_single_inner_row, float expected_inner_rows_per_outer_row, bool store_rowids, table_map tables_to_get_rowid_for, MultiRangeRowIterator *mrr_iterator, JoinType join_type)
Definition: bka_iterator.cc:70
bool m_has_row_from_previous_batch
Whether we have a row in m_outer_row_buffer from the previous batch of rows that we haven't stored in...
Definition: bka_iterator.h:226
MEM_ROOT m_mem_root
The MEM_ROOT we are storing the outer rows on, and also allocating MRR buffer from.
Definition: bka_iterator.h:202
int ReadOuterRows()
Read a batch of outer rows (BeginNewBatch() must have been called earlier).
Definition: bka_iterator.cc:127
void BatchFinished()
If there are more outer rows, begin the next batch.
Definition: bka_iterator.cc:239
const unique_ptr_destroy_only< RowIterator > m_inner_input
Definition: bka_iterator.h:197
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: bka_iterator.h:139
const size_t m_max_memory_available
See max_memory_available in the constructor.
Definition: bka_iterator.h:243
MultiRangeRowIterator *const m_mrr_iterator
See mrr_iterator in the constructor.
Definition: bka_iterator.h:249
State
Definition: bka_iterator.h:163
@ RETURNING_JOINED_ROWS
We are returning rows from the MultiRangeRowIterator.
@ NEED_OUTER_ROWS
We are about to start reading outer rows into our buffer.
@ RETURNING_NULL_COMPLEMENTED_ROWS
We are an outer join or antijoin, and we're returning NULL-complemented rows for those outer rows tha...
@ END_OF_ROWS
Both the outer and inner side are out of rows.
String m_outer_row_buffer
Used for serializing the row we read from the outer table(s), before it stored into the MEM_ROOT and ...
Definition: bka_iterator.h:222
size_t m_mrr_bytes_needed_per_row
For each outer row, how many bytes we need in the MRR buffer (ie., the number of bytes we expect to u...
Definition: bka_iterator.h:234
pack_rows::TableCollection m_outer_input_tables
Tables and columns needed for each outer row.
Definition: bka_iterator.h:210
JoinType m_join_type
The join type of the BKA join.
Definition: bka_iterator.h:252
size_t m_bytes_used
Estimated number of bytes used on m_mem_root so far.
Definition: bka_iterator.h:237
hash_join_buffer::BufferRow * m_current_pos
If we are synthesizing NULL-complemented rows (for an outer join or antijoin), points to the next row...
Definition: bka_iterator.h:257
int Read() override
Read a single row.
Definition: bka_iterator.cc:270
bool m_end_of_outer_rows
Whether we've seen EOF from the outer iterator.
Definition: bka_iterator.h:240
Mem_root_array< hash_join_buffer::BufferRow > m_rows
Buffered outer rows.
Definition: bka_iterator.h:205
State m_state
Definition: bka_iterator.h:194
void BeginNewBatch()
Clear out the MEM_ROOT and prepare for reading rows anew.
Definition: bka_iterator.cc:120
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: bka_iterator.h:125
void UnlockRow() override
Definition: bka_iterator.h:130
bool Init() override
Initialize or reinitialize the iterator.
Definition: bka_iterator.cc:101
const unique_ptr_destroy_only< RowIterator > m_outer_input
Definition: bka_iterator.h:196
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:930
Definition: sql_optimizer.h:133
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:426
The iterator actually doing the reads from the inner table during BKA.
Definition: bka_iterator.h:264
MultiRangeRowIterator(THD *thd, TABLE *table, Index_lookup *ref, int mrr_flags, JoinType join_type, const Prealloced_array< TABLE *, 4 > &outer_input_tables, bool store_rowids, table_map tables_to_get_rowid_for)
Definition: bka_iterator.cc:322
void MarkLastRowAsRead()
Mark that the BKA iterator has seen the last row we returned from Read().
Definition: bka_iterator.h:326
void set_match_flag_buffer(uchar *ptr)
Specify an unused chunk of memory that we can use to mark which inner rows have been read (by the par...
Definition: bka_iterator.h:319
const int m_mrr_flags
Flags passed on to MRR.
Definition: bka_iterator.h:387
int Read() override
Read another inner row (if any) and load the appropriate outer row(s) into the associated table buffe...
Definition: bka_iterator.cc:431
handler *const m_file
Handler for the table we are reading from.
Definition: bka_iterator.h:381
static bool MrrSkipRecordCallbackThunk(range_seq_t seq, char *range_info, uchar *)
Definition: bka_iterator.h:368
const JoinType m_join_type
The join type of the BKA join we are part of.
Definition: bka_iterator.h:413
pack_rows::TableCollection m_outer_input_tables
Tables and columns needed for each outer row.
Definition: bka_iterator.h:409
bool Init() override
Do the actual multi-range read with the rows given by set_rows() and using the temporary buffer given...
Definition: bka_iterator.cc:335
Index_lookup *const m_ref
The index condition.
Definition: bka_iterator.h:384
static uint MrrNextCallbackThunk(void *init_params, KEY_MULTI_RANGE *range)
Definition: bka_iterator.h:364
bool MrrSkipRecord(char *range_info)
Definition: bka_iterator.cc:426
HANDLER_BUFFER m_mrr_buffer
Temporary space for storing inner rows, used by MRR.
Definition: bka_iterator.h:402
bool MrrSkipIndexTuple(char *range_info)
uchar * m_match_flag_buffer
See set_match_flag_buffer().
Definition: bka_iterator.h:405
uint MrrNextCallback(KEY_MULTI_RANGE *range)
Definition: bka_iterator.cc:381
static range_seq_t MrrInitCallbackThunk(void *init_params, uint n_ranges, uint flags)
Definition: bka_iterator.h:359
range_seq_t MrrInitCallback(uint n_ranges, uint flags)
Definition: bka_iterator.cc:376
const hash_join_buffer::BufferRow * m_current_pos
Which row we are at in the [m_begin, m_end) range.
Definition: bka_iterator.h:395
const hash_join_buffer::BufferRow * m_begin
Current outer rows to read inner rows for. Set by set_rows().
Definition: bka_iterator.h:390
void set_rows(const hash_join_buffer::BufferRow *begin, const hash_join_buffer::BufferRow *end)
Specify which outer rows to read inner rows for.
Definition: bka_iterator.h:293
bool RowHasBeenRead(const hash_join_buffer::BufferRow *row) const
Check whether the given row has been marked as read (using MarkLastRowAsRead()) or not.
Definition: bka_iterator.h:339
void set_mrr_buffer(uchar *ptr, size_t size)
Specify an unused chunk of memory MRR can use for the returned inner rows.
Definition: bka_iterator.h:304
const hash_join_buffer::BufferRow * m_last_row_returned
What row we last returned from Read() (used for MarkLastRowAsRead()).
Definition: bka_iterator.h:398
const hash_join_buffer::BufferRow * m_end
Definition: bka_iterator.h:391
A typesafe replacement for DYNAMIC_ARRAY.
Definition: prealloced_array.h:71
A context for reading through a single table using a chosen access method: index read,...
Definition: row_iterator.h:82
THD * thd() const
Definition: row_iterator.h:228
Using this class is fraught with peril, and you need to be very careful when doing so.
Definition: sql_string.h:167
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
Definition: row_iterator.h:234
TABLE * table() const
Definition: row_iterator.h:246
The handler class is the interface for dynamically loadable storage engines.
Definition: handler.h:4583
A structure that contains a list of input tables for a hash join operation, BKA join operation or a s...
Definition: pack_rows.h:93
This file contains the HashJoinRowBuffer class and related functions/classes.
static int flags[50]
Definition: hp_test1.cc:40
JoinType
Definition: join_type.h:28
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
std::unique_ptr< T, Destroy_only< T > > unique_ptr_destroy_only
std::unique_ptr, but only destroying.
Definition: my_alloc.h:480
Some integer typedefs for easier portability.
unsigned char uchar
Definition: my_inttypes.h:52
uint64_t table_map
Definition: my_table_map.h:30
PT & ref(PT *tp)
Definition: tablespace_impl.cc:359
bool distance(const dd::Spatial_reference_system *srs, const Geometry *g1, const Geometry *g2, double *distance, bool *is_null) noexcept
Computes the distance between two geometries.
Definition: distance.cc:40
Key BufferRow
Definition: hash_join_buffer.h:111
const char * begin(const char *const c)
Definition: base64.h:44
size_t size(const char *const c)
Definition: base64.h:46
Cursor end()
A past-the-end Cursor.
Definition: rules_table_service.cc:192
Generic routines for packing rows (possibly from multiple tables at the same time) into strings,...
void * range_seq_t
Definition: handler.h:3822
join_type
Definition: sql_opt_exec_shared.h:186
Our own string classes, used pervasively throughout the executor.
Definition: handler.h:3816
uchar * buffer_end
Definition: handler.h:3818
uchar * buffer
Definition: handler.h:3817
Structure used for index-based lookups.
Definition: sql_opt_exec_shared.h:67
Definition: my_base.h:1132
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:83
Definition: table.h:1421
Definition: gen_lex_token.cc:149