MySQL 9.0.0
Source Code Documentation
hash_join_buffer.h
Go to the documentation of this file.
1#ifndef SQL_ITERATORS_HASH_JOIN_BUFFER_H_
2#define SQL_ITERATORS_HASH_JOIN_BUFFER_H_
3
4/* Copyright (c) 2019, 2024, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27/// @file
28///
29/// This file contains the HashJoinRowBuffer class and related
30/// functions/classes.
31///
32/// A HashJoinBuffer is a row buffer that can hold a certain amount of rows.
33/// The rows are stored in a hash table, which allows for constant-time lookup.
34/// The HashJoinBuffer maintains its own internal MEM_ROOT, where all of the
35/// data is allocated.
36///
37/// The HashJoinBuffer contains an operand with rows from one or more tables,
38/// keyed on the value we join on. Consider the following trivial example:
39///
40/// SELECT t1.data FROM t1 JOIN t2 ON (t1.key = t2.key);
41///
42/// Let us say that the table "t2" is stored in a HashJoinBuffer. In this case,
43/// the hash table key will be the value found in "t2.key", since that is the
44/// join condition that belongs to t2. If we have multiple equalities, they
45/// will be concatenated together in order to form the hash table key. The hash
46/// table key is a std::string_view.
47///
48/// In order to store a row, we use the function StoreFromTableBuffers. See the
49/// comments attached to the function for more details.
50///
51/// The amount of memory a HashJoinBuffer instance can use is limited by the
52/// system variable "join_buffer_size". However, note that we check whether we
53/// have exceeded the memory limit _after_ we have inserted data into the row
54/// buffer. As such, we will probably use a little bit more memory than
55/// specified by join_buffer_size.
56///
57/// The primary use case for these classes is, as the name implies,
58/// for implementing hash join.
59
60#include <stddef.h>
61#include <cassert>
62#include <memory>
63#include <optional>
64#include <string_view>
65#include <vector>
66
67#include "my_alloc.h"
69#include "sql/pack_rows.h"
70#include "sql_string.h"
71
73class THD;
74
75namespace hash_join_buffer {
76
77/// The key type for the hash structure in HashJoinRowBuffer.
78///
79/// A key consists of the value from one or more columns, taken from the join
80/// condition(s) in the query. E.g., if the join condition is
81/// (t1.col1 = t2.col1 AND t1.col2 = t2.col2), the key is (col1, col2), with the
82/// two key parts concatenated together.
83///
84/// What the data actually contains depends on the comparison context for the
85/// join condition. For instance, if the join condition is between a string
86/// column and an integer column, the comparison will be done in a string
87/// context, and thus the integers will be converted to strings before storing.
88/// So the data we store in the key are in some cases converted, so that we can
89/// hash and compare them byte-by-byte (i.e. decimals), while other types are
90/// already comparable byte-by-byte (i.e. integers), and thus stored as-is.
91///
92/// Note that the key data can come from items as well as fields if the join
93/// condition is an expression. E.g. if the join condition is
94/// UPPER(t1.col1) = UPPER(t2.col1), the join key data will come from an Item
95/// instead of a Field.
96///
97/// The Key class never takes ownership of the data. As such, the user must
98/// ensure that the data has the proper lifetime. When storing rows in the row
99/// buffer, the data must have the same lifetime as the row buffer itself.
100/// When using the Key class for lookups in the row buffer, the same lifetime is
101/// not needed; the key object is only needed when the lookup is done.
102using Key = std::string_view;
103
104// A row in the hash join buffer is the same as the Key class.
106
107// A convenience form of LoadIntoTableBuffers() that also verifies the end
108// pointer for us.
110 BufferRow row);
111
112// A convenience form of the above that also decodes the LinkedImmutableString
113// for us.
116
118
120 public:
121 // Construct the buffer. Note that Init() must be called before the buffer can
122 // be used.
124 std::vector<HashJoinCondition> join_conditions,
125 size_t max_mem_available_bytes);
126
128
129 // Initialize the HashJoinRowBuffer so it is ready to store rows. This
130 // function can be called multiple times; subsequent calls will only clear the
131 // buffer for existing rows.
132 bool Init();
133
134 /// Store the row that is currently lying in the tables record buffers.
135 /// The hash map key is extracted from the join conditions that the row buffer
136 /// holds.
137 ///
138 /// @param thd the thread handler
139 /// @param reject_duplicate_keys If true, reject rows with duplicate keys.
140 /// If a row is rejected, the function will still return ROW_STORED.
141 ///
142 /// @retval ROW_STORED the row was stored.
143 /// @retval BUFFER_FULL the row was stored, and the buffer is full.
144 /// @retval FATAL_ERROR an unrecoverable error occurred (most likely,
145 /// malloc failed). It is the caller's responsibility to call
146 /// my_error().
147 StoreRowResult StoreRow(THD *thd, bool reject_duplicate_keys);
148
149 size_t size() const;
150
151 bool empty() const { return size() == 0; }
152
153 std::optional<LinkedImmutableString> find(Key key) const;
154
155 std::optional<LinkedImmutableString> first_row() const;
156
158 assert(Initialized());
159 return m_last_row_stored;
160 }
161
162 bool Initialized() const { return m_hash_map != nullptr; }
163
164 bool contains(const Key &key) const { return find(key).has_value(); }
165
166 private:
167 // The type of hash map in which the rows are stored.
168 class HashMap;
169
170 const std::vector<HashJoinCondition> m_join_conditions;
171
172 // A row can consist of parts from different tables. This structure tells us
173 // which tables that are involved.
175
176 // The MEM_ROOT on which all of the hash table keys and values are allocated.
177 // The actual hash map is on the regular heap.
179
180 // A MEM_ROOT used only for storing the final row (possibly both key and
181 // value). The code assumes fairly deeply that inserting a row never fails, so
182 // when m_mem_root goes full (we set a capacity on it to ensure that the last
183 // allocated block does not get too big), we allocate the very last row on
184 // this MEM_ROOT and the signal fullness so that we can start spilling to
185 // disk.
187
188 // The hash table where the rows are stored.
189 std::unique_ptr<HashMap> m_hash_map;
190
191 // A buffer we can use when we are constructing a join key from a join
192 // condition. In order to avoid reallocating memory, the buffer never shrinks.
195
196 // The maximum size of the buffer, given in bytes.
198
199 // The last row that was stored in the hash table, or nullptr if the hash
200 // table is empty. We may have to put this row back into the tables' record
201 // buffers if we have a child iterator that expects the record buffers to
202 // contain the last row returned by the storage engine (the probe phase of
203 // hash join may put any row in the hash table in the tables' record buffer).
204 // See HashJoinIterator::BuildHashTable() for an example of this.
206
207 // Fetch the relevant fields from each table, and pack them into m_mem_root
208 // as a LinkedImmutableString where the “next” pointer points to “next_ptr”.
209 // If that does not work (capacity reached), pack into m_overflow_mem_root
210 // instead and set “full” to true. If _that_ does not work (fatally out
211 // of memory), returns nullptr. Otherwise, returns a pointer to the newly
212 // packed string.
214 LinkedImmutableString next_ptr, bool *full);
215};
216
217} // namespace hash_join_buffer
218
219/// External interface to the corresponding member in HashJoinRowBuffer
221 MEM_ROOT *mem_root, MEM_ROOT *overflow_mem_root,
223 size_t row_size_upper_bound, bool *full);
224
225#endif // SQL_ITERATORS_HASH_JOIN_BUFFER_H_
A class that represents a join condition in a hash join.
Definition: item_cmpfunc.h:87
LinkedImmutableString is designed for storing rows (values) in hash join.
Definition: immutable_string.h:173
Using this class is fraught with peril, and you need to be very careful when doing so.
Definition: sql_string.h:167
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
Definition: hash_join_buffer.h:119
String m_buffer
Definition: hash_join_buffer.h:193
bool contains(const Key &key) const
Definition: hash_join_buffer.h:164
size_t m_row_size_upper_bound
Definition: hash_join_buffer.h:194
std::unique_ptr< HashMap > m_hash_map
Definition: hash_join_buffer.h:189
std::optional< LinkedImmutableString > first_row() const
Definition: hash_join_buffer.cc:343
size_t size() const
Definition: hash_join_buffer.cc:335
bool Init()
Definition: hash_join_buffer.cc:199
bool Initialized() const
Definition: hash_join_buffer.h:162
bool empty() const
Definition: hash_join_buffer.h:151
HashJoinRowBuffer(pack_rows::TableCollection tables, std::vector< HashJoinCondition > join_conditions, size_t max_mem_available_bytes)
Definition: hash_join_buffer.cc:181
std::optional< LinkedImmutableString > find(Key key) const
Definition: hash_join_buffer.cc:337
LinkedImmutableString m_last_row_stored
Definition: hash_join_buffer.h:205
const std::vector< HashJoinCondition > m_join_conditions
Definition: hash_join_buffer.h:168
const size_t m_max_mem_available
Definition: hash_join_buffer.h:197
LinkedImmutableString StoreLinkedImmutableStringFromTableBuffers(LinkedImmutableString next_ptr, bool *full)
Definition: hash_join_buffer.cc:160
StoreRowResult StoreRow(THD *thd, bool reject_duplicate_keys)
Store the row that is currently lying in the tables record buffers.
Definition: hash_join_buffer.cc:227
const pack_rows::TableCollection m_tables
Definition: hash_join_buffer.h:174
LinkedImmutableString LastRowStored() const
Definition: hash_join_buffer.h:157
MEM_ROOT m_overflow_mem_root
Definition: hash_join_buffer.h:186
MEM_ROOT m_mem_root
Definition: hash_join_buffer.h:178
A structure that contains a list of input tables for a hash join operation, BKA join operation or a s...
Definition: pack_rows.h:93
static MEM_ROOT mem_root
Definition: client_plugin.cc:114
LinkedImmutableString StoreLinkedImmutableStringFromTableBuffers(MEM_ROOT *mem_root, MEM_ROOT *overflow_mem_root, pack_rows::TableCollection tables, LinkedImmutableString next_ptr, size_t row_size_upper_bound, bool *full)
External interface to the corresponding member in HashJoinRowBuffer.
Definition: hash_join_buffer.cc:48
ImmutableString defines a storage format for strings that is designed to be as compact as possible,...
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
Definition: hash_join_buffer.cc:98
std::string_view Key
The key type for the hash structure in HashJoinRowBuffer.
Definition: hash_join_buffer.h:102
Key BufferRow
Definition: hash_join_buffer.h:105
void LoadImmutableStringIntoTableBuffers(const TableCollection &tables, LinkedImmutableString row)
Definition: hash_join_buffer.cc:176
void LoadBufferRowIntoTableBuffers(const TableCollection &tables, BufferRow row)
Definition: hash_join_buffer.cc:169
StoreRowResult
Definition: hash_join_buffer.h:117
Generic routines for packing rows (possibly from multiple tables at the same time) into strings,...
required string key
Definition: replication_asynchronous_connection_failover.proto:60
Our own string classes, used pervasively throughout the executor.
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:83