MySQL  8.0.27
Source Code Documentation
hash_join_buffer.h
Go to the documentation of this file.
1 #ifndef SQL_HASH_JOIN_BUFFER_H_
2 #define SQL_HASH_JOIN_BUFFER_H_
3 
4 /* Copyright (c) 2019, 2021, Oracle and/or its affiliates.
5 
6  This program is free software; you can redistribute it and/or modify
7  it under the terms of the GNU General Public License, version 2.0,
8  as published by the Free Software Foundation.
9 
10  This program is also distributed with certain software (including
11  but not limited to OpenSSL) that is licensed under separate terms,
12  as designated in a particular file or component or in included license
13  documentation. The authors of MySQL hereby grant you an additional
14  permission to link the program and your derivative works with the
15  separately licensed software that they have included with MySQL.
16 
17  This program is distributed in the hope that it will be useful,
18  but WITHOUT ANY WARRANTY; without even the implied warranty of
19  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20  GNU General Public License, version 2.0, for more details.
21 
22  You should have received a copy of the GNU General Public License
23  along with this program; if not, write to the Free Software
24  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25 
26 /// @file
27 ///
28 /// This file contains the HashJoinRowBuffer class and related
29 /// functions/classes.
30 ///
31 /// A HashJoinBuffer is a row buffer that can hold a certain amount of rows.
32 /// The rows are stored in a hash table, which allows for constant-time lookup.
33 /// The HashJoinBuffer maintains its own internal MEM_ROOT, where all of the
34 /// data is allocated.
35 ///
36 /// The HashJoinBuffer contains an operand with rows from one or more tables,
37 /// keyed on the value we join on. Consider the following trivial example:
38 ///
39 /// SELECT t1.data FROM t1 JOIN t2 ON (t1.key = t2.key);
40 ///
41 /// Let us say that the table "t2" is stored in a HashJoinBuffer. In this case,
42 /// the hash table key will be the value found in "t2.key", since that is the
43 /// join condition that belongs to t2. If we have multiple equalities, they
44 /// will be concatenated together in order to form the hash table key. The hash
45 /// table key is a std::string_view.
46 ///
47 /// In order to store a row, we use the function StoreFromTableBuffers. See the
48 /// comments attached to the function for more details.
49 ///
50 /// The amount of memory a HashJoinBuffer instance can use is limited by the
51 /// system variable "join_buffer_size". However, note that we check whether we
52 /// have exceeded the memory limit _after_ we have inserted data into the row
53 /// buffer. As such, we will probably use a little bit more memory than
54 /// specified by join_buffer_size.
55 ///
56 /// The primary use case for these classes is, as the name implies,
57 /// for implementing hash join.
58 
59 #include <stddef.h>
60 #include <stdint.h>
61 #include <memory>
62 #include <vector>
63 
64 #include "extra/robin-hood-hashing/robin_hood.h"
65 #include "field_types.h"
66 #include "map_helpers.h"
67 #include "my_alloc.h"
68 #include "my_inttypes.h"
69 #include "my_table_map.h"
70 #include "prealloced_array.h"
71 #include "sql/immutable_string.h"
72 #include "sql/item_cmpfunc.h"
73 #include "sql/pack_rows.h"
74 #include "sql/table.h"
75 #include "sql_string.h"
76 
77 class Field;
78 class QEP_TAB;
79 
80 namespace hash_join_buffer {
81 
82 /// The key type for the hash structure in HashJoinRowBuffer.
83 ///
84 /// A key consists of the value from one or more columns, taken from the join
85 /// condition(s) in the query. E.g., if the join condition is
86 /// (t1.col1 = t2.col1 AND t1.col2 = t2.col2), the key is (col1, col2), with the
87 /// two key parts concatenated together.
88 ///
89 /// What the data actually contains depends on the comparison context for the
90 /// join condition. For instance, if the join condition is between a string
91 /// column and an integer column, the comparison will be done in a string
92 /// context, and thus the integers will be converted to strings before storing.
93 /// So the data we store in the key are in some cases converted, so that we can
94 /// hash and compare them byte-by-byte (i.e. decimals), while other types are
95 /// already comparable byte-by-byte (i.e. integers), and thus stored as-is.
96 ///
97 /// Note that the key data can come from items as well as fields if the join
98 /// condition is an expression. E.g. if the join condition is
99 /// UPPER(t1.col1) = UPPER(t2.col1), the join key data will come from an Item
100 /// instead of a Field.
101 ///
102 /// The Key class never takes ownership of the data. As such, the user must
103 /// ensure that the data has the proper lifetime. When storing rows in the row
104 /// buffer, the data must have the same lifetime as the row buffer itself.
105 /// When using the Key class for lookups in the row buffer, the same lifetime is
106 /// not needed; the key object is only needed when the lookup is done.
107 using Key = std::string_view;
108 
109 class KeyEquals {
110  public:
111  // This is a marker from C++17 that signals to the container that
112  // operator() can be called with arguments of which one of the types
113  // differs from the container's key type (ImmutableStringWithLength),
114  // and thus enables map.find(Key). The type itself does not matter.
115  using is_transparent = void;
116 
117  bool operator()(const Key &str1,
118  const ImmutableStringWithLength &other) const {
119  return str1 == other.Decode();
120  }
121 
123  const ImmutableStringWithLength &str2) const {
124  return str1 == str2;
125  }
126 };
127 
128 // A row in the hash join buffer is the same as the Key class.
129 using BufferRow = Key;
130 
131 class KeyHasher {
132  public:
133  // This is a marker from C++17 that signals to the container that
134  // operator() can be called with an argument that differs from the
135  // container's key type (ImmutableStringWithLength), and thus enables
136  // map.find(Key). The type itself does not matter.
137  using is_transparent = void;
138 
140  return robin_hood::hash_bytes(key.data(), key.size());
141  }
142 
144  std::string_view decoded = key.Decode();
145  return robin_hood::hash_bytes(decoded.data(), decoded.size());
146  }
147 };
148 
149 // A convenience form of LoadIntoTableBuffers() that also verifies the end
150 // pointer for us.
152  BufferRow row);
153 
154 // A convenience form of the above that also decodes the LinkedImmutableString
155 // for us.
158 
160 
162  public:
163  // Construct the buffer. Note that Init() must be called before the buffer can
164  // be used.
166  std::vector<HashJoinCondition> join_conditions,
167  size_t max_mem_available_bytes);
168 
169  // Initialize the HashJoinRowBuffer so it is ready to store rows. This
170  // function can be called multiple times; subsequent calls will only clear the
171  // buffer for existing rows.
172  bool Init();
173 
174  /// Store the row that is currently lying in the tables record buffers.
175  /// The hash map key is extracted from the join conditions that the row buffer
176  /// holds.
177  ///
178  /// @param thd the thread handler
179  /// @param reject_duplicate_keys If true, reject rows with duplicate keys.
180  /// If a row is rejected, the function will still return ROW_STORED.
181  /// @param store_rows_with_null_in_condition Whether to store rows where the
182  /// join conditions contains SQL NULL.
183  ///
184  /// @retval ROW_STORED the row was stored.
185  /// @retval BUFFER_FULL the row was stored, and the buffer is full.
186  /// @retval FATAL_ERROR an unrecoverable error occured (most likely,
187  /// malloc failed). It is the callers responsibility to call
188  /// my_error().
189  StoreRowResult StoreRow(THD *thd, bool reject_duplicate_keys,
190  bool store_rows_with_null_in_condition);
191 
192  size_t size() const { return m_hash_map->size(); }
193 
194  bool empty() const { return m_hash_map->empty(); }
195 
196  bool inited() const { return m_hash_map != nullptr; }
197 
198  using hash_map_type = robin_hood::unordered_flat_map<
200 
201  using hash_map_iterator = hash_map_type::const_iterator;
202 
203  hash_map_iterator find(const Key &key) const { return m_hash_map->find(key); }
204 
205  hash_map_iterator begin() const { return m_hash_map->begin(); }
206 
207  hash_map_iterator end() const { return m_hash_map->end(); }
208 
210  assert(Initialized());
211  return m_last_row_stored;
212  }
213 
214  bool Initialized() const { return m_hash_map.get() != nullptr; }
215 
216  bool contains(const Key &key) const { return find(key) != end(); }
217 
218  private:
219  const std::vector<HashJoinCondition> m_join_conditions;
220 
221  // A row can consist of parts from different tables. This structure tells us
222  // which tables that are involved.
224 
225  // The MEM_ROOT on which all of the hash table keys and values are allocated.
226  // The actual hash map is on the regular heap.
228 
229  // A MEM_ROOT used only for storing the final row (possibly both key and
230  // value). The code assumes fairly deeply that inserting a row never fails, so
231  // when m_mem_root goes full (we set a capacity on it to ensure that the last
232  // allocated block does not get too big), we allocate the very last row on
233  // this MEM_ROOT and the signal fullness so that we can start spilling to
234  // disk.
236 
237  // The hash table where the rows are stored.
238  std::unique_ptr<hash_map_type> m_hash_map;
239 
240  // A buffer we can use when we are constructing a join key from a join
241  // condition. In order to avoid reallocating memory, the buffer never shrinks.
244 
245  // The maximum size of the buffer, given in bytes.
246  const size_t m_max_mem_available;
247 
248  // The last row that was stored in the hash table, or nullptr if the hash
249  // table is empty. We may have to put this row back into the tables' record
250  // buffers if we have a child iterator that expects the record buffers to
251  // contain the last row returned by the storage engine (the probe phase of
252  // hash join may put any row in the hash table in the tables' record buffer).
253  // See HashJoinIterator::BuildHashTable() for an example of this.
255 
256  // Fetch the relevant fields from each table, and pack them into m_mem_root
257  // as a LinkedImmutableString where the “next” pointer points to “next_ptr”.
258  // If that does not work (capacity reached), pack into m_overflow_mem_root
259  // instead and set “full” to true. If _that_ does not work (fatally out
260  // of memory), returns nullptr. Otherwise, returns a pointer to the newly
261  // packed string.
263  LinkedImmutableString next_ptr, bool *full);
264 };
265 
266 } // namespace hash_join_buffer
267 
268 #endif // SQL_HASH_JOIN_BUFFER_H_
Definition: field.h:590
The variant with length (ImmutableStringWithLength) stores the length as a Varint128 (similar to prot...
Definition: immutable_string.h:62
std::string_view Decode() const
Definition: immutable_string.h:129
LinkedImmutableString is designed for storing rows (values) in hash join.
Definition: immutable_string.h:172
Definition: sql_executor.h:256
Using this class is fraught with peril, and you need to be very careful when doing so.
Definition: sql_string.h:165
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_class.h:821
Definition: hash_join_buffer.h:161
String m_buffer
Definition: hash_join_buffer.h:242
bool contains(const Key &key) const
Definition: hash_join_buffer.h:216
size_t m_row_size_upper_bound
Definition: hash_join_buffer.h:243
hash_map_type::const_iterator hash_map_iterator
Definition: hash_join_buffer.h:201
StoreRowResult StoreRow(THD *thd, bool reject_duplicate_keys, bool store_rows_with_null_in_condition)
Store the row that is currently lying in the tables record buffers.
Definition: hash_join_buffer.cc:158
hash_map_iterator end() const
Definition: hash_join_buffer.h:207
size_t size() const
Definition: hash_join_buffer.h:192
bool Init()
Definition: hash_join_buffer.cc:129
bool Initialized() const
Definition: hash_join_buffer.h:214
hash_map_iterator begin() const
Definition: hash_join_buffer.h:205
bool empty() const
Definition: hash_join_buffer.h:194
bool inited() const
Definition: hash_join_buffer.h:196
std::unique_ptr< hash_map_type > m_hash_map
Definition: hash_join_buffer.h:238
HashJoinRowBuffer(pack_rows::TableCollection tables, std::vector< HashJoinCondition > join_conditions, size_t max_mem_available_bytes)
Definition: hash_join_buffer.cc:115
LinkedImmutableString m_last_row_stored
Definition: hash_join_buffer.h:254
const std::vector< HashJoinCondition > m_join_conditions
Definition: hash_join_buffer.h:219
const size_t m_max_mem_available
Definition: hash_join_buffer.h:246
LinkedImmutableString StoreLinkedImmutableStringFromTableBuffers(LinkedImmutableString next_ptr, bool *full)
Definition: hash_join_buffer.cc:58
const pack_rows::TableCollection m_tables
Definition: hash_join_buffer.h:223
robin_hood::unordered_flat_map< ImmutableStringWithLength, LinkedImmutableString, KeyHasher, KeyEquals > hash_map_type
Definition: hash_join_buffer.h:199
LinkedImmutableString LastRowStored() const
Definition: hash_join_buffer.h:209
hash_map_iterator find(const Key &key) const
Definition: hash_join_buffer.h:203
MEM_ROOT m_overflow_mem_root
Definition: hash_join_buffer.h:235
MEM_ROOT m_mem_root
Definition: hash_join_buffer.h:227
Definition: hash_join_buffer.h:109
void is_transparent
Definition: hash_join_buffer.h:115
bool operator()(const ImmutableStringWithLength &str1, const ImmutableStringWithLength &str2) const
Definition: hash_join_buffer.h:122
bool operator()(const Key &str1, const ImmutableStringWithLength &other) const
Definition: hash_join_buffer.h:117
Definition: hash_join_buffer.h:131
size_t operator()(hash_join_buffer::Key key) const
Definition: hash_join_buffer.h:139
void is_transparent
Definition: hash_join_buffer.h:137
size_t operator()(ImmutableStringWithLength key) const
Definition: hash_join_buffer.h:143
A structure that contains a list of tables for the hash join operation, and some pre-computed propert...
Definition: pack_rows.h:83
This file contains the field type.
ImmutableString defines a storage format for strings that is designed to be as compact as possible,...
This file follows Google coding style, except for the name MEM_ROOT (which is kept for historical rea...
Some integer typedefs for easier portability.
Definition: hash_join_buffer.cc:55
std::string_view Key
The key type for the hash structure in HashJoinRowBuffer.
Definition: hash_join_buffer.h:107
Key BufferRow
Definition: hash_join_buffer.h:129
void LoadImmutableStringIntoTableBuffers(const TableCollection &tables, LinkedImmutableString row)
Definition: hash_join_buffer.cc:110
void LoadBufferRowIntoTableBuffers(const TableCollection &tables, BufferRow row)
Definition: hash_join_buffer.cc:103
StoreRowResult
Definition: hash_join_buffer.h:159
Generic routines for packing rows (possibly from multiple tables at the same time) into strings,...
required string key
Definition: replication_asynchronous_connection_failover.proto:59
Our own string classes, used pervasively throughout the executor.
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:78