MySQL 8.0.39
Source Code Documentation
item_regexp_func.h
Go to the documentation of this file.
1#ifndef SQL_ITEM_REGEXP_FUNC_H_
2#define SQL_ITEM_REGEXP_FUNC_H_
3
4/* Copyright (c) 2017, 2024, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27/**
28 @file item_regexp_func.h
29
30 The function classes for regular expression functions. They have a common
31 base class Item_func_regexp, which is also the prefix of their class
32 names. After the %Item_func prefix comes the name of the SQL function,
33 e.g. Item_func_regexp_instr represents the SQL function `REGEXP_INSTR`.
34
35 Type resolution
36 ===============
37
38 The type and name resolution procedure is hooked into by the
39 Item_func_regexp class, which implement both
40 Item_result_field::resolve_type() and Item::fix_fields().
41
42 Collations
43 ==========
44
45 The regular expression library doesn't deal with collations at all, but we
46 need them because the 'winning' collation of the pattern and the subject
47 strings dictates case-sensitivity. The winning collation is defined by
48 coercion rules, and we don't delve into that here. See
49 Item_func::agg_arg_charsets_for_comparison(). The call to this function is
50 done in resolve_type() as this appears to be an unwritten convention.
51
52 Implementation
53 ==============
54
55 All communication with the regular expression library is done through a
56 Regexp_facade object, instantiated in Item_func_regexp::fix_fields().
57
58 @todo We now clean up ICU heap memory in Item_func_regexp::cleanup. Should
59 it be done more rarely? On session close?
60*/
61
62#include <assert.h>
63#include <unicode/uregex.h>
64
65#include <optional>
66#include <string>
67
68// assert
69#include "my_inttypes.h" // MY_INT32_NUM_DECIMAL_DIGITS
70#include "sql/item_cmpfunc.h"
71#include "sql/item_strfunc.h"
72#include "sql/mysqld.h" // make_unique_destroy_only
74#include "sql_string.h" // String
75
76// GCC bug 80635.
77#if defined(__GNUC__) && !defined(__clang__)
78#pragma GCC diagnostic push
79#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
80#endif
81
82/**
83 Base class for all regular expression function classes. Is responsible for
84 creating the Regexp_facade object.
85*/
87 public:
88 Item_func_regexp(const POS &pos, PT_item_list *opt_list)
89 : Item_func(pos, opt_list) {}
90
91 /**
92 Resolves the collation to use for comparison. The type resolution is done
93 in the subclass constructors.
94
95 For all regular expression functions, i.e. REGEXP_INSTR, REGEXP_LIKE,
96 REGEXP_REPLACE and REGEXP_SUBSTR, it goes that the first two arguments
97 have to agree on a common collation. This collation is used to control
98 case-sensitivity.
99
100 @see fix_fields()
101 */
102 bool resolve_type(THD *) override;
103
104 /// Decides on the mode for matching, case sensitivity etc.
105 bool fix_fields(THD *thd, Item **) override;
106
107 /// The expression for the subject string.
108 Item *subject() const { return args[0]; }
109
110 /// The expression for the pattern string.
111 Item *pattern() const { return args[1]; }
112
113 /// The value of the `position` argument, or its default if absent.
114 std::optional<int> position() const {
115 int the_index = pos_arg_pos();
116 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
117 int value = args[the_index]->val_int();
118 /*
119 Note: Item::null_value() can't be trusted alone here; there are cases
120 (for the DATE data type in particular) where we can have it set
121 without Item::m_nullable being set! This really should be cleaned up,
122 but until that happens, we need to have a more conservative check.
123 */
124 if (args[the_index]->is_nullable() && args[the_index]->null_value)
125 return {};
126 else
127 return value;
128 }
129 return 1;
130 }
131
132 /// The value of the `occurrence` argument, or its default if absent.
133 std::optional<int> occurrence() const {
134 int the_index = occ_arg_pos();
135 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
136 int value = args[the_index]->val_int();
137 /*
138 Note: Item::null_value() can't be trusted alone here; there are cases
139 (for the DATE data type in particular) where we can have it set
140 without Item::maybe_null being set! This really should be cleaned up,
141 but until that happens, we need to have a more conservative check.
142 */
143 if (args[the_index]->is_nullable() && args[the_index]->null_value)
144 return {};
145 else
146 return value;
147 }
148 return 0;
149 }
150
151 /// The value of the `match_parameter` argument, or an empty string if absent.
152 std::optional<std::string> match_parameter() const {
153 int the_index = match_arg_pos();
154 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
155 StringBuffer<5> buf; // Longer match_parameter doesn't make sense.
156 String *s = args[the_index]->val_str(&buf);
157 if (s != nullptr)
158 return to_string(*s);
159 else
160 return {};
161 }
162 return std::string{};
163 }
164
165 void cleanup() override;
166
167 protected:
169 assert(fixed == 1);
170 longlong nr = val_int();
171 if (null_value) return nullptr;
172 str->set_int(nr, unsigned_flag, collation.collation);
173 return str;
174 }
175
177 assert(fixed == 1);
178 longlong nr = val_int();
179 if (null_value) return nullptr; /* purecov: inspected */
181 return value;
182 }
183
185 assert(fixed == 1);
186 return val_int();
187 }
188
190 assert(fixed == 1);
191 int err_not_used;
192 const char *end_not_used;
193 String *res = val_str(&str_value);
194 if (res == nullptr) return 0.0;
195 return my_strntod(res->charset(), res->ptr(), res->length(), &end_not_used,
196 &err_not_used);
197 }
198
200 assert(fixed == 1);
201 int err;
202 String *res = val_str(&str_value);
203 if (res == nullptr) return 0;
204 return my_strntoll(res->charset(), res->ptr(), res->length(), 10, nullptr,
205 &err);
206 }
207
208 /**
209 The position in the argument list of 'position'. -1 means that the default
210 should be used.
211 */
212 virtual int pos_arg_pos() const = 0;
213
214 /**
215 The position in the argument list of 'occurrence'. -1 means that the default
216 should be used.
217 */
218 virtual int occ_arg_pos() const = 0;
219
220 /// The position in the argument list of match_parameter.
221 virtual int match_arg_pos() const = 0;
222
223 bool set_pattern();
224
226};
227
229 public:
231 : Item_func_regexp(pos, opt_list) {
233 }
234
235 Item_result result_type() const override { return INT_RESULT; }
236
237 bool fix_fields(THD *thd, Item **arguments) override;
238
240
241 double val_real() override { return convert_int_to_real(); }
242
243 longlong val_int() override;
244
245 const char *func_name() const override { return "regexp_instr"; }
246
247 /// The value of the `return_option` argument, or its default if absent.
248 std::optional<int> return_option() const {
249 int the_index = retopt_arg_pos();
250 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
251 int value = args[the_index]->val_int();
252 if (args[the_index]->null_value)
253 return std::optional<int>();
254 else
255 return value;
256 }
257 return 0;
258 }
259
260 /**
261 @{
262
263 Copy-pasted from Item_int_func. Usually, an SQL function returning INTEGER
264 just inherits Item_int_func and thus the implementation, but these classes
265 need to have Item_func_regexp as base class because of fix_fields().
266 */
267 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
268 return get_date_from_int(ltime, fuzzydate);
269 }
270
271 bool get_time(MYSQL_TIME *t) override { return get_time_from_int(t); }
272 /// @}
273
274 protected:
275 int pos_arg_pos() const override { return 2; }
276 int occ_arg_pos() const override { return 3; }
277 /// The position in the argument list of `occurrence`.
278 int retopt_arg_pos() const { return 4; }
279 int match_arg_pos() const override { return 5; }
280
281 private:
282 bool resolve_type(THD *) final;
283};
284
286 public:
288 : Item_func_regexp(pos, opt_list) {
290 }
291
292 Item_result result_type() const override { return INT_RESULT; }
293
295
296 double val_real() override { return convert_int_to_real(); }
297
298 longlong val_int() override;
299
300 const char *func_name() const override { return "regexp_like"; }
301
302 bool is_bool_func() const override { return true; }
303
304 /**
305 @{
306
307 Copy-pasted from Item_int_func. Usually, an SQL function returning INTEGER
308 just inherits Item_int_func and thus the implementation, but these classes
309 need to have Item_func_regexp as base class because of fix_fields().
310 */
311 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
312 return get_date_from_int(ltime, fuzzydate);
313 }
314
315 bool get_time(MYSQL_TIME *t) override { return get_time_from_int(t); }
316 /// @}
317
318 protected:
319 int pos_arg_pos() const override { return -1; }
320 int occ_arg_pos() const override { return -1; }
321 int match_arg_pos() const override { return 2; }
322
323 private:
324 bool resolve_type(THD *) final;
325};
326
328 public:
330 : Item_func_regexp(pos, item_list) {}
331
332 Item_result result_type() const override { return STRING_RESULT; }
333
334 bool resolve_type(THD *) final;
335
336 Item *replacement() { return args[2]; }
337
338 longlong val_int() override { return convert_str_to_int(); }
339
340 String *val_str(String *result) override;
341
342 double val_real() override { return convert_str_to_real(); }
343
344 const char *func_name() const override { return "regexp_replace"; }
345
346 /**
347 @{
348
349 Copy-pasted from Item_str_func. Usually, an SQL function returning INTEGER
350 just inherits Item_str_func and thus the implementation, but these classes
351 need to have Item_func_regexp as base class because of fix_fields().
352 */
353 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
354 return get_date_from_string(ltime, fuzzydate);
355 }
356
357 bool get_time(MYSQL_TIME *t) override { return get_time_from_string(t); }
358 /// @}
359
360 protected:
361 int pos_arg_pos() const override { return 3; }
362 int occ_arg_pos() const override { return 4; }
363 int match_arg_pos() const override { return 5; }
364};
365
367 public:
369 : Item_func_regexp(pos, item_list) {}
370
371 Item_result result_type() const override { return STRING_RESULT; }
372
373 bool resolve_type(THD *) final;
374
375 longlong val_int() override { return convert_str_to_int(); }
376
377 String *val_str(String *result) override;
378
379 double val_real() override { return convert_str_to_real(); }
380
381 const char *func_name() const override { return "regexp_substr"; }
382
383 /**
384 @{
385
386 Copy-pasted from Item_str_func. Usually, an SQL function returning INTEGER
387 just inherits Item_str_func and thus the implementation, but these classes
388 need to have Item_func_regexp as base class because of fix_fields().
389 */
390 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
391 return get_date_from_string(ltime, fuzzydate);
392 }
393
394 bool get_time(MYSQL_TIME *t) override { return get_time_from_string(t); }
395 /// @}
396
397 protected:
398 int pos_arg_pos() const override { return 2; }
399 int occ_arg_pos() const override { return 3; }
400 int match_arg_pos() const override { return 4; }
401};
402
405
406 public:
407 explicit Item_func_icu_version(const POS &pos);
408
409 bool itemize(Parse_context *pc, Item **res) override;
410};
411
412#if defined(__GNUC__) && !defined(__clang__)
413#pragma GCC diagnostic pop
414#endif
415
416#endif // SQL_ITEM_REGEXP_FUNC_H_
const CHARSET_INFO * collation
Definition: item.h:177
Definition: item_regexp_func.h:403
Item_func_icu_version(const POS &pos)
Definition: item_regexp_func.cc:317
bool itemize(Parse_context *pc, Item **res) override
The same as contextualize() but with additional parameter.
Definition: item_regexp_func.cc:323
Definition: item_regexp_func.h:228
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_int_func.
Definition: item_regexp_func.h:267
int retopt_arg_pos() const
The position in the argument list of occurrence.
Definition: item_regexp_func.h:278
const char * func_name() const override
Definition: item_regexp_func.h:245
bool fix_fields(THD *thd, Item **arguments) override
Decides on the mode for matching, case sensitivity etc.
Definition: item_regexp_func.cc:158
Item_func_regexp_instr(const POS &pos, PT_item_list *opt_list)
Definition: item_regexp_func.h:230
String * val_str(String *str) override
Definition: item_regexp_func.h:239
double val_real() override
Definition: item_regexp_func.h:241
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:271
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:276
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:279
std::optional< int > return_option() const
The value of the return_option argument, or its default if absent.
Definition: item_regexp_func.h:248
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:171
Item_result result_type() const override
Definition: item_regexp_func.h:235
longlong val_int() override
Definition: item_regexp_func.cc:179
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:275
Definition: item_regexp_func.h:285
bool is_bool_func() const override
Definition: item_regexp_func.h:302
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:320
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:227
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:319
String * val_str(String *str) override
Definition: item_regexp_func.h:294
longlong val_int() override
Definition: item_regexp_func.cc:203
Item_func_regexp_like(const POS &pos, PT_item_list *opt_list)
Definition: item_regexp_func.h:287
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_int_func.
Definition: item_regexp_func.h:311
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:321
double val_real() override
Definition: item_regexp_func.h:296
Item_result result_type() const override
Definition: item_regexp_func.h:292
const char * func_name() const override
Definition: item_regexp_func.h:300
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:315
Definition: item_regexp_func.h:327
double val_real() override
Definition: item_regexp_func.h:342
Item_func_regexp_replace(const POS &pos, PT_item_list *item_list)
Definition: item_regexp_func.h:329
const char * func_name() const override
Definition: item_regexp_func.h:344
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_str_func.
Definition: item_regexp_func.h:353
longlong val_int() override
Definition: item_regexp_func.h:338
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:357
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:361
Item * replacement()
Definition: item_regexp_func.h:336
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:362
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:234
Item_result result_type() const override
Definition: item_regexp_func.h:332
String * val_str(String *result) override
Definition: item_regexp_func.cc:261
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:363
Definition: item_regexp_func.h:366
longlong val_int() override
Definition: item_regexp_func.h:375
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:398
const char * func_name() const override
Definition: item_regexp_func.h:381
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:285
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_str_func.
Definition: item_regexp_func.h:390
double val_real() override
Definition: item_regexp_func.h:379
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:400
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:399
String * val_str(String *result) override
Definition: item_regexp_func.cc:295
Item_func_regexp_substr(const POS &pos, PT_item_list *item_list)
Definition: item_regexp_func.h:368
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:394
Item_result result_type() const override
Definition: item_regexp_func.h:371
Base class for all regular expression function classes.
Definition: item_regexp_func.h:86
void cleanup() override
Called for every Item after use (preparation and execution).
Definition: item_regexp_func.cc:136
std::optional< int > position() const
The value of the position argument, or its default if absent.
Definition: item_regexp_func.h:114
std::optional< std::string > match_parameter() const
The value of the match_parameter argument, or an empty string if absent.
Definition: item_regexp_func.h:152
Item * pattern() const
The expression for the pattern string.
Definition: item_regexp_func.h:111
Item * subject() const
The expression for the subject string.
Definition: item_regexp_func.h:108
my_decimal * convert_int_to_decimal(my_decimal *value)
Definition: item_regexp_func.h:176
bool fix_fields(THD *thd, Item **) override
Decides on the mode for matching, case sensitivity etc.
Definition: item_regexp_func.cc:125
bool set_pattern()
Definition: item_regexp_func.cc:141
virtual int pos_arg_pos() const =0
The position in the argument list of 'position'.
unique_ptr_destroy_only< regexp::Regexp_facade > m_facade
Definition: item_regexp_func.h:225
virtual int match_arg_pos() const =0
The position in the argument list of match_parameter.
longlong convert_str_to_int()
Definition: item_regexp_func.h:199
virtual int occ_arg_pos() const =0
The position in the argument list of 'occurrence'.
bool resolve_type(THD *) override
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:109
String * convert_int_to_str(String *str)
Definition: item_regexp_func.h:168
double convert_int_to_real()
Definition: item_regexp_func.h:184
std::optional< int > occurrence() const
The value of the occurrence argument, or its default if absent.
Definition: item_regexp_func.h:133
Item_func_regexp(const POS &pos, PT_item_list *opt_list)
Definition: item_regexp_func.h:88
double convert_str_to_real()
Definition: item_regexp_func.h:189
Definition: item_func.h:102
Item ** args
Array of pointers to arguments.
Definition: item_func.h:109
uint arg_count
How many arguments in 'args'.
Definition: item_func.h:132
Item ** arguments() const
Definition: item_func.h:134
Definition: item.h:5453
Item_static_string_func(const Name_string &name_par, const char *str, size_t length, const CHARSET_INFO *cs, Derivation dv=DERIVATION_COERCIBLE)
Definition: item.h:5457
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:853
String str_value
str_values's main purpose is to cache the value in save_in_field
Definition: item.h:3367
DTCollation collation
Character set and collation properties assigned for this Item.
Definition: item.h:3374
void set_data_type_bool()
Definition: item.h:1405
bool is_nullable() const
Definition: item.h:3469
bool get_time_from_string(MYSQL_TIME *ltime)
Convert val_str() to time in MYSQL_TIME.
Definition: item.cc:1540
virtual longlong val_int()=0
bool fixed
True if item has been resolved.
Definition: item.h:3458
bool null_value
True if item is null.
Definition: item.h:3495
bool unsigned_flag
Definition: item.h:3496
bool get_date_from_string(MYSQL_TIME *ltime, my_time_flags_t flags)
Convert val_str() to date in MYSQL_TIME.
Definition: item.cc:1453
virtual String * val_str(String *str)=0
bool get_date_from_int(MYSQL_TIME *ltime, my_time_flags_t flags)
Convert val_int() to date in MYSQL_TIME.
Definition: item.cc:1481
void set_data_type_longlong()
Set the data type of the Item to be longlong.
Definition: item.h:1416
bool get_time_from_int(MYSQL_TIME *ltime)
Convert val_int() to time in MYSQL_TIME.
Definition: item.cc:1568
Wrapper class for an Item list head, used to allocate Item lists in the parser in a context-independe...
Definition: parse_tree_helpers.h:105
Base class for parse tree nodes (excluding the Parse_tree_root hierarchy)
Definition: parse_tree_node_base.h:139
String class wrapper with a preallocated buffer of size buff_sz.
Definition: sql_string.h:660
Using this class is fraught with peril, and you need to be very careful when doing so.
Definition: sql_string.h:168
const CHARSET_INFO * charset() const
Definition: sql_string.h:241
const char * ptr() const
Definition: sql_string.h:250
size_t length() const
Definition: sql_string.h:242
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:34
my_decimal class limits 'decimal_t' type to what we need in MySQL.
Definition: my_decimal.h:94
#define E_DEC_FATAL_ERROR
Definition: decimal.h:149
static std::string to_string(const LEX_STRING &str)
Definition: lex_string.h:50
#define my_strntoll(s, a, b, c, d, e)
Definition: m_ctype.h:775
#define my_strntod(s, a, b, c, d)
Definition: m_ctype.h:779
std::unique_ptr< T, Destroy_only< T > > unique_ptr_destroy_only
std::unique_ptr, but only destroying.
Definition: my_alloc.h:489
int int2my_decimal(uint mask, longlong i, bool unsigned_flag, my_decimal *d)
Definition: my_decimal.h:357
Some integer typedefs for easier portability.
long long int longlong
Definition: my_inttypes.h:55
unsigned int my_time_flags_t
Flags to str_to_datetime and number_to_datetime.
Definition: my_time.h:94
std::string str(const mysqlrouter::ConfigGenerator::Options::Endpoint &ep)
Definition: config_generator.cc:1052
Definition: buf0block_hint.cc:30
static Value err()
Create a Value object that represents an error condition.
Definition: json_binary.cc:910
This file hides most of ICU from the Item_func_regexp subclasses.
Our own string classes, used pervasively throughout the executor.
Definition: mysql_time.h:82
Environment data for the contextualization phase.
Definition: parse_tree_node_base.h:121
Bison "location" class.
Definition: parse_location.h:43
Definition: result.h:30
unsigned int uint
Definition: uca9-dump.cc:75
Item_result
Type of the user defined function return slot and arguments.
Definition: udf_registration_types.h:39
@ STRING_RESULT
not valid for UDFs
Definition: udf_registration_types.h:41
@ INT_RESULT
double
Definition: udf_registration_types.h:43