MySQL 8.4.0
Source Code Documentation
item_regexp_func.h
Go to the documentation of this file.
1#ifndef SQL_ITEM_REGEXP_FUNC_H_
2#define SQL_ITEM_REGEXP_FUNC_H_
3
4/* Copyright (c) 2017, 2024, Oracle and/or its affiliates.
5
6 This program is free software; you can redistribute it and/or modify
7 it under the terms of the GNU General Public License, version 2.0,
8 as published by the Free Software Foundation.
9
10 This program is designed to work with certain software (including
11 but not limited to OpenSSL) that is licensed under separate terms,
12 as designated in a particular file or component or in included license
13 documentation. The authors of MySQL hereby grant you an additional
14 permission to link the program and your derivative works with the
15 separately licensed software that they have either included with
16 the program or referenced in the documentation.
17
18 This program is distributed in the hope that it will be useful,
19 but WITHOUT ANY WARRANTY; without even the implied warranty of
20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 GNU General Public License, version 2.0, for more details.
22
23 You should have received a copy of the GNU General Public License
24 along with this program; if not, write to the Free Software
25 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
26
27/**
28 @file item_regexp_func.h
29
30 The function classes for regular expression functions. They have a common
31 base class Item_func_regexp, which is also the prefix of their class
32 names. After the %Item_func prefix comes the name of the SQL function,
33 e.g. Item_func_regexp_instr represents the SQL function `REGEXP_INSTR`.
34
35 Type resolution
36 ===============
37
38 The type and name resolution procedure is hooked into by the
39 Item_func_regexp class, which implement both
40 Item_result_field::resolve_type() and Item::fix_fields().
41
42 Collations
43 ==========
44
45 The regular expression library doesn't deal with collations at all, but we
46 need them because the 'winning' collation of the pattern and the subject
47 strings dictates case-sensitivity. The winning collation is defined by
48 coercion rules, and we don't delve into that here. See
49 Item_func::agg_arg_charsets_for_comparison(). The call to this function is
50 done in resolve_type() as this appears to be an unwritten convention.
51
52 Implementation
53 ==============
54
55 All communication with the regular expression library is done through a
56 Regexp_facade object, instantiated in Item_func_regexp::fix_fields().
57
58 @todo We now clean up ICU heap memory in Item_func_regexp::cleanup. Should
59 it be done more rarely? On session close?
60*/
61
62#include <assert.h>
63#include <unicode/uregex.h>
64
65#include <optional>
66#include <string>
67
68// assert
69#include "my_inttypes.h" // MY_INT32_NUM_DECIMAL_DIGITS
71#include "sql/item_cmpfunc.h"
72#include "sql/item_strfunc.h"
73#include "sql/mysqld.h" // make_unique_destroy_only
75#include "sql_string.h" // String
76
77// GCC bug 80635.
78#if defined(__GNUC__) && !defined(__clang__)
79#pragma GCC diagnostic push
80#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
81#endif
82
83/**
84 Base class for all regular expression function classes. Is responsible for
85 creating the Regexp_facade object.
86*/
88 public:
89 Item_func_regexp(const POS &pos, PT_item_list *opt_list)
90 : Item_func(pos, opt_list) {}
91
92 /**
93 Resolves the collation to use for comparison. The type resolution is done
94 in the subclass constructors.
95
96 For all regular expression functions, i.e. REGEXP_INSTR, REGEXP_LIKE,
97 REGEXP_REPLACE and REGEXP_SUBSTR, it goes that the first two arguments
98 have to agree on a common collation. This collation is used to control
99 case-sensitivity.
100
101 @see fix_fields()
102 */
103 bool resolve_type(THD *) override;
104
105 /// Decides on the mode for matching, case sensitivity etc.
106 bool fix_fields(THD *thd, Item **) override;
107
108 /// The expression for the subject string.
109 Item *subject() const { return args[0]; }
110
111 /// The expression for the pattern string.
112 Item *pattern() const { return args[1]; }
113
114 /// The value of the `position` argument, or its default if absent.
115 std::optional<int> position() const {
116 const int the_index = pos_arg_pos();
117 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
118 const int value = args[the_index]->val_int();
119 /*
120 Note: Item::null_value() can't be trusted alone here; there are cases
121 (for the DATE data type in particular) where we can have it set
122 without Item::m_nullable being set! This really should be cleaned up,
123 but until that happens, we need to have a more conservative check.
124 */
125 if (args[the_index]->is_nullable() && args[the_index]->null_value)
126 return {};
127 else
128 return value;
129 }
130 return 1;
131 }
132
133 /// The value of the `occurrence` argument, or its default if absent.
134 std::optional<int> occurrence() const {
135 const int the_index = occ_arg_pos();
136 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
137 const int value = args[the_index]->val_int();
138 /*
139 Note: Item::null_value() can't be trusted alone here; there are cases
140 (for the DATE data type in particular) where we can have it set
141 without Item::maybe_null being set! This really should be cleaned up,
142 but until that happens, we need to have a more conservative check.
143 */
144 if (args[the_index]->is_nullable() && args[the_index]->null_value)
145 return {};
146 else
147 return value;
148 }
149 return 0;
150 }
151
152 /// The value of the `match_parameter` argument, or an empty string if absent.
153 std::optional<std::string> match_parameter() const {
154 const int the_index = match_arg_pos();
155 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
156 StringBuffer<5> buf; // Longer match_parameter doesn't make sense.
157 String *s = args[the_index]->val_str(&buf);
158 if (s != nullptr)
159 return to_string(*s);
160 else
161 return {};
162 }
163 return std::string{};
164 }
165
166 void cleanup() override;
167
168 protected:
170 assert(fixed);
171 longlong nr = val_int();
172 if (null_value) return nullptr;
173 str->set_int(nr, unsigned_flag, collation.collation);
174 return str;
175 }
176
178 assert(fixed);
179 longlong nr = val_int();
180 if (null_value) return nullptr; /* purecov: inspected */
182 return value;
183 }
184
186 assert(fixed);
187 return val_int();
188 }
189
191 assert(fixed);
192 int err_not_used;
193 const char *end_not_used;
194 String *res = val_str(&str_value);
195 if (res == nullptr) return 0.0;
196 return my_strntod(res->charset(), res->ptr(), res->length(), &end_not_used,
197 &err_not_used);
198 }
199
201 assert(fixed);
202 int err;
203 String *res = val_str(&str_value);
204 if (res == nullptr) return 0;
205 return my_strntoll(res->charset(), res->ptr(), res->length(), 10, nullptr,
206 &err);
207 }
208
209 /**
210 The position in the argument list of 'position'. -1 means that the default
211 should be used.
212 */
213 virtual int pos_arg_pos() const = 0;
214
215 /**
216 The position in the argument list of 'occurrence'. -1 means that the default
217 should be used.
218 */
219 virtual int occ_arg_pos() const = 0;
220
221 /// The position in the argument list of match_parameter.
222 virtual int match_arg_pos() const = 0;
223
224 bool set_pattern();
225
227};
228
230 public:
232 : Item_func_regexp(pos, opt_list) {
234 }
235
236 Item_result result_type() const override { return INT_RESULT; }
237
238 bool fix_fields(THD *thd, Item **arguments) override;
239
241
242 double val_real() override { return convert_int_to_real(); }
243
244 longlong val_int() override;
245
246 const char *func_name() const override { return "regexp_instr"; }
247
248 /// The value of the `return_option` argument, or its default if absent.
249 std::optional<int> return_option() const {
250 const int the_index = retopt_arg_pos();
251 if (the_index != -1 && arg_count >= static_cast<uint>(the_index) + 1) {
252 const int value = args[the_index]->val_int();
253 if (args[the_index]->null_value || current_thd->is_error())
254 return std::optional<int>();
255 else
256 return value;
257 }
258 return 0;
259 }
260
261 /**
262 @{
263
264 Copy-pasted from Item_int_func. Usually, an SQL function returning INTEGER
265 just inherits Item_int_func and thus the implementation, but these classes
266 need to have Item_func_regexp as base class because of fix_fields().
267 */
268 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
269 return get_date_from_int(ltime, fuzzydate);
270 }
271
272 bool get_time(MYSQL_TIME *t) override { return get_time_from_int(t); }
273 /// @}
274
275 protected:
276 int pos_arg_pos() const override { return 2; }
277 int occ_arg_pos() const override { return 3; }
278 /// The position in the argument list of `occurrence`.
279 int retopt_arg_pos() const { return 4; }
280 int match_arg_pos() const override { return 5; }
281
282 private:
283 bool resolve_type(THD *) final;
284};
285
287 public:
289 : Item_func_regexp(pos, opt_list) {
291 }
292
293 Item_result result_type() const override { return INT_RESULT; }
294
296
297 double val_real() override { return convert_int_to_real(); }
298
299 longlong val_int() override;
300
301 const char *func_name() const override { return "regexp_like"; }
302
303 bool is_bool_func() const override { return true; }
304
305 /**
306 @{
307
308 Copy-pasted from Item_int_func. Usually, an SQL function returning INTEGER
309 just inherits Item_int_func and thus the implementation, but these classes
310 need to have Item_func_regexp as base class because of fix_fields().
311 */
312 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
313 return get_date_from_int(ltime, fuzzydate);
314 }
315
316 bool get_time(MYSQL_TIME *t) override { return get_time_from_int(t); }
317 /// @}
318
319 protected:
320 int pos_arg_pos() const override { return -1; }
321 int occ_arg_pos() const override { return -1; }
322 int match_arg_pos() const override { return 2; }
323
324 private:
325 bool resolve_type(THD *) final;
326};
327
329 public:
331 : Item_func_regexp(pos, item_list) {}
332
333 Item_result result_type() const override { return STRING_RESULT; }
334
335 bool resolve_type(THD *) final;
336
337 Item *replacement() { return args[2]; }
338
339 longlong val_int() override { return convert_str_to_int(); }
340
341 String *val_str(String *result) override;
342
343 double val_real() override { return convert_str_to_real(); }
344
345 const char *func_name() const override { return "regexp_replace"; }
346
347 /**
348 @{
349
350 Copy-pasted from Item_str_func. Usually, an SQL function returning INTEGER
351 just inherits Item_str_func and thus the implementation, but these classes
352 need to have Item_func_regexp as base class because of fix_fields().
353 */
354 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
355 return get_date_from_string(ltime, fuzzydate);
356 }
357
358 bool get_time(MYSQL_TIME *t) override { return get_time_from_string(t); }
359 /// @}
360
361 protected:
362 int pos_arg_pos() const override { return 3; }
363 int occ_arg_pos() const override { return 4; }
364 int match_arg_pos() const override { return 5; }
365};
366
368 public:
370 : Item_func_regexp(pos, item_list) {}
371
372 Item_result result_type() const override { return STRING_RESULT; }
373
374 bool resolve_type(THD *) final;
375
376 longlong val_int() override { return convert_str_to_int(); }
377
378 String *val_str(String *result) override;
379
380 double val_real() override { return convert_str_to_real(); }
381
382 const char *func_name() const override { return "regexp_substr"; }
383
384 /**
385 @{
386
387 Copy-pasted from Item_str_func. Usually, an SQL function returning INTEGER
388 just inherits Item_str_func and thus the implementation, but these classes
389 need to have Item_func_regexp as base class because of fix_fields().
390 */
391 bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override {
392 return get_date_from_string(ltime, fuzzydate);
393 }
394
395 bool get_time(MYSQL_TIME *t) override { return get_time_from_string(t); }
396 /// @}
397
398 protected:
399 int pos_arg_pos() const override { return 2; }
400 int occ_arg_pos() const override { return 3; }
401 int match_arg_pos() const override { return 4; }
402};
403
406
407 public:
408 explicit Item_func_icu_version(const POS &pos);
409
410 bool do_itemize(Parse_context *pc, Item **res) override;
411};
412
413#if defined(__GNUC__) && !defined(__clang__)
414#pragma GCC diagnostic pop
415#endif
416
417#endif // SQL_ITEM_REGEXP_FUNC_H_
const CHARSET_INFO * collation
Definition: item.h:180
Definition: item_regexp_func.h:404
bool do_itemize(Parse_context *pc, Item **res) override
The core function that does the actual itemization.
Definition: item_regexp_func.cc:339
Item_func_icu_version(const POS &pos)
Definition: item_regexp_func.cc:333
Definition: item_regexp_func.h:229
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_int_func.
Definition: item_regexp_func.h:268
int retopt_arg_pos() const
The position in the argument list of occurrence.
Definition: item_regexp_func.h:279
const char * func_name() const override
Definition: item_regexp_func.h:246
bool fix_fields(THD *thd, Item **arguments) override
Decides on the mode for matching, case sensitivity etc.
Definition: item_regexp_func.cc:159
Item_func_regexp_instr(const POS &pos, PT_item_list *opt_list)
Definition: item_regexp_func.h:231
String * val_str(String *str) override
Definition: item_regexp_func.h:240
double val_real() override
Definition: item_regexp_func.h:242
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:272
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:277
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:280
std::optional< int > return_option() const
The value of the return_option argument, or its default if absent.
Definition: item_regexp_func.h:249
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:181
Item_result result_type() const override
Definition: item_regexp_func.h:236
longlong val_int() override
Definition: item_regexp_func.cc:189
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:276
Definition: item_regexp_func.h:286
bool is_bool_func() const override
Definition: item_regexp_func.h:303
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:321
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:243
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:320
String * val_str(String *str) override
Definition: item_regexp_func.h:295
longlong val_int() override
Definition: item_regexp_func.cc:219
Item_func_regexp_like(const POS &pos, PT_item_list *opt_list)
Definition: item_regexp_func.h:288
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_int_func.
Definition: item_regexp_func.h:312
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:322
double val_real() override
Definition: item_regexp_func.h:297
Item_result result_type() const override
Definition: item_regexp_func.h:293
const char * func_name() const override
Definition: item_regexp_func.h:301
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:316
Definition: item_regexp_func.h:328
double val_real() override
Definition: item_regexp_func.h:343
Item_func_regexp_replace(const POS &pos, PT_item_list *item_list)
Definition: item_regexp_func.h:330
const char * func_name() const override
Definition: item_regexp_func.h:345
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_str_func.
Definition: item_regexp_func.h:354
longlong val_int() override
Definition: item_regexp_func.h:339
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:358
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:362
Item * replacement()
Definition: item_regexp_func.h:337
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:363
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:250
Item_result result_type() const override
Definition: item_regexp_func.h:333
String * val_str(String *result) override
Definition: item_regexp_func.cc:277
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:364
Definition: item_regexp_func.h:367
longlong val_int() override
Definition: item_regexp_func.h:376
int pos_arg_pos() const override
The position in the argument list of 'position'.
Definition: item_regexp_func.h:399
const char * func_name() const override
Definition: item_regexp_func.h:382
bool resolve_type(THD *) final
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:301
bool get_date(MYSQL_TIME *ltime, my_time_flags_t fuzzydate) override
Copy-pasted from Item_str_func.
Definition: item_regexp_func.h:391
double val_real() override
Definition: item_regexp_func.h:380
int match_arg_pos() const override
The position in the argument list of match_parameter.
Definition: item_regexp_func.h:401
int occ_arg_pos() const override
The position in the argument list of 'occurrence'.
Definition: item_regexp_func.h:400
String * val_str(String *result) override
Definition: item_regexp_func.cc:311
Item_func_regexp_substr(const POS &pos, PT_item_list *item_list)
Definition: item_regexp_func.h:369
bool get_time(MYSQL_TIME *t) override
Definition: item_regexp_func.h:395
Item_result result_type() const override
Definition: item_regexp_func.h:372
Base class for all regular expression function classes.
Definition: item_regexp_func.h:87
void cleanup() override
Called for every Item after use (preparation and execution).
Definition: item_regexp_func.cc:137
std::optional< int > position() const
The value of the position argument, or its default if absent.
Definition: item_regexp_func.h:115
std::optional< std::string > match_parameter() const
The value of the match_parameter argument, or an empty string if absent.
Definition: item_regexp_func.h:153
Item * pattern() const
The expression for the pattern string.
Definition: item_regexp_func.h:112
Item * subject() const
The expression for the subject string.
Definition: item_regexp_func.h:109
my_decimal * convert_int_to_decimal(my_decimal *value)
Definition: item_regexp_func.h:177
bool fix_fields(THD *thd, Item **) override
Decides on the mode for matching, case sensitivity etc.
Definition: item_regexp_func.cc:126
bool set_pattern()
Definition: item_regexp_func.cc:142
virtual int pos_arg_pos() const =0
The position in the argument list of 'position'.
unique_ptr_destroy_only< regexp::Regexp_facade > m_facade
Definition: item_regexp_func.h:226
virtual int match_arg_pos() const =0
The position in the argument list of match_parameter.
longlong convert_str_to_int()
Definition: item_regexp_func.h:200
virtual int occ_arg_pos() const =0
The position in the argument list of 'occurrence'.
bool resolve_type(THD *) override
Resolves the collation to use for comparison.
Definition: item_regexp_func.cc:110
String * convert_int_to_str(String *str)
Definition: item_regexp_func.h:169
double convert_int_to_real()
Definition: item_regexp_func.h:185
std::optional< int > occurrence() const
The value of the occurrence argument, or its default if absent.
Definition: item_regexp_func.h:134
Item_func_regexp(const POS &pos, PT_item_list *opt_list)
Definition: item_regexp_func.h:89
double convert_str_to_real()
Definition: item_regexp_func.h:190
Definition: item_func.h:102
Item ** args
Array of pointers to arguments.
Definition: item_func.h:109
uint arg_count
How many arguments in 'args'.
Definition: item_func.h:132
Item ** arguments() const
Definition: item_func.h:134
Definition: item.h:5645
Item_static_string_func(const Name_string &name_par, const char *str, size_t length, const CHARSET_INFO *cs, Derivation dv=DERIVATION_COERCIBLE)
Definition: item.h:5649
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:934
String str_value
str_values's main purpose is to cache the value in save_in_field
Definition: item.h:3528
DTCollation collation
Character set and collation properties assigned for this Item.
Definition: item.h:3535
void set_data_type_bool()
Definition: item.h:1510
bool is_nullable() const
Definition: item.h:3636
bool get_time_from_string(MYSQL_TIME *ltime)
Convert val_str() to time in MYSQL_TIME.
Definition: item.cc:1688
virtual longlong val_int()=0
bool fixed
True if item has been resolved.
Definition: item.h:3625
bool null_value
True if item is null.
Definition: item.h:3662
bool unsigned_flag
Definition: item.h:3663
bool get_date_from_string(MYSQL_TIME *ltime, my_time_flags_t flags)
Convert val_str() to date in MYSQL_TIME.
Definition: item.cc:1601
virtual String * val_str(String *str)=0
bool get_date_from_int(MYSQL_TIME *ltime, my_time_flags_t flags)
Convert val_int() to date in MYSQL_TIME.
Definition: item.cc:1629
void set_data_type_longlong()
Set the data type of the Item to be longlong.
Definition: item.h:1541
bool get_time_from_int(MYSQL_TIME *ltime)
Convert val_int() to time in MYSQL_TIME.
Definition: item.cc:1716
Wrapper class for an Item list head, used to allocate Item lists in the parser in a context-independe...
Definition: parse_tree_helpers.h:105
Base class for parse tree nodes (excluding the Parse_tree_root hierarchy)
Definition: parse_tree_node_base.h:231
String class wrapper with a preallocated buffer of size buff_sz.
Definition: sql_string.h:681
Using this class is fraught with peril, and you need to be very careful when doing so.
Definition: sql_string.h:167
const CHARSET_INFO * charset() const
Definition: sql_string.h:240
const char * ptr() const
Definition: sql_string.h:249
size_t length() const
Definition: sql_string.h:241
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:36
bool is_error() const
true if there is an error in the error stack.
Definition: sql_class.h:3281
my_decimal class limits 'decimal_t' type to what we need in MySQL.
Definition: my_decimal.h:95
static char buf[MAX_BUF]
Definition: conf_to_src.cc:73
thread_local THD * current_thd
Definition: current_thd.cc:26
#define E_DEC_FATAL_ERROR
Definition: decimal.h:149
static std::string to_string(const LEX_STRING &str)
Definition: lex_string.h:50
A better implementation of the UNIX ctype(3) library.
int64_t my_strntoll(const CHARSET_INFO *cs, const char *str, size_t length, int base, const char **end, int *err)
Definition: m_ctype.h:749
double my_strntod(const CHARSET_INFO *cs, const char *str, size_t length, const char **end, int *err)
Definition: m_ctype.h:761
std::unique_ptr< T, Destroy_only< T > > unique_ptr_destroy_only
std::unique_ptr, but only destroying.
Definition: my_alloc.h:477
int int2my_decimal(uint mask, longlong i, bool unsigned_flag, my_decimal *d)
Definition: my_decimal.h:358
Some integer typedefs for easier portability.
long long int longlong
Definition: my_inttypes.h:55
unsigned int my_time_flags_t
Flags to str_to_datetime and number_to_datetime.
Definition: my_time.h:94
std::string str(const mysqlrouter::ConfigGenerator::Options::Endpoint &ep)
Definition: config_generator.cc:1073
Definition: buf0block_hint.cc:30
static Value err()
Create a Value object that represents an error condition.
Definition: json_binary.cc:927
This file hides most of ICU from the Item_func_regexp subclasses.
Our own string classes, used pervasively throughout the executor.
Definition: mysql_time.h:82
Bison "location" class.
Definition: parse_location.h:43
Environment data for the contextualization phase.
Definition: parse_tree_node_base.h:420
Definition: result.h:30
Item_result
Type of the user defined function return slot and arguments.
Definition: udf_registration_types.h:39
@ STRING_RESULT
not valid for UDFs
Definition: udf_registration_types.h:41
@ INT_RESULT
double
Definition: udf_registration_types.h:43