MySQL 8.4.3
Source Code Documentation
|
This class exposes high-level regular expression operations to the facade. More...
#include <regexp_engine.h>
Public Member Functions | |
Regexp_engine (const std::u16string &pattern, uint flags, int stack_limit, int time_limit) | |
Compiles the URegularExpression object. More... | |
uint | flags () |
void | Reset (const std::u16string &subject) |
Resets the engine with a new subject string. More... | |
bool | Matches (int start, int occurrence) |
Tries to find match number occurrence in the string, starting on start . More... | |
int | StartOfMatch () |
Returns the start position in the input string of the string where Matches() found a match. More... | |
int | EndOfMatch () |
Returns the position in the input string right after the end of the text where Matches() found a match. More... | |
const std::u16string & | Replace (const std::u16string &replacement, int start, int occurrence) |
Iterates over the subject string, replacing matches. More... | |
std::pair< int, int > | MatchedSubstring () |
The start of the match and its length. More... | |
bool | HasWarning () const |
bool | IsError () const |
bool | CheckError () const |
virtual | ~Regexp_engine () |
size_t | HardLimit () |
The hard limit for growing the replace buffer. More... | |
void | AppendHead (size_t size) |
Fills in the prefix in case we are doing a replace operation starting on a non-first occurrence of the pattern, or a non-first start position. More... | |
void | AppendReplacement (const std::u16string &replacement) |
Tries to write the replacement, growing the buffer if needed. More... | |
void | AppendTail () |
Appends the trailing segment after the last match to the subject string,. More... | |
int | SpareCapacity () const |
The spare capacity in the replacement buffer, given in code points. More... | |
Private Member Functions | |
int | TryToAppendReplacement (const std::u16string &replacement) |
Preflight function: If the buffer capacity is adequate, the replacement is appended to the buffer, otherwise nothing is written. More... | |
int | TryToAppendTail () |
Tries to append the part of the subject string after the last match to the buffer. More... | |
Private Attributes | |
URegularExpression * | m_re |
Our handle to ICU's compiled regular expression, owned by instances of this class. More... | |
UErrorCode | m_error_code = U_ZERO_ERROR |
std::u16string | m_current_subject |
std::u16string | m_replace_buffer |
int | m_replace_buffer_pos = 0 |
This is always the next index in m_replace_buffer where ICU can write data. More... | |
Friends | |
class | regexp_engine_unittest::Mock_regexp_engine |
This class exposes high-level regular expression operations to the facade.
It implements the algorithm for search-and-replace and the various matching options.
A buffer is used for search-and-replace, whose initial size is that of the subject string. The buffer uses ICU preflight features to probe the required buffer size within each append operation, and the buffer can grow up until max_allowed_packet, at which case and error will be thrown.
|
inline |
Compiles the URegularExpression object.
If compilation fails, my_error() is called and the IsError() returns true. In this case, all subsequent operations will be no-ops, reporting failure. This follows ICU's chaining conventions, see http://icu-project.org/apiref/icu4c/utypes_8h.html.
pattern | The pattern string in ICU's character set. |
flags | ICU flags. |
stack_limit | Sets the amount of heap storage, in bytes, that the match backtracking stack is allowed to allocate. |
time_limit | Gets set on the URegularExpression. Please refer to the ICU API docs for the definition of time limit. |
|
inlinevirtual |
void regexp::Regexp_engine::AppendHead | ( | size_t | size | ) |
Fills in the prefix in case we are doing a replace operation starting on a non-first occurrence of the pattern, or a non-first start position.
AppendReplacement() will fill in the section starting after the previous match or start position, so a prefix must be appended first.
The part we have to worry about here, the part that ICU doesn't add for us is, is if the search didn't start on the first character or first match for the regular expression. It's the longest such prefix that we have to copy ourselves.
void regexp::Regexp_engine::AppendReplacement | ( | const std::u16string & | replacement | ) |
Tries to write the replacement, growing the buffer if needed.
replacement | The replacement string. |
void regexp::Regexp_engine::AppendTail | ( | ) |
Appends the trailing segment after the last match to the subject string,.
|
inline |
|
inline |
Returns the position in the input string right after the end of the text where Matches() found a match.
|
inline |
|
inline |
The hard limit for growing the replace buffer.
The buffer cannot grow beyond this size, and an error will be thrown if the limit is reached.
|
inline |
|
inline |
std::pair< int, int > regexp::Regexp_engine::MatchedSubstring | ( | ) |
The start of the match and its length.
bool regexp::Regexp_engine::Matches | ( | int | start, |
int | occurrence | ||
) |
Tries to find match number occurrence
in the string, starting on start
.
start | Start position, 0-based. |
occurrence | Which occurrence to replace. If zero, replace all occurrences. |
const std::u16string & regexp::Regexp_engine::Replace | ( | const std::u16string & | replacement, |
int | start, | ||
int | occurrence | ||
) |
Iterates over the subject string, replacing matches.
replacement | The string to replace matches with. |
start | Start position, 0-based. |
occurrence | Which occurrence to replace. If zero, replace all occurrences. |
void regexp::Regexp_engine::Reset | ( | const std::u16string & | subject | ) |
Resets the engine with a new subject string.
This also clears the replacement buffer, see Replace().
subject | The new string to match the regular expression against. |
|
inline |
The spare capacity in the replacement buffer, given in code points.
ICU communicates via a capacity
variable, but we like to use an absolute position instead, and we want to keep a single source of truth, so we calculate it when needed and assert that the number is correct.
|
inline |
Returns the start position in the input string of the string where Matches() found a match.
|
private |
Preflight function: If the buffer capacity is adequate, the replacement is appended to the buffer, otherwise nothing is written.
Either way, the replacement's full size is returned.
|
private |
Tries to append the part of the subject string after the last match to the buffer.
This is a preflight function: If the buffer capacity is adequate, the tail is appended to the buffer, otherwise nothing is written. Either way, the tail's full size is returned.
|
friend |
|
private |
|
private |
|
private |
Our handle to ICU's compiled regular expression, owned by instances of this class.
URegularExpression is a C struct, but this class follows RAII and initializes this pointer in the constructor and cleans it up in the destructor.
|
private |
|
private |
This is always the next index in m_replace_buffer where ICU can write data.