MySQL 8.3.0
Source Code Documentation
regexp::Regexp_engine Class Reference

This class exposes high-level regular expression operations to the facade. More...

#include <regexp_engine.h>

Public Member Functions

 Regexp_engine (const std::u16string &pattern, uint flags, int stack_limit, int time_limit)
 Compiles the URegularExpression object. More...
 
uint flags ()
 
void Reset (const std::u16string &subject)
 Resets the engine with a new subject string. More...
 
bool Matches (int start, int occurrence)
 Tries to find match number occurrence in the string, starting on start. More...
 
int StartOfMatch ()
 Returns the start position in the input string of the string where Matches() found a match. More...
 
int EndOfMatch ()
 Returns the position in the input string right after the end of the text where Matches() found a match. More...
 
const std::u16string & Replace (const std::u16string &replacement, int start, int occurrence)
 Iterates over the subject string, replacing matches. More...
 
std::pair< int, int > MatchedSubstring ()
 The start of the match and its length. More...
 
bool HasWarning () const
 
bool IsError () const
 
bool CheckError () const
 
virtual ~Regexp_engine ()
 
size_t HardLimit ()
 The hard limit for growing the replace buffer. More...
 
void AppendHead (size_t size)
 Fills in the prefix in case we are doing a replace operation starting on a non-first occurrence of the pattern, or a non-first start position. More...
 
void AppendReplacement (const std::u16string &replacement)
 Tries to write the replacement, growing the buffer if needed. More...
 
void AppendTail ()
 Appends the trailing segment after the last match to the subject string,. More...
 
int SpareCapacity () const
 The spare capacity in the replacement buffer, given in code points. More...
 

Private Member Functions

int TryToAppendReplacement (const std::u16string &replacement)
 Preflight function: If the buffer capacity is adequate, the replacement is appended to the buffer, otherwise nothing is written. More...
 
int TryToAppendTail ()
 Tries to append the part of the subject string after the last match to the buffer. More...
 

Private Attributes

URegularExpression * m_re
 Our handle to ICU's compiled regular expression, owned by instances of this class. More...
 
UErrorCode m_error_code = U_ZERO_ERROR
 
std::u16string m_current_subject
 
std::u16string m_replace_buffer
 
int m_replace_buffer_pos = 0
 This is always the next index in m_replace_buffer where ICU can write data. More...
 

Friends

class regexp_engine_unittest::Mock_regexp_engine
 

Detailed Description

This class exposes high-level regular expression operations to the facade.

It implements the algorithm for search-and-replace and the various matching options.

A buffer is used for search-and-replace, whose initial size is that of the subject string. The buffer uses ICU preflight features to probe the required buffer size within each append operation, and the buffer can grow up until max_allowed_packet, at which case and error will be thrown.

Constructor & Destructor Documentation

◆ Regexp_engine()

regexp::Regexp_engine::Regexp_engine ( const std::u16string &  pattern,
uint  flags,
int  stack_limit,
int  time_limit 
)
inline

Compiles the URegularExpression object.

If compilation fails, my_error() is called and the IsError() returns true. In this case, all subsequent operations will be no-ops, reporting failure. This follows ICU's chaining conventions, see http://icu-project.org/apiref/icu4c/utypes_8h.html.

Parameters
patternThe pattern string in ICU's character set.
flagsICU flags.
stack_limitSets the amount of heap storage, in bytes, that the match backtracking stack is allowed to allocate.
time_limitGets set on the URegularExpression. Please refer to the ICU API docs for the definition of time limit.

◆ ~Regexp_engine()

virtual regexp::Regexp_engine::~Regexp_engine ( )
inlinevirtual

Member Function Documentation

◆ AppendHead()

void regexp::Regexp_engine::AppendHead ( size_t  size)

Fills in the prefix in case we are doing a replace operation starting on a non-first occurrence of the pattern, or a non-first start position.

AppendReplacement() will fill in the section starting after the previous match or start position, so a prefix must be appended first.

The part we have to worry about here, the part that ICU doesn't add for us is, is if the search didn't start on the first character or first match for the regular expression. It's the longest such prefix that we have to copy ourselves.

◆ AppendReplacement()

void regexp::Regexp_engine::AppendReplacement ( const std::u16string &  replacement)

Tries to write the replacement, growing the buffer if needed.

Parameters
replacementThe replacement string.

◆ AppendTail()

void regexp::Regexp_engine::AppendTail ( )

Appends the trailing segment after the last match to the subject string,.

◆ CheckError()

bool regexp::Regexp_engine::CheckError ( ) const
inline

◆ EndOfMatch()

int regexp::Regexp_engine::EndOfMatch ( )
inline

Returns the position in the input string right after the end of the text where Matches() found a match.

◆ flags()

uint regexp::Regexp_engine::flags ( )
inline

◆ HardLimit()

size_t regexp::Regexp_engine::HardLimit ( )
inline

The hard limit for growing the replace buffer.

The buffer cannot grow beyond this size, and an error will be thrown if the limit is reached.

◆ HasWarning()

bool regexp::Regexp_engine::HasWarning ( ) const
inline

◆ IsError()

bool regexp::Regexp_engine::IsError ( ) const
inline

◆ MatchedSubstring()

std::pair< int, int > regexp::Regexp_engine::MatchedSubstring ( )

The start of the match and its length.

Returns
The index of the first code point of the match, and the length of the same.

◆ Matches()

bool regexp::Regexp_engine::Matches ( int  start,
int  occurrence 
)

Tries to find match number occurrence in the string, starting on start.

Parameters
startStart position, 0-based.
occurrenceWhich occurrence to replace. If zero, replace all occurrences.

◆ Replace()

const std::u16string & regexp::Regexp_engine::Replace ( const std::u16string &  replacement,
int  start,
int  occurrence 
)

Iterates over the subject string, replacing matches.

Parameters
replacementThe string to replace matches with.
startStart position, 0-based.
occurrenceWhich occurrence to replace. If zero, replace all occurrences.
Returns
Reference to a the result of the operation. It is guaranteed to stay intact until a call is made to Reset().

◆ Reset()

void regexp::Regexp_engine::Reset ( const std::u16string &  subject)

Resets the engine with a new subject string.

This also clears the replacement buffer, see Replace().

Parameters
subjectThe new string to match the regular expression against.

◆ SpareCapacity()

int regexp::Regexp_engine::SpareCapacity ( ) const
inline

The spare capacity in the replacement buffer, given in code points.

ICU communicates via a capacity variable, but we like to use an absolute position instead, and we want to keep a single source of truth, so we calculate it when needed and assert that the number is correct.

◆ StartOfMatch()

int regexp::Regexp_engine::StartOfMatch ( )
inline

Returns the start position in the input string of the string where Matches() found a match.

◆ TryToAppendReplacement()

int regexp::Regexp_engine::TryToAppendReplacement ( const std::u16string &  replacement)
private

Preflight function: If the buffer capacity is adequate, the replacement is appended to the buffer, otherwise nothing is written.

Either way, the replacement's full size is returned.

◆ TryToAppendTail()

int regexp::Regexp_engine::TryToAppendTail ( )
private

Tries to append the part of the subject string after the last match to the buffer.

This is a preflight function: If the buffer capacity is adequate, the tail is appended to the buffer, otherwise nothing is written. Either way, the tail's full size is returned.

Friends And Related Function Documentation

◆ regexp_engine_unittest::Mock_regexp_engine

friend class regexp_engine_unittest::Mock_regexp_engine
friend

Member Data Documentation

◆ m_current_subject

std::u16string regexp::Regexp_engine::m_current_subject
private

◆ m_error_code

UErrorCode regexp::Regexp_engine::m_error_code = U_ZERO_ERROR
private

◆ m_re

URegularExpression* regexp::Regexp_engine::m_re
private

Our handle to ICU's compiled regular expression, owned by instances of this class.

URegularExpression is a C struct, but this class follows RAII and initializes this pointer in the constructor and cleans it up in the destructor.

◆ m_replace_buffer

std::u16string regexp::Regexp_engine::m_replace_buffer
private

◆ m_replace_buffer_pos

int regexp::Regexp_engine::m_replace_buffer_pos = 0
private

This is always the next index in m_replace_buffer where ICU can write data.


The documentation for this class was generated from the following files: