WL#7589: Query Rewrite Plugins

Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

Query Rewrite Framework
=======================

Several parties (e.g., support) have raised interest for a query rewrite
plugin for the MySQL server.  That is, a way for users to specify how certain
queries that arrives at the server should be rewritten before they are
processed by the server.

The main motivation for this worklog is for users to be able to
enforce/prevent a certain query plan by adding hints to a query.  This will
make it possible to improve the performance of queries coming from a database
application that cannot easily be modified.  Especially, this could be used to
work around performance regressions observed when upgrading to a new version
of the server.  This is particularly interesting in the context of the cost
model project which will likely cause some performance regressions.

Another application of a query rewrite plugin that has been proposed is to
detect and intercept sql injections. Many other use cases have also been
mentioned. However, these use cases will not be considered for this work
log. The sole focus will be to enable users to add optimizer hints to incoming
queries.

This work has two main parts:

1. A general query rewrite plugin INTERFACE that enables users to
   deploy their own plugin in order to rewrite their queries.

2. A query rewrite PLUGIN that gives users a way to specify query
   transformations without having to write their own plugin.

Pre Parse Query Rewrite Plugin Interface Functional Requirements
Post Parse Query Rewrite Plugin Interface Functional Requirements
Pre and Post Parse Query Rewrite Plugin Interface Non-Functional Requirements
Rewriter Plugin Functional Requirements
Supplied Post Parse Query Rewrite Plugin Non-Functional Requirements

Pre Parse Query Rewrite Plugin Interface Functional Requirements

F-1: A user shall be able to write a plugin that rewrites query strings before parsing.
F-2: The server must emit a note if a plugin rewrites a query.

Post Parse Query Rewrite Plugin Interface Functional Requirements

F-3: A user shall be able to write a plugin that rewrites query parse trees immediately after parsing.
F-4: It shall be possible for a plugin to activate digest calculation regardless of whether the feature is deactivated in performance_schema.
F-6: There shall be a service allowing a plugin to traverse the parse tree.
F-7: There shall be a service allowing a plugin to obtain a normalized version of the current query. See http://dev.mysql.com/doc/refman/5.6/en/performance-schema-statement-digests.html.
F-8: There shall be a service allowing a plugin to request the server to parse a query string.
F-9: The server must emit a note if a plugin rewrites a query.

Pre and Post Parse Query Rewrite Plugin Interface Non-Functional Requirements

NF-1: All log files will contain the rewritten query, possibly with passwords obfuscated if applicable, except if the server is started with the --log-raw option. In this case the original query will appear in the general query log.

Rewriter Plugin Functional Requirements

F-10: Users will mainly communicate with the plugin by modifying a designated table.
F-11: The plugin will mainly communicate with users by modifying the table mentioned in F-10.
F-12: It shall be possible for a user to deactivate a rule without deleting it.
F-13: Query rewrites are defined by rules, which consist of a pattern and a replacement.
F-14: The rules in the rules table will be loaded into the plugin when the user calls a designated stored procedure. This procedure will commit the current transaction. If loading of rules fails, the procure will signal an error condition.

Supplied Post Parse Query Rewrite Plugin Non-Functional Requirements

NF-3: The time to reject a query as a non-candidate for rewriting must not exceed 5% of total execution time.
NF-4: The plugin must keep an accurate atomic counter for how many queries have been rewritten since loading the plugin.

Pre Parse Query Rewrite Plugin Interface Specification
Post Parse Query Rewrite Plugin Interface Specification
Pre and Post Parse Query Rewrite Plugin Interface Specification
Rewriter Plugin Interface Specification
Future Work

Pre Parse Query Rewrite Plugin Interface Specification

I-1: The interface allows for plugins to rewrite query strings before they are passed to the parser. A plugin will receive the query string and may produce a different string as its output.

Post Parse Query Rewrite Plugin Interface Specification

I-2: The interface allow for plugins to rewrite queries based on their parse trees. A plugin will be given read/write access to the parse tree along with primitives to traverse the tree.

Pre and Post Parse Query Rewrite Plugin Interface Specification

I-3: New SQL Note: When a query is rewritten, a note e.g. "Query '%s' rewritten to '%s' by plugin: %s." is issued. This note does not hide any previous warnings or errors. A somewhat unexpected consequence is that rewritten diagnostics statements, e.g. SHOW WARNINGS will always show a note if the SHOW WARNINGS statement itself is rewritten.

Rewriter Plugin Interface Specification

I-4: The interface to the plugin consists of a table named rewrite_rules in the schema query_rewrites.

I-5: The plugin looks for the columns PATTERN, PATTERN_DATABASE, REPLACEMENT, ENABLED, and MESSAGE. In order for it to function properly, all columns must be present.

I-6: The plugin will make patterns from the PATTERN column, qualifying unqualified table references with PATTERN_DATABASE. These will be rewritten into replacements as specified in the REPLACEMENT column, if the ENABLED column says 'Y'. Rewrites are carried out according to the rewrite procedure.

I-7: Errors encountered while loading the table into memory are reported by updating the MESSAGE column, if it exists. If not, the message is lost.

I-8: Errors encountered during rewrites are also recorded by setting the status variable Rewriter_reload_error to ON.

I-9: The number of queries that have been rewritten since loading the plugin is available in the status variable Rewriter_number_rewritten_queries.

I-10: The rules that are successfully parsed and pass the semantic checks in the parser are loaded into memory by the plugin. The number of rules currently in memory is recorded in the status variable Rewriter_number_loaded_rules.

I-11: The rules table is loaded when the user calls the stored procedure query_rewrite.flush_rewrite_rules().

I-12: The number of times that the rules table has been loaded is available in the status variable Rewriter_number_reloads.

I-13: An SQL script is supplied with the plugin that creates the rules table and stored procedure.

I-14: The table in I-13 has all privileges revoked except for the root user.

Future Work

This plugin is a product in itself that will surely evolve over time. During the last six months several requests for additional features have come in from support and customers. They will not make it into this worklog, but they should most likely be implemented later.

Unique rules

Currently several rules can have the same pattern, and which rule will be applied is undefined. This is likely something that nonadvanced customers will be confused by, since the normalization of rules is not obvious. Enforcing unique rules is just like a UNIQUE constraint on the NORMALIZED_PATTERN column, which is optional and therefore not documented here.

Likely it must be investigated whether the plugin or the rules table should enforce the unique constraint. or both. Either way a helpful message in the MESSAGE column is also needed.

IN lists

Apparently it is a popular request that dynamic in-lists should be handled in rewrite rules. E.g. it should be possible to define a rule such as

SELECT something FROM sometable WHERE somecol IN (...)

SELECT somethingelse FROM someothertable WHERE someothercol IN (...)

In this case the following would happen

the ... symbol matches any list of expressions
The in-list in the query should be copied verbatim the new query.

The problems that need solving in order to overcome this problem are numerous:

There has to be a ... token that is parseable but that currently has no meaning. It should reasonably be something that is syntactically accepted but an error semantically. One option would be to simply extend the parser. Another is to make col IN (?) match any in-list. This makes sense since there can never be an Item representing this expression, as it would be transformed into col = ?.
The parser service would have to handle the in-predicate. Currently this service only handles literals.
The Rewriter plugin would have to know what in-predicates and in-lists are, so that they could be matched.
The function printing normalized queries would have to be changed to print in-lists as IN (...).

Per-rule counter of rewritten queries

This is another request which intuitively makes sense. Simply put, the rules table would have an extra column with a counter that gets incremented each time a rule applies. In practice this would mean a performance bottleneck, however. The rules table is only read when loading rules. An option would be to add a new UDF for writing this information to the table.

Character sets

The problem with different character sets is that they influence digests. Character sets can be set globally or per session. This gets troublesome when rules are loaded as there is a dedicated session loading them. The session picks up the current global character set, so this works transparently, but if the global character set is changed, rules have to be reloaded. What will not work, however, is sessions that have different session-level character sets. This will likely have to be fixed the same way as session-level 'current database' works. I.e. a character set column needs to be added to the rules table which the plugin can read. Then it will need a service to set its rule loading session's character set accordingly when loading the rule. A NULL value in the column would mean use the global character set.

Overview
Declaring plugins in MySQL
Part 1. Pre Parse Query Rewrite Plugin Interface
- Plugin Declaration
- Errors and Warnings
Part 2. Example Pre Parse Query Rewrite Plugin
Part 3. Post Parse Query Rewrite Plugin Interface
Part 4. The Rewriter Plugin

Overview

This document describes the final design of the Query Rewrite Framework.

This work consists of five parts:

A pre parse query rewrite plugin interface
An example implementation of a pre parse query rewrite plugin.
A post parse query rewrite plugin interface
A fully functional post parse query rewrite plugin.
A benchmark measuring the performance of (3) and (4) used in conjunction.

Declaring plugins in MySQL

A plugin is declared by calling the C macro mysql_declare_plugin(NAME). There is a bit of platform-dependent macro magic going on, but in essence a declaration looks like the following:

mysql_declare_plugin(<name>)
{
  <type code>,
  <specific plugin declaration>,
  <plugin name>,
  <author>,
  <description>,
  <license>,
  <init function pointer>,
  <deinit function pointer>,
  <version>,
  <status variables declarations>,
  <system variables declarations>,
  <configuration options>,
  <flags>
}
mysql_declare_plugin_end;

Each plugin type has a specific plugin declaration, which is a C struct.

Part 1. Pre Parse Query Rewrite Plugin Interface

This interface can be used for rewriting a query from the actual query string arriving at the server. A plugin will receive a const char* pointer containing the current query string. If it rewrites the query, it must set the flag FLAG_REWRITE_PLUGIN_QUERY_REWRITTEN.

Plugin Declaration

A pre parse query rewrite plugin declaration consists of the parts:

The interface version
The entry point (function pointer) to the plugin.
A deinit function to free the rewritten query.

The struct is defined as follows:

struct st_mysql_rewrite_pre_parse
{
  int interface_version;
  int (*rewrite)(Mysql_rewrite_pre_parse_param *param);
  int (*deinit)(Mysql_rewrite_pre_parse_param *param);
};

The function rewrite is called with a Mysql_rewrite_pre_parse_param struct, defined as follows:

/// Structure that is passed during each step of a rewriting.
typedef struct Mysql_rewrite_pre_parse_param
{
  /**
    Indicate the status of the current rewrite.
    @see FLAG_REWRITE_PLUGIN_QUERY_REWRITTEN
    @see FLAG_REWRITE_PLUGIN_IS_PREPARED_STATEMENT
  */
  int flags;

  /// The current session.
  MYSQL_THD thd;

  /// Pointer left to the plugin to store any necessary info as needed.
  void* data;

  /// The query potentially to be rewritten.
  const char* query;

  /// Length of query potentially to be rewritten.
  size_t query_length;

  /// The rewritten query, if applicable.
  char* rewritten_query;

  /// Length of the rewritten query, if applicable.
  size_t rewritten_query_length;
} Mysql_rewrite_pre_parse_param;

Errors and Warnings

When a query is rewritten, an SQL note with a new error type ER_QUERY_REWRITTEN is issued.

Part 2. Example Pre Parse Query Rewrite Plugin

An example plugin for the pre parse rewrite plugin interface is supplied. It was originally written by Padraig O'Sullivan under contributor agreement.

Part 3. Post Parse Query Rewrite Plugin Interface

This interface can be used for rewriting a query based on the digest of a query and its parse tree.

Design goals

The number one design goal of the post parse plugin interface is that unaffected queries suffer as little extra overhead as possible. Even if this goal is accomplished at the expense of matching queries being rewritten slower than they would have to, it can still be tolerated.

Plugin declaration

A post parse query rewrite plugin declaration consists of the parts:

The interface version
Whether the plugin is going to ask for a statement digest right after parsing. See section #Statement Digest.
The entry point (function pointer) to the plugin.

The struct is defined as follows:

struct st_mysql_rewrite_post_parse
{
  int interface_version;
  int needs_statement_digest;
  int (*rewrite)(Mysql_rewrite_post_parse_param *param);
};

The function rewrite is called with a Mysql_rewrite_post_parse_param struct, defined as follows:

/// Must be set by a plugin if the query is rewritten.
#define FLAG_REWRITE_PLUGIN_QUERY_REWRITTEN 1

/// Is set by the server if the query is prepared statement.
#define FLAG_REWRITE_PLUGIN_IS_PREPARED_STATEMENT 2

/// Structure that is passed during each step of a rewriting.
typedef struct Mysql_rewrite_post_parse_param
{
  /**
    Indicate the status of the current rewrite.
    @see FLAG_REWRITE_PLUGIN_QUERY_REWRITTEN
    @see FLAG_REWRITE_PLUGIN_IS_PREPARED_STATEMENT
  */
  int flags;

  /// The current session.
  MYSQL_THD thd;

  /// Pointer left to the plugin to store any necessary info as needed.
  void* data;
} Mysql_rewrite_post_parse_param;

Parser Service

Part of this interface is a service that lets any plugin access certain aspects of the parsing stage. The interfaces offered are as follows:

typedef
int (*parse_node_visit_function)(void* item, int type, unsigned char* arg);

Callback function which is invoked for every node in the parse tree during traversal by mysql_parser_visit_tree().

typedef
int (*sql_condition_handler_function)(MYSQL_THD thd,
                                      int sql_errno,
                                      const char* sqlstate,
                                      const char* msg);

This function pointer may be passed as an argument to some functions of the parser service interface. If supplied, it will be called whenever an SQL condition is raised. The function may then choose to flag the condition as handled by returning non-zero. In this case the condition will not end up in the diagnostics area.

MYSQL_THD mysql_parser_current_session(): Returns the session that is associated with the current thread.

MYSQL_THD mysql_parser_open_session(): Opens a new session and associates it with the current thread.

void mysql_parser_close_session(MYSQL_THD thd, MYSQL_THD old_thd): Closes a thread and makes old_thd the current thread's session.

void mysql_parser_set_current_database(MYSQL_THD thd, const MYSQL_LEX_STRING db): Sets the session's current database.

int mysql_parser_parse(MYSQL_THD thd, const MYSQL_LEX_STRING query, unsigned char is_prepared, sql_condition_handler_function handle_condition, void *condition_handler_state): Parses a string and stores the parse tree in a supplied session object. If an SQL condition occurs, the function handle_condition is called. See sql_condition_handler_function for interface.

int mysql_parser_get_statement_type(MYSQL_THD thd): Returns a code representing the statement type.

int mysql_parser_get_statement_digest(MYSQL_THD thd, uchar *digest): Returns the digest computed while parsing the latest string. To be used for matching queries to patterns.

int mysql_parser_get_number_params(MYSQL_THD thd): Returns the number of parameter markers in the latest string that was parsed.

int mysql_parser_extract_prepared_params(MYSQL_THD thd, int *positions): Returns an array of character offset positions of parameter markers within the last query string that was parsed.

int mysql_parser_visit_tree(MYSQL_THD thd, parse_node_visit_function processor, unsigned char* arg): Lets a plugin traverse the parse tree of the latest string that was parsed. A function pointer to a visiting function is supplied.

MYSQL_LEX_STRING mysql_parser_item_string(MYSQL_ITEM item): Returns the string value of an expression. The string is dynamically allocated by the server library and must hence be freed by the server, see below.

void mysql_parser_free_string(MYSQL_LEX_STRING string): Frees a string generated by the above interface.

MYSQL_LEX_STRING mysql_parser_get_query(MYSQL_THD thd): Returns the original string of the latest query that was parsed.

MYSQL_LEX_STRING mysql_parser_get_normalized_query(MYSQL_THD thd): Returns the normalized string of the latest query that was parsed. The normalized query format is explained in http://dev.mysql.com/doc/refman/5.7/en/performance-schema-statement-digests.html.

Normalization is done by performance_schema as well, but it is subject to a (currently) non-configurable maximum length - in parser tokens - so we can't use it (yet). For this reason the parser service has a specialized implementation that allows for arbitrary length strings.

Errors and Warnings

This section will discuss the approach taken toward warnings and errors raised by the interface.

Rewriting rules

When a query is rewritten, an sql note with a new condition of type ER_QUERY_REWRITTEN is issued. This assumes that the plugin has raised the flag FLAG_REWRITE_PLUGIN_QUERY_REWRITTEN.

Redesign of Error Handling

Error handling in MySQL uses stack-able diagnostics areas as specified in the SQL standard. They contain warnings, errors, notes etc, commonly referred to as 'conditions'. The redesign in this section does not aim to change the behavior of non-rewritten statements, merely to enable the rewrite plugin interfaces to insert conditions in the diagnostics area.

What follows below is a very simplified view of how the server currently handles diagnostics areas. Only the details relevant to this work are shown.

The server maintains two diagnostics areas (they will be referred to as DA from now on), a statement DA and a parser DA. The statement DA is the one whose contents are displayed using a <get diagnostics statement> such as SHOW WARNINGS or GET DIAGNOSTICS. For all non-diagnostic statements, the DA is cleared in the beginning of statement execution or, if there were parse conditions, right after parsing. The cleared parser DA is used during parsing (on top of the stack), so as not to interfere with the statement DA, because the latter will contain conditions from the previous statement at this point. Obviously, the server can't know if the statement is a diagnostics statement until after parsing, so the statement DA must be kept intact until this is known. The relevant code flow is shown below, functions opened up and in-lined:

   Server start-up:
     Push statement DA

   Query Processing:
     mysql_parse() {
       parse_sql() {
         clear parser DA
         push parser DA
         parse() {
           ...
           if the statement is a diagnostics statement
             set DA_KEEP_DIAGNOSTICS
           ...
         }
         if there are conditions in parser DA {
           clear statement DA
           copy conditions from parser DA to statement DA
           set DA_KEEP_PARSE_ERROR flag
         }
         pop DA
       }
     }

     mysql_execute_command() {
       ...
       clear DA unless DA_KEEP_PARSE_ERROR or DA_KEEP_DIAGNOSTICS
       ...
     }

We make some observations:

The sole purpose of the parser DA is to keep conditions that arise during parsing from polluting the statement DA until we know what type the current statement is. Its lifetime goes beyond the scope of parse_sql() for efficiency reasons only; a DA has its own mem_root.

The design does not take into account that conditions may be raised during the parsing of diagnostics statements. If that happened, all other conditions would be cleared, and only the conditions about the diagnostics statement itself would reach the client.

The introduction of a framework for query rewrites before and after parsing place quite a few additional requirements on the error handling. We will start by listing the requirements for each interface and conclude by listing the additional requirements that the interfaces place on each other.

Pre parse rewrite interface requirements:

There shall be a note when a pre parse plugin rewrites a query.
This note must not clear any other conditions.

Post parse rewrite interface requirements:

There shall be a note when a post parse plugin rewrites a query.
This note must not clear any other conditions.

Requirement on the inter-operation of the interfaces:

The post parse interface will act on the output of the pre parse interface, and this must be clear in the 'rewritten' notes.

In all other aspects, the error handling will work as it always has.

The error handling will be redesigned to support these requirements. The new design is as follows:

   Server start-up:
     Push statement DA

   Query Processing:
     mysql_parse() {
       invoke_pre_parse_rewrite_plugins() {
         push parser DA
         invoke all pre parse plugins
         pop DA
       }
       parse_sql() {
         push parser DA without clearing it
         parse() {
           ...
           if the statement is a diagnostics statement
             set DA_KEEP_DIAGNOSTICS
           ...
         }
         if there are conditions in parser DA {
           clear statement DA only if this is not a diagnostics statement
           copy conditions from parser DA to statement DA
           set DA_KEEP_PARSE_ERROR flag
           clear the parser DA
         }
         pop DA
       }
       invoke_post_parse_rewrite_plugins() {
         Initialize dedicated plugin DA
         push plugin DA
         invoke all post parse plugins
         if there are conditions in plugin DA {
           if the statement after rewriting is not a diagnostics statement and
              DA_KEEP_PARSE_ERROR is *not* set
             clear statement DA
           copy conditions from plugin DA to statement DA
           set DA_KEEP_PARSE_ERROR flag
         }
         pop DA
       }
     }

     mysql_execute_command() {
       ...
       clear statement DA unless DA_KEEP_PARSE_ERROR or DA_KEEP_DIAGNOSTICS
       ...
     }

The major change is that the clearing of the parser DA is now moved to after copying the conditions to the statement DA. This way, the new design acknowledges that the parser DA has a conceptual life span that ends when conditions are transferred to the statement DA, which simplifies things.

The pre parse interface simply uses the parser DA so that conditions fired by a plugin will appear to have happened during parsing.

The post parse interface uses an idea similar to what parse_sql() does, but it does not reuse the parser DA. The reason is not only clarity but also purely technical: as mentioned before the parser DA is a singleton that is reused for each statement, and furthermore the DA stack is implemented using intrusive pointers. This makes it impossible to push the same DA twice. Hence if we are to have any hope of a plugin invoking the parser, we had better not have the parser DA on the stack. Hence the post parse rewrite plugins will use a separate plugin DA for error reporting. Also the note about queries being rewritten come to this DA first, even though they are raised by the interface rather than any plugin.

Part 4. The Rewriter Plugin

This work will supply an implementation of a post parse rewrite plugin called "Rewriter".

Design Goals

A design goal of the post parse plugins is that unaffected queries suffer as little extra overhead as possible. Even if this causes rewritten queries to be even less efficient, it can still be tolerated.

Syntax of Rewrite Rules

The plugin rewrites queries by applying rewrite rules. A rewrite rule consists of two parts:

A pattern, which a query must match in order to be rewritten.
A replacement, which will take the matched-out portions of the pattern and rewrite them according to the replacement.

The syntax for specifying both patterns and replacement has been chosen to be the same as is currently used for prepared statements. This offers several advantages:

A pattern can be parsed using the normal sql parser, no extra parser is needed.

A replacement can be parsed using the normal sql parser, no extra parser is needed.

We can leverage existing digests currently calculated by performance schema to index the rules for efficient matching.

As future work, the prepared statement framework could be used to prepare statements from replacements and then simply bind its parameter markers to values from matched queries. Investigating this is outside the scope of this work, however.

Matching Rules to Queries

For efficiency reasons, matching rewrite rules to queries is done in three phases with increasing granularity:

Comparison of digests.
Comparison of parse trees.
Comparison of literals.

The first comparison is done as part of hash table lookup in the in-memory rules table. It is obviously susceptible to false positives, a.k.a. hash collisions, but serves a quick-reject test. The second is slightly more elaborate and consist of normalizing the query and the pattern. The normalized format is explained in the manual: http://dev.mysql.com/doc/refman/5.7/en/performance-schema-statement- digests.html.

In the normalized format, all literals are replaced by a '?'. This means that in practice comparing the normalized strings will give the same answer as comparing the structure of two parse trees. This is fortunate, as we have no means of keeping two parse trees 'live' at the same time without creating two different THD objects. The parse tree is far too tangled with the THD object to permit this. We streamline the process by normalizing the pattern as it is loaded into memory, hence we only need to do the following to implement phase 2:

Normalize the current query.
Compare the normalized pattern and the query with strcmp().

What remains in the third phase is simply to compare literals values, which are the leaves in the parse tree. Since we are traversing the parse tree when generating the new query (see the following sections), we will visit all leaves anyway, and we are better off delaying the comparison and do both at the same time.

Execution of Rewrite Rules

Any query that can be parsed as a prepared statement can be used as a pattern or replacement. However, a replacement cannot contain more parameter markers than its corresponding pattern, for reasons which become apparent below.

A pattern can look like the following example

SELECT <select list>
FROM <table expression>
WHERE table1.col1 = ? AND table2.col2 = ?

A valid corresponding replacement can look like

SELECT <select list>
FROM <table expression>
WHERE table1.col1 = ? AND table.col2 = 'x' AND table2.col2 = 'y'

At the start of the execution of a rule, we know that the parse trees (of the rule and of the current query) have the same structure. See previous section.

If a given query matches a pattern, the query is rewritten by:

Finding all the literals in the query's parse tree, and the corresponding literals in the pattern's parse tree. The below ASCII art gives an example:

          Pattern                 Original Query

            AND                       AND
          /     \                   /     \
        EQ      NEQ               EQ      NEQ
      /   \   /     \           /   \    /    \
     A     ? B       ?         A     3  C      5

We have the following pairs of literals in corresponding positions:

A, A
?, 3
B, C
?, 5

For each of these pairs, we check the following:
1. If the pair consists of '?' in the pattern, we collect the literal from the original query. That is, in the above situation we would collect literals 3 and 5. In the case of prepared statements, the query may contain '?' as well, but this is only accepted if the pattern contains '?' as well.
2. If the literals are equal, we have established that the rule still matches the query, and we simply proceed. If they differ (the query literal may not be '?' now) we fail rewriting, since we have found that the pattern does not match the query after all.
Replacing the '?' symbols in the replacement string with the literals obtained in (2). Note that the order in both extraction and injection is strictly left-to-right.

The algorithm is described in pseudo-code in section #Implementation of Query Rewrites.

Defining Rewrite Rules

Rewrite rules are defined by inserting a row into an MySQL table having the columns PATTERN and REPLACEMENT. For ease of use, the table also contains a column ENABLED of type CHAR(1).

The plugins reads the enabled rewrite rules from the table. It will read the columns 'pattern' and 'replacement' if the column 'enabled' is equal to 'Y'. No other assumptions are made about the table.

In-Memory Rules

For efficiency reasons, the rules table will be read into an in-memory hash table using the statement digest as hash keys. A matching digest is a necessary-but-insufficient condition for matching a pattern to a query, and hence constitutes a quick-reject test, in the spirit of the design goal, see section 4.1 Design Goals.

If we find a rule by hash lookup on the digest, the current query is printed in normalized format using the parser service (see section #Parser Service,) and then compared with the rule pattern's normalized format. In case of digest collisions, we iterate through all rules with a matching digest, performing string comparison for each one until a match is found.

The hash table is protected from concurrent modification by a global mutex. There are no performance considerations since this is only meant to be done by a single user and infrequently.

Loading rules into memory

Procedure for Loading

Rewrite rules are loaded into memory by calling the UDF load_rewrite_rules(), declared in the same library as the Rewriter plugin. This UDF will do the following:

Obtain the current session from the Parser Service.
Take a global mutex on its in-memory hash table.
Open a new session using the Parser Service and set it as the thread's current session.
Scan the rules table using a dedicated Rules Table Service (see below), and for each row:
- Parse the pattern.
- Get the pattern's digest.
- Parse the replacement.
- Update the message column, setting it to NULL if there are no errors.
- Unless there were errors, insert the pattern and replacement in the hash table using the digest as hash key.
Commit the transaction.
Update the status variable Rewriter_number_reloads.
Restore the current session.
Release the mutex.

Loading Rules and Character Sets

When the server parses a query, it needs to know which character set the client that sent the query uses. The client's character set can be set in two ways: global and session-local. The variable is called character_set_client, and can be set e.g.

 SET @@global.character_set_client = utf8;

 SET @@session.character_set_client = utf8;

Character sets influence the digest, which means that in order for a rule to match a query, the rule must be parsed (when loaded) with the same character set as the query to be rewritten. Because rules are loaded using a dedicated session, it does not know about session-local settings of character_set_client.

For the global setting, the loading session will copy the value of character_set_client when the session is opened, i.e. in mysql_parser_open_session(), see section #Parser Service. If the global setting is subsquently altered, the rules will cease to match until the rules table is loaded again.

The above can be concluded like so:

Sessions using a character set different than the global character set will never get their queries rewritten.
If the global character set changes, the rules must be reloaded.

Loading Rules and Current Database

Every session can have a "current database". This setting acts like a session variable, which is used to resolve unqualified table names. This would normally not be relevant to query rewrites on a parse tree, but the there are two strong reasons why a query rewrite plugin would be interested in a sessions current database.

Technical reasons.: The current database is resolved in the parser, and the unqualified table names are qualified at parse time. Digests are calculated from the unqualified names, however. There are no traces of this adornment, so the plugin can't detect whether a table name was qualified or unqualified.

Functionality.: It is likely that users want to rewrite queries with unqualified table names. It is also likely that they want this rewrite rules to be sensitive to the session's current database. I.e. they may want to rewrite SELECT * FROM person in database a, but not necessarily in database b.

This is the reason the table rewrite_rules has a pattern_database column. Using the function mysql_parser_set_current_database(), the Rewriter plugin can set current database for each rule before parsing the pattern. Hence we have the following behavior:

Unqualified table names will be resolved in the database from pattern_database. If this column is NULL, there will be a parse error.
Qualified table names will match qualified table names only, because the database name influences the digest.

If it is desired to rewrite a query regardless of whether the table was qualified or not, two rules have to be declared. Unfortunately this leads to combinatorial explosion:

If you want to rewrite all queries SELECT * FROM employee, you need to declare the patterns SELECT * FROM employee, and SELECT * FROM db.employee

If you want to rewrite all queries SELECT * FROM employee, department, you need to declare SELECT * FROM employee, department, SELECT * FROM employee, db.department, SELECT * FROM db.employee, department and SELECT * FROM db.employee, db.department. And so on.

Loading Rules and Transactions

Rules are loaded into the plugin in a dedicated session with its own transaction. This means that depending on the setting of autocommit and isolation level, the plugin may or may not see rules in the table. For this reason, a user should not call the UDF in #Procedure for Loading directly. Instead, the user calls the stored procedure flush_rewrite_rules(). This procedure will

Commit the user's transaction
Call the UDF load_rewrite_rules()
If there is an error message, signal an error
Clear the query cache.

A plugin can neither trigger a COMMIT, SIGNAL nor clear the query cache on its own, so that is why this is done in a stored procedure.

After the transaction is committed, all rules are obviously visible to other sessions. The name has been chosen to be similar to other commands that cause an implicit commit, e.g. FLUSH TABLES.

The reason the query cache is cleared here is explained in #Query Cache.

The benefit of this design is that we need not care about concurrency issues when two sessions are updating the rules table simultaneously. What will happen depends on the autocommit setting and isolation levels, and are well documented in the manual.

The Rules Table Service

The dedicated service for reading the rules table is called rules_table_service, and has a sole reason for existence, namely to allow the Rewriter plugin to access the table query_rewrite.rewrite_rules.

The service offers the following interface:

Cursor open_table(MYSQL_THD thd): Opens the rules table for scanning and returns a Cursor object.

Cursor end(): A past-the-end Cursor. All past-the-end cursors are considered equal when compared with operator ==.

A Cursor is writable and supports the following operations:

bool operator== (const Cursor &other): Equality operator. The only cursors that are equal are past-the-end cursors.

bool operator!= (const Cursor &other): Inequality operator. All cursors are considered different except past-the-end cursors.

Cursor &operator++ (): Advances this Cursor. Any handler errors are ignored.

void set(int colno, const char* str, size_t length): Sets the value of column colno to a string value.

~Cursor(): Closes the table scan if initiated.

The parser service is specified in section #Parser Service.

Implementation of Query Rewrites

When an in-memory rewrite rule is created from a row in the sql table, the following properties are computed and stored:

The pattern is parsed as a prepared statement and the parse tree traversed. A counter is incremented for every literal that appears therein. If the literal is a parameter marker, the current counter value is appended to a list.

The replacement is parsed as a prepared statement and the string positions of parameter markers are recorded and appended to a list. The server already does this for prepared statements, so all the plugin needs to do is keep a copy of the positions.

A query is rewritten by creating a new query, parse it, and continue execution with the new query's parse tree as 'current parse tree'. Creation of a new query is performed using the following algorithm.

tokenNumbers is a list of the token numbers of parameter markers in the pattern's parse tree.

charPositions is a list of the character positions of parameter markers in the replacement's parse tree.

queryParseTree is the parse tree of the query.

replacement is the replacement string.

  buildNewQuery(tokenNumbers, charPositions, queryParseTree, replacement)
  {
    // invariant: startOfChunk contains the start of the next string chunk
    // in 'replacement' that has not yet been appended to the new query string.
    startOfChunk := 0
    newQuery := ""
    foreach endOfChunk in charPositions
      tokenNumber := tokenNumbers.pop()
      newQuery.append replacement[startOfChunk .. endOfChunk - 1]

      patternLiteral := patternParseTree.getLiteralValue(tokenNumber)
      queryLiteral := queryParseTree.getLiteralValue(tokenNumber)
      if patternLiteral = '?' then
        newQuery.append queryLiteral
      else
        fail() unless patternLiteral = queryLiteral

      startOfChunk := endOfChunk + lengthOf('?')

    newQuery.append replacement[startOfChunk..]
    return newQuery
  }

Finally, the new query is parsed by the plugin and FLAG_REWRITE_PLUGIN_QUERY_REWRITTEN is raised.

Status Variables

The Rewriter plugin declares four status variables.

Rewriter_number_rewritten_queries: LONG INTEGER: Number of queries that have been rewritten since the plugin was loaded.

Rewriter_data_error ON/OFF: Value is ON in case an error occurred when loading the rules into memory. The plugin will also have emitted an error in this case.

Rewriter_number_rules_loaded INTEGER: Number of rewrite rules that are currently successfully loaded into memory.

Rewriter_number_rules_reload LONG INTEGER: Number of times that a loading of the rules table into memory has occurred.

System Variables

The plugin declares one system variable.

Rewriter_verbose INT

This variable is for internal debugging use and is subject to change.

Query Cache

Obviously, the query cache will interfere with query rewrite plugins. Still, the cases are relatively easy to handle. The general cases are handled by the query rewrite plugin framework, while the cases specific to the Rewriter plugin are handled by the plugin itself. The table below summarizes the problems and how they are handled.

Problem	How It's Handled By the Framework	How it's Handled By the Rewriter Plugin
A query is in the cache when the plugin is loaded		The install script ends with RESET QUERY CACHE
A rewritten query is put in the cache, then the plugin is uninstalled.	Rewritten queries are never put in the case.
A rewritten query is put in the cache, then the rule is disabled.	Rewritten queries are never put in the case.
A query is in the cache when a rewrite rule matching it is loaded.		The loading procedure ends with RESET QUERY CACHE

Statement Digest

The post parse query rewrite plugin assumes that a statement digest is always available right after parsing. See section 3.2 Plugin declaration.

Using the plugin

The sub-sections within this section will explain how the plugin is used and how errors are handled.

Installation and setup

When installing the plugin using the supplied SQL script, it will create a database, rewrite rules table and a UDF unless they exist. Only the normal warning will be issued if the table already exists.

Errors and Warnings

Missing columns: The plugin makes the following assumptions about the structure of the rules table:

There are columns named PATTERN, PATTERN_DATABASE, REPLACEMENT, ENABLED and MESSAGE.
The MESSAGE column can store a string of at least length 1024.

If some of these assumptions does not hold, the plugin sets the status variable Rewriter_data_error to ON. It will not rewrite queries unless the structure changes.

Rule syntax errors: If there are errors processing pattern or replacement, the plugin will try to write an error message by updating the column MESSAGE in the rule's row, setting ENABLED to 'NO' and set the status variable Rewriter_data_error to ON.

WL#7589: Query Rewrite Plugins

Contents

Pre Parse Query Rewrite Plugin Interface Functional Requirements

Post Parse Query Rewrite Plugin Interface Functional Requirements

Pre and Post Parse Query Rewrite Plugin Interface Non-Functional Requirements

Rewriter Plugin Functional Requirements

Supplied Post Parse Query Rewrite Plugin Non-Functional Requirements

Contents

Pre Parse Query Rewrite Plugin Interface Specification

Post Parse Query Rewrite Plugin Interface Specification

Pre and Post Parse Query Rewrite Plugin Interface Specification

Rewriter Plugin Interface Specification

Future Work

Unique rules

IN lists

Per-rule counter of rewritten queries

Character sets

Contents

Overview

Declaring plugins in MySQL

Part 1. Pre Parse Query Rewrite Plugin Interface

Plugin Declaration

Errors and Warnings

Part 2. Example Pre Parse Query Rewrite Plugin

Part 3. Post Parse Query Rewrite Plugin Interface

Design goals

Plugin declaration

Parser Service

Errors and Warnings

Rewriting rules

Redesign of Error Handling

Part 4. The Rewriter Plugin

Design Goals

Syntax of Rewrite Rules

Matching Rules to Queries

Execution of Rewrite Rules

Defining Rewrite Rules

In-Memory Rules

Loading rules into memory

Procedure for Loading

Loading Rules and Character Sets

Loading Rules and Current Database

Loading Rules and Transactions

The Rules Table Service

Implementation of Query Rewrites

Status Variables

System Variables

Query Cache

Statement Digest

Using the plugin

Installation and setup

Errors and Warnings