WL#12668: Pre-parse plugin for DDL statement clause filtering
Affects: Server-8.0
—
Status: Complete
In various situations when restoring output from a dump file, it is relevant to filter out certain clauses from DDL statements. This is useful, e.g. when restoring a dump file into an environment where encryption will not be applied. In such a case, we would like to filter out the encryption clauses from the CREATE TABLE statements. The purpose of this WL is to design and implement a plugin which does simple string replacement of DDL statements from dump files based on regular expression matching. This will be an audit type plugin catching and handling pre-parse events. It should be noted from the beginning that since the plugin is based on regular expression matching, it is clear that there may be situations where the plugins decides to replace occurrences that should not be replaced, e.g. if the search pattern is matched within a comment.
Functional requirements ----------------------- Note: The regular expressions below are defined using the ECMA script syntax as used by the C++ regex package (http://www.cplusplus.com/reference/regex/ECMAScript/). FR0. DDL statements are expected to be generated by SHOW CREATE TABLE. This means that complicating issues such as leading whitespace, arbitrary comments, etc. can be ignored. FR1. Only statements where the first character in the query string is 'c' or 'C' shall be considered for rewrite. FR2. Statements shall be considered for rewrite if and only if they match the following regular expression case insensitively: "^CREATE\s+TABLE" This also covers FR1 of course, but for performance reasons, the check in FR1 is done before FR2 to weed out statements that do not need rewriting as early as possible. FR3. Statements considered for rewrite shall have their DIRECTORY clauses removed and replaced by a single space (to maintain proper separation of clauses), if matched by the following regular expression case insensitively: "\s*,?\s*(DATA|INDEX)\s+DIRECTORY\s*?=?\s*?'[^']+?'\s*,?\s*" FR4. Statements considered for rewrite shall have their ENCRYPTION clauses removed and replaced by a single space (to maintain proper separation of clauses), if matched by the following regular expression case insensitively: "\s*,?\s*ENCRYPTION\s*?=?\s*?'[NY]?'\s*,?\s*" FR5. Multi line statements shall be handled correctly, i.e., clauses shall be removed as expected regardless of which line they are located in. FR6. Per partition directory clauses shall be handled correctly, i.e., the DATA|INDEX DIRECTORY clause shall be removed both at the table level and for individual partitions. Non-functional requirements --------------------------- NFR1. Query execution time shall not be significantly affected (typically <10% increase in average execution time for load as specified in the "Non functional test" in the QA notes below). NFR2. The implementation of the plugin shall be unit-testable.
Overall plugin design --------------------- The plugin is defined by an audit plugin descriptor and a generic plugin descriptor: /* Audit plugin descriptor. */ static struct st_mysql_audit ddl_rewriter_descriptor = { MYSQL_AUDIT_INTERFACE_VERSION, /* interface version */ NULL, /* release_thd() */ rewrite_ddl, /* event_notify() */ { 0, 0, (unsigned long)MYSQL_AUDIT_PARSE_ALL, } /* class mask */ }; /* Plugin descriptor */ mysql_declare_plugin(audit_log){ MYSQL_AUDIT_PLUGIN, /* plugin type */ &ddl_rewriter_descriptor, /* type specific descriptor*/ "ddl_rewriter", /* plugin name */ "Oracle", /* author */ "Rewrite of DDL statements.", /* description */ PLUGIN_LICENSE_GPL, /* license */ plugin_init, /* plugin initializer */ NULL, /* plugin check uninstall */ NULL, /* plugin deinitializer */ 0x0001, /* version */ NULL, /* status variables */ NULL, /* system variables */ NULL, /* reserved */ 0 /* flags */ } mysql_declare_plugin_end; The plugin initializer will register a performance schema key to track memory usage, as shown below. Performance schema instrumentation to track memory allocation ------------------------------------------------------------- static PSI_memory_key key_memory_ddl_rewriter; static PSI_memory_info all_rewrite_memory[] = { {&key_memory_ddl_rewriter, "ddl_rewriter", 0, 0, PSI_DOCUMENT_ME}}; static int plugin_init(MYSQL_PLUGIN) { const char *category = "rewriter"; int count = static_cast(array_elements(all_rewrite_memory)); mysql_memory_register(category, all_rewrite_memory, count); return 0; /* success */ } Handling rewrite events ----------------------- When notified about a new event, the plugin will first check whether this is an event of the appropriate type with the expected query string, and make an early exit if we don't need further analysis of the event, along the lines of the following pseudo code (note that since this is just pseudo code, e.g. variable types are not necessarily consistent and correct): if (event_class != MYSQL_AUDIT_PARSE_CLASS || event_parse->event_subclass != MYSQL_AUDIT_PARSE_PREPARSE || event_parse->query.length == 0 || (event_parse->query[0] != 'C' && event_parse->query[0] != 'c') || !regex_search(event_parse->query, "^CREATE\s+TABLE", icase)) return 0; We now know this is pre-parsing of a CREATE TABLE statement. So we can go ahead with replacements of DATA|INDEX DIRECTORY = '...' and ENCRYPTION = '...', but note: - Optional preceding and trailing commas. - Optional assignment operator. - Path enclosed in single quotes. - Use of ungreedy patterns to get correct grouping. The regex definition and replacement is done according to the following pseudo code: std::regex directory_option( "\s*,?\s*(DATA|INDEX)\s+DIRECTORY\s*?=?\s*?[\"'][^\"']+?[\"']\s*,?\s*" std::regex::icase); rewritten_query = std::regex_replace(event_parse->query, directory_option, " "); std::regex encryption_option( "\s*,?\s*ENCRYPTION\s*?=?\s*?[\"'][NY]?[\"']\s*,?\s*", std::regex::icase); rewritten_query = std::regex_replace(rewritten_query, encryption_option, " "); Finally, if there was indeed a rewrite, we record the rewritten query in the event that was broadcasted (note that memory allocation with performance schema instrumentation, and string handling, are not included in the pseudo code below): event->rewritten_query->str = rewritten_query.c_str(); event->rewritten_query->length = rewritten_query.length(); *(reinterpret_cast (event_parse->flags)) |= MYSQL_AUDIT_PARSE_REWRITE_PLUGIN_QUERY_REWRITTEN;
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.