WL#4034: Implement INFORMATION SCHEMA as a storage engine
Affects: Server-Prototype Only — Status: Un-Assigned — Priority: Medium
The INFORMATION_SCHEMA is schema/database within MySQL that contains a series of tables which report rows of data describing various types of internal information. Most of these tables can also be displayed using one of the various SHOW commands. It is implemented within the server in many places but mostly in a file called sql_show.cc. There are two general types of schema tables; 1) Server tables in which the column and index metadata is hard coded (as defined in http://en.wikipedia.org/wiki/Hard_coding) so table structure is always in memory and is fixed for a given MySQL version. 2) Storage Engine tables are declared as new information_schema plugins by storage engines. Since these may be compiled separately, the metadata for these tables are not hard-coded within the server, but discovered when the plugin is loaded. The INFORMATION_SCHEMA implementation uses internally "SQL derived tables" (internal temporary tables of the optimizer) as a temporary, intermediate storage for produced contents. This creates extra overhead during execution and is not the right tool for the job because: * In MySQL, there is no generic mechanism implemented for optimization of WHERE clauses against an internal temporary table - prior to introduction of this usage there were no use for this. * The current mechanism is not suited for producing contents on-demand, that is, producing data only when a particular row or column is accessed. * This mechanism creates unnecessary overhead and implementation complexity when providing data that is otherwise fully in-memory. Some examples of INFORMATION SCHEMA tables that are derived from in-memory sources are; CHARACTER_SETS, COLUMN_PRIVILEGES, COLLATIONS, ENGINES, GLOBAL_STATUS, GLOBAL_VARIABLES, PLUGINS, PROCESSLIST, SESSION_STATUS, SESSION_VARIABLES & STATISTICS. Whereas, long term, we would like to use base physical tables of the data dictionary as the underlying storage for strict *INFORMATION_SCHEMA* views (see WL#4039), such as TABLES, COLUMNS, KEY_COLUMN_USAGE, we still need a more efficient and easily programmable way to support views of volatile data (that is; data that is modified by a source external from the declaring object). The proposal of this worklog is to use our storage engine API to implement a special "information schema" storage engine, that will be used to produce the necessary data. This underlying mechanism is more suited for the task because the storage engine API; * Supports condition pushdown * Supports operation costs * Has support for index declaration and can automatically let the optimizer use the needed index. Indexes can be used for; 1) The natural index provided by the internal layout of data, e.g. if this data is stored on disk. (Konstantin explains the meaning of "natural index" and "hash index" here: [mysql intranet archive] /secure/mailarchive/mail.php?folder=4&mail=35344. "Supporting a "natural" index on information_schema.tables would be telling the optimizer that there is a unique key on the table name + db name. With that metadata the optimizer can faster and without hacks employed today resolved WHERE table_name= type queries. This "index" is maintained by the filesystem, and even though it's not utf8_general_ci, which is collation of the column, it's in some known collation.Why does the collation of the index have to match the collation of the column?.") Since currently we do not have an internal table-based data dictionary, it is suggested that this underlying mechanism is used for *all* INFORMATION_SCHEMA tables currently available in the system. In the future, for data that is located in base tables, we may use a different, separate mechanism to access it. The main purpose of this worklog is to improve the performance of INFORMATION_SCHEMA table access. This will add to the performance gain implemented by WL#3732, which was a less intrusive way of limiting table related data materialized to only those databases and tables matching the selection criteria. A good example of a performance problem that should be targetted by this worklog is BUG#43818
A new Storage engine will be created in /storage called infoschema. Written in C++ in a similar manner as performance_schema, it will make use of inheritance and well defined interfaces including the Handler interface. The infoschema storage engine will NOT ever need to be compiled independently of the server it runs against. The data it collects is intrinsically tied to the server that provides it. So it may be necessary for this engine to 'know too much' about its host since it gathers information directly from server memory objects. However, wherever possible, it is to be a goal of this project to separate the transfer of data back and forth from this storage engine to the server into well defined interfaces. It is also a goal to keep the number of new interfaces to a minimum as well. This can be accomplished by allowing the server to do as much of its own data collection as possible, and report that through a minimum number of new function calls from infoschema back to the server. Then infoschema will use the Handler interface to report rows back to the server. The isolation level of data reported will be a sort of 'rolling snapshot'. In other words, a list will initially be collected of the primary keys for rows to report with as little blocking as possible. Then those rows will be reported if they still exist at the time the row is requested. So this is by no means a serializable or repeatable-read isolation. And it is not exactly a read-committed isolation either because new rows that appear while data is being reported will not be found or returned. For example; infoschema.engines will report engines that are in the ready state (PLUGIN_IS_READY). If a new storage engine enters in this state immediately after the call to Handler::rnd_init(), it will not be reported because it was not collected into the list of engines to report by this Handler object. However, if an engine leaves this state immediately after the call to Handler::rnd_init(), it will again not be reported because the call to rnd_next() does not see the PLUGIN_IS_READY flag. So new rows, it is like repeatable read. For updated and deleted rows, it is like read-committed. Schema tables that report information about other tables, views, and triggers will need to use existing DDL locking when it collects data. Some information_schema tables are provided by third-party storage engine plugins as a plugin of type MYSQL_INFORMATION_SCHEMA_PLUGIN. These plugins use an existing limited interface which includes only one function that can be called by the server to collect rows for these tables.It is called fill_table(). With this current API, the INFOSCHEMA storage engine must call this function once, save the rows reported by the third party plugin, and then report them one by one as needed by the optimizer. So for the case of third party plugin schema tables, all data must be materialized once, but only once, per query. A future worklog task could involve expanding the plugin API for these types of plugins.
This low level design is broken up into multiple milestones. The technical issues involved will be added to these milestones once they are understood. Milestone 1: Merge to mysql-next-mr 100% Get the code basically compiling with mysql-next-mr. Create a repository on bk-internal supported by pushbuild. Milestone 2: Initial Bug fixing. 80% Get the merged infoschema storage engine to work for some simple queries. Start identifying problems and fix them one by one. The old code is not to be replaced while the INFOSCHEMA storage engine is being built. The two should work side by side so that changes to information_schema code can be merged in during the development process and the output of the two parallel sources can be compared. Milestone 3: Improve Discovery 90% The existing discovery mechanism was written for NDB cluster so that it could move FRM files around and let a clustered engine 'discover' a table from a newly written FRM file. This process should change so that an FRM file is not needed. Much of this work will feed into WL#5308. The storage engine should provide metadata about the table to code maintained in the server so that a TABLE_SHARE can be created without the necessity of an FRM file. Milestone 4: Variables and Status 100% done Improve and fix the 6 tables associated with collecting status variables from in memory arrays. These tables at VARIABLES, GLOBAL_VARIABLES, SESSION_VARIABLES, STATUS, GLOBAL_STATUS, & SESSION_STATUS. Make the global and session tables inherit the VARIABLE and STATUS classes to reduce and simplify code. Milestone 5: Colations and Character sets 80% Clean up these three simple tables and make sure they work the same as the old code. CHARACTER_SETS, COLLATIONS & COLLATION_CHARACTER_SET_APPLICABILITY Milestone 6: Plugins and Engines 100% Clean up these two similar tables. ENGINES is a subset of PLUGINS and should inherit the plugin class to reduce code. Milestone 7: Tables, etc 50% There are 10 information_schema tables that collect data through sql_show.cc; get_all_tables(). Three of them were previously implemented. This process is the most complex in information_schema and needs to be simplified. Create an infoschema_tables_base class that the others will inherit and reimplement the work in this class. Make sure TABLES, TABLE_NAMES & PARTITIONS work with this refactoring. Milestone 8: Third-Party tables Third party storage engines may register their own information_schema tables through the plugin interface. This API has an init() and fill_table() function to provide the table. Create a generic class in INFOSCHEMA that can be set up to collect and display this information. 100% Milestone 9: Columns Create the table COLUMNS inheriting infochema_tables_base Milestone 10: Key Column Usage Create the table KEY_COLUMN_USAGE inheriting infochema_tables_base Milestone 11: Referential Contrainsts Create the table REFERENTIAL_CONSTRAINTS inheriting infochema_tables_base Milestone 12: Statistics Create the table STATISTICS inheriting infochema_tables_base Milestone 13: Triggers Create the table TRIGGERS inheriting infochema_tables_base Milestone 14: Views Create the table VIEWS inheriting infochema_tables_base Milestone 15: Events Create the table EVENTS Milestone 16: Files Create the table FILES Milestone 17: Parameters Create the table PARAMETERS Milestone 18: Profiling Create the table PROFILING Milestone 19: Routines Create the table ROUTINES Milestone 20: Schema Privileges Create the table SCHEMA_PRIVILEGES Milestone 21: Schemata Create the table SCHEMATA Milestone 22: Table Constraints Create the table TABLE_CONSTRAINTS Milestone 23: Table Privileges Create the table TABLE_PRIVILEGES Milestone 24: Tablespaces Create the table TABLESPACES Milestone 25: User Privileges Create the table USER_PRIVILEGES Milestone 26: Index use by the Optimizer Determine how the Optimizer uses hash indexes and debug the current implementation in INFOSCHEMA. Clean this up and implement indexes in any table where it makes sense or delete all the existing index code. Milestone 27: Condition Pushdown The existing information schema implementation uses a limited condition pushdown for the fields db_name and table_name and only for table related schema tables. This is lost in the infoschema storage engine. So infoschema will collect all tables for all databases on each call. Since the main purpose of this worklog is performance, infoschema must be able to avoid collecting data from plugins and from in-disk directories that are not needed. Milestone 28: Compare Do a final round of testing to insure that all queries against both databases return the same thing. Some field contents are timing sensitive so they cannot be the same. But all other data should be. The order of rows may be different if the query is not sorted. Milestone 29: Redirect SHOW commands Switch all the SHOW commands over to INFOSCHEMA Milestone 30: Redirect INFORMATION_SCHEMA Switch all INFORMATION_SCHEMA references to INFOSCHEMA. In this way, these two names become synonyms for each other. The shorter name is much easier to deal with, but the older name is in use everywhere. Milestone 31: Discover Performance Schema This may be a separate task, or part of WL#5308. Make changes to Performance Schema code to allow it to avoid FRM files by implementing a handler::find_files() function and a handler::discover() Milestone 32: Consolidate Look for commonalities between INFOSCHEMA and PERFORMANCE_SCHEMA and reduce code wherever possible. The attached chart titled Show2Infoschema shows all SHOW commands and all INFORMATION_SCHEMA commands and connects the SHOW commands that map to INFORMATION_SCHEMA tables. It also shows the original function that collects the data. It is attached both as a tabl-delimited txt file and an Excel spreadsheet.
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.