WL#4034: Implement INFORMATION SCHEMA as a storage engine

Affects: Server-Prototype Only   —   Status: Un-Assigned

The INFORMATION_SCHEMA is schema/database within MySQL that contains a series of
tables which report rows of data describing various types of internal
information.  Most of these tables can also be displayed using one of the
various SHOW commands.   It is implemented within the server in many places but
mostly in a file called sql_show.cc. 

There are two general types of schema tables;
  1) Server tables in which the column and index metadata is hard coded
(as defined in http://en.wikipedia.org/wiki/Hard_coding) so table structure is
always in memory and is fixed for a given MySQL version.
  2) Storage Engine tables are declared as new information_schema plugins by
storage engines.  Since these may be compiled separately, the metadata for these
tables are not hard-coded within the server, but discovered when the plugin is
loaded. 

The INFORMATION_SCHEMA implementation uses internally "SQL derived
tables" (internal temporary tables of the optimizer) as a temporary,
intermediate storage for produced contents. This creates extra overhead during
execution and is not the right tool for the job because:

 * In MySQL, there is no generic mechanism implemented for optimization of WHERE
clauses against an internal temporary table - prior to introduction of this
usage there were no use for this. 
 * The current mechanism is not suited for producing contents on-demand, that
is, producing data only when a particular row or column is accessed.
 * This mechanism creates unnecessary overhead and implementation complexity
when providing data that is otherwise fully in-memory.  Some examples of
INFORMATION SCHEMA tables that are derived from in-memory sources are;
CHARACTER_SETS, COLUMN_PRIVILEGES, COLLATIONS, ENGINES,
GLOBAL_STATUS, GLOBAL_VARIABLES, PLUGINS, PROCESSLIST, SESSION_STATUS,
SESSION_VARIABLES & STATISTICS.

Whereas, long term, we would like to use base physical tables of the data
dictionary as the underlying storage for strict *INFORMATION_SCHEMA* views (see
WL#4039), such as TABLES, COLUMNS, KEY_COLUMN_USAGE, we still need a more
efficient and easily programmable way to support views of volatile data (that
is; data that is modified by a source external from the declaring object).

The proposal of this worklog is to use our storage engine API to implement a
special "information schema" storage engine, that will be used to produce the
necessary data.

This underlying mechanism is more suited for the task because the storage engine
API;
 * Supports condition pushdown
 * Supports operation costs
 * Has support for index declaration and can automatically let the optimizer
use the needed index.  Indexes can be used for;
   1) The natural index provided by the internal layout of data, e.g. if this
data is stored on disk. 
(Konstantin explains the meaning of "natural index" and "hash index" here:
[mysql intranet archive] /secure/mailarchive/mail.php?folder=4&mail=35344.
"Supporting a "natural" index on information_schema.tables would be telling
the optimizer that there is a unique key on the table name + db name.
With that metadata the optimizer can faster and without hacks
employed today resolved WHERE table_name= type queries.
This "index" is maintained by the filesystem, and even though it's 
not utf8_general_ci, which is collation of the column, it's in
some known collation.Why does the collation of the index have to
match the collation of the column?.")

Since currently we do not have an internal table-based data dictionary, it is
suggested that this underlying mechanism is used for *all* INFORMATION_SCHEMA
tables currently available in the system.  In the future, for data that is
located in base tables, we may use a different, separate mechanism to access it.

The main purpose of this worklog is to improve the performance of
INFORMATION_SCHEMA table access.  This will add to the performance gain
implemented by WL#3732, which was a less intrusive way of limiting table related
data materialized to only those databases and tables matching the selection
criteria.

A good example of a performance problem that should be targetted by this worklog
is BUG#43818
A new Storage engine will be created in /storage called infoschema. Written in
C++ in a similar manner as performance_schema, it will make use of inheritance
and well defined interfaces including the Handler interface.  

The infoschema storage engine will NOT ever need to be compiled independently of
the server it runs against. The data it collects is intrinsically tied to the
server that provides it.  So it may be necessary for this engine to 'know too
much' about its host since it gathers information directly from server memory
objects.  

However, wherever possible, it is to be a goal of this project to separate the
transfer of data back and forth from this storage engine to the server into well
defined interfaces.  

It is also a goal to keep the number of new interfaces to a minimum as well. 
This can be accomplished by allowing the server to do as much of its own data
collection as possible, and report that through a minimum number of new function
calls from infoschema back to the server.  Then infoschema will use the Handler
interface to report rows back to the server.

The isolation level of data reported will be a sort of 'rolling snapshot'. In
other words, a list will initially be collected of the primary keys for rows to
report with as little blocking as possible.  Then those rows will be reported if
they still exist at the time the row is requested.  So this is by no means a
serializable or repeatable-read isolation.  And it is not exactly a
read-committed isolation either because new rows that appear while data is being
reported will not be found or returned.

For example; infoschema.engines will report engines that are in the ready state
(PLUGIN_IS_READY).  If a new storage engine enters in this state immediately after
the call to Handler::rnd_init(), it will not be reported because it was not
collected into the list of engines to report by this Handler object.  However,
if an engine leaves this state immediately after the call to
Handler::rnd_init(), it will again not be reported because the call to
rnd_next() does not see the PLUGIN_IS_READY flag.  So new rows, it is like
repeatable read.  For updated and deleted rows, it is like read-committed.

Schema tables that report information about other tables, views, and triggers
will need to use existing DDL locking when it collects data.

Some information_schema tables are provided by third-party storage engine
plugins as a plugin of type MYSQL_INFORMATION_SCHEMA_PLUGIN.  These plugins use
an existing limited interface which includes only one function that can be
called by the server to collect rows for these tables.It is called fill_table().
 With this current API, the INFOSCHEMA storage engine must call this function
once, save the rows reported by the third party plugin, and then  report them
one by one as needed by the optimizer. So for the case of third party plugin
schema tables, all data must be materialized once, but only once, per query. A
future worklog task could involve expanding the plugin API for these types of
plugins.

This low level design is broken up into multiple milestones.  The technical
issues involved will be added to these milestones once they are understood.

					
Milestone 1:  Merge to mysql-next-mr   100%
	Get the code basically compiling with mysql-next-mr.  Create a repository on
bk-internal supported by pushbuild.

Milestone 2:  Initial Bug fixing.  80%
	Get the merged infoschema storage engine to work for some simple queries. 
Start identifying problems and fix them one by one.  The old code is not to be
replaced while the INFOSCHEMA storage engine is being built.  The two should
work side by side so that changes to information_schema code can be merged in
during the development process and the output of the two parallel sources can be
compared.

Milestone 3:  Improve Discovery   90%
	The existing discovery mechanism was written for NDB cluster so that it could
move FRM files around and let a clustered engine 'discover' a table from a newly
written FRM file.  This process should change so that an FRM file is not needed.
 Much of this work will feed into WL#5308.  The storage engine should provide
metadata about the table to code maintained in the server so that a TABLE_SHARE
can be created without the necessity of an FRM file.	

Milestone 4:  Variables and Status   100% done
	Improve and fix the 6 tables associated with collecting status variables from
in memory arrays.  These tables at VARIABLES, GLOBAL_VARIABLES,
SESSION_VARIABLES, STATUS, GLOBAL_STATUS, & SESSION_STATUS.  Make the global and
session tables inherit the VARIABLE and STATUS classes to reduce and simplify code.

Milestone 5:  Colations and Character sets   80%
	Clean up these three simple tables and  make sure they work the same as the old
code.  CHARACTER_SETS, COLLATIONS & COLLATION_CHARACTER_SET_APPLICABILITY

Milestone 6:  Plugins and Engines   100%
	Clean up these two similar tables.  ENGINES is a subset of PLUGINS and should
inherit the plugin class to reduce code. 

Milestone 7:  Tables, etc   50%
	There are 10 information_schema tables that collect data through sql_show.cc;
get_all_tables().  Three of them were previously implemented.  This process is
the most complex in information_schema and needs to be simplified.  Create an
infoschema_tables_base class that the others will inherit and reimplement the
work in this class.  Make sure TABLES, TABLE_NAMES & PARTITIONS work with this
refactoring.

Milestone 8:  Third-Party tables
	Third party storage engines may register their own information_schema tables
through the plugin interface.  This API has an init() and fill_table() function
to provide the table.  Create a generic class in INFOSCHEMA that can be set up
to collect and display this information.	100%		

Milestone 9:  Columns
	Create the table COLUMNS inheriting infochema_tables_base

Milestone 10:  Key Column Usage
	Create the table KEY_COLUMN_USAGE inheriting infochema_tables_base

Milestone 11:  Referential Contrainsts
	Create the table REFERENTIAL_CONSTRAINTS inheriting infochema_tables_base

Milestone 12:  Statistics
	Create the table STATISTICS inheriting infochema_tables_base

Milestone 13:  Triggers
	Create the table TRIGGERS inheriting infochema_tables_base

Milestone 14:  Views
	Create the table VIEWS inheriting infochema_tables_base

Milestone 15:  Events
	Create the table EVENTS

Milestone 16:  Files
	Create the table FILES

Milestone 17:  Parameters
	Create the table PARAMETERS

Milestone 18:  Profiling
	Create the table PROFILING

Milestone 19:  Routines
	Create the table ROUTINES

Milestone 20:  Schema Privileges
	Create the table SCHEMA_PRIVILEGES

Milestone 21:  Schemata
	Create the table SCHEMATA

Milestone 22:  Table Constraints
	Create the table TABLE_CONSTRAINTS

Milestone 23:  Table Privileges
	Create the table TABLE_PRIVILEGES

Milestone 24:  Tablespaces
	Create the table TABLESPACES

Milestone 25:  User Privileges
	Create the table USER_PRIVILEGES

Milestone 26:  Index use by the Optimizer
	Determine how the Optimizer uses hash indexes and debug the current
implementation in INFOSCHEMA.  Clean this up and implement indexes in any table
where it makes sense or delete all the existing index code.

Milestone 27:  Condition Pushdown
	The existing information schema implementation uses a limited condition
pushdown for the fields db_name and table_name and only for table related schema
tables.  This is lost in the infoschema storage engine. So infoschema will
collect all tables for all databases on each call.  Since the main purpose of
this worklog is performance, infoschema must be able to avoid collecting data
from plugins and from in-disk directories that are not needed.

Milestone 28:  Compare
	Do a final round of testing to insure that all queries against both databases
return the same thing.  Some field contents are timing sensitive so they cannot
be the same.  But all other data should be.  The order of rows may be different
if the query is not sorted.

Milestone 29:  Redirect SHOW commands
	Switch all the SHOW commands over to INFOSCHEMA

Milestone 30:  Redirect INFORMATION_SCHEMA
	Switch all INFORMATION_SCHEMA references to INFOSCHEMA.  In this way, these two
names become synonyms for each other.  The shorter name is much easier to deal
with, but the older name is in use everywhere.

Milestone 31:  Discover Performance Schema
	This may be a separate task, or part of WL#5308.  Make changes to Performance
Schema code to allow it to avoid FRM files by implementing a
handler::find_files() function and a handler::discover()

Milestone 32:  Consolidate
	Look for commonalities between INFOSCHEMA and PERFORMANCE_SCHEMA and reduce
code wherever possible.


The attached chart titled Show2Infoschema shows all SHOW commands and all
INFORMATION_SCHEMA commands and connects the SHOW commands that map to
INFORMATION_SCHEMA tables.  It also shows the original function that collects
the data.  It is attached both as a tabl-delimited txt file and an Excel
spreadsheet.