WL#4034: Implement INFORMATION SCHEMA as a storage engine

Affects: Server-Prototype Only   —   Status: Un-Assigned

The INFORMATION_SCHEMA is schema/database within MySQL that contains a series of
tables which report rows of data describing various types of internal
information.  Most of these tables can also be displayed using one of the
various SHOW commands.   It is implemented within the server in many places but
mostly in a file called sql_show.cc. 

There are two general types of schema tables;
  1) Server tables in which the column and index metadata is hard coded
(as defined in http://en.wikipedia.org/wiki/Hard_coding) so table structure is
always in memory and is fixed for a given MySQL version.
  2) Storage Engine tables are declared as new information_schema plugins by
storage engines.  Since these may be compiled separately, the metadata for these
tables are not hard-coded within the server, but discovered when the plugin is
loaded. 

The INFORMATION_SCHEMA implementation uses internally "SQL derived
tables" (internal temporary tables of the optimizer) as a temporary,
intermediate storage for produced contents. This creates extra overhead during
execution and is not the right tool for the job because:

 * In MySQL, there is no generic mechanism implemented for optimization of WHERE
clauses against an internal temporary table - prior to introduction of this
usage there were no use for this. 
 * The current mechanism is not suited for producing contents on-demand, that
is, producing data only when a particular row or column is accessed.
 * This mechanism creates unnecessary overhead and implementation complexity
when providing data that is otherwise fully in-memory.  Some examples of
INFORMATION SCHEMA tables that are derived from in-memory sources are;
CHARACTER_SETS, COLUMN_PRIVILEGES, COLLATIONS, ENGINES,
GLOBAL_STATUS, GLOBAL_VARIABLES, PLUGINS, PROCESSLIST, SESSION_STATUS,
SESSION_VARIABLES & STATISTICS.

Whereas, long term, we would like to use base physical tables of the data
dictionary as the underlying storage for strict *INFORMATION_SCHEMA* views (see
WL#4039), such as TABLES, COLUMNS, KEY_COLUMN_USAGE, we still need a more
efficient and easily programmable way to support views of volatile data (that
is; data that is modified by a source external from the declaring object).

The proposal of this worklog is to use our storage engine API to implement a
special "information schema" storage engine, that will be used to produce the
necessary data.

This underlying mechanism is more suited for the task because the storage engine
API;
 * Supports condition pushdown
 * Supports operation costs
 * Has support for index declaration and can automatically let the optimizer
use the needed index.  Indexes can be used for;
   1) The natural index provided by the internal layout of data, e.g. if this
data is stored on disk. 
(Konstantin explains the meaning of "natural index" and "hash index" here:
[mysql intranet archive] /secure/mailarchive/mail.php?folder=4&mail=35344.
"Supporting a "natural" index on information_schema.tables would be telling
the optimizer that there is a unique key on the table name + db name.
With that metadata the optimizer can faster and without hacks
employed today resolved WHERE table_name= type queries.
This "index" is maintained by the filesystem, and even though it's 
not utf8_general_ci, which is collation of the column, it's in
some known collation.Why does the collation of the index have to
match the collation of the column?.")

Since currently we do not have an internal table-based data dictionary, it is
suggested that this underlying mechanism is used for *all* INFORMATION_SCHEMA
tables currently available in the system.  In the future, for data that is
located in base tables, we may use a different, separate mechanism to access it.

The main purpose of this worklog is to improve the performance of
INFORMATION_SCHEMA table access.  This will add to the performance gain
implemented by WL#3732, which was a less intrusive way of limiting table related
data materialized to only those databases and tables matching the selection
criteria.

A good example of a performance problem that should be targetted by this worklog
is BUG#43818