WL#4351: Data Dictionary: relational schema

Affects: Server-6.2   —   Status: Assigned   —   Priority: Medium

From the minutes of an architecture meeting in Stockholm, March 2008:
> Information_Schema
> 
> It was noted that Peter had agreed to list the underlying tables
> and the views.

Peter hereby lists. At this time it is only necessary to list
the base tables, and only with respect to the information
that is new or shocking.

The base tables are analogous to the standard's
DEFINITION_SCHEMA. There is no requirement that users can see
them, they are internal. But INFORMATION_SCHEMA views will just
select from them, so their contents are visible indirectly.

We already have the makings of data dictionary tables.
Peter proposes to  (a) make changes affecting identifiers,
(b) take what's already in mysql.* tables,
(c) for items that are not in mysql.* use information_schema
definitions and materialize.

Name of Data Dictionary Database
--------------------------------

Originally Peter wrote:
"
I have no name for the database where we'll store data dictionary.
Indeed I cannot assume that there is such a thing, because WL#2760
leaves us a choice of storing separate instances of metadata tables
in each database. But if we did have such a thing it would be called
DEFINITION_SCHEMA  /* because the standard has one */
MYSQL              /* because mysql.* is where stuff is now */
DATA_DICTIONARY    /* because that's what we're calling WL#2760 */
'SYS____%'         /* like other DBMSs e.g. DB2 has SYSCAT */
SUPER              /* reminiscent of the original WL#2760 proposal */
DDD                /* abbreviation of DATA DICTIONARY DATABASE. yuk. */
For neutrality in what follows, I'll preface names of new
data-dictionary tables with 'Metadata.'. We'll replace it later.
"
However, Konstantin has informed Peter that the Stockholm decided
that the name is 'mysql', same as the directory.
Peter will remove this section soon.

Changes Affecting Identifiers
-----------------------------

Peter proposed in Stockholm that this is the appropriate time to
change identifiers. He heard no objection there. Of course this
means that Peter is assuming approval of WL#922 "Case-Sensitive
Delimited Identifiers" plus WL#923 "Store regular identifiers
in upper case". He is also assuming a lot about what will be in
WL#922 and WL#923 by the time they're approved.

1. All identifier data types will be [VAR]CHAR(128).
This is the amount that we allow for storage.
This does not mean that maximum lengths automatically
increase from what the MySQL Reference Manual currently says
(http://dev.mysql.com/doc/refman/5.1/en/identifiers.html).

2. All Metadata table and column names will become upper case.
This is already true for information_schema columns but not
mysql.* columns. For example mysql.columns_priv has a column
`Host`; it will change to HOST.
This won't affect searches since they'll be case insensitive.

3. Identifiers are stored using 4-byte Unicode (UTF-8) with a
Private Use Area.

4. Almost all identifier columns will become COLLATE
UTF8_UNICODE_CS. There's currently no such collation.
The CS stands for "case sensitive" but remember that regular
identifiers get stored in upper case.

There will be no flag to show an identifier was delimited.

Base Tables like mysql.*
------------------------

These Metadata. tables have exactly the same contents as 6.0 
mysql.* base tables except for the "Changes Affecting
Identifiers" described earlier, and the addition of CATALOG
VARCHAR(128) which will always be NULL.

COLUMNS_PRIV
DB
EVENT
FUNC
HELP_CATEGORY
HELP_KEYWORD
HELP_RELATION
HELP_TOPIC
HOST
NDB_BINLOG_INDEX
PLUGIN
PROC
PROCS_PRIV
SERVERS
TABLES_PRIV
TIME_ZONE
TIME_ZONE_LEAP_SECOND
TIME_ZONE_NAME
TIME_ZONE_TRANSITION
TIME_ZONE_TRANSITION_TYPE
USER

These mysql.* tables are not metadata and do not become Metadata
tables:
GENERAL_LOG
ONLINE_BACKUP
ONLINE_BACKUP_PROGRESS
SLOW_LOG

Base Tables like information_schema
-----------------------------------

These Metadata. tables have exactly the same contents as 6.0
information_schema viewed tables except for the "Changes
Affecting Identifiers" described earlier.

Until now MYSQL has taken definitions from .frm files and from
file headers. Henceforward it stores the definitions, since
henceforward these are persistent base tables rather than just
views. However the volatile data in TABLES columns, for
example "row_count", is not stored and must be calculated
during retrieval.

FILES
KEY_COLUMN_USAGE
PARAMETERS
PARTITIONS
REFERENTIAL_CONSTRAINTS
STATISTICS
TABLES
TABLESPACES (actually this doesn't exist in INFORMATION_SCHEMA yet)
TRIGGERS
VIEWS

These information_schema tables are not metadata,
or are fixed, or are due to bugs, and do not become Metadata tables:
CHARACTER_SETS
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
ENGINES
FALCON_DATABASE_IO
FALCON_RECORD_CACHE_DETAIL
FALCON_RECORD_CACHE_SUMMARY
FALCON_SERIAL_LOG_INFO
FALCON_SYSTEM_MEMORY_SUMMARY
FALCON_SYNCOBJECTS
FALCON_SYSTEM_MEMORY_DETAIL
FALCON_TABLES
FALCON_TRANSACTIONS
FALCON_TRANSACTION_SUMMARY
FALCON_VERSION
GLOBAL_STATUS
GLOBAL_VARIABLES
PROCESSLIST
PROFILING
SESSION_STATUS
SESSION_VARIABLES

Other objects
-------------

Peter urges reading the list of objects in
WL#3713 Online backup: Selective rpl/backup of metadata
https://intranet.mysql.com/worklog/Backup-BackLog/?tid=3713
He intends to follow the same classification scheme here.
Metadata. has only a minority of the 60 items in the list.

References
----------

dev-private email thread "Minutes of Stockholm meeting re changing WL#2760"
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=4&mail=23528