WL#2760: Super-Database (real Data Dictionary)

Status: Assigned   —   Priority: Low

In a single file, store information about all databases 
and database objects. This has been called a system 
catalog or system-wide directory or dictionary 
in the past. Super-Database is not a common term but 
I'm hoping it's memorable. The word "super" just means
"over" as in "supertype" or "supersede", rather than
"wonderful" as in comic books.

During Roadmap discussions at the Heidelberg Dev Meeting,
Kostja committed to having an approved HLS for this
functionality by the time MySQL 6.1 enters beta. The
stretch goal is to have an approved LLD at the same time,
or shortly thereafter. 

At the Riga meeting we realized that we cannot
deliver HLS and LLD in the current 6.1 time frame
(by August 2009). We still want it for 6.x.

The roadmap https://inside.mysql.com/wiki/MySQL_Server_Future
currently says WL#2760 is in 6.x and 7.x.
It's really in 6.x. There may be a cluster-related
subtask which we'll split out and put in 7.x.
Trudy explains thus:
"No error. Read the comments and see the spreadsheet above 
all the text -- Kostja committed to having an approved HLS 
for WL#2760 in the 6.x timeframe as well as perhaps an 
approved LLD. Then, for 7.x, we have the implementation scheduled."


Old information in .frm, .MYD, and .MYI files
Other DBMSs
Atomicity, Clustering, Concurrency
Mention of a Data Integration Engine
Effects on other worklog tasks
Affected bugs


What do we call this thing? Proposals were:
Super-Database, or Data Dictionary, or Real Data Dictionary,
or System Catalog sometimes abbreviated SYSCAT or SYS,
or system-wide dictionary, or directory, or master database of
databases, or object list, or definition_schema, or `mysql`.

The Stockholm meeting folk chose: Data Dictionary.
The problem with "data dictionary" is it has multiple meanings.
   * a document describing a database
   * a component of a database describing its structure
   * middleware added to a DBMS's native data dictionary
   And, we've already used "data dictionary" when referring
... But the alternatives weren't as good.


If we have a disk-based storage engine, what is the base location
for all the tables?
Alternatives for location of the data dictionary tables were:
The MySQL datadir directory, or the MySQL datadir/mysql directory,
or a directory depending on a new configuration variable that
defaults to the same as datadir, or the current directory,
or according to an argument in mysql_install_db, or
in a default tablespace for the default storage engine.

In the Stockholm meeting Konstantin put three items on the whiteboard:
1. Where to store metadata?
   In datadir/mysql.
2. What is the database name of the db containing metadata tables?
   The answer is `mysql` not `system` or `syscat'
3. Does each database keep getting its own subdirectory?

Default Storage Engine

The Default Storage Engine is the storage engine
that handles the data dictionary.

Both of the following statements were accepted
during the Stockholm meeting:

1. Our default storage engine is one that stores metadata
in a subdirectory of the datadir called mysql.
2. The default storage engine will be a transactional engine
that we control.


How many data dictionaries are there?

Alternatives were "1" (as with `mysql` database),
"1 per catalog", "1 per schema", "1 per instance",
"1 per cluster", "1 per storage engine","1 per server".
The choise was: one metadata database per server.
You cannot connect to two data dictionaries at once with
"CONNECT TO <SQL-server name> [ USER ...];".


What the user sees is a bunch of tables. But what is
actually on the disk? Proposals were:
a series of SQL statements, or a MyISAM .MYD file,
or several MyISAM .MYD files, or a table in any of
the advanced storage engines. The Stockholm meeting
chose: "it is a table in any of the advanced
storage engines". The Trondheim meeting determined
it must be a transactional storage engine, not
to be named.

Any questions about indexes and primary keys will
be determined when Peter submits the schema description.


All database objects (e.g. persistent base tables,
routines, triggers), and some extra-database objects
(e.g. plugins, tablespaces). A numbered list of
objects is in WL#3713 "Online backup: Selective rpl/backup
of metadata". Minimally, we want to store all the
information that we have currently in information_schema,
and/or all the information that we have in `mysql` database.
(So: not prepared statements, not performance data.)

Also, there can be some new items, or changes to
existing items. These are some proposals.

1. Every object will have a unique numeric identifier.
   It's a bit quicker to search for
   "object_guid = 5555555"
   rather than
   "object_schema = 'x' and object_identifier = 'y'".
   The number can include the server ID.
   The number is unique.
   Konstantin thinks the number is persistent;
   Peter thinks the number may change without notice.
   A replicated database will have a different number set.
   ... Accepted during Stockholm meeting March 2008.

2. Every object will have a non-unique hash of the
   object name. This might speed up sequential searches
   in the cache.
   ... Rejected during Stockholm meeting March 2008.

3. Every character identifier, e.g. catalog_name + schema_name +
   table_name, can be up to 128 characters long.
   This is a maximum in standard SQL.
   See also WL#2284 "Increase the length of a user name".
   And from the chart on page 340 of "SQL Performance Tuning"
   (Gulutzan + Pelzer) (Copyright (c) 2003) we have:
   Vendor         MaxTableNameLength
   ------         ------------------
   IBM              18
   Informix        128
   Ingres           32
   InterBase        32
   Microsoft       128
   MySQL            64
   Oracle           30
   Sybase           30
   ... Accepted during Stockholm meeting March 2008.

4. All character data will be in 3-byte UTF8. (Originally this
   section added "For characters in non-Unicode character sets,
   where two characters might map to the same Unicode character,
   there will be a conversion to a Private Use Area character",
   but that is superseded by discussions about WL#923, we convert
   to UTF8 the same way we convert in other contexts.)
   ... Accepted during Stockholm meeting March 2008.

5. Every character identifier, e.g. catalog_name + schema_name +
   table_name, will be stored and searched according to
   WL#922 Case-Sensitive Delimited Identifiers, and
   WL#923 Store regular identifiers in upper case.
   This might mean there are new collations and flags.
   ... Accepted during Stockholm meeting March 2008.

6. Every object will have a "status" to show whether it's
   invalid, needs re-checking, etc. As in Oracle. See
   ... Rejected during Stockholm meeting March 2008.
   (don't need 'obsolete' flag because we re-prepare)

7. There will no longer be an algorithm for converting table
   names to file names or vice versa, so there has to be a
   lookup. In other words, the base table for "tables" will
   have a column for the file name.
   ... Rejected during Stockholm meeting March 2008.
   (out of respect for Monty, we keep tablename-to-filename mapping)


There will be a program for converting to or from
the 'old' (`mysql` database, .frm files) and `new`

The conversion program, part of "mysql_upgrade", will:
* take every "mysql" table and add to corresponding
  super-database table
* take every .frm in every other datadir subdirectory
  and add to super-database.tables.
Conversions can fail because of changes in case
sensitivity, or because there is ancient debris
in a directory that was never meant to be read.

Alternatives during the conversion period are:
1. Compulsory upgrade.
   When a mysql-6.3 server starts: if there is
   no Super-Database, but there is an old-style "mysql"
   database (`mysql` directory, .frm files):
   Error "You must run the conversion program".
2. Parallel development.
   For mysql-6.3 users will be able to refuse to
   convert. The old-style database is deprecated
   but upgrade will not be compulsory until mysql-7.0.
   Parallel development will mean lots more work for us.
3. Reverse conversion.
   There will also be a program for converting from 'new'
   (Super-Database) to 'old' (`mysql` database, .FRM files).
   This will become difficult if we introduce new features
   at the same time that we introduce Super-Database, for
   example longer identifier names.

During the March 2008 Stockholm meeting, Konstantin wrote
more on the whiteboard, showing 'levels of .frm support':
1. Upgrade-level support, that is, we upgrade and lose all .frm files.
   (This is what's in the 'Conversion' section of WL#2760 now, minus
2. Import an .frm to an existing data dictionary
3. Export data from data dictionary to an .frm
   3a. in old format
   3a. in new extended format
4. The .frm files are there and they are updated online,
   at the same time that the metadata tables are updated.
5. .frm storage engine.
No decision was made, Konstantin said he will do some necessary


It will be convenient if names and structures are the same
in Super-Database and in information_schema, but that's of
minor importance compared to objectives of efficiency.

The information_schema tables will in fact be views.
For example, supposing table names are in Super-Database
table "x", the definition of information_schema.tables is
CREATE VIEW tables AS SELECT table_name FROM "x";
But users will not see this definition.

Peter has agreed to list the underlying tables
and the views.

Old information in .frm, .MYD, and .MYI files

The previous note about .frm files that was already made above
applies here too -- Konstantin will investigate.
Therefore for the moment the information in the section
should be regarded as obsolete.

Each storage engine has files containing some "metadata".
For example, with MyISAM, we'd have to worry about:
* In the .frm files there is information about the name,
the foreign key definition, etc.
* In the .MYI files there is information about used space,
column names, etc.
We don't worry about the data in the .MYD or .MYI files,
since there is no need to change it. But if we want all
metadata to be in Super-Database, then how do we handle
the fact that some utilities expect it to be elsewhere?

The alternatives for .frm files are:
1. All .frm files disappear.
2. All .frm files remain, but they are not the authoritative
   source of information about tables.
3. All .frm files remain and, even if there is a
   Super-Database, the information in the .frm files is
   what matters. In this case, the Super-Database tables
   are (Antony's words) "a bunch of system-managed
   auto-created-at-startup memory tables".

The alternatives for .MYI files are:
1. The storage-engine maintainers have to change MyISAM
   so that everything is in Super-Database and the .MYI
   header ceases to have meaning.
2. The Super-Database base tables do not contain anything
   that's in the .MYI file, so you have to read the .MYI
   file to find out (for example) foreign-key information.

We should settle this:
Maintaining databases by maintaining files, for example
renaming a table by copying its files, is to be discouraged.
There may be some dispute, though.

Quoting Mark:
"You could still keep .frm files and such, but their names
just wouldn't matter. because some people do like the
databases/tables are files kind of thing. ..."

Quoting Ingo:
"This task would probably be much easier if we had a data
dictionary. But MySQL obviously doesn't want it. Instead
Monty rewrites the .FRM files. Sigh."

Quoting Konstantin:
If we would like to retain FRM files, we should implement FRM storage engine
that supports:
 - two-phase commit

We should perhaps store nothing in FRMs except the definitions of the base tables.
This would allow us to bypass FRM processing completely when building prelocking
sets or parsing stored procedures. ACID properties of FRM storage engine are
required in order to ensure atomicity of creation of complex objects: tables
with foreign keys and triggers, when the creator has to update more than one
data source (mysql.triggers table and the frm itself).

Another argument why we should not store anything except table definitions in
FRMs is data normalization.

Our data dictionary should be available as a relational database, and the better
this database is designed the easier it is to work with. Currently we have the
following tables:

The following tables are missing:
base_tables (storage_engine=FRM).

Storing anything except base table information in FRMs will either mean that our
base_tables table is denormalized, or that we will impose much higher
requirements on the design of FRM engine (it will have to support tables of
arbitrary structure).


Alternatives were:

1. We keep binlog the way it is. Changing logs is not part
   of this task.

2. Instead of CREATEs and ALTERs and DROPs, we store
   INSERTs and UPDATEs and DELETEs of metadata rows.
   This means we have true Row Based Replication.

The Stockholm March 2008 meeting was:
"1. we keep binlog the way it is."


Oracle stores information in the "data dictionary" (a term
that Oracle seems to like), which is in the SYSTEM tablespace.
Examples of some tables in the SYSTEM tablespace:
An SQL statement that gets them all: 
  select a.owner,a.object_name,a.object_id,a.object_type,a.status,b.num_rows
  from dba_objects a , dba_tables b
  where (a.owner='SYSTEM' or a.owner='SYS')
  and a.owner=b.owner
  and a.object_type='TABLE'
  and a.object_name=b.table_name
  order by 2
For a list, see
Actually you don't see these tables very often. They're base tables.
Usually you look at views. The USER_* and DBA_* and ALL_* tables are views.
Notice that it's SYSTEM tablespace; PERFSTAT is a different tablespace;
Oracle knows that "persistent" and "transient" don't belong together.

Oracle has a "data dictionary" cache, also called a "row cache" because it's a
cache of rows rather than a cache of buffers. Row Cache contents are the
most recently used rows and the most important columns. Row Cache access control
is via latches. Row Cache access monitoring is via the V$ROWCACHE view.
If an SQL statement refers to an object that is outside the Row Cache:
(1) free space by discarding old cache objects using an LRU algorithm,
(2) read information from the data dictionary table using an SQL statement.

Other DBMSs

SQL Server for global objects uses a single "Resource" database, and
for per-database objects stores user metadata in the appropriate database.
Thus there seems to be a choice: store everything in one set of tables
(which is what I'm thinking that Oracle does), or store system objects in a
main database and user-metadata in the database it's created in (which
is what SQL Server does). See section "Number".

DB2 has a SYSCATSPACE tablespace and inside there are four schemas:
SYSIBM, SYSCAT, SYSFUN, SYSSTAT. Buffers are in a separate pool.


The first (rejected) alternative was:

A notion of dependencies among schema-level objects was introduced
with online backup. E.g. stored procedure P1 depends on view V if V
is used in P1.
See WL#3713 "Online backup: Selective rpl/backup of metadata"
and dev-backup email thread "List of database object dependencies"
If the definition of view V is altered, the dependency between V and P1
shall be tracked and P1 shall be marked invalid until
it's recompiled. If any object, in turn, depends on am invalidated object, it
shall be also invalidated, recursively.
This was rejected during the Stockholm March 2008 meeting.

The second (mostly accepted alternative was:


RENAME object_name TO object_name should automatically transfer all sub-objects
of that object to the new object (indexes, privileges, triggers, foreign keys,
constraints). This may be unnecessary if object references are by "object id"
rather than by "object name" as described in this task's "Content" section.

Should referencing stored procedures and views be updated automatically or
invalidated? See the previous requirement ("OBJECT INVALIDATION").
Possible solutions:
- DROP all the sub-objects but do not create them for the new table
- FORBID rename of a table which has triggers, constraints, foreign keys.
This could depend on RESTRICT|CASCADE keywords but that's not proposed.

This was accepted during the Stockholm March 2008 meeting;
there will be some variation due to other discussions.

DB2 takes this approach:
When renaming a table, the source table must not:
* Be referenced in any existing view definitions or 
  materialized query table definitions
* Be referenced in any triggered SQL statements in existing 
  triggers or be the subject table of an existing trigger
* Be referenced in an SQL function
* Have any check constraints
* Have any generated columns other than the identity column
* Be a parent or dependent table in any referential 
  integrity constraints
* Be the scope of any existing reference column. 
DB2 automatically transfers all the sub-objects to the new object.

Oracle takes this approach:
* Oracle automatically transfers integrity constraints, 
  indexes, and grants on the old object to the new object.
* Oracle invalidates all objects that depend on the renamed 
  object, such as views, synonyms, and stored procedures and 
  functions that refer to a renamed table

See also:
Oracle Database Concepts 11g "Schema Object Dependencies"


We're not doing caching as described in this section,
instead we are doing WL#4298 and WL#4299.
but Mikael's comment will remain.

Caching of grant information will not change in scope of WL#2760.
Also the definition of grant tables won't change.

Ideally there is a complete in-memory copy. But sometimes there
are tens of thousands of objects, of which only a few are used.
Caching possibilities include:

1. Have a special LRU cache, as with Oracle, that contains only rows
   (not pages) and contains only the parts that are necessary all
   the time. For example, when parsing a SELECT statement, you

2. Depend on the caching / buffering available with the storage engine
   that you used when creating Super-Database.

Mikael writes re caching:

The caching solution should handle both the handling of global cached  
objects as well as
local cached objects in the thread object.

It should also handle invalidations in a generic manner.

There should be upcalls from storage engine both to locking level and  
to cache level when
it is about to internally change the meta data. The same approach can  
be used for clustered
engines where the Data Dictionary change is issued from another MySQL  
Server in the cluster.

Atomicity, Clustering, Concurrency

In the Stockholm March 2008 meeting, we agreed
to remove this whole section.

Mention of a Data Integration Engine

In the Stockholm March 2008 meeting, we agreed
to remove this whole section, non-unamimously.

Effects on other worklog tasks

Acceptance of this task (WL#2760) affects other worklog tasks.

WL#923 "Store regular identifiers in upper case"
should be made dependent on WL#2760. The description
in section "Tablename-to-filename encodings versus Directories"
should say that "Proposal E" is preferred.

WL#3125 "New structure of privilege tables"
should be cancelled, although it's on the roadmap for version 7:

WL#3726 "Transactional DDL locking for all metadata objects"
should be revised in places where it refers to

Note from the March 2008 Stockholm meeting:
"With regard to WL#923, we have to get a consistent
description for both WL#923 and WL#2760. Therefore
the LLD of WL#2760 cannot be done until these
interdependent tasks are both revised and accepted.
We are not yet agreed about WL#3726."

Affected bugs
BUG#158 ENUM and SET types does not accept valid cp1251 character
BUG#494 Symbol 0xFF is not allowed in column names
BUG#1939 Wrong case sensitivity for table names when in "ANSI" mode
BUG#4117 Too many columns while creating a table
BUG#7191 REVOKE doens't remove permissions from a logged in user
-- this is a manifestation of dependency tracking problem.
BUG#11715 naming rules for constraints lead to illstructured information_schema
BUG#14985 Removing a Table "referenced" by a View is allowed
BUG#16424 Events: event remains if user is dropped
BUG#16184 DatabaseMetaData.getImportedKeys returns lower case tablename
BUG#17613 Change privileges after object is changed
BUG#19783 triggers are triggered on just single node in clustered table
BUG#20943 Traditional: INSERT accepts invalid date from Default value
- not immediately relevant, but raises the issue of storing SQL_MODE in
BUG#20796 MySQL doesn't support more than 32000 databases
BUG#22034 Events: U and ΓΌ are the same name
BUG#23683 SHOW DATABASES doesn't filter invalid directory names
BUG#22615 MySQL Server incorrectly catogorizes the lost+found directory as a
BUG#25922 InnoDB crash recovery changes: make DDL in MySQL 'atomic'
BUG#30114 crash during RENAME TABLE leads to loss of data
BUG#30115 crash during RENAME TABLE breaks replication
BUG#49437 Cannot do SHOW FIELDS for big view


Oracle Database Concepts, 11g, "The Data Dictionary"

dev-architecture email thread
"Ingress to work on Data Dictionary Reengineering"

dev-private email thread
"Re: suggestions Re: READ & REPLY: code cleanup in 6.1"

dev-private email thread
"Minutes of Stockholm meeting re changing WL#2760"