WL#7284: Implement common code for different DD APIs

Affects: Server-8.0   —   Status: Complete

WL#6380 defines basic principles and guidelines for implementing new Data
Dictionary. In other words: *how* new API should look like and what
principles should be followed.

Different API implementations (table API, stored program API, ...) are
based on some [hidden] core API and have common code.

This WL specifies that core API.

In other words: this WL is about *implementing* common code following the
guidelines set in WL#6380.
NF1: No user visible changes.

This HLS is organized in the following chapters:

  - "Basics" providing technical information about DD API.

  - "Main concepts" outlining main ideas behind the DD API. This chapter
    should be of the main interest for DD API users.
    Look for 'DD API USERS' comments there.

  - "Implementation details" which specifies the logic under the hood of
    the DD API. This chapter should be of no interest for DD API users.

  - "For the record" that contains thoughts / ideas / alternatives which
    were tried and rejected.


Foundation libraries

It's proposed to use STL containers and STL strings for quickly
kick-starting new Data Dictionary.

When proper new string tools are available, it should be pretty
straightforward to switch from STL-strings to our custom strings.

STL strings provide good level of encapsulation, thus it is pretty
hard to abuse them. This is very important for the starting phase.


  - STL strings will have to be replaced either by current-MySQL-strings or
    by new-MySQL-strings before new DD goes live. It's not clear how STL
    strings support all the character sets we support in the server.

  - The main problem with the integration of the DD implemented via
    STL-strings with the rest of the server seems to be "the lack of
    ownership". I.e. the current MySQL-strings are basically (char *),
    and might be modified somewhere down the call-stack. So, a simple cast
    from STL-string to (char *) is dangerous.

  - STL containers most likely can still be used in production, as
    the server already uses them in other components;

Source file locations

All Data Dictionary files reside under sql/dd directory.

Within the sql/dd directory there is the following hierarchy:

  - sql/dd/

    Public interface files (except for the object types) are placed here.

  - sql/dd/types/
    Public object type interface files are placed here.

  - sql/dd/impl/

    All the implementation sources are placed under this directory.

  - sql/dd/impl/db/

    "Database Access Layer" (see below) implementation files.

  - sql/dd/impl/types/

    Object type implementation files.

The high-level idea behind the hierarchy is that the implementation files
must be visible separate from the interface files.

Main concepts

The Life Cycle

The Data Dictionary component must be properly initialized before use, and
proporly shuted down before the server is down. The Data Dictionary life
cycle (initialization / deinitialization) depends on the InnoDB, but it has
nothing to do with InnoDB initialization or crash-recovery.

DD initialization

The DD initialization has the following steps:

  - initialize the Data Dictionary internal in-memory structures;

  - install metadata of the Data Dictionary tables in InnoDB (do the
    bootstrap), or load that metadata in memory;

  - load various server defaults from the Data Dictionary (such as server

DD shutdown

During the DD shutdown the following actions are taken:

  - destroy in memory objects.

The Dictionary interface

The Dictionary interface is the main point for accessing the new Data
Dictionary. It's main responsibility is to start a new DD transaction.

The DD user deals with one of the transaction interfaces (see below) to
access or modifies DD objects.

Global DD functions

New DD provides the following global functions in the "dd" namespace to be
used by the rest of the server:

  - dd::init()

    This function performs the Data Dictionary initialization. It is
    intended to be called during the initialization of the MySQL server.

  - dd::shutdown()

    This function properly closes the use of Data Dictionary, freeing up
    allocated resources. The function is intended to be called during the
    shutdown of the server.

  - dd::get_dictionary()

    This is the main entry point of the Data Dictionary. The function
    returns a pointer to dd::Dictionary inteface.

    The function must be called after dd::init() and before
    Calling it before dd::init() or after dd::shutdown() results in
    undefined behavior (firing an assert in debug builds, returning NULL in
    release ones).

Database access layer

DD API USERS: this section describes DD internal component, which is not
exposed through DD API and is not intended to be used directly by DD users.

Every DD API (table API, store program API, ...) has to perform basic
operations with the database, such as:

  - lookup a record by a key / index (SELECT)

  - insert a new record (INSERT)

  - update some columns of the previously found record (UPDATE)

  - delete previously found record (DELETE)

There are the following main entities:

  - MySQL-TABLE wrapper

    This is a class containing (TABLE *) and providing operations to:

    - find a record (return found record object wrapper)

    - prepare a record for update (return prepared record object wrapper)

    - prepare a new record for insert (return new record object wrapper)

    - opens a record set (return record-set object wrapper)

    - delete a record

  - Record-set wrapper

    This class basically encapsulates one operation -- advance to the next
    record with the given criteria (primary key, index, ...).

  - Record wrapper

    This class provides convenient operations for storing and retrieving
    primitive data types (numbers, strings, ...) to/from particular field.

  - Key wrapper

    This is a simple structure, containing all the needed [low-level]
    information to pass to handler API in order to do read / write /
    search / ...

Essentially, these primitives are just convenient wrapper on the MySQL
fundamental objects (such as TABLE) and on MySQL Handler API. They are
needed to provide ORM capabilities.

These entities will reside in a separate directory: sql/dd/impl/raw.
The classes will be named with the 'Raw_' prefix to highlight the low-level
nature of them.

The 'Raw_' primitives provide a way to access (read/write) a particular
table in the database (e.g. dd.tables). Contrary, object tables encapsulate
the specific of the Data Dictionary tables, such as a set of columns,
indexes used and so on. Every DD-table has a corresponding class among
Object-tables. That class defines the collection of columns (fields) and
provides operations specific to this table such as:
  - create a key for a lookup
  - find a record by some criteria

NOTE: several alternatives to the naming scheme were discussed, and the
'Raw_' prefix was chosen as the least evil. It is also viewed as a warning
sign, which should prevent regular DD-user from dealing with such classes.

NOTE: the classes are:
  - Raw_key
  - Raw_record
  - Raw_record_set
  - Raw_new_record
  - Raw_table

Object key

We should provide a uniform way to access rows of various dictionary tables
without exposing details about physical layout of table indexes.

Object_key interface provides a single operation to create an Access-key
instance. This interface has different implementations for different table
keys and indexes.

Data Dictionary transactions

Every operation on the Data Dictionary must be performed in a context of
"Data Dictionary transaction". In other words, a DD API user must have a Data
Dictionary transaction context (i.e. DD transaction must be explicitly
started by the DD API user at some point) for doing the following

  - Read metadata for an object (open a table, ...)

  - Modify metadata for an object (alter table, ...)

There are two final transaction interfaces used in Data Dictionary:

  - Transaction_ro -- Read-only DD transaction
    (processing DML and DDL statements);

  - Transaction_rw -- Read-write DD transaction
    (processing DDL statements);

  In other words: read-only DD transactions can not request a DD-object for

Transaction_ro and Transaction_rw derive from dd::Transaction interface,
which specifies common operations for RO- and RW-transactions.

The transaction interfaces serve as entry-points to retrieve global
objects, such as schemas (or catalogs), character sets, collations. Also,
these interfaces provide operations to retrieve objects by SE-private

In other words, the DD API user has to explicitly request the Data
Dictionary to begin either read-only or read-write DD transaction. As a
result of that request, the DD API user will receive a C++ pointer to a C++
object implementing requested interface / behavior.

The Data Dictionary provides convenient auxillary classes ("transaction
holders", empoying RAII technique) to make sure DD users don't forget to
end the active DD transaction.

A read-write DD transaction will have its own object cache for the DD objects
modified in that transaction. See WL#8150 for more details about that.

The long-term vision is to have a single DD transaction spanning for
several SQL statements. This way it will be possible to achieve
transactional DDL at some point.

Overview of Data Dictionary object classes

Every Data Dictionary entity (table, column, index, foreign key, view,
store program, ...) is represented in three class families:

  - So called "object" classes.

  - So called "object type" classes;

  - So called "object table" classes;

Object classes

Object classes are probably the most obvious ones. Every DD-entity has a
corresponding "object class", which represents one instance of that entity.

For example:
  - one column is represented by one instance of dd::Column class;
  - one table is represented by one instance of dd::Table class.

DD API USERS: Object classes are parts of DD API and will probably be most
used classes while dealing with the Data Dictionary.

Object type classes

An object-type-class serves as a meta-class for object-classes. Object type
classes have the '_type' suffix. Object type classes are singletons (there
is only one instance of each object type class). One of the
responsibilities of object-type-class is to create a new object-class

For example:
  - all columns (all dd::Column instances) reference one instance of
  - all tables (all dd::Table instances) reference one instance of

DD API USERS: Object type classes are parts of DD API.

Object table classes

An object-table-class is an interface to a particular DD-table (a physical
table on disk in the Data Dictionary schema). Object-table-classes
encapsulate all the knowledge about DD-tables, such as column indexes,
index numbers, ways to create keys and so on.

Object-table-classes are named after corresponding DD-tables. Apart from
other DD-class, the object-table-classes reside in the dd::tables
namespace. That's done intentionally to minimize name conflicts.

For example:
  - dd::tables::Columns -- corresponds to dd.columns;
  - dd::tables::Tables  -- corresponds to dd.tables.

DD API USERS: Object table classes are parts of DD API.

This design provides needed level of flexibility to handle different object
types stored in one table (for example: dd::Table and dd::View objects are
stored in dd.tables).

Kinds of object classes

There are different kinds of DD objects:
  - objects which don't have object-id (dd::Weak_object)
  - objects which do have object-id (dd::Entity_object)
  - objects which are cached by the Data Dictionary (dd::Dictionary_object)


Weak_object is a single parent (top-most) interface for all Data Dictionary
object classes. The name "Weak_object" comes from the database modelling
language, where entities without primary ID key are named weak entities:


  Weak entity can not be uniquely identified by its attributes alone;
  therefore, it must use a foreign key in conjunction with its attributes
  to create a primary key.

The C++ class Weak_object is a simple class, which does not contain any
operation that is significant to the DD API user.

An example of such class is dd::Index_element (the corresponding DD table:
INDEX_COLUMN_USAGE). It encapsulates the information about a column in an


Entity objects are the objects which have object ID.

Entity objects are represented by the Entity_object interface derived from
Weak_object. Entity_object extends Weak_object by bringing in more
capabilities related to object ID.

The C++ class Entity_object declares the following attributes:
  - Object ID
  - Object name

Example of entity objects:
  - Table
  - Column
  - Index


Dictionary_object denotes objects, which can be accessed via the
dd::Dictionary interface.

In other words, Dictionary_object is the base interface for all object
types, which are accessed via dd::Dictionary interface, which means they
can be loaded / stored independently, and they can be referred to from
other objects by Object-ID.

Example of dictionary objects:
  - Charset
  - Collation
  - Schema
  - Table

A dictionary object is an entity object (i.e. if an object can be retrieved
using dd::Dictionary, the object must have ID). But not all entity objects
can be retrieved independently via dd::Dictionary. This is elaborated below
in more details.

The C++ class Dictionary_object has the following members:

  - Object type attribute, which returns a reference to a
    Dictionary_object_type-based class (see 'Object type classes' above).

  - Operation to create the key (Object_key interface) identifying this
    particular instance.

Handling object relations

There are two types of references between objects:

  - "weak object reference" -- an object reference by object-id, meaning
    that the object might not be loaded yet.

  - "strong object reference" -- an object reference by C-pointer
    (C-reference) meaning that the object is surely exists (loaded) in the

Thus, there are "tightly-coupled" and "loosely-coupled" object classes.

Here are current relations and their types:

  Tablespace - Tablespace_file      : tightly-coupled
  Table - Column                    : tightly-coupled
  Table - Index                     : tightly-coupled
  Table - Foreign_key               : tightly-coupled
  Column - Set_element              : tightly-coupled
  Column - Enum_element             : tightly-coupled
  Index - Index_column              : tightly-coupled
  Foreign_key - Foreign_key_column  : tightly-coupled

  Catalog - Schema                  : loosely-coupled
  Catalog - Tablespace              : loosely-coupled
  Schema - Table                    : loosely-coupled
  Schema - Stored_routine           : loosely-coupled
  Table - Tablespace                : loosely-coupled
  Index - Tablespace                : loosely-coupled

  - "tightly-coupling" relations means aggregation;
  - "loosely-coupling" -- association.

  - Foreign_key has actually two table references: the "from" and "to"
    tables. The tightly-coupled relation goes to the "from" reference (a
    table aggregates the foreign keys it has).

    The "to" table reference is named "referenced_table" and
    that reference is implemented using names. In other word, each
    Foreign_key object has 3 attributes specifying "referenced table"
    catalog name, schema name and table name. That kind of relations is
    neither tightly-coupled nor loosely-coupled as the DD API doesn't
    provide ways to deal with it. DD API users have to "manually" lookup
    "referenced_table" by the corresponding name attributes.

This section will use use "Parent" and "Child" words to specify relations
in general.

For example:

  - in Catalog - Schema
    - Parent: Catalog
    - Child: Schema

  - in Table - Column
    - Parent: Table
    - Child: Column

Resolving weak references

In the persistent DB-layer a reference is represented by object-id of the
object being referenced. DD API user should request a referenced DD-object
by ID.

See the "Object_reference" section in the "For the record" chapter below
for more background information.

Mapping of loosely-coupled classes

Background (quick summary):

  - Child has a weak reference to the Parent:

    class Child {
      Object_id m_parent_id;

  - Child has getter/setter allowing to get/modify the reference:

    class Child {
      Object_id parent_id() const;
      void set_parent_id(Object_id);

The proposal to handle loosely-coupled objects:

  - Parent does not have in-memory collection (cache) of references to the

  - Instead, the Parent has an operation that queries the database and
    retrieves all the children for the given Parent:

    class Child_iterator {
      Child *next();

    class Parent {
      Child_iterator *fetch_children(THD *thd, Functor *condition);

    fetch_children() basically opens a cursor on a table and fetching rows
    which match the given condition.

  - As we aim to create defensive API (against misuse), every Parent-class
    will have an operation to create an object of the Child-class:

    class Parent {
      Child *create_child();

    The Parent::create_child() method creates a new instance of Child and
    does the basic initialization. At least, it sets Child::m_parent

    Thus, the DD-user does not have a way to create a Child-object, which
    does not belong to any Parent (which is good).

    There is one limitation: a child can not be created on a non-persistent
    (newly created) parent. In other words, the following code will not

      Parent *p= xxx->create_parent();

      // 'p' is an in-memory object, it is not stored in the persistent
      // database, p->id() is INVALID_OBJECT_ID.

      Child *c= p->create_child();

      // 'c' will be NULL here. The Parent::create_child() call above
      // fails as 'p' is not persistent yet (p->id() is INVALID_OBJECT_ID).

    The reasoning for this limitation is as follows:

      - Parent -- Children references are uni-directional -- every child
        has a reference to the parent, the parent does not have a reference
        to its children.

      - If a child is created for the non-persistent parent, it's not clear
        how to store these objects:

        - If the child->store() is called, it will fail since the parent
          reference is not established (parent_id is INVALID_OBJECT_ID).

        - All children should be updated after parent->store(), but that's
          impossible since the parent doesn't know its children.

    This limitation might be lifted in future versions, but so far such
    feature does not seem to be needed.

  - Parent::drop() operation is "cascade", i.e. it drops all the children.

  - Child::drop() operation just removes child's association from the
    Parent, meaning that the next fetch_children() will not return the
    dropped child.

  - Any drop() operation drops the corresponding InnoDB records. That
    happens in the scope of the current DD transaction. The changes will be
    visible to other parties once the current DD transaction is comitted.

  - The Parent-class also provides needed lookup-operations, like

    class Parent {
      Child *lookup_child(THD *thd, const std::string &child_name);

Mapping of tightly-coupled classes

In this case the Parent class has a collection of Child-objects.
Something like this:

class Parent {
  std::vector m_children;

That collection is loaded and stored when the Parent-object is loaded and

The Child class (as usual) has a pointer to the parent:

class Child {
  Parent *m_parent;

The problem is that in this case we have bidirectional references, thus
every reference change must be updated on both sides.

For example:

  Tablespace - Tablespace_file

  Let's say, we're moving a tablespace file from one tablespace to the
  other. The following references must be updated:
    - Old_tablespace::m_files
    - New_tablespace::m_files
    - Tablespace_file::m_parent

We need to design error-proof API that minimizes chances for the user to
make a mistake (update the reference on one side, and forget to update it
on the other side).

NOTE: as outlined in WL#6380, we aim for the iterators. I.e. we will have
iterators instead of index-based lookup/traverse.

See the "Alternatives for mapping tightly-related objects" section in "For
the record" chapter for more information.

The following use cases look appropriate from the DD-user view point:

  a) Iterating through all items:

    Child_iterator *it= parent->children();

    while (true)
      Child *child= it->next();

      if (!child)


  b) Adding new item:

    Child *child= parent->create_child();

  c) Adding existing item: not supported.

  d) Drop existing item

    Child *c= parent->lookup_child(child_name);


  e) Change the parent:
    Child *c= parent->lookup_child(child_name);


Handling changes in child-object

The thing is that we need to handle three types of child-object changes:

  - A new child is added

  - An existing child is dropped

  - An existing child is modified

The main idea is that Weak_object::store() must be the only operation that
writes changes to the database storage. In other words: add / drop / update
operations will update in-memory state only, and until store() for the
"main object" is called, these changes will not hit the persistent database

The main object for tightly-coupled relationship is the top parent object.
For example: a table for the 'Table - Column - Set_element' relation --
until Table::store() is called, no changes to the table columns, or to any
column element are stored.

The main object for loosely-coupled relationship is the child object.
For example: a new schema table will not be stored in the persistent
database until Table::store() is called.

An alternative could be to propagate changes to the persistent storage in
each add / drop / update operation. It would simplify the development, but
it would also introduce potential problems with implementing online ALTER
TABLE, where an acquisition of the exclusive lock has to postponed as long
as possible (till the very end of the operation). This alternative would
mean, the exclusive lock had to be taken before the first add / drop /
modify column operation.

NOTE: it was also discussed if it is possible to hide store() operation
from the DD-user, and make the user to operate in terms of
DDL-transaction's commit or rollback. That should be possible, but not for
the 1st increment. When we introduce DDL-transactions, we can consider
keeping track of object changes in the transaction context, and "apply"
those changes (call store() method on the corresponding objects) if needed.

Postponing writes to the persistent storage means that every DD-object
must be able to tell:
  - if it is a new or not;
  - if it has been modified in memory;
  - if it has been dropped.

Properties overview

Several object types provide associated properties for every object
instance. In other words, an object can have an associated set of key-value
pairs, e.g., each table has "options" and "se-private-data".

The Properties interface and corresponding implementation is provided as a
common tool to implement those sets of key-value pairs in different object

Handling null-able columns

Many columns in the dd-tables are allowed to contain NULL value (null-able
columns). This section outlines the relation between the DD API and
null-able columns.

First of all, most of the null-able columns are declared null-able because
they are shown by INFORMATION_SCHEMA tables, which is specified by The SQL
Standard. In other words, a column has to be null-able just because it's
shown in an INFORMATION_SCHEMA table, where it's specified to be null-able.
Thus, we can not get rid of null-able columns.

Null-able columns are not exceptions, an estimation is that about 30-50% of
all columns will be null-able.

The proposed idea on the high-level is that generally the DD API must not
expose column null-ability to the DD user. On a rare cases null-ability
will be reflected in the API terms.

The thing is that in most cases there are strict rules when a column must
be set NULL and when it must have reasonable value. For example,
dd.table.partition_expression must be NULL for non-partitioned tables.  An
alternative is to push the responsibility to enforce those rules to the
DD-user side. That alternative seems to be error prone, since the code can
be duplicated and not well maintained. It's better to have a single place
(DD API implementation) where those rules are enforced.

In rare case when we do need to expose column null-ability, it's proposed
to stick to the following convention:

  - Let's say we have 'xxx' column, which is mapped to the 'xxx' attribute;

  - Ordinary operations are (getter/setter):

    const T &xxx() const;
    void set(const T &);

    where 'T' is the column/attribute type.

    It's proposed to use the same style for null-able and non-null-able
    attributes (i.e. use C++ references in both cases).

  - In addition to the operations above, it's proposed to have the
    following operations:

    bool has_xxx() const; -- returns 'false' if 'xxx' is NULL;
                          -- in that case, the result xxx() is undefined.

    void set_xxx_null();  -- indicates that 'xxx' must be set to NULL;

Handling SQL enums

WL#6379 defines several columns in the DD-tables to be of SQL ENUM type.
SQL ENUM type is required there to be shown in the INFORMATION_SCHEMA

SQL-ENUM and C-enum types have a few important differences:

  - 0 has special meaning for SQL-ENUM and can not be used to identify
    regular constant;

  - SQL-ENUM can not have gaps in the numeric values.

That means, there is no way to represent C-enum like the following:

  enum enum_field_types
    MYSQL_TYPE_DECIMAL,         -- The numeric value is 0.
    MYSQL_TYPE_TIME2,           -- Here is a "gap" in numeric values
    MYSQL_TYPE_NEWDECIMAL=246,  -- between N and 246.

So, here is the problem discussed in this section: any C-enum can not be
fully represented by SQL-ENUM.

There are the following alternatives to solve the problem:

  1. do not use SQL-ENUM, use SQL-INT to represent C-enum values.

    That's not suitable because INFORMATION_SCHEMA needs to show SQL-ENUM,
    and we don't want to add extra overhead by converting functions.

  2. use both SQL-INT and SQL-ENUM (two columns) to store a single

    That means duplication and possible inconsistency.

  3. have a mapping between C-enum values and SQL-ENUM values.

    That adds some overhead and additional logic into DD API.

Another related problem is that in few cases enum values (both C-enum and
SQL-ENUM) are a part of some API:

  - C-enum enum_field_types values are exposed by the client-server
    protocol and prepared statements.

  - dd.columns.type of SQL-ENUM is exposed via INFORMATION_SCHEMA, and what
    is even worse, several C-enum values must be shown as a single SQL-ENUM

    For example, MYSQL_TYPE_DECIMAL and MYSQL_TYPE_NEWDECIMAL C-enum-value
    must be represented by DECIMAL SQL-ENUM value.

Sometimes that problem occurs for legacy elements, which are going to be
deprecated (MYSQL_TYPE_DECIMAL in the example above). However, it's also
anticipated that the enumeration values will evolve in the future (new
items might be added as new types/algorithms/... are added, others might be
deprecated and removed). So, it's expected that the situation we have today
will also be common in the future.

When a C-enum is not exposed to the users (is a part of some interface /
contract), it should be possible to change the numerical values of the

However, given the nature of C-enum it's not a trivial task to especially
get rid of the items with 0 enum value. The problem is that C-enum
variables can be used in 'if' statements in the form of "if (x)" (i.e. just
comparing against zero). Another side of this problem is that those
variables can be initialized within the containing structures by using

Thus, changing the numerical values of the C-enums is not a complex, but
tedious task, which should be carefully verified and tested. The proposal
is identify which enums can be converged eventually, but don't rush to
converge them at this point in time.

It is proposed to use (2) or (3) depending on the attribute.

Implementation details

Object_type interface

Every DD-object has a special meta-class, that implements the Object_type

The idea for introducing object-type class is that there are the following
responsibilities associated with the object type, not with the object

  - Create an instance of DD-object of a given object type;

  - Know which DD tables should be opened to store/restore an object

Every DD-object class contains a static constant reference to the instance
of its type.

For example:

  class Table {
    static const Object_type &TYPE;

This reference is initialized in the implementation code.

Transaction interfaces

As it was mentioned above, there are 3 transaction interfaces:

  - Transaction -- the base interface

  - Transaction_ro -- the interface of read-only DD transaction (used
    mostly in DML handling)

  - Transaction_rw -- the interface of read-write DD transaction (used for
    procession DDL statements)

There are the following sets of operations.

Create global objects

  virtual Schema *create_schema() = 0;

  virtual Tablespace *create_tablespace() = 0;

Get a single object

Retrieve an individual object. It's proposed to use the "get_" prefix to
designate such operations.
These operations might use the internal dictionary cache. However, the
cache interactions must be transparent for the user. The cache will only
speed-up things, but not affect the semantic of the operation. In other
words, the DD-user must get the same results with the cache turned on and

If a "get"-operation fails, NULL pointer is returned.

The operation might fail mainly because of the following reasons:

  - requested object does not exist;

  - an internal error happened, which prevented DD to complete the
    operation. This kind of errors is expected to be very rare.
    Internal errors make the THD::error flag to be set and trigger an error
    message to be put into the error log.

Fetch a single object or a collection of objects

The "fetch_" prefix designates operations, which involve an unconditional
round trip to the persistent database.

"fetch"-operations can return a single object as well as a collection of

If a "fetch"-operation returns a single object, it returns NULL-pointer in
case of failure. It will be the DD user responsibility to handle that.

If a "fetch"-operations returns a collection of objects, it returns a
pointer to iterator over that collection. The pointer is NULL in case of

"Fetch" operations basically do the following steps:
  1. lookup an object-id by some secondary key/index (object-name,
    se-private-id, ...);
  2. "get" an object by that object-id.

The 2nd step can involve the object cache.

"Fetch" operations that deal with collection do a full-prefetch and return
an iterator over the in-memory collection of loaded objects. Basically,
what happens under the hood is:

  - go directly into the persistent database, opens necessary table.

  - find matching records.

  - read 'object-id' from the matched record.

  - "get" an object by that object-id (i.e. find it in the object cache, or
    restore from the persistent database) and put the object into the
    resulting collection.

  - close the table.

  - return an iterator over the collection.

There are the following issues with the operation semantics:

  - If a new object matching the fetch-operation criteria has been
    created in-memory, but not yet stored in the database, the fetch
    operation will skip it.

    For example:

      // Let's say, there are 3 objects matching the criteria.

      // Now add two new children.

      Child *c4= 

      Child *c5= 

      // As the new children have not been committed to the database,
      // the following call will still print 1, 2, 3

      // Store new children.


      // Now, fetch_children() sees new children,
      // so the output will be 1, 2, 3, 4, 5.

  - If an existing object matching the fetch-operation criteria has
    been modified in-memory, but not yet stored in the database, the
    fetch operation will return the *modified* object.

    For example:

      // Let's say, there are 3 objects matching the criteria.
      // Let's assume every object has object-id and a symbol
      // attribute.
      // So the function will print 1a, 2a, 3a

      // Now, let's modify the 2nd child.
      Child *c2 = 

      // fetch_children() will return the modified c2.
      // So the function will print 1a, 2b, 3a

These issues are not a problem because of metadata locking system. The
thing is that MDL will prevent modification of children while accessing
parents. The MDL system and the Data Dictionary are on different levels,
but the Data Dictionary relies on the proper MDL locking.

Release object

When the user is done with the DD-object, the user must indicate that to
the Dictionary by calling the release-object operation.

After calling release_object(), the object pointer might be still valid,
but the user must not use it.

See WL#8150 for more details.

Store object

This operation stores a new or existing object into the database and also
puts into the cache if needed.

Under the hood this operations goes through the object cache in order to
maintain its consistency.

When a newly created object is stored, it's assigned a newly generated
object-id after insert (the value of last-inserted-id from auto-increment
column). So, the storing sequence of a newly created DD-object looks like
the following:
  1. [Create a new DD object]
  2. Insert a new InnoDB record into the corresponding InnoDB table;
  3. Get the last-inserted-id and assign it as the object ID for the
  DD-object. That happens in memory, so the DD-object is now in the
  updated state in memory.
  4. Put the DD-object into the appropriate cache.

NOTE: this operation will be hidden behind the DD transaction interface.
All object modifications will be stored on a transaction commit.

Drop object

This operation attempts to drop an existing object in the database, and
releases the object (i.e. it becomes unavailable to the user).

Calling drop operation for a newly-created objects does not make much sense
(newly created objects exist only in memory), but it's a valid scenario.
The user-visible effect in this case is just that the object is released
and becomes unavailable to the user.

It's been discussed whether the drop operation should release the object
being dropped when the database essentially failed to drop it, and the
decision is to release the object unconditionally. The rationale is as

  - It is consistent for the DD-user (the object passed to drop_object() is
    always released);

  - The fact that the database failed to drop an object means that the
    object has already been dropped in the database. How that might happen
    is a different question (we rely on MDL). The thing is that the object
    in memory is inconsistent and it's good to force the DD-user to
    re-retrieve it. 

NOTE: this operation most likely will be moved to the Dictionary_object

Informational operations

There might be additional operations, such as:

  const Collation &server_collation() const

Support InnoDB specific use cases

As it has been mentioned in WL#6380, we fully rely on the MDL subsytem.
That means, that the Dictionary itself does not check if the object
requested by the user can be accessed or modified. It is supposed that the
user did all needed interactions with the MDL to lock objects. There will
be assertions to ensure that a proper MDL-lock has been taken.

NOTE: there two different MDL-lock types involved in the DD API:

  1. MDL-locks on the dd-tables;

    This is just the nature of accessing (opening & locking) tables within
    the MySQL server. The Data Dictionary doesn't need these locks itself,
    but the server infrastructure does.

  2. MDL-locks on the objects, requested from the dd-tables.

The DD API takes care of (1), but relies on the user to take care of (2).

InnoDB has special needs -- during the recovery, InnoDB needs to retrieve
objects for which MDL-locks are not held.

The idea is to provide a special interface for InnoDB to be able to
retrieve unlocked objects. This interface must not be used by regular

The thing is that InnoDB can not take needed MDL-lock, just because InnoDB
does not know object-name. All that InnoDB knows is se-private-id. If it
could, we would not provide such a special interface, but force InnoDB to
use common MDL-interface. But it can not.

Thus, the DD API must provide an operation to: 
  1. lookup object-row by se-private-id (or any other index tuple)
  2. get the object-name from the row
  3. acquire MDL-lock on object name
  4. lookup object based on se_private_id (or on object-name) again
  5. check if the object name has changed. This step makes sure that there
      was no rename/drop operations between (2) and (3). 

Implementation notes

From the DD implementation view point, the differences between
Transaction_ro and Transaction_rw are as follows. These are implementation
details and hidden from DD API users.

  - Read-only transactions use InnoDB attachable transactions.

  - Read-only transactions commit InnoDB attachable transaction in the end;

  - Read-write transactions use the user (outer) InnoDB transaction so that
    all changes are committed at once;

  - Read-only transactions share the object cache (use the global object

  - Read-write transactions use their own object cache to store modified

See WL#8150 for more details regarding the transaction implementation.

Vision of the future

The vision is that a DD-transaction will have a long scope including the
execution of a few SQL statements. That in particular means that:

  - a DD-transaction can not be seen "just" as a collection of
    THD-attributes to be saved/reset/restored. In fact, the THD-attributes
    must be reset/restored for a short time when the DD is dealing with the
    persistent storage (InnoDB) (reading / writing / dropping records).
    It's important that for the rest of time, the original values of the
    THD attributes are in effect.

  - there will be a need for upgrade a read-only DD transaction to a
    read-write DD transaction: the user transaction might start with a
    plain SELECT followed by ALTER TABLE. That in turn means that InnoDB
    attachable transaction must be upgrade to a regular read-write
    transaction. This is an unexplored question now.


The Properties interface provides:

  - convenient access to reading and modifying key-value pairs

  - an interface with the following core functions:

    - size()          : Return the number of key=value pairs
    - exists(key)     : Return true if the key exists
    - remove(key)     : Remove the key=value pair
    - value(key)      : Return the value for the given key
    - get(key, &value): Return (by parameter) the value
    - set(key, value) : Set/replace the value for the given key
    - iterators       : Iterate over all key=value pairs

  - conversion functions (X in {int64, uint64, int32, uint32, bool}):

    - to_X(string, &value): Convert from string to X, return by parameter
    - from_X(value)       : Convert from X to string, return string
    - set_X(key, value)   : Convert from string to X, set value for key
    - get_X(key, &value)  : Get value for key, convert to X, return by parameter

  - access to the raw string, which stores all those key-value pairs in
    some internal format (see below) to store that string in the DB;

  - a way to parse the raw string into a set of key-value pairs in order to
    load properties from the DB.

NOTE: size_t is not among the supported types of the Properties interface.
However, it might be re-considered later if needed.

The internal format of the raw string is a semicolon separated list of
key=value pairs. The list must conform to the following informal grammar: 

             | ";"
     ::= "="
     ::= .-["\","=",";"]
             | "\\","\=","\;"
    ::= ""

Thus, keys and values are separated by "=", and key=value pairs are separated by
";". A key must contain one or more characters, while a value can be an empty
string. The "=" and ";" may be part of key or value, but then they must be
escaped by "\". Hence, "\" itself must also be escaped. The key=value list
supports UTF-8 strings. If other character sets are needed, the strings must be

Internally, within the Properties implementation, the only characters we look
for are the numbers '0'..'9', sign ('-') (this is actually handled in the mysql
number conversion function), the semicolon ';', backslash '\' and equality '='.
All of these are single byte characters in UTF-8. Single byte characters in
UTF-8 start with a 0 bit (0xxxxxxx), while characters that are represented as
multi byte sequences in UTF-8 have bytes starting with at least one 1 bit
(1xxxxxxx, see http://en.wikipedia.org/wiki/UTF-8). Thus, a byte which is part
of a multi byte character will never be equal to a byte which is by itself a
single byte character, i.e., there will never be a byte which is part of a multi
byte character that equals any of the characters we search for internally in the
Properties implementation. Thus, UTF-8 strings are handled even though the
iteration is byte by byte rather than character by character. This means that
string parsing, unescaping and escaping works, which again means that both keys
and values can be UTF-8 strings. 

The Properties interface also defines a set of auxiliary functions to parse
string values into int32, int64, uint32, uint64, boolean types. The functions
are static, and can be invoked through the interface name. The functions are
defined as, e.g.: 

  bool to_int64(const std::string &string, int64 *value);

Here, the string argument is supposed to be a string containing the decimal
value of the number. 

  int64 x;
  bool status= Properties::to_int64("12345", &x);

The boolean conversion function is defined along the same lines. The string
argument is supposed to be "true" or a number != "0" (=> true) or "false" or "0"
(=> false). 

  bool maybe;
  bool status= Properties::to_bool("true", &maybe);

There are also conversion functions converting the other direction, i.e., from
primitive types to string: 

  std::string from_int64(int64 value);

These functions are also wrapped into variants of the set and get functions of
the Properties interface in order to convert a value between a primitive type
and a string and set or get this value for a given key, e.g.: 

  bool set_int64(const std::string &key, int64 value);
  bool get_int64(const std::string &key, int64 *value);

List of SQL enums in DD API


      Corresponding C-enum: none


    - default_partitioning ENUM(NO, YES, NUMBER)

    - subpartition_type ENUM(HASH, KEY_51, KEY_55, LINEAR_HASH,

    - default_subpartitioning ENUM(NO, NAMES, NUMBER)


    - check_option ENUM(NONE, LOCAL, CASCADED)
    - is_updatable ENUM(NO, YES)
    - security_type ENUM(INVOKER, DEFINER)


    - type ENUM(...)
      Corresponding C-enum: enum_field_types


    - order ENUM(ASC, DESC)

    - match_option ENUM(NONE, PARTIAL, FULL)

    - action_timing ENUM(BEFORE,AFTER)

    - result_data_type ENUM(...),
    - sql_data_access ENUM(CONTAINS_SQL, NO_SQL, READS_SQL_DATA,
    - security_type ENUM(INVOKER, DEFINER)

    - mode ENUM(IN, OUT, INOUT)
    - data_type ENUM(...)

    - on_completion ENUM(DROP, PRESERVE)

Single record lookup

In this specification, "single record lookup" means that an InnoDB
record/row must be read no more than once to retrieve/restore a DD object.
The thing is that the Data Dictionary doesn't know the exact type of a
DD-object being restored.

For example, both views and tables are stored in the same table. When a DD
user requests a table-or-view by ID, the Data Dictionary does not know what
it is before it reads the record and analyzes the type field.

There are different keys used to look for needed DD-object in the DD-table:
  - by object-id;
  - by name;
  - by SE-private-id;
  - by composite key.

We must have single record lookup in all these cases.

Usage of size_t

The Data Dictionary API must use the 'size_t' type for all attributes which
essentially correspond to any "length" (such as string length). Length is
usually represented by the 'size_t' type in C/C++ libraries. Thus if we
chose any other data type, we would need a type conversion/cast every time
length-attribute is used. Those conversions/casts would be required on the
DD user side. Using of the 'size_t' type in the DD API moves those
conversions/casts to the API implementation.

That does *not* mean that any unsigned numeric type must be represented as

For the record

Alternatives for mapping tightly-related objects

There are several alternatives:

  1. Expose internal containers

    The idea is like this:

    class Parent {
      typedef std::vector Child_seq;
      Child_seq m_children;

      Child_seq *children()
      { return &m_children; }

    This approach is simple and straightforward, but it is too low-level
    for the DD-user. It is vulnerable to misuse. Also, it would be good to
    have possibility to change the actual container implementation
    internally without affecting DD-users much.

    It's been decided to reject this approach, so it will not be discussed

  2. "Microsoft" approach

    This approach is widely used within object systems designed by

    The idea is that a new entity is introduced, which provides all the
    operations for consistent reference management.

    So, we had: Parent, Child. Now, we're introducing Child_collection:

    class Parent {
      public Child_collection *children();

    The Child_collection interface may be like the following:

    class Child_collection {
      add(Child *);
      drop(Child *);
      Child_iterator *iterator();

    So, the use case scenarios are like this:

    a) Iterating through all items:

      Child_collection::iterator *it= parent->children()->iterator();

      while (true)
        Child *child= it->next();

        if (!child)


    b) Adding new item:

      Child *child= parent()->children()->add_new();

    c) Adding existing item (not sure if that's needed at all):

      Child *child= ... (create new object somehow)

    d) Drop existing item


      drop() returns error status if 'child' is not in the collection.

    Of course, Child_collection will be a new type for every Child-class
    (i.e. A_collection, B_collection, ...).

    This approach has the following benefits:

      - The logic to manage collections is outsourced to a separate class and
        the Parent is not polluted with various methods.

      - Most likely it should be possible to come up with a base-collection
        class, that does most of the functionality. Then inherit individual
        collection types from it to add type-specificity.

        Actually, it should be possible to write something like:

        class Parent {

          typedef Base_collection > Child_collection;
          Child_collection m_children;

          Child_collection *children()
          { return &m_children; }

      - If there is a need for a new operation, there is a class for it,
        which means:

        - no further Parent pollution of Child-specific methods;

        - it will be more self-explanatory for the DD-user to get the list
          of collection operations.

    The downside is vague fear of too many classes. Also, it might be a bit
    tricky to implement collection operations with custom semantic -- for
    example, if add() operation must do some non-trivial checks; or if more
    than one container should be consistently mantained (e.g. vector and

  3. "Java" approach

    The idea is that the Parent has all the operations to maintain items
    within the collection.

    class Parent {
      Child_iterator *children();
      void add_child(Child *);
      void drop_child(Child *);
      void find_child(...);

    a) Iterating through all items:

      Child_iterator *it= parent->children();

      while (true)
        Child *child= it->next();

        if (!child)


    b) Adding new item:

      Child *child= parent()->add_child();

    c) Adding existing item (not sure if that's needed at all):

      Child *child= ... (create new object somehow)

    d) Drop existing item


      drop() returns error status if 'child' is not in the collection.

    The problem here is the pollution of the parent class with
    child-operations. However, implementing custom semantic should not be a

Overall, the approaches above are probably designed for the cases when
fine-grained API is needed. In our situation, we want to provide more
high-level API to alleviate the burden of setting references from the
DD-user, and (what's more important) to reduce the chance of doing
something wrong.

From the implementation view point, it makes sense to have a collection
class in any way. It might be directly exposed as in (2), or used to
implement the parent operations as in (3).

Catalog support

The final design of the Data Dictionary depends on the answers on the
following two questions:

  - What is the relation between Data Dictionary and Catalog?
    In other words: is Data Dictionary global, or per-catalog?

  - What is the relation between Catalog and Tablespace?

    Tentative answer: Catalog "has" 1 or more Tablespaces, a Tablespace can
    be contained by the only one Catalog. However, there are chances to
    revisit this answer, as the Catalog WL is not completed.

It's been agreed to postpone decisions on these items because:

  - There will be no fully functional Catalog support in 5.7;

  - 5.7 will have one default catalog as it is in 5.1 / 5.5 / 5.6;

  - There are many questions about the way Catalogs should be supported in
    the server;

  - The Catalog support is not in the 5.8 plans.

Thus, in order to finish the Data Dictionary design for 5.7 some
assumptions have to be made.

It is proposed to design Data Dictionary relying on the current level of
Catalog support (the only one default catalog), and outline what should be
modified when Catalog support will be added.

An alternative to this is to design the Data Dictionary in the assumption
of per-catalog Data Dictionary as it seems to cause less changes in the
future. However, it seems, that currently Catalog interface will only
complicate things. Moreover, the changes will be needed anyway in the
future as the Catalog are not properly designed now, so we can only try to
predict what will be in the final design. Introducing some Catalog
interface right now and changing it later would mean user confusion.
So, it's proposed to reject this alternative.

Thus, to sum up, the proposal is to not introduce Catalog interface at this
point in time.

NOTE: even though Catalog interface will not exist in the DD API, it's
still proposed to keep the catalog-related columns in the DD-tables (as
specified by WL#6379) in order to minimize future differences in the

NOTE: since the dd.schemata table has the catalog_id column, it's required
to maintain its value. One record with the default catalog information will
be inserted into the dd.catalogs table. Dictionary implementation will be
aware of the default catalog id and name.

Let's now see how the DD API can evolve in the future. The Dictionary
interface has the following logical groups of operations:

  1. Retrieve truly global read-only objects (individual object by name or
    collections of objects).

    There are two types of such objects: Charset and Collation. These
    objects can not be changed via Data Dictionary.

  2. Retrieve global objects, such as Tablespace or Schema (individual
    object by name or collections of objects).

  3. Retrieve non-global object(s) by some criteria.

      - get a table by SE-private-ID
      - get all indexes by SE-private-table-ID

The following modifications will be done for the global Data Dictionary:

  - Catalog interface will be added

  - Schema will be moved in scope of Catalog, so there will be no
    Dictionary::get_schema(), but instead Catalog::fetch_schemas().

  - Operations to deal with Catalogs will be added Dictionary.

The following modifications will be done for the per-catalog Data

  - Catalog interface will be added

  - (2) and (3) will be moved from Dictionary to Catalog

  - Schema will be moved in scope of Catalog. So, there will be no
    Schema-related operations in the Dictionary interface, the Catalog
    interface will have them.

  - Operations to deal with Catalogs will be added Dictionary.

Handling DROP without restoring objects

Let's say we're going to drop a table. That means, we should drop all its
columns, indexes, foreign keys, partitions and may be something else.

There are 2 alternatives:

  1. Object-oriented approach: restore a dd::Table-instance and call drop()
  on it.
  The drawback here is that we're doing extra job when we're handling just
  DROP TABLE statement: we don't actually need dd::Table and all its
  children to be restored. We need an C++ object just to call drop() on it
  and destroy it afterwards.

  2. RDBMS-approach: delete a matching record from the main table
  (dd.tables) and do cascade delete of all matching records in dependent
  tables using foreign keys.

  I.e. we just need to execute an SQL-statement like
    DELETE FROM dd.tables WHERE ...
  and be done.

  The thing is that there is no CASCADE foreign keys, so it is the
  responsibility of the DD to go through all the needed records and delete
  them. For example, in case of DROP TABLE the Data Dictionary must delete
  records for all table columns, indexes, etc.

The alternatives have been considered and it's been decided that (2) is an
optimization of 'DROP TABLE', which we should not bother with.


The following approach for handling weak object references was considered.
We wanted to encapsulate a weak object reference by a C++ class, which had
two members:

  - Object ID

  - C-pointer to the DD-object in case it had been already loaded into
    the cache.

Depending on the state of the object, a reference might be in the following

  - NULL C-pointer, invalid object id -- uninitialized reference.
    That could happen when a new C++ object is instantiated in memory.

  - Valid C-pointer, invalid object id.
    That could happen when a new tree of C++ objects had been created and
    initialized, but not stored to DD-tables, so the object-id was yet

  - NULL C-pointer, valid object id
    That happened when the object was just retrieved and the referenced
    object had not been restored in the memory

  - Valid C-pointer, valid object id

The main problem with that approach is maintaining consistent C-pointer in
the reference class, as the DD-object might get evicted from the cache.
Consider the following scenario:

  - A user requested a Table and its Schema. The reference-object was
    initialized: the Table references the Schema by C-pointer.

  - A user released both objects, so they were "back to" the cache and
    marked as unused.

  - The cache then decided to evict the Schema object, but keep the Table
    object. The C-pointer in the reference-object became invalid.

In order to solve that, each reference object must be aware of the object
management that the cache is doing. We didn't want go down that road.