MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
MySQL 8.0: Data Dictionary Architecture and Design

This blog post elaborates on the architecture and design of the transactional data dictionary that will be part of MySQL 8.0. Some descriptions of architecture will be implemented in later versions. See  MySQL 8.0 Data Dictionary:  Background and motivation.

The MySQL Data Dictionary Schema
dd-in-innodb-tables
The Transactional Data Dictionary in 8.0 has a simplified and uniform handling of dictionary data

Dictionary  tables and system tables store data and meta data needed by the MySQL server. The dictionary tables are designed based on the SQL standard. The “system tables” hold meta data or data in the mysql schema. The dictionary tables are designed to be extendible. Note that therefore you will not find any “future looking” fields in the table definitions. Please see WL#6379 for details on the schema definition of the data dictionary tables.

Upgrade from 8.0 and forward

The data dictionary will have a version table. This will enable automatic upgrade from 8.0 and forward on data dictionary tables.

I_S as views over Data Dictionary Tables

INFORMATION SCHEMA is now implemented as views over dictionary tables, requires no extra disc accesses, no creation of temporary tables, and is subject to similar handling of character sets and collations as user tables.

information-schema-in-mysql-80
The ecosystem of INFORMATION_SCHEMA
An API for the Data Dictionary

We will implement a uniform API for the data dictionary. This API will then be used by server internal code, plugin service API code and storage engines, through the SE API. This will be done in a manner that has low intrusiveness for the server code that access the data dictionary, so code can be refactored piecewise.

The new Data Dictionary cache

There are many caches in MySQL, and these caches are not always hidden behind  APIs. A new cache implementation aims to be a replacement for many caches, and this new cache is hidden behind the data dictionary API. For now, we have not replaced any old caches, but enhanced them to use the new cache. Going forward we will refactor the old caches, create proper APIs for them and adapt the code of the callers. This will simplify the code of the callers, and move all the cache logic behind the API.

The new SE API for atomic and crashsafe DDL

We do want to provide atomic and crashsafe DDL, and this requires changes to the MySQL server DDL code, and the InnoDB code where dictionary tables are stored. The MySQL server code will remove all implicit commits and implement clear atomic semantics for DDL statements. To enable this, the tables must be stored in a transactional storage engine, and we will use InnoDB.

With the new SE API, we are able to implement crashsafe DDL, as the storage of the data dictionary is InnoDB, which inherently has transational behaviour.

Serialized Dictionary Information and changes to the IMPORT statement

Many users have showed love for the ability to copy table data and FRM files into the DATA DIRECTORY and have the MySQL server automatically picking up these tables. This capability has also been utilized for disaster recovery, where .FRM “blackbelts” have been able to reconstruct the meta data in the .FRM file. In MySQL 8.0 we provide Serialized Dictionary information for dictionary objects. For  InnoDB tablespaces, this information is appended to the tablespace, so the meta data and data are bundled together. For storage engines which are not supporting this functionality, a .SDI file will be written.

sdi
Note the unidirection of the arrow which indicates that the SDI is a copy

 

For InnoDB tablespaces, a tool will be provided to read the SDI information. The SDI information is JSON format. So the same capability of modifying the SDI as users have with .FRM files for disaster recovery is provided.

One “source of truth”

We will have a global dictionary. So InnoDB will populate the InnoDB dictionary cache from the global data dictionary. We will then remove the class of problems that previously was known as “split brain” problem.