Just as you use a database like MySQL to store your application data, MySQL must also store its meta data (schema names, table definitions etc) somewhere. Traditionally this meta data storage has been split between many different locations (.FRM, .PAR, .OPT, .TRN and .TRG files). This has gradually become a bottleneck in various contexts.
To grow the MySQL server even further and provide users with more features, more reliability and better INFORMATION_SCHEMA performance, we need a new transactional data dictionary. Let us investigate some of the motivation for creating a new transactional data dictionary in MySQL 8.0.
- Our INFORMATION_SCHEMA implementation suffers, and has been subject to years of complaints. A file based data dictionary complicates the implementation and has proven non-performant. See MySQL 8.0: Improvements to Information_schema for more details.
- Dictionaries out of synch. Before MySQL 8.0, the data dictionary is a “split brain”, where the “server” and InnoDB have their own separate data dictionary, where some information duplicated. Information that is duplicated in the MySQL server dictionary and the InnoDB dictionary might get out of synch, and we need one common “source of truth” for dictionary information.
- Lack of uniformity. The non-uniformity of the data dictionary (storing in MyISAM tables, .FRM, .PAR, .OPT, .TRN and .TRG files) makes it very hard to maintain. Having so many different access patterns, and no uniform API, makes the code for handling dictionary information bloated, and makes it harder to use for programmers that need to the access meta data of the MySQL server.
- No atomic DDL. Storing the data dictionary in non-transactional tables and files, means that DDLs are unsafe for replication (they are not transactional, not even atomic). If a compound DDL fails we still need to replicate it and hope that it fails with the same error. This is a best effort approach and there is a lot of logic coded to handle this . It is hard to maintain, slows down progress and bloats the replication codebase. The data dictionary is stored partly in non-transactional tables. These are not safe for replication building resilient HA systems on top of MySQL. For instance, some dictionary tables need to be manipulated using regular DML, which causes problems for GTIDs.
- Crash recovery. Since the DDL statements are not atomic, it is challenging to recover after crashing in the middle of a DDL execution, and is especially problematic for replication.
- Lack of extensibility. We have grand plans to add new features to MySQL in future releases. Having an extensible meta data framework lowers the barrier to entry in adding new objects that require persistence, making them more viable.
- Improved upgrades. As I will detail in my next post, The data dictionary will have a version table. This will enable automatic upgrade from 8.0 and forward on data dictionary tables.
So we clearly see the need for a transactional data dictionary, so we can continue to add valuable new features and improvements the MySQL server.