Data nodes are the distributed, sharded storage core of MySQL NDB Cluster. Its data is usually accessed by MySQL Servers (also called SQL nodes in NDB parlance). The MySQL servers each have their own transactional Data Dictionary (DD) where all the metadata describing tables, databases, tablespaces, logfile groups, foreign keys, and other objects are stored for use by MySQL server. The MySQL server DD, introduced in version 8.0, has enabled improvements such as atomic and crash-safe DDL and the INFORMATION_SCHEMA
implementation among other things. At the storage engine level, NDB has its own distributed data dictionary describing all of the schema objects which can be modified directly using native NdbApi.
From an NDB Cluster perspective, the NDB Dictionary is viewed as the source of truth while each MySQL server’s DD is equivalent to a cached copy whose overlapping contents need to be kept in synchronization with that of the NDB Dictionary. This synchronization is achieved by the ndbcluster storage engine plugin through the following three mechanisms:
- Schema Synchronization: This occurs every time a MySQL server reconnects to the Cluster. The schema synchronization mechanism ensures that the DD of the MySQL server is updated with any NDB metadata changes that might have occurred while the MySQL server was not connected to the Cluster. It is important to note that there are no changes made to metadata in the NDB Dictionary in this phase with the NDB Dictionary remaining read-only until the synchronization concludes.
- Schema Distribution: While a MySQL server is connected to the Cluster, we rely on the schema distribution mechanism to ensure that all connected MySQL servers remain in synchronized states. This is done by ensuring that all DDL changes involving NDB metadata are distributed across all connected MySQL servers.
- User-triggered Synchronization: Unlike the first 2 mechanisms which are executed automatically in the background, this requires the user to take action and trigger a synchronization of metadata. In NDB Cluster 7.x versions, this is useful after the ndb_restore utility is used to restore metadata in the NDB Dictionary. Such changes then have to be reflected in the DD of the MySQL server and require the user to manually trigger a synchronization which can be done on a larger scale by issuing a
SHOW TABLES
query or using the “table discovery” mechanism to synchronize on a per table basis. Table discovery can be triggered by any DMLs that involve a table open such asSELECT
orSHOW CREATE TABLE
for example.
In MySQL 8.0, the MySQL Server data dictionary was reimplemented, storing schema information in InnoDB tables, and using InnoDB transactions to give transactional behaviour to MySQL Server data dictionary DDL operations. For NDB, the introduction of the transactional DD in MySQL 8.0 involved large changes to the internal working of schema synchronization and distribution including improvements to the respective protocols. Most of this schema synchronization work is done automatically in the background and will have little or no impact to the user. The user-triggered synchronization, on the other hand, is obviously different and we took the chance to review its behaviour and indeed change the working in its entirety in NDB Cluster 8.0 (which is now GA!).
In NDB Cluster 7.x versions, issuing a SHOW TABLES
command performs the equivalent of a schema synchronization comparing the contents of the data directory with that of the NDB Dictionary and correcting any mismatch detected. This is less than ideal due to the following reasons:
- Usability: The user is expected to issue an additional query after restoring metadata to the NDB Dictionary to ensure that the metadata is also visible in the MySQL server. This can become tedious with larger configurations since it has to be done on every MySQL server connected to the Cluster.
- Global locks: This requires acquiring and holding global locks which prevents other metadata changes from occurring during the synchronization.
- Additional work done by
SHOW TABLES
:SHOW TABLES
is meant to be a simple read query but instead performs additional metadata changes and uses more resources than one would expect. - Design concern: The user thread performs synchronization which is primarily the responsibility of the NDB Event Handling component.
This functionality in NDB Cluster 7.x versions relied on the presence of .frm files which have been removed with the advent of the MySQL server DD in MySQL 8.0. This gave us the chance to wipe the slate clean in NDB Cluster 8.0 and look at how to approach the problem again. Read the follow-up post for more details about Automatic Schema Synchronization in NDB Cluster 8.0!