MySQL :: NDB Cluster Internals :: 5.21 System Restart Handling in Phase 4

The master sets the latest GCI as the restart GCI, and then synchronizes its system file to all other nodes involved in the system restart.

The next step is to synchronize the schema of all the nodes in the system restart. This is performed in 15 passes. The problem we are trying to solve here occurs when a schema object has been created while the node was up but was dropped while the node was down, and possibly a new object was even created with the same schema ID while that node was unavailable. In order to handle this situation, it is necessary first to re-create all objects that are supposed to exist from the viewpoint of the starting node. After this, any objects that were dropped by other nodes in the cluster while this node was “dead” are dropped; this also applies to any tables that were dropped during the outage. Finally, any tables that have been created by other nodes while the starting node was unavailable are re-created on the starting node. All these operations are local to the starting node. As part of this process, is it also necessary to ensure that all tables that need to be re-created have been created locally and that the proper data structures have been set up for them in all kernel blocks.

After performing the procedure described previously for the master node the new schema file is sent to all other participants in the system restart, and they perform the same synchronization.

All fragments involved in the restart must have proper parameters as derived from DBDIH. This causes a number of START_FRAGREQ signals to be sent from DBDIH to DBLQH. This also starts the restoration of the fragments, which are restored one by one and one record at a time in the course of reading the restore data from disk and applying in parallel the restore data read from disk into main memory. This restores only the main memory parts of the tables.

Once all fragments have been restored, a START_RECREQ message is sent to all nodes in the starting cluster, and then all undo logs for any Disk Data parts of the tables are applied.

After applying the undo logs in LGMAN, it is necessary to perform some restore work in TSMAN that requires scanning the extent headers of the tablespaces.

Next, it is necessary to prepare for execution of the redo log, which log can be performed in up to four phases. For each fragment, execution of redo logs from several different nodes may be required. This is handled by executing the redo logs in different phases for a specific fragment, as decided in DBDIH when sending the START_FRAGREQ signal. An EXEC_FRAGREQ signal is sent for each phase and fragment that requires execution in this phase. After these signals are sent, an EXEC_SRREQ signal is sent to all nodes to tell them that they can start executing the redo log.

Note

Before starting execution of the first redo log, it is necessary to make sure that the setup which was started earlier (in Phase 4) by DBLQH has finished, or to wait until it does before continuing.

Prior to executing the redo log, it is necessary to calculate where to start reading and where the end of the redo log should have been reached. The end of the redo log should be found when the last GCI to restore has been reached.

After completing the execution of the redo logs, all redo log pages that have been written beyond the last GCI to be restore are invalidated. Given the cyclic nature of the redo logs, this could carry the invalidation into new redo log files past the last one executed.

After the completion of the previous step, DBLQH report this back to DBDIH using a START_RECCONF message.

When the master has received this message back from all starting nodes, it sends a NDB_STARTCONF signal back to NDBCNTR.

The NDB_STARTCONF message signals the end of STTOR phase 4 to NDBCNTR, which is the only block involved to any significant degree in this phase.