This consists of the following steps:
The master sets the latest GCI as the restart GCI, and then synchronizes its system file to all other nodes involved in the system restart.
The next step is to synchronize the schema of all the nodes in the system restart. This is performed in 15 passes. The problem we are trying to solve here occurs when a schema object has been created while the node was up but was dropped while the node was down, and possibly a new object was even created with the same schema ID while that node was unavailable. In order to handle this situation, it is necessary first to re-create all objects that are supposed to exist from the viewpoint of the starting node. After this, any objects that were dropped by other nodes in the cluster while this node was “dead” are dropped; this also applies to any tables that were dropped during the outage. Finally, any tables that have been created by other nodes while the starting node was unavailable are re-created on the starting node. All these operations are local to the starting node. As part of this process, is it also necessary to ensure that all tables that need to be re-created have been created locally and that the proper data structures have been set up for them in all kernel blocks.
After performing the procedure described previously for the master node the new schema file is sent to all other participants in the system restart, and they perform the same synchronization.
All fragments involved in the restart must have proper
parameters as derived from
causes a number of
to be sent from
DBLQH. This also starts the restoration
of the fragments, which are restored one by one and one
record at a time in the course of reading the restore data
from disk and applying in parallel the restore data read
from disk into main memory. This restores only the main
memory parts of the tables.
Once all fragments have been restored, a
START_RECREQ message is sent to all nodes
in the starting cluster, and then all undo logs for any Disk
Data parts of the tables are applied.
After applying the undo logs in
is necessary to perform some restore work in
TSMAN that requires scanning the extent
headers of the tablespaces.
Next, it is necessary to prepare for execution of the redo
log, which log can be performed in up to four phases. For
each fragment, execution of redo logs from several different
nodes may be required. This is handled by executing the redo
logs in different phases for a specific fragment, as decided
DBDIH when sending the
START_FRAGREQ signal. An
EXEC_FRAGREQ signal is sent for each
phase and fragment that requires execution in this phase.
After these signals are sent, an
EXEC_SRREQ signal is sent to all nodes to
tell them that they can start executing the redo log.
Before starting execution of the first redo log, it is
necessary to make sure that the setup which was started
earlier (in Phase 4) by
finished, or to wait until it does before continuing.
Prior to executing the redo log, it is necessary to calculate where to start reading and where the end of the REDO log should have been reached. The end of the REDO log should be found when the last GCI to restore has been reached.
After completing the execution of the redo logs, all redo log pages that have been written beyond the last GCI to be restore are invalidated. Given the cyclic nature of the redo logs, this could carry the invalidation into new redo log files past the last one executed.
After the completion of the previous step,
DBLQH report this back to
DBDIH using a
When the master has received this message back from all
starting nodes, it sends a
signal back to
NDB_STARTCONF message signals the end
STTOR phase 4 to
NDBCNTR, which is the only block involved
to any significant degree in this phase.