5.4 STTOR Phase 1

This is one of the phases in which most kernel blocks participate (see the table in Section 5.3, “STTOR Phase 0”). Otherwise, most blocks are involved primarily in the initialization of data—for example, this is all that DBTC does.

Many blocks initialize references to other blocks in Phase 1. DBLQH initializes block references to DBTUP, and DBACC initializes block references to DBTUP and DBLQH. DBTUP initializes references to the DBLQH, TSMAN, and LGMAN blocks.

NDBCNTR initializes some variables and sets up block references to DBTUP, DBLQH, DBACC, DBTC, DBDIH, and DBDICT; these are needed in the special start phase handling of these blocks using NDB_STTOR signals, where the bulk of the node startup process actually takes place.

If the cluster is configured to lock pages (that is, if the LockPagesInMainMemory configuration parameter has been set), CMVMI handles this locking.

The QMGR block calls the initData() method (defined in storage/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp) whose output is handled by all other blocks in the READ_CONFIG_REQ phase (see Section 5.1, “Initialization Phase (Phase -1)”). Following these initializations, QMGR sends the DIH_RESTARTREQ signal to DBDIH, which determines whether a proper system file exists; if it does, an initial start is not being performed. After the reception of this signal comes the process of integrating the node among the other data nodes in the cluster, where data nodes enter the cluster one at a time. The first one to enter becomes the master; whenever the master dies the new master is always the node that has been running for the longest time from those remaining.

QMGR sets up timers to ensure that inclusion in the cluster does not take longer than what the cluster's configuration is set to permit (see Controlling Timeouts, Intervals, and Disk Paging for the relevant configuration parameters), after which communication to all other data nodes is established. At this point, a CM_REGREQ signal is sent to all data nodes. Only the president of the cluster responds to this signal; the president permits one node at a time to enter the cluster. If no node responds within 3 seconds then the president becomes the master. If several nodes start up simultaneously, then the node with the lowest node ID becomes president. The president sends CM_REGCONF in response to this signal, but also sends a CM_ADD signal to all nodes that are currently alive.

Next, the starting node sends a CM_NODEINFOREQ signal to all current live data nodes. When these nodes receive that signal they send a NODE_VERSION_REP signal to all API nodes that have connected to them. Each data node also sends a CM_ACKADD to the president to inform the president that it has heard the CM_NODEINFOREQ signal from the new node. Finally, each of the current data nodes sends the CM_NODEINFOCONF signal in response to the starting node. When the starting node has received all these signals, it also sends the CM_ACKADD signal to the president.

When the president has received all of the expected CM_ACKADD signals, it knows that all data nodes (including the newest one to start) have replied to the CM_NODEINFOREQ signal. When the president receives the final CM_ACKADD, it sends a CM_ADD signal to all current data nodes (that is, except for the node that just started). Upon receiving this signal, the existing data nodes enable communication with the new node; they begin sending heartbeats to it and including in the list of neighbors used by the heartbeat protocol.

The start struct is reset, so that it can handle new starting nodes, and then each data node sends a CM_ACKADD to the president, which then sends a CM_ADD to the starting node after all such CM_ACKADD signals have been received. The new node then opens all of its communication channels to the data nodes that were already connected to the cluster; it also sets up its own heartbeat structures and starts sending heartbeats. It also sends a CM_ACKADD message in response to the president.

The signalling between the starting data node, the already live data nodes, the president, and any API nodes attached to the cluster during this phase is shown in the following diagram:

Exchange of signals in cluster STTOR start phase 1

As a final step, QMGR also starts the timer handling for which it is responsible. This means that it generates a signal to blocks that have requested it. This signal is sent 100 times per second even if any one instance of the signal is delayed..

The BACKUP kernel block also begins sending a signal periodically. This is to ensure that excessive amounts of data are not written to disk, and that data writes are kept within the limits of what has been specified in the cluster configuration file during and after restarts. The DBUTIL block initializes the transaction identity, and DBTUX creates a reference to the DBTUP block, while PGMAN initializes pointers to the LGMAN and DBTUP blocks. The RESTORE kernel block creates references to the DBLQH and DBTUP blocks to enable quick access to those blocks when needed.