ndb_restore now reports the specific
NDBerror number and message when it is unable to load a table descriptor from a backup
.ctlfile. This can happen when attempting to restore a backup taken from a later version of the NDB Cluster software to a cluster running an earlier version—for example, when the backup includes a table using a character set which is unknown to the version of ndb_restore being used to restore it. (Bug #30184265)
References: See also: Bug #29929996.
NDB Disk Data: When a data node failed following creation and population of an
NDBtable having columns on disk, but prior to execution of a local checkpoint, it was possible to lose row data from the tablespace. (Bug #29506869)
Once a data node is started, 95% of its configured
DataMemoryshould be available for normal data, with 5% to spare for use in critical situations. During the node startup process, all of its configured
DataMemoryis usable for data, in order to minimize the risk that restoring the node data fails due to running out of data memory due to some dynamic memory structure using more pages for the same data than when the node was stopped. For example, a hash table grows differently during a restart than it did previously, since the order of inserts to the table differs from the historical order.
The issue raised in this bug report occurred when a check that the data memory used plus the spare data memory did not exceed the value set for
DataMemoryfailed at the point where the spare memory was reserved. This happened as the state of the data node transitioned from starting to started, when reserving spare pages. After calculating the number of reserved pages to be used for spare memory, and then the number of shared pages (that is, pages from shared global memory) to be used for this, the number of reserved pages already allocated was not taken into consideration. (Bug #30205182)
References: See also: Bug #29616383.
When executing a global schema lock (GSL),
NDBused a single
Ndb_table_guardobject for successive retires when attempting to obtain a table object reference; it was not possible for this to succeed after failing on the first attempt, since
Ndb_table_guardassumes that the underlying object pointer is determined once only—at initialisation—with the previously retrieved pointer being returned from a cached reference thereafter.
This resulted in infinite waits to obtain the GSL, causing the binlog injector thread to hang so that mysqld considered all
NDBtables to be read-only. To avoid this problem,
NDBnow uses a fresh instance of
Ndb_table_guardfor each such retry. (Bug #30120858)
References: This issue is a regression of: Bug #30086352.
When starting, a data node's local sysfile was not updated between the first completed local checkpoint and start phase 50. (Bug #30086352)
BACKUPblock, the assumption was made that the first record in
c_backupswas the local checkpoint record, which is not always the case. Now
NDBloops through the records in
c_backupsto find the (correct) LCP record instead. (Bug #30080194)
During node takeover for the master it was possible to end in the state
LCP_STATUS_IDLEwhile the remaining data nodes were reporting their state as
LCP_TAB_SAVED. This led to failure of the node when attempting to handle reception of a
LCP_COMPLETE_REPsignal since this is not expected when idle. Now in such cases local checkpoint handling is done in a manner that ensures that this node finishes in the proper state (
LCP_TAB_SAVED). (Bug #30032863)
Restoring tables for which
MAX_ROWSwas used to alter partitioning from a backup made from NDB 7.4 to a cluster running NDB 7.6 did not work correctly. This is fixed by ensuring that the upgrade code handling
PartitionBalancesupplies a valid table specification to the
NDBdictionary. (Bug #29955656)
During upgrade of an NDB Cluster when half of the data nodes were running NDB 7.6 while the remainder were running NDB 8.0, attempting to shut down those nodes which were running NDB 7.6 led to failure of one node with the error CHECK FAILEDNODEPTR.P->DBLQHFAI. (Bug #29912988, Bug #30141203)
When performing a local checkpoint (LCP), a table's schema version was intermittently read as 0, which caused
NDBLCP handling to treat the table as though it were being dropped. This could effect rebuilding of indexes offline by ndb_restore while the table was in the
TABLE_READ_ONLYstate. Now the function reading the schema version (
getCreateSchemaVersion()) no longer not changes it while the table is read-only. (Bug #29910397)
NDBindex statistics are calculated based on the topology of one fragment of an ordered index; the fragment chosen in any particular index is decided at index creation time, both when the index is originally created, and when a node or system restart has recreated the index locally. This calculation is based in part on the number of fragments in the index, which can change when a table is reorganized. This means that, the next time that the node is restarted, this node may choose a different fragment, so that no fragments, one fragment, or two fragments are used to generate index statistics, resulting in errors from
This issue is solved by modifying the online table reorganization to recalculate the chosen fragment immediately, so that all nodes are aligned before and after any subsequent restart. (Bug #29534647)
During a restart when the data nodes had started but not yet elected a president, the management server received a node ID already in use error, which resulted in excessive retries and logging. This is fixed by introducing a new error 1705 Not ready for connection allocation yet for this case.
During a restart when the data nodes had not yet completed node failure handling, a spurious Failed to allocate nodeID error was returned. This is fixed by adding a check to detect an incomplete node start and to return error 1703 Node failure handling not completed instead.
As part of this fix, the frequency of retries has been reduced for not ready to alloc nodeID errors, an error insert has been added to simulate a slow restart for testing purposes, and log messages have been reworded to indicate that the relevant node ID allocation errors are minor and only temporary. (Bug #27484514)
The process of selecting the transaction coordinator checked for “live” data nodes but not necessarily for those that were actually available. (Bug #27160203)