ndb_restore now reports the specific
NDB
error number and message when it is unable to load a table descriptor from a backup.ctl
file. This can happen when attempting to restore a backup taken from a later version of the NDB Cluster software to a cluster running an earlier version—for example, when the backup includes a table using a character set which is unknown to the version of ndb_restore being used to restore it. (Bug #30184265)-
The output from
DUMP 1000
in the ndb_mgm client has been extended to provide information regarding total data page usage. (Bug #29841454)References: See also: Bug #29929996.
NDB Disk Data: When a data node failed following creation and population of an
NDB
table having columns on disk, but prior to execution of a local checkpoint, it was possible to lose row data from the tablespace. (Bug #29506869)MySQL NDB ClusterJ: If ClusterJ was deployed as a separate module of a multi-module web application, when the application tried to create a new instance of a domain object, the exception
java.lang.IllegalArgumentException: non-public interface is not defined by the given loader
was thrown. It was because ClusterJ always tries to create a proxy class from which the domain object can be instantiated, and the proxy class is an implementation of the domain interface and the protectedDomainTypeHandlerImpl::Finalizable
interface. The class loaders of these two interfaces were different in the case, as they belonged to different modules running on the web server, so that when ClusterJ tried to create the proxy class using the domain object interface's class loader, the above-mentioned exception was thrown. This fix makes theFinalization
interface public so that the class loader of the web application would be able to access it even if it belongs to a different module from that of the domain interface. (Bug #29895213)MySQL NDB ClusterJ: ClusterJ sometimes failed with a segmentation fault after reconnecting to an NDB Cluster. This was due to ClusterJ reusing old database metadata objects from the old connection. With the fix, those objects are discarded before a reconnection to the cluster. (Bug #29891983)
-
Once a data node is started, 95% of its configured
DataMemory
should be available for normal data, with 5% to spare for use in critical situations. During the node startup process, all of its configuredDataMemory
is usable for data, in order to minimize the risk that restoring the node data fails due to running out of data memory due to some dynamic memory structure using more pages for the same data than when the node was stopped. For example, a hash table grows differently during a restart than it did previously, since the order of inserts to the table differs from the historical order.The issue raised in this bug report occurred when a check that the data memory used plus the spare data memory did not exceed the value set for
DataMemory
failed at the point where the spare memory was reserved. This happened as the state of the data node transitioned from starting to started, when reserving spare pages. After calculating the number of reserved pages to be used for spare memory, and then the number of shared pages (that is, pages from shared global memory) to be used for this, the number of reserved pages already allocated was not taken into consideration. (Bug #30205182)References: See also: Bug #29616383.
-
When executing a global schema lock (GSL),
NDB
used a singleNdb_table_guard
object for successive retires when attempting to obtain a table object reference; it was not possible for this to succeed after failing on the first attempt, sinceNdb_table_guard
assumes that the underlying object pointer is determined once only—at initialisation—with the previously retrieved pointer being returned from a cached reference thereafter.This resulted in infinite waits to obtain the GSL, causing the binlog injector thread to hang so that mysqld considered all
NDB
tables to be read-only. To avoid this problem,NDB
now uses a fresh instance ofNdb_table_guard
for each such retry. (Bug #30120858)References: This issue is a regression of: Bug #30086352.
When starting, a data node's local sysfile was not updated between the first completed local checkpoint and start phase 50. (Bug #30086352)
In the
BACKUP
block, the assumption was made that the first record inc_backups
was the local checkpoint record, which is not always the case. NowNDB
loops through the records inc_backups
to find the (correct) LCP record instead. (Bug #30080194)During node takeover for the master it was possible to end in the state
LCP_STATUS_IDLE
while the remaining data nodes were reporting their state asLCP_TAB_SAVED
. This led to failure of the node when attempting to handle reception of aLCP_COMPLETE_REP
signal since this is not expected when idle. Now in such cases local checkpoint handling is done in a manner that ensures that this node finishes in the proper state (LCP_TAB_SAVED
). (Bug #30032863)Restoring tables for which
MAX_ROWS
was used to alter partitioning from a backup made from NDB 7.4 to a cluster running NDB 7.6 did not work correctly. This is fixed by ensuring that the upgrade code handlingPartitionBalance
supplies a valid table specification to theNDB
dictionary. (Bug #29955656)During upgrade of an NDB Cluster when half of the data nodes were running NDB 7.6 while the remainder were running NDB 8.0, attempting to shut down those nodes which were running NDB 7.6 led to failure of one node with the error CHECK FAILEDNODEPTR.P->DBLQHFAI. (Bug #29912988, Bug #30141203)
When performing a local checkpoint (LCP), a table's schema version was intermittently read as 0, which caused
NDB
LCP handling to treat the table as though it were being dropped. This could effect rebuilding of indexes offline by ndb_restore while the table was in theTABLE_READ_ONLY
state. Now the function reading the schema version (getCreateSchemaVersion()
) no longer not changes it while the table is read-only. (Bug #29910397)-
NDB
index statistics are calculated based on the topology of one fragment of an ordered index; the fragment chosen in any particular index is decided at index creation time, both when the index is originally created, and when a node or system restart has recreated the index locally. This calculation is based in part on the number of fragments in the index, which can change when a table is reorganized. This means that, the next time that the node is restarted, this node may choose a different fragment, so that no fragments, one fragment, or two fragments are used to generate index statistics, resulting in errors fromANALYZE TABLE
.This issue is solved by modifying the online table reorganization to recalculate the chosen fragment immediately, so that all nodes are aligned before and after any subsequent restart. (Bug #29534647)
-
During a restart when the data nodes had started but not yet elected a president, the management server received a node ID already in use error, which resulted in excessive retries and logging. This is fixed by introducing a new error 1705 Not ready for connection allocation yet for this case.
During a restart when the data nodes had not yet completed node failure handling, a spurious Failed to allocate nodeID error was returned. This is fixed by adding a check to detect an incomplete node start and to return error 1703 Node failure handling not completed instead.
As part of this fix, the frequency of retries has been reduced for not ready to alloc nodeID errors, an error insert has been added to simulate a slow restart for testing purposes, and log messages have been reworded to indicate that the relevant node ID allocation errors are minor and only temporary. (Bug #27484514)
The process of selecting the transaction coordinator checked for “live” data nodes but not necessarily for those that were actually available. (Bug #27160203)