-
During a node restart, the
SUMA
block of the node that is starting must get a copy of the subscriptions (events with subscribers) and subscribers (NdbEventOperation
instances which are executing) from a node already running. Before the copy is complete, nodes which are still starting ignore any user-levelSUB_START
orSUB_STOP
requests; after the copy is done, they can participate in such requests. While the copy operation is in progress, user-levelSUB_START
andSUB_STOP
requests are blocked using aDICT
lock.An issue was found whereby a starting node could participate in
SUB_START
andSUB_STOP
requests after the lock was requested, but before it is granted, which resulted in unsuccessfulSUB_START
andSUB_STOP
requests. This fix ensures that the nodes cannot participate in these requests until after theDICT
lock has actually been granted. (Bug #31302657) Statistics generated by
NDB
for use in tracking internal objects allocated and deciding when to release them were not calculated correctly, with the result that the threshold for resource usage was 50% higher than intended. This fix corrects the issue, and should allow for reduced memory usage. (Bug #31127237)The Dojo toolkit included with NDB Cluster and used by the Auto-Installer was upgraded to version 1.15.3. (Bug #31029110)
A packed version 1 configuration file returned by ndb_mgmd could contain duplicate entries following an upgrade to NDB 8.0, which made the file incompatible with clients using version 1. This occurs due to the fact that the code for handling backwards compatibility assumed that the entries in each section were already sorted when merging it with the default section. To fix this, we now make sure that this sort is performed prior to merging. (Bug #31020183)
-
When executing any of the
SHUTDOWN
,ALL STOP
, orALL RESTART
management commands, it is possible for different nodes to attempt to stop on different global checkpoint index (CGI) boundaries. If they succeed in doing so, then a subsequent system restart is slower than normal because any nodes having an earlier stop GCI must undergo takeover as part of the process. When nodes failing on the first GCI boundary cause surviving nodes to be nonviable, surviving nodes suffer an arbitration failure; this has the positive effect of causing such nodes to halt at the correct GCI, but can give rise to spurious errors or similar.To avoid such issues, extra synchronization is now performed during a planned shutdown to reduce the likelihood that different data nodes attempt to shut down at different GCIs as well as the use of unnecessary node takeovers during system restarts. (Bug #31008713)
-
The master node in a backup shut down unexpectedly on receiving duplicate replies to a
DEFINE_BACKUP_REQ
signal. These occurred when a data node other than the master errored out during the backup, and the backup master handled the situation by sending itself aDEFINE_BACKUP_REF
signal on behalf of the missing node, which resulted in two replies being received from the same node (aCONF
signal from the problem node prior to shutting down and theREF
signal from the master on behalf of this node), even though the master expected only one reply per node. This scenario was also encountered forSTART_BACKUP_REQ
andSTOP_BACKUP_REQ
signals.This is fixed in such cases by allowing duplicate replies when the error is the result of an unplanned node shutdown. (Bug #30589827)
-
A
BLOB
value is stored byNDB
in multiple parts; when reading such a value, one read operation is executed per part. If a part is not found, the read fails with a row not found error, which indicates a corruptedBLOB
, since aBLOB
should never have any missing parts. A problem can arise because this error is reported as the overall result of the read operation, which means that mysqld sees no error and reports zero rows returned.This issue is fixed by adding a check specifically for the case in wich a blob part is not found. Now, when this occurs, overwriting the row not found error with corrupted blob, which causes the originating
SELECT
statement to fail as expected. Users of the NDB API should be aware that, despite this change, theNdbBlob::getValue()
method continues to report the error as row not found in such cases. (Bug #28590428) Incorrect handling of operations on fragment replicas during node restarts could result in a forced shutdown, or in content diverging between fragment replicas, when primary keys with nonbinary (case-sensitive) equality conditions were used. (Bug #98526, Bug #30884622)