Important Change: The maximum failure time calculation used to ensure that normal node failure handling mechanisms are given time to handle survivable cluster failures (before global checkpoint watchdog mechanisms start to kill nodes due to GCP delays) was excessively conservative, and neglected to consider that there can be at most
NoOfReplicasnode failures before the cluster can no longer survive. Now the value of
NoOfReplicasis properly taken into account when performing this calculation.
This fix adds the
TimeBetweenGlobalCheckpointsTimeoutdata node configuration parameter, which makes the minimum timeout between global checkpoints settable by the user. This timeout was previously fixed internally at 120000 milliseconds, which is now the default value for this parameter. (Bug #20069617, Bug #20069624)
References: See also: Bug #19858151, Bug #20128256, Bug #20135976.
NDB Cluster APIs: A scan operation, whether it is a single table scan or a query scan used by a pushed join, stores the result set in a buffer. This maximum size of this buffer is calculated and preallocated before the scan operation is started. This buffer may consume a considerable amount of memory; in some cases we observed a 2 GB buffer footprint in tests that executed 100 parallel scans with 2 single-threaded (ndbd) data nodes. This memory consumption was found to scale linearly with additional fragments.
A number of root causes, listed here, were discovered that led to this problem:
Result rows were unpacked to full
NdbRecordformat before they were stored in the buffer. If only some but not all columns of a table were selected, the buffer contained empty space (essentially wasted).
These issues became more evident in NDB 7.2 and later MySQL NDB Cluster release series. This was due to the fact buffer size is scaled by
BatchSize, and that the default value for this parameter was increased fourfold (from 64 to 256) beginning with MySQL NDB Cluster 7.2.1.
This fix causes result rows to be buffered using the packed format instead of the unpacked format; a buffered scan result row is now not unpacked until it becomes the current row. In addition,
MaxScanBatchSizeare now used as limiting factors when calculating the required buffer size.
Also as part of this fix, refactoring has been done to separate handling of buffered (packed) from handling of unbuffered result sets, and to remove code that had been unused since NDB 7.0 or earlier. The
NdbRecordclass declaration has also been cleaned up by removing a number of unused or redundant member variables. (Bug #73781, Bug #75599, Bug #19631350, Bug #20408733)
In the event of a node failure during an initial node restart followed by another node start, the restart of the affected node could hang with a
START_INFOREQthat occurred while invalidation of local checkpoints was still ongoing. (Bug #20546157, Bug #75916)
References: See also: Bug #34702.
It was found during testing that problems could arise when the node registered as the arbitrator disconnected or failed during the arbitration process.
In this situation, the node requesting arbitration could never receive a positive acknowledgement from the registered arbitrator; this node also lacked a stable set of members and could not initiate selection of a new arbitrator.
Now in such cases, when the arbitrator fails or loses contact during arbitration, the requesting node immediately fails rather than waiting to time out. (Bug #20538179)
DROP DATABASEfailed to remove the database when the database directory contained a
.ndbfile which had no corresponding table in
NDB. Now, when executing
NDBperforms an check specifically for leftover
.ndbfiles, and deletes any that it finds. (Bug #20480035)
References: See also: Bug #44529.
When performing a restart, it was sometimes possible to find a log end marker which had been written by a previous restart, and that should have been invalidated. Now when searching for the last page to invalidate, the same search algorithm is used as when searching for the last page of the log to read. (Bug #76207, Bug #20665205)
During a node restart, if there was no global checkpoint completed between the
START_LCP_REQfor a local checkpoint and its
LCP_COMPLETE_REPit was possible for a comparison of the LCP ID sent in the
LCP_COMPLETE_REPsignal with the internal value
SYSFILE->latestLCP_IDto fail. (Bug #76113, Bug #20631645)
LCP_FRAG_ORDsignals as part of master takeover, it is possible that the master not is not synchronized with complete accuracy in real time, so that some signals must be dropped. During this time, the master can send a
LCP_FRAG_ORDsignal with its
lastFragmentFlagset even after the local checkpoint has been completed. This enhancement causes this flag to persist until the statrt of the next local checkpoint, which causes these signals to be dropped as well.
This change affects ndbd only; the issue described did not occur with ndbmtd. (Bug #75964, Bug #20567730)
When reading and copying transporter short signal data, it was possible for the data to be copied back to the same signal with overlapping memory. (Bug #75930, Bug #20553247)
NDB node takeover code made the assumption that there would be only one takeover record when starting a takeover, based on the further assumption that the master node could never perform copying of fragments. However, this is not the case in a system restart, where a master node can have stale data and so need to perform such copying to bring itself up to date. (Bug #75919, Bug #20546899)