-
Important Change: The maximum failure time calculation used to ensure that normal node failure handling mechanisms are given time to handle survivable cluster failures (before global checkpoint watchdog mechanisms start to kill nodes due to GCP delays) was excessively conservative, and neglected to consider that there can be at most
number_of_data_nodes
/NoOfReplicas
node failures before the cluster can no longer survive. Now the value ofNoOfReplicas
is properly taken into account when performing this calculation.This fix adds the
TimeBetweenGlobalCheckpointsTimeout
data node configuration parameter, which makes the minimum timeout between global checkpoints settable by the user. This timeout was previously fixed internally at 120000 milliseconds, which is now the default value for this parameter. (Bug #20069617, Bug #20069624)References: See also: Bug #19858151, Bug #20128256, Bug #20135976.
-
NDB Cluster APIs: A scan operation, whether it is a single table scan or a query scan used by a pushed join, stores the result set in a buffer. This maximum size of this buffer is calculated and preallocated before the scan operation is started. This buffer may consume a considerable amount of memory; in some cases we observed a 2 GB buffer footprint in tests that executed 100 parallel scans with 2 single-threaded (ndbd) data nodes. This memory consumption was found to scale linearly with additional fragments.
A number of root causes, listed here, were discovered that led to this problem:
Result rows were unpacked to full
NdbRecord
format before they were stored in the buffer. If only some but not all columns of a table were selected, the buffer contained empty space (essentially wasted).Due to the buffer format being unpacked,
VARCHAR
andVARBINARY
columns always had to be allocated for the maximum size defined for such columns.BatchByteSize
andMaxScanBatchSize
values were not taken into consideration as a limiting factor when calculating the maximum buffer size.
These issues became more evident in NDB 7.2 and later MySQL NDB Cluster release series. This was due to the fact buffer size is scaled by
BatchSize
, and that the default value for this parameter was increased fourfold (from 64 to 256) beginning with MySQL NDB Cluster 7.2.1.This fix causes result rows to be buffered using the packed format instead of the unpacked format; a buffered scan result row is now not unpacked until it becomes the current row. In addition,
BatchByteSize
andMaxScanBatchSize
are now used as limiting factors when calculating the required buffer size.Also as part of this fix, refactoring has been done to separate handling of buffered (packed) from handling of unbuffered result sets, and to remove code that had been unused since NDB 7.0 or earlier. The
NdbRecord
class declaration has also been cleaned up by removing a number of unused or redundant member variables. (Bug #73781, Bug #75599, Bug #19631350, Bug #20408733) -
In the event of a node failure during an initial node restart followed by another node start, the restart of the affected node could hang with a
START_INFOREQ
that occurred while invalidation of local checkpoints was still ongoing. (Bug #20546157, Bug #75916)References: See also: Bug #34702.
-
It was found during testing that problems could arise when the node registered as the arbitrator disconnected or failed during the arbitration process.
In this situation, the node requesting arbitration could never receive a positive acknowledgement from the registered arbitrator; this node also lacked a stable set of members and could not initiate selection of a new arbitrator.
Now in such cases, when the arbitrator fails or loses contact during arbitration, the requesting node immediately fails rather than waiting to time out. (Bug #20538179)
-
DROP DATABASE
failed to remove the database when the database directory contained a.ndb
file which had no corresponding table inNDB
. Now, when executingDROP DATABASE
,NDB
performs an check specifically for leftover.ndb
files, and deletes any that it finds. (Bug #20480035)References: See also: Bug #44529.
When performing a restart, it was sometimes possible to find a log end marker which had been written by a previous restart, and that should have been invalidated. Now when searching for the last page to invalidate, the same search algorithm is used as when searching for the last page of the log to read. (Bug #76207, Bug #20665205)
During a node restart, if there was no global checkpoint completed between the
START_LCP_REQ
for a local checkpoint and itsLCP_COMPLETE_REP
it was possible for a comparison of the LCP ID sent in theLCP_COMPLETE_REP
signal with the internal valueSYSFILE->latestLCP_ID
to fail. (Bug #76113, Bug #20631645)-
When sending
LCP_FRAG_ORD
signals as part of master takeover, it is possible that the master not is not synchronized with complete accuracy in real time, so that some signals must be dropped. During this time, the master can send aLCP_FRAG_ORD
signal with itslastFragmentFlag
set even after the local checkpoint has been completed. This enhancement causes this flag to persist until the statrt of the next local checkpoint, which causes these signals to be dropped as well.This change affects ndbd only; the issue described did not occur with ndbmtd. (Bug #75964, Bug #20567730)
When reading and copying transporter short signal data, it was possible for the data to be copied back to the same signal with overlapping memory. (Bug #75930, Bug #20553247)
NDB node takeover code made the assumption that there would be only one takeover record when starting a takeover, based on the further assumption that the master node could never perform copying of fragments. However, this is not the case in a system restart, where a master node can have stale data and so need to perform such copying to bring itself up to date. (Bug #75919, Bug #20546899)