Important Change: The maximum failure time calculation used to ensure that normal node failure handling mechanisms are given time to handle survivable cluster failures (before global checkpoint watchdog mechanisms start to kill nodes due to GCP delays) was excessively conservative, and neglected to consider that there can be at most
NoOfReplicasnode failures before the cluster can no longer survive. Now the value of
NoOfReplicasis properly taken into account when performing this calculation.
This fix adds the
TimeBetweenGlobalCheckpointsTimeoutdata node configuration parameter, which makes the minimum timeout between global checkpoints settable by the user. This timeout was previously fixed internally at 120000 milliseconds, which is now the default value for this parameter. (Bug #20069617, Bug #20069624)
References: See also: Bug #19858151, Bug #20128256, Bug #20135976.
NDB Cluster APIs: When a transaction is started from a cluster connection,
Indexschema objects may be passed to this transaction for use. If these schema objects have been acquired from a different connection (
Ndb_cluster_connectionobject), they can be deleted at any point by the deletion or disconnection of the owning connection. This can leave a connection with invalid schema objects, which causes an NDB API application to fail when these are dereferenced.
To avoid this problem, if your application uses multiple connections, you can now set a check to detect sharing of schema objects between connections when passing a schema object to a transaction, using the
NdbTransaction::setSchemaObjectOwnerChecks()method added in this release. When this check is enabled, the schema objects having the same names are acquired from the connection and compared to the schema objects passed to the transaction. Failure to match causes the application to fail with an error. (Bug #19785977)
NDB Cluster APIs: The increase in the default number of hashmap buckets (
DefaultHashMapSizeAPI node configuration parameter) from 240 to 3480 in MySQL NDB Cluster 7.2.11 increased the size of the internal
DictHashMapInfo::HashMaptype considerably. This type was allocated on the stack in some
getTable()calls which could lead to stack overflow issues for NDB API users.
To avoid this problem, the hashmap is now dynamically allocated from the heap. (Bug #19306793)
NDB Cluster APIs: A scan operation, whether it is a single table scan or a query scan used by a pushed join, stores the result set in a buffer. This maximum size of this buffer is calculated and preallocated before the scan operation is started. This buffer may consume a considerable amount of memory; in some cases we observed a 2 GB buffer footprint in tests that executed 100 parallel scans with 2 single-threaded (ndbd) data nodes. This memory consumption was found to scale linearly with additional fragments.
A number of root causes, listed here, were discovered that led to this problem:
Result rows were unpacked to full
NdbRecordformat before they were stored in the buffer. If only some but not all columns of a table were selected, the buffer contained empty space (essentially wasted).
These issues became more evident in NDB 7.2 and later MySQL NDB Cluster release series. This was due to the fact buffer size is scaled by
BatchSize, and that the default value for this parameter was increased fourfold (from 64 to 256) beginning with MySQL NDB Cluster 7.2.1.
This fix causes result rows to be buffered using the packed format instead of the unpacked format; a buffered scan result row is now not unpacked until it becomes the current row. In addition,
MaxScanBatchSizeare now used as limiting factors when calculating the required buffer size.
Also as part of this fix, refactoring has been done to separate handling of buffered (packed) from handling of unbuffered result sets, and to remove code that had been unused since NDB 7.0 or earlier. The
NdbRecordclass declaration has also been cleaned up by removing a number of unused or redundant member variables. (Bug #73781, Bug #75599, Bug #19631350, Bug #20408733)
It was found during testing that problems could arise when the node registered as the arbitrator disconnected or failed during the arbitration process.
In this situation, the node requesting arbitration could never receive a positive acknowledgement from the registered arbitrator; this node also lacked a stable set of members and could not initiate selection of a new arbitrator.
Now in such cases, when the arbitrator fails or loses contact during arbitration, the requesting node immediately fails rather than waiting to time out. (Bug #20538179)
The values of the
Ndb_last_commit_epoch_sessionstatus variables were incorrectly reported on some platforms. To correct this problem, these values are now stored internally as
long long, rather than
long. (Bug #20372169)
When a data node fails or is being restarted, the remaining nodes in the same nodegroup resend to subscribers any data which they determine has not already been sent by the failed node. Normally, when a data node (actually, the
SUMAkernel block) has sent all data belonging to an epoch for which it is responsible, it sends a
SUB_GCP_COMPLETE_REPsignal, together with a count, to all subscribers, each of which responds with a
SUMAreceives this acknowledgment from all subscribers, it reports this to the other nodes in the same nodegroup so that they know that there is no need to resend this data in case of a subsequent node failure. If a node failed before all subscribers sent this acknowledgement but before all the other nodes in the same nodegroup received it from the failing node, data for some epochs could be sent (and reported as complete) twice, which could lead to an unplanned shutdown.
The fix for this issue adds to the count reported by
SUB_GCP_COMPLETE_ACKa list of identifiers which the receiver can use to keep track of which buckets are completed and to ignore any duplicate reported for an already completed bucket. (Bug #17579998)
When performing a restart, it was sometimes possible to find a log end marker which had been written by a previous restart, and that should have been invalidated. Now when searching for the last page to invalidate, the same search algorithm is used as when searching for the last page of the log to read. (Bug #76207, Bug #20665205)
When reading and copying transporter short signal data, it was possible for the data to be copied back to the same signal with overlapping memory. (Bug #75930, Bug #20553247)
When a bulk delete operation was committed early to avoid an additional round trip, while also returning the number of affected rows, but failed with a timeout error, an SQL node performed no verification that the transaction was in the Committed state. (Bug #74494, Bug #20092754)
References: See also: Bug #19873609.