Important Change: When restoring to a cluster using data node IDs different from those in the original cluster, ndb_restore tried to open files corresponding to node ID 0. To keep this from happening, the
--backupidoptions—neither of which has a default value—are both now explicitly required when invoking ndb_restore. (Bug #28813708)
NDB Disk Data: When a log file group had more than 18 undo logs, it was not possible to restart the cluster. (Bug #251155785)
References: See also: Bug #28922609.
NDB Cluster APIs: When the
SUMAblock sends a
TE_ALTERevent, it does not keep track of when all fragments of the event are sent. When
NDBreceives the event, it buffers the fragments, and processes the event when all fragments have arrived. An issue could possibly arise for very large table definitions, when the time between transmission and reception could span multiple epochs; during this time,
SUMAcould send a
SUB_GCP_COMPLETE_REPsignal to indicate that it has sent all data for an epoch, even though in this case that is not entirely true since there may be fragments of a
TE_ALTERevent still waiting on the data node to be sent. Reception of the
SUB_GCP_COMPLETE_REPleads to closing the buffers for that epoch. Thus, when
TE_ALTERfinally arrives, NDB assumes that it is a duplicate from an earlier epoch, and silently discards it.
We fix the problem by making sure that the
SUMAkernel block never sends a
SUB_GCP_COMPLETE_REPfor any epoch in which there are unsent fragments for a
This issue could have an impact on NDB API applications making use of
TE_ALTERevents. (SQL nodes do not make any use of
TE_ALTERevents and so they and applications using them were not affected.) (Bug #28836474)
Where a data node was restarted after a configuration change whose result was a decrease in the sum of
MaxNoOfUniqueHashIndexes, it sometimes failed with a misleading error message which suggested both a temporary error and a bug, neither of which was the case.
The failure itself is expected, being due to the fact that there is at least one table object with an ID greater than the (new) sum of the parameters just mentioned, and that this table cannot be restored since the maximum value for the ID allowed is limited by that sum. The error message has been changed to reflect this, and now indicates that this is a permanent error due to a problem configuration. (Bug #28884880)
When a local checkpoint (LCP) was complete on all data nodes except one, and this node failed,
NDBdid not continue with the steps required to finish the LCP. This led to the following issues:
No new LCPs could be started.
Redo and Undo logs were not trimmed and so grew excessively large, causing an increase in times for recovery from disk. This led to write service failure, which eventually led to cluster shutdown when the head of the redo log met the tail. This placed a limit on cluster uptime.
Node restarts were no longer possible, due to the fact that a data node restart requires that the node's state be made durable on disk before it can provide redundancy when joining the cluster. For a cluster with two data nodes and two fragment replicas, this meant that a restart of the entire cluster (system restart) was required to fix the issue (this was not necessary for a cluster with two fragment replicas and four or more data nodes). (Bug #28728485, Bug #28698831)
References: See also: Bug #11757421.
ANALYZE TABLEon an
NDBtable with an index having longer than the supported maximum length caused data nodes to fail. (Bug #28714864)
It was possible in certain cases for nodes to hang during an initial restart. (Bug #28698831)
References: See also: Bug #27622643.
The output of ndb_config
--query-allnow shows that configuration changes for the
MaxNoOfExecutionThreadsdata node parameters require system initial restarts (
restart="system" initial="true"). (Bug #28494286)
API nodes should observe that a node is moving through
SL_STOPPINGphases (graceful stop) and stop using the node for new transactions, which minimizes potential disruption in the later phases of the node shutdown process. API nodes were only informed of node state changes via periodic heartbeat signals, and so might not be able to avoid interacting with the node shutting down. This generated unnecessary failures when the heartbeat interval was long. Now when a data node is being gracefully stopped, all API nodes are notified directly, allowing them to experience minimal disruption. (Bug #28380808)
When scanning a row using a
ACCscan, or when performing a read using the primary key, it is possible to start a read of the row and hit a real-time break during which it is necessary to wait for the page to become available in memory. When the page request returns later, an attempt to read the row fails due to an invalid checksum; this is because, when the row is deleted, its checksum is invalidated.
This problem is solved by introducing a new tuple header
DELETE_WAITflag, which is checked before starting any row scan or PK read operations on the row where disk data pages are not yet available, and cleared when the row is finally committed. (Bug #27584165, Bug #93035, Bug #28868412)
When tables with
BLOBcolumns were dropped and then re-created with a different number of
BLOBcolumns the event definitions for monitoring table changes could become inconsistent in certain error situations involving communication errors when the expected cleanup of the corresponding events was not performed. In particular, when the new versions of the tables had more
BLOBcolumns than the original tables, some events could be missing. (Bug #27072756)
When running a cluster with 4 or more data nodes under very high loads, data nodes could sometimes fail with Error 899 Rowid already allocated. (Bug #25960230)
mysqld shut down unexpectedly when a purge of the binary log was requested before the server had completely started, and it was thus not yet ready to delete rows from the
ndb_binlog_indextable. Now when this occurs, requests for any needed purges of the
ndb_binlog_indextable are saved in a queue and held for execution when the server has completely started. (Bug #25817834)
When starting, a data node copies metadata, while a local checkpoint updates metadata. To avoid any conflict, any ongoing LCP activity is paused while metadata is being copied. An issue arose when a local checkpoint was paused on a given node, and another node that was also restarting checked for a complete LCP on this node; the check actually caused the LCP to be completed before copying of metadata was complete and so ended the pause prematurely. Now in such cases, the LCP completion check waits to complete a paused LCP until copying of metadata is finished and the pause ends as expected, within the LCP in which it began. (Bug #24827685)
Asynchronous disconnection of mysqld from the cluster caused any subsequent attempt to start an NDB API transaction to fail. If this occurred during a bulk delete operation, the SQL layer called
HA::end_bulk_delete(), whose implementation by
ha_ndbclusterassumed that a transaction had been started, and could fail if this was not the case. This problem is fixed by checking that the transaction pointer used by this method is set before referencing it. (Bug #20116393)
NdbScanFilterdid not always handle
NULLaccording to the SQL standard, which could result in sending non-qualifying rows to be filtered (otherwise not necessary) by the MySQL server. (Bug #92407, Bug #28643463)
References: See also: Bug #93977, Bug #29231709.
NDBattempted to use condition pushdown on greater-than (
>) and less-than (
<) comparisons with
ENUMcolumn values but this could cause rows to be omitted in the result. Now such comparisons are no longer pushed down. Comparisons for equality (
=) and inequality (
ENUMvalues are not affected by this change, and conditions including these comparisons can still be pushed down. (Bug #92321, Bug #28610217)