Important Change: When restoring to a cluster using data node IDs different from those in the original cluster, ndb_restore tried to open files corresponding to node ID 0. To keep this from happening, the
--nodeid
and--backupid
options—neither of which has a default value—are both now explicitly required when invoking ndb_restore. (Bug #28813708)Packaging; MySQL NDB ClusterJ:
libndbclient
was missing from builds on some platforms. (Bug #28997603)-
NDB Disk Data: When a log file group had more than 18 undo logs, it was not possible to restart the cluster. (Bug #251155785)
References: See also: Bug #28922609.
-
NDB Cluster APIs: When the
NDB
kernel'sSUMA
block sends aTE_ALTER
event, it does not keep track of when all fragments of the event are sent. WhenNDB
receives the event, it buffers the fragments, and processes the event when all fragments have arrived. An issue could possibly arise for very large table definitions, when the time between transmission and reception could span multiple epochs; during this time,SUMA
could send aSUB_GCP_COMPLETE_REP
signal to indicate that it has sent all data for an epoch, even though in this case that is not entirely true since there may be fragments of aTE_ALTER
event still waiting on the data node to be sent. Reception of theSUB_GCP_COMPLETE_REP
leads to closing the buffers for that epoch. Thus, whenTE_ALTER
finally arrives, NDB assumes that it is a duplicate from an earlier epoch, and silently discards it.We fix the problem by making sure that the
SUMA
kernel block never sends aSUB_GCP_COMPLETE_REP
for any epoch in which there are unsent fragments for aSUB_TABLE_DATA
signal.This issue could have an impact on NDB API applications making use of
TE_ALTER
events. (SQL nodes do not make any use ofTE_ALTER
events and so they and applications using them were not affected.) (Bug #28836474) -
Where a data node was restarted after a configuration change whose result was a decrease in the sum of
MaxNoOfTables
,MaxNoOfOrderedIndexes
, andMaxNoOfUniqueHashIndexes
, it sometimes failed with a misleading error message which suggested both a temporary error and a bug, neither of which was the case.The failure itself is expected, being due to the fact that there is at least one table object with an ID greater than the (new) sum of the parameters just mentioned, and that this table cannot be restored since the maximum value for the ID allowed is limited by that sum. The error message has been changed to reflect this, and now indicates that this is a permanent error due to a problem configuration. (Bug #28884880)
-
When a local checkpoint (LCP) was complete on all data nodes except one, and this node failed,
NDB
did not continue with the steps required to finish the LCP. This led to the following issues:No new LCPs could be started.
Redo and Undo logs were not trimmed and so grew excessively large, causing an increase in times for recovery from disk. This led to write service failure, which eventually led to cluster shutdown when the head of the redo log met the tail. This placed a limit on cluster uptime.
Node restarts were no longer possible, due to the fact that a data node restart requires that the node's state be made durable on disk before it can provide redundancy when joining the cluster. For a cluster with two data nodes and two fragment replicas, this meant that a restart of the entire cluster (system restart) was required to fix the issue (this was not necessary for a cluster with two fragment replicas and four or more data nodes). (Bug #28728485, Bug #28698831)
References: See also: Bug #11757421.
Running
ANALYZE TABLE
on anNDB
table with an index having longer than the supported maximum length caused data nodes to fail. (Bug #28714864)-
It was possible in certain cases for nodes to hang during an initial restart. (Bug #28698831)
References: See also: Bug #27622643.
The output of ndb_config
--configinfo
--xml
--query-all
now shows that configuration changes for theThreadConfig
andMaxNoOfExecutionThreads
data node parameters require system initial restarts (restart="system" initial="true"
). (Bug #28494286)API nodes should observe that a node is moving through
SL_STOPPING
phases (graceful stop) and stop using the node for new transactions, which minimizes potential disruption in the later phases of the node shutdown process. API nodes were only informed of node state changes via periodic heartbeat signals, and so might not be able to avoid interacting with the node shutting down. This generated unnecessary failures when the heartbeat interval was long. Now when a data node is being gracefully stopped, all API nodes are notified directly, allowing them to experience minimal disruption. (Bug #28380808)Executing
SELECT
* FROM
INFORMATION_SCHEMA.TABLES
caused SQL nodes to restart in some cases. (Bug #27613173)-
When scanning a row using a
TUP
scan orACC
scan, or when performing a read using the primary key, it is possible to start a read of the row and hit a real-time break during which it is necessary to wait for the page to become available in memory. When the page request returns later, an attempt to read the row fails due to an invalid checksum; this is because, when the row is deleted, its checksum is invalidated.This problem is solved by introducing a new tuple header
DELETE_WAIT
flag, which is checked before starting any row scan or PK read operations on the row where disk data pages are not yet available, and cleared when the row is finally committed. (Bug #27584165, Bug #93035, Bug #28868412) When tables with
BLOB
columns were dropped and then re-created with a different number ofBLOB
columns the event definitions for monitoring table changes could become inconsistent in certain error situations involving communication errors when the expected cleanup of the corresponding events was not performed. In particular, when the new versions of the tables had moreBLOB
columns than the original tables, some events could be missing. (Bug #27072756)When running a cluster with 4 or more data nodes under very high loads, data nodes could sometimes fail with Error 899 Rowid already allocated. (Bug #25960230)
mysqld shut down unexpectedly when a purge of the binary log was requested before the server had completely started, and it was thus not yet ready to delete rows from the
ndb_binlog_index
table. Now when this occurs, requests for any needed purges of thendb_binlog_index
table are saved in a queue and held for execution when the server has completely started. (Bug #25817834)When starting, a data node copies metadata, while a local checkpoint updates metadata. To avoid any conflict, any ongoing LCP activity is paused while metadata is being copied. An issue arose when a local checkpoint was paused on a given node, and another node that was also restarting checked for a complete LCP on this node; the check actually caused the LCP to be completed before copying of metadata was complete and so ended the pause prematurely. Now in such cases, the LCP completion check waits to complete a paused LCP until copying of metadata is finished and the pause ends as expected, within the LCP in which it began. (Bug #24827685)
Asynchronous disconnection of mysqld from the cluster caused any subsequent attempt to start an NDB API transaction to fail. If this occurred during a bulk delete operation, the SQL layer called
HA::end_bulk_delete()
, whose implementation byha_ndbcluster
assumed that a transaction had been started, and could fail if this was not the case. This problem is fixed by checking that the transaction pointer used by this method is set before referencing it. (Bug #20116393)-
NdbScanFilter
did not always handleNULL
according to the SQL standard, which could result in sending non-qualifying rows to be filtered (otherwise not necessary) by the MySQL server. (Bug #92407, Bug #28643463)References: See also: Bug #93977, Bug #29231709.
NDB
attempted to use condition pushdown on greater-than (>
) and less-than (<
) comparisons withENUM
column values but this could cause rows to be omitted in the result. Now such comparisons are no longer pushed down. Comparisons for equality (=
) and inequality (<>
/!=
) withENUM
values are not affected by this change, and conditions including these comparisons can still be pushed down. (Bug #92321, Bug #28610217)