This section contains unified change history highlights for all
NDB Cluster releases based on version 7.3 of the
NDBCLUSTER storage engine through
MySQL NDB Cluster 7.3.17.
For an overview of features that were added in MySQL NDB Cluster 7.3, see What is New in NDB Cluster 7.3.
- Changes in MySQL NDB Cluster 7.3.16 (5.6.35-ndb-7.3.16)
- Changes in MySQL NDB Cluster 7.3.15 (5.6.34-ndb-7.3.15)
- Changes in MySQL NDB Cluster 7.3.14 (5.6.31-ndb-7.3.14)
- Changes in MySQL NDB Cluster 7.3.13 (5.6.29-ndb-7.3.13)
- Changes in MySQL NDB Cluster 7.3.12 (5.6.28-ndb-7.3.12)
- Changes in MySQL NDB Cluster 7.3.11 (5.6.27-ndb-7.3.11)
- Changes in MySQL NDB Cluster 7.3.10 (5.6.25-ndb-7.3.10)
- Changes in MySQL NDB Cluster 7.3.9 (5.6.24-ndb-7.3.9)
- Changes in MySQL NDB Cluster 7.3.8 (5.6.22-ndb-7.3.8)
- Changes in MySQL NDB Cluster 7.3.7 (5.6.21-ndb-7.3.7)
- Changes in MySQL NDB Cluster 7.3.6 (5.6.19-ndb-7.3.6)
- Changes in MySQL NDB Cluster 7.3.5 (5.6.17-ndb-7.3.5)
- Changes in MySQL NDB Cluster 7.3.4 (5.6.15-ndb-7.3.4)
- Changes in MySQL NDB Cluster 7.3.3 (5.6.14-ndb-7.3.3)
- Changes in MySQL NDB Cluster 7.3.2 (5.6.11-ndb-7.3.2)
- Changes in MySQL NDB Cluster 7.3.1 (5.6.10-ndb-7.3.1)
ndb_restore did not restore tables having more than 341 columns correctly. This was due to the fact that the buffer used to hold table metadata read from
.ctlfiles was of insufficient size, so that only part of the table descriptor could be read from it in such cases. This issue is fixed by increasing the size of the buffer used by ndb_restore for file reads. (Bug #25182956)
References: See also: Bug #25302901.
rand()function was used to produce a unique table ID and table version needed to identify a schema operation distributed between multiple SQL nodes, relying on the assumption that
rand()would never produce the same numbers on two different instances of mysqld. It was later determined that this is not the case, and that in fact it is very likely for the same random numbers to be produced on all SQL nodes.
This fix removes the usage of
rand()for producing a unique table ID or version, and instead uses a sequence in combination with the node ID of the coordinator. This guarantees uniqueness until the counter for the sequence wraps, which should be sufficient for this purpose.
The effects of this duplication could be observed as timeouts in the log (for example NDB create db: waiting max 119 sec for distributing) when restarting multiple mysqld processes simultaneously or nearly so, or when issuing the same
DROP DATABASEstatement on multiple SQL nodes. (Bug #24926009)
Long message buffer exhaustion when firing immediate triggers could result in row ID leaks; this could later result in persistent RowId already allocated errors (
NDBError 899). (Bug #23723110)
References: See also: Bug #19506859, Bug #13927679.
when a parent
NDBtable in a foreign key relationship was updated, the update cascaded to a child table as expected, but the change was not cascaded to a child table of this child table (that is, to a grandchild of the original parent). This can be illustrated using the tables generated by the following
CREATE TABLE parent( id INT PRIMARY KEY AUTO_INCREMENT, col1 INT UNIQUE, col2 INT ) ENGINE NDB; CREATE TABLE child( ref1 INT UNIQUE, FOREIGN KEY fk1(ref1) REFERENCES parent(col1) ON UPDATE CASCADE ) ENGINE NDB; CREATE TABLE grandchild( ref2 INT, FOREIGN KEY fk2(ref2) REFERENCES child(ref1) ON UPDATE CASCADE ) ENGINE NDB;
childis a child of table
grandchildis a child of table
child, and a grandchild of
parent. In this scenario, a change to column
child, but it was not always propagated in turn to
grandchild. (Bug #83743, Bug #25063506)
Removed an invalid assertion to the effect that all cascading child scans are closed at the time API connection records are released following an abort of the main transaction. The assertion was invalid because closing of scans in such cases is by design asynchronous with respect to the main transaction, which means that subscans may well take some time to close after the main transaction is closed. (Bug #23709284)
A number of potential buffer overflow issues were found and fixed in the
NDBcodebase. (Bug #23152979)
When a data node has insufficient redo buffer during a system restart, it does not participate in the restart until after the other nodes have started. After this, it performs a takeover of its fragments from the nodes in its node group that have already started; during this time, the cluster is already running and user activity is possible, including DML and DDL operations.
During a system restart, table creation is handled differently in the
DIHkernel block than normally, as this creation actually consists of reloading table definition data from disk on the master node. Thus,
DIHassumed that any table creation that occurred before all nodes had restarted must be related to the restart and thus always on the master node. However, during the takeover, table creation can occur on non-master nodes due to user activity; when this happened, the cluster underwent a forced shutdown.
Now an extra check is made during system restarts to detect in such cases whether the executing node is the master node, and use that information to determine whether the table creation is part of the restart proper, or is taking place during a subsequent takeover. (Bug #23028418)
When restoring a backup taken from a database containing tables that had foreign keys, ndb_restore disabled the foreign keys for data, but not for the logs. (Bug #83155, Bug #24736950)
Several object constructors and similar functions in the
NDBcodebase did not always perform sanity checks when creating new instances. These checks are now performed under such circumstances. (Bug #77408, Bug #21286722)
An internal call to
malloc()was not checked for
NULL. The function call was replaced with a direct write. (Bug #77375, Bug #21271194)
NDB Cluster APIs: Reuse of transaction IDs could occur when
Ndbobjects were created and deleted concurrently. As part of this fix, the NDB API methods
unlock_ndb_objectsare now declared as
const. (Bug #23709232)
Incompatible Change: When the data nodes are only partially connected to the API nodes, a node used for a pushdown join may get its request from a transaction coordinator on a different node, without (yet) being connected to the API node itself. In such cases, the
NodeInfoobject for the requesting API node contained no valid info about the software version of the API node, which caused the
DBSPJblock to assume (incorrectly) when aborting to assume that the API node used
NDBversion 7.2.4 or earlier, requiring the use of a backward compatability mode to be used during query abort which sent a node failure error instead of the real error causing the abort.
Now, whenever this situation occurs, it is assumed that, if the
NDBsoftware version is not yet available, the API node version is greater than 7.2.4. (Bug #23049170)
Reserved send buffer for the loopback transporter, introduced in MySQL Cluster NDB 7.4.8 and used by API and management nodes for administrative signals, was calculated incorrectly. (Bug #23093656, Bug #22016081)
References: This issue is a regression of: Bug #21664515.
During a node restart, re-creation of internal triggers used for verifying the referential integrity of foreign keys was not reliable, because it was possible that not all distributed TC and LDM instances agreed on all trigger identities. To fix this problem, an extra step is added to the node restart sequence, during which the trigger identities are determined by querying the current master node. (Bug #23068914)
References: See also: Bug #23221573.
Following the forced shutdown of one of the 2 data nodes in a cluster where
NoOfReplicas=2, the other data node shut down as well, due to arbitration failure. (Bug #23006431)
ClusterMgris a internal component of NDB API and ndb_mgmd processes, part of
TransporterFacade—which in turn is a wrapper around the transporter registry—and shared with data nodes. This component is responsible for a number of tasks including connection setup requests; sending and monitoring of heartbeats; provision of node state information; handling of cluster disconnects and reconnects; and forwarding of cluster state indicators.
ClusterMgrmaintains a count of live nodes which is incremented on receiving a report of a node having connected (
reportConnected()method call), and decremented on receiving a report that a node has disconnected (
TransporterRegistry. This count is checked within
reportDisconnected()to verify that is it greater than zero.
The issue addressed here arose when node connections were very brief due to send buffer exhaustion (among other potential causes) and the check just described failed. This occurred because, when a node did not fully connect, it was still possible for the connection attempt to trigger a
reportDisconnected()call in spite of the fact that the connection had not yet been reported to
ClusterMgr; thus, the pairing of
reportDisconnected()calls was not guaranteed, which could cause the count of connected nodes to be set to zero even though there remained nodes that were still in fact connected, causing node crashes with debug builds of MySQL Cluster, and potential errors or other adverse effects with release builds.
To fix this issue,
ClusterMgr::reportDisconnected()now verifies that a disconnected node had actually finished connecting completely before checking and decrementing the number of connected nodes. (Bug #21683144, Bug #22016081)
References: See also: Bug #21664515, Bug #21651400.
To reduce the possibility that a node's loopback transporter becomes disconnected from the transporter registry by
reportError()due to send buffer exhaustion (implemented by the fix for Bug #21651400), a portion of the send buffer is now reserved for the use of this transporter. (Bug #21664515, Bug #22016081)
References: See also: Bug #21651400, Bug #21683144.
The loopback transporter is similar to the TCP transporter, but is used by a node to send signals to itself as part of many internal operations. Like the TCP transporter, it could be disconnected due to certain conditions including send buffer exhaustion, but this could result in blocking of
TransporterFacadeand so cause multiple issues within an ndb_mgmd or API node process. To prevent this, a node whose loopback transporter becomes disconnected is now simply shut down, rather than allowing the node process to hang. (Bug #21651400, Bug #22016081)
References: See also: Bug #21683144, Bug #21664515.
NDB Cluster APIs: Deletion of Ndb objects used a dispoportionately high amount of CPU. (Bug #22986823)
During node failure handling, the request structure used to drive the cleanup operation was not maintained correctly when the request was executed. This led to inconsistencies that were harmless during normal operation, but these could lead to assertion failures during node failure handling, with subsequent failure of additional nodes. (Bug #22643129)
The previous fix for a lack of mutex protection for the internal
TransporterFacade::deliver_signal()function was found to be incomplete in some cases. (Bug #22615274)
References: This issue is a regression of: Bug #77225, Bug #21185585.
When setup of the binary log as an atomic operation on one SQL node failed, this could trigger a state in other SQL nodes in which they appeared to detect the SQL node participating in schema change distribution, whereas it had not yet completed binary log setup. This could in turn cause a deadlock on the global metadata lock when the SQL node still retrying binary log setup needed this lock, while another mysqld had taken the lock for itself as part of a schema change operation. In such cases, the second SQL node waited for the first one to act on its schema distribution changes, which it was not yet able to do. (Bug #22494024)
Duplicate key errors could occur when ndb_restore was run on a backup containing a unique index. This was due to the fact that, during restoration of data, the database can pass through one or more inconsistent states prior to completion, such an inconsistent state possibly having duplicate values for a column which has a unique index. (If the restoration of data is preceded by a run with
--disable-indexesand followed by one with
--rebuild-indexes, these errors are avoided.)
Added a check for unique indexes in the backup which is performed only when restoring data, and which does not process tables that have explicitly been excluded. For each unique index found, a warning is now printed. (Bug #22329365)
Restoration of metadata with ndb_restore
-moccasionally failed with the error message Failed to create index... when creating a unique index. While disgnosing this problem, it was found that the internal error PREPARE_SEIZE_ERROR (a temporary error) was reported as an unknown error. Now in such cases, ndb_restore retries the creation of the unique index, and PREPARE_SEIZE_ERROR is reported as NDB Error 748 Busy during read of event table. (Bug #21178339)
References: See also: Bug #22989944.
Optimization of signal sending by buffering and sending them periodically, or when the buffer became full, could cause
SUB_GCP_COMPLETE_ACKsignals to be excessively delayed. Such signals are sent for each node and epoch, with a minimum interval of
TimeBetweenEpochs; if they are not received in time, the
SUMAbuffers can overflow as a result. The overflow caused API nodes to be disconnected, leading to current transactions being aborted due to node failure. This condition made it difficult for long transactions (such as altering a very large table), to be completed. Now in such cases, the
ACKsignal is sent without being delayed. (Bug #18753341)
NDB Cluster APIs: Executing a transaction with an
NdbIndexOperationbased on an obsolete unique index caused the data node process to fail. Now the index is checked in such cases, and if it cannot be used the transaction fails with an appropriate error. (Bug #79494, Bug #22299443)
Important Change: A fix made in MySQL Cluster NDB 7.3.11 and MySQL Cluster NDB 7.4.8 caused ndb_restore to perform unique key checks even when operating in modes which do not restore data, such as when using the program's
That change in behavior caused existing valid backup routines to fail; to keep this issue from affecting this and future releases, the previous fix has been reverted. This means that the requirement added in those versions that ndb_restore be run
--rebuild-indexeswhen used on tables containing unique indexes is also lifted. (Bug #22345748)
References: See also: Bug #22329365. Reverted patches: Bug #57782, Bug #11764893.
Important Note: If an
NDBtable having a foreign key was dropped while one of the data nodes was stopped, the data node later failed when trying to restart. (Bug #18554390)
In debug builds, a
WAIT_EVENTwhile polling caused excessive logging to stdout. (Bug #22203672)
When executing a schema operation such as
CREATE TABLEon a MySQL Cluster with multiple SQL nodes, it was possible for the SQL node on which the operation was performed to time out while waiting for an acknowledgement from the others. This could occur when different SQL nodes had different settings for
--ndb-log-update-as-write, or other mysqld options effecting binary logging by
This happened due to the fact that, in order to distribute schema changes between them, all SQL nodes subscribe to changes in the
ndb_schemasystem table, and that all SQL nodes are made aware of each others subscriptions by subscribing to
TE_UNSUBSCRIBEevents. The names of events to subscribe to are constructed from the table names, adding
REPLF$as a prefix.
REPLF$is used when full binary logging is specified for the table. The issue described previously arose because different values for the options mentioned could lead to different events being subscribed to by different SQL nodes, meaning that all SQL nodes were not necessarily aware of each other, so that the code that handled waiting for schema distribution to complete did not work as designed.
To fix this issue, MySQL Cluster now treats the
ndb_schematable as a special case and enforces full binary logging at all times for this table, independent of any settings for mysqld binary logging options. (Bug #22174287, Bug #79188)
Attempting to create an
NDBtable having greater than the maximum supported combined width for all
BITcolumns (4096) caused data node failure when these columns were defined with
COLUMN_FORMAT DYNAMIC. (Bug #21889267)
Creating a table with the maxmimum supported number of columns (512) all using
COLUMN_FORMAT DYNAMICled to data node failures. (Bug #21863798)
STOP -fto force a node shutdown even when it triggered a complete shutdown of the cluster, it was possible to lose data when a sufficient number of nodes were shut down, triggering a cluster shutodwn, and the timing was such that
SUMAhandovers had been made to nodes already in the process of shutting down. (Bug #17772138)
NdbEventBuffer::set_total_buckets()method calculated the number of remaining buckets incorrectly. This caused any incomplete epoch to be prematurely completed when the
SUB_START_CONFsignal arrived out of order. Any events belonging to this epoch arriving later were then ignored, and so effectively lost, which resulted in schema changes not being distributed correctly among SQL nodes. (Bug #79635, Bug #22363510)
Compilation of MySQL Cluster failed on SUSE Linux Enterprise Server 12. (Bug #79429, Bug #22292329)
Schema events were appended to the binary log out of order relative to non-schema events. This was caused by the fact that the binary log injector did not properly handle the case where schema events and non-schema events were from different epochs.
This fix modifies the handling of events from the two schema and non-schema event streams such that events are now always handled one epoch at a time, starting with events from the oldest available epoch, without regard to the event stream in which they occur. (Bug #79077, Bug #22135584, Bug #20456664)
NDBfailed during a node restart due to the status of the current local checkpoint being set but not as active, even though it could have other states under such conditions. (Bug #78780, Bug #21973758)
The value set for
ThreadConfigparameter was not calculated correctly, causing the spin to continue for longer than actually specified. (Bug #78525, Bug #21886476)
NDB Cluster APIs: The binary log injector did not work correctly with
TE_INCONSISTENTevent type handling by
Ndb::nextEvent(). (Bug #22135541)
References: See also: Bug #20646496.
NDB Cluster APIs:
pollEvents2()were slow to receive events, being dependent on other client threads or blocks to perform polling of transporters on their behalf. This fix allows a client thread to perform its own transporter polling when it has to wait in either of these methods.
Introduction of transporter polling also revealed a problem with missing mutex protection in the
ndbcluster_binloghandler, which has been added as part of this fix. (Bug #79311, Bug #20957068, Bug #22224571)
Important Change: When ndb_restore was run without
--rebuild-indexeson a table having a unique index, it was possible for rows to be restored in an order that resulted in duplicate values, causing it to fail with duplicate key errors. Running
ndb_restoreon such a table now requires using at least one of these options; failing to do so now results in an error. (Bug #57782, Bug #11764893)
References: See also: Bug #22329365, Bug #22345748.
Backup block states were reported incorrectly during backups. (Bug #21360188)
References: See also: Bug #20204854, Bug #21372136.
When a data node is known to have been alive by other nodes in the cluster at a given global checkpoint, but its
sysfilereports a lower GCI, the higher GCI is used to determine which global checkpoint the data node can recreate. This caused problems when the data node being started had a clean file system (GCI = 0), or when it was more than more global checkpoint behind the other nodes.
Now in such cases a higher GCI known by other nodes is used only when it is at most one GCI ahead. (Bug #19633824)
References: See also: Bug #20334650, Bug #21899993. This issue is a regression of: Bug #29167.
When restoring a specific database or databases with the
--exclude-databasesoption, ndb_restore attempted to apply foreign keys on tables in databases which were not among those being restored. (Bug #18560951)
After restoring the database schema from backup using ndb_restore, auto-discovery of restored tables in transactions having multiple statements did not work correctly, resulting in Deadlock found when trying to get lock; try restarting transaction errors.
This issue was encountered both in the mysql client, as well as when such transactions were executed by application programs using Connector/J and possibly other MySQL APIs.
Prior to upgrading, this issue can be worked around by executing
SELECT TABLE_NAME, TABLE_SCHEMA FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE = 'NDBCLUSTER'on all SQL nodes following the restore operation, before executing any other statements. (Bug #18075170)
Trying to create an
NDBtable with a composite foreign key referencing a composite primary key of the parent table failed when one of the columns in the composite foreign key was the table's primary key and in addition this column also had a unique key. (Bug #78150, Bug #21664899)
When attempting to enable index statistics, creation of the required system tables, events and event subscriptions often fails when multiple mysqld processes using index statistics are started concurrently in conjunction with starting, restarting, or stopping the cluster, or with node failure handling. This is normally recoverable, since the affected mysqld process or processes can (and do) retry these operations shortly thereafter. For this reason, such failures are no longer logged as warnings, but merely as informational events. (Bug #77760, Bug #21462846)
Adding a unique key to an
NDBtable failed when the table already had a foreign key. Prior to upgrading, you can work around this issue by creating the unique key first, then adding the foreign key afterwards, using a separate
ALTER TABLEstatement. (Bug #77457, Bug #20309828)
NDB Cluster APIs: While executing
dropEvent(), if the coordinator
DBDICTfailed after the subscription manager (
SUMAblock) had removed all subscriptions but before the coordinator had deleted the event from the system table, the dropped event remained in the table, causing any subsequent drop or create event with the same name to fail with
NDBerror 1419 Subscription already dropped or error 746 Event name already exists. This occurred even when calling
dropEvent()with a nonzero force argument.
Now in such cases, error 1419 is ignored, and
DBDICTdeletes the event from the table. (Bug #21554676)
NDB Cluster APIs: The internal value representing the latest global checkpoint was not always updated when a completed epoch of event buffers was inserted into the event queue. This caused subsequent calls to
pollEvents2()to fail when trying to obtain the correct GCI for the events available in the event buffers. This could also result in later calls to
nextEvent2()seeing events that had not yet been discovered. (Bug #78129, Bug #21651536)
Functionality Added or Changed
A number of improvements, listed here, have been made with regard to handling issues that could arise when an overload arose due to a great number of inserts being performed during a local checkpoint (LCP):
Failures sometimes occurred during restart processing when trying to execute the undo log, due to a problem with finding the end of the log. This happened when there remained unwritten pages at the end of the first undo file when writing to the second undo file, which caused the execution of undo logs in reverse order and so execute old or even nonexistent log records.
This is fixed by ensuring that execution of the undo log begins with the proper end of the log, and, if started earlier, that any unwritten or faulty pages are ignored.
It was possible to fail during an LCP, or when performing a
COPY_FRAGREQ, due to running out of operation records. We fix this by making sure that LCPs and
COPY_FRAGuse resources reserved for operation records, as was already the case with scan records. In addition, old code for ACC operations that was no longer required but that could lead to failures was removed.
When an LCP was performed while loading a table, it was possible to hit a livelock during LCP scans, due to the fact that that each record that was inserted into new pages after the LCP had started had its
LCP_SKIPflag set. Such records were discarded as intended by the LCP scan, but when inserts occurred faster than the LCP scan could discard records, the scan appeared to hang. As part of this issue, the scan failed to report any progress to the LCP watchdog, which after 70 seconds of livelock killed the process. This issue was observed when performing on the order of 250000 inserts per second over an extended period of time (120 seconds or more), using a single LDM.
This part of the fix makes a number of changes, listed here:
We now ensure that pages created after the LCP has started are not included in LCP scans; we also ensure that no records inserted into those pages have their
Handling of the scan protocol is changed such that a certain amount of progress is made by the LCP regardless of load; we now report progress to the LCP watchdog so that we avoid failure in in the event that an LCP is making progress but not writing any records.
We now take steps to guarantee that LCP scans proceed more quickly than inserts can occur, by ensuring that scans are prioritized this scanning activity, and thus, that the LCP is in fact (eventually) completed.
In addition, scanning is made more efficient, by prefetching tuples; this helps avoid stalls while fetching memory in the CPU.
Row checksums for preventing data corruption now include the tuple header bits.
(Bug #76373, Bug #20727343, Bug #76741, Bug #69994, Bug #20903880, Bug #76742, Bug #20904721, Bug #76883, Bug #20980229)
The behavior of
Ndb::pollEvents()has also been modified such that it now returns NDB_FAILURE_GCI (equal to
~(Uint64) 0) when a cluster failure has been detected. (Bug #18753887)
After restoring the database metadata (but not any data) by running ndb_restore
-m), SQL nodes would hang while trying to
SELECTfrom a table in the database to which the metadata was restored. In such cases the attempt to query the table now fails as expected, since the table does not actually exist until ndb_restore is executed with
-r). (Bug #21184102)
References: See also: Bug #16890703.
When a great many threads opened and closed blocks in the NDB API in rapid succession, the internal
close_clnt()function synchronizing the closing of the blocks waited an insufficiently long time for a self-signal indicating potential additional signals needing to be processed. This led to excessive CPU usage by ndb_mgmd, and prevented other threads from opening or closing other blocks. This issue is fixed by changing the function polling call to wait on a specific condition to be woken up (that is, when a signal has in fact been executed). (Bug #21141495)
Previously, multiple send threads could be invoked for handling sends to the same node; these threads then competed for the same send lock. While the send lock blocked the additional send threads, work threads could be passed to other nodes.
This issue is fixed by ensuring that new send threads are not activated while there is already an active send thread assigned to the same node. In addition, a node already having an active send thread assigned to it is no longer visible to other, already active, send threads; that is, such a node is longer added to the node list when a send thread is currently assigned to it. (Bug #20954804, Bug #76821)
Queueing of pending operations when the redo log was overloaded (
DefaultOperationRedoProblemActionAPI node configuration parameter) could lead to timeouts when data nodes ran out of redo log space (P_TAIL_PROBLEM errors). Now when the redo log is full, the node aborts requests instead of queuing them. (Bug #20782580)
References: See also: Bug #20481140.
NDBstatistics queries could be delayed by the error delay set for
ndb_index_stat_option(default 60 seconds) when the index that was queried had been marked with internal error. The same underlying issue could also cause
ANALYZE TABLEto hang when executed against an
NDBtable having multiple indexes where an internal error occured on one or more but not all indexes.
Now in such cases, any existing statistics are returned immediately, without waiting for any additonal statistics to be discovered. (Bug #20553313, Bug #20707694, Bug #76325)
The multi-threaded scheduler sends to remote nodes either directly from each worker thread or from dedicated send threadsL, depending on the cluster's configuration. This send might transmit all, part, or none of the available data from the send buffers. While there remained pending send data, the worker or send threads continued trying to send in a loop. The actual size of the data sent in the most recent attempt to perform a send is now tracked, and used to detect lack of send progress by the send or worker threads. When no progress has been made, and there is no other work outstanding, the scheduler takes a 1 millisecond pause to free up the CPU for use by other threads. (Bug #18390321)
References: See also: Bug #20929176, Bug #20954804.
In some cases, attempting to restore a table that was previously backed up failed with a File Not Found error due to a missing table fragment file. This occurred as a result of the NDB kernel
BACKUPblock receiving a Busy error while trying to obtain the table description, due to other traffic from external clients, and not retrying the operation.
The fix for this issue creates two separate queues for such requests—one for internal clients such as the
BACKUPblock or ndb_restore, and one for external clients such as API nodes—and prioritizing the internal queue.
Note that it has always been the case that external client applications using the NDB API (including MySQL applications running against an SQL node) are expected to handle Busy errors by retrying transactions at a later time; this expectation is not changed by the fix for this issue. (Bug #17878183)
References: See also: Bug #17916243.
In some cases, the
DBDICTblock failed to handle repeated
GET_TABINFOREQsignals after the first one, leading to possible node failures and restarts. This could be observed after setting a sufficiently high value for
MaxNoOfExecutionThreadsand low value for
LcpScanProgressTimeout. (Bug #77433, Bug #21297221)
Client lookup for delivery of API signals to the correct client by the internal
TransporterFacade::deliver_signal()function had no mutex protection, which could cause issues such as timeouts encountered during testing, when other clients connected to the same
TransporterFacade. (Bug #77225, Bug #21185585)
It was possible to end up with a lock on the send buffer mutex when send buffers became a limiting resource, due either to insufficient send buffer resource configuration, problems with slow or failing communications such that all send buffers became exhausted, or slow receivers failing to consume what was sent. In this situation worker threads failed to allocate send buffer memory for signals, and attempted to force a send in order to free up space, while at the same time the send thread was busy trying to send to the same node or nodes. All of these threads competed for taking the send buffer mutex, which resulted in the lock already described, reported by the watchdog as
Stuck in Send. This fix is made in two parts, listed here:
The send thread no longer holds the global send thread mutex while getting the send buffer mutex; it now releases the global mutex prior to locking the send buffer mutex. This keeps worker threads from getting stuck in send in such cases.
Locking of the send buffer mutex done by the send threads now uses a try-lock. If the try-lock fails, the node to make the send to is reinserted at the end of the list of send nodes in order to be retried later. This removes the
Stuck in Sendcondition for the send threads.
(Bug #77081, Bug #21109605)
NDB Cluster APIs: Added the
Column::getSizeInBytesForRecord()method, which returns the size required for a column by an
NdbRecord, depending on the column's type (text/blob, or other). (Bug #21067283)
NDB Cluster APIs: Creation and destruction of
Ndb_cluster_connectionobjects by multiple threads could make use of the same application lock, which in some cases led to failures in the global dictionary cache. To alleviate this problem, the creation and destruction of several internal NDB API objects have been serialized. (Bug #20636124)
NDB Cluster APIs: A number of timeouts were not handled correctly in the NDB API.
NDB Cluster APIs: When an
Ndbobject created prior to a failure of the cluster was reused, the event queue of this object could still contain data node events originating from before the failure. These events could reference “old” epochs (from before the failure occurred), which in turn could violate the assumption made by the
nextEvent()method that epoch numbers always increase. This issue is addressed by explicitly clearing the event queue in such cases. (Bug #18411034)
References: See also: Bug #20888668.
Important Change: The maximum failure time calculation used to ensure that normal node failure handling mechanisms are given time to handle survivable cluster failures (before global checkpoint watchdog mechanisms start to kill nodes due to GCP delays) was excessively conservative, and neglected to consider that there can be at most
NoOfReplicasnode failures before the cluster can no longer survive. Now the value of
NoOfReplicasis properly taken into account when performing this calculation.
This fix adds the
TimeBetweenGlobalCheckpointsTimeoutdata node configuration parameter, which makes the minimum timeout between global checkpoints settable by the user. This timeout was previously fixed internally at 120000 milliseconds, which is now the default value for this parameter. (Bug #20069617, Bug #20069624)
References: See also: Bug #19858151, Bug #20128256, Bug #20135976.
It was found during testing that problems could arise when the node registered as the arbitrator disconnected or failed during the arbitration process.
In this situation, the node requesting arbitration could never receive a positive acknowledgement from the registered arbitrator; this node also lacked a stable set of members and could not initiate selection of a new arbitrator.
Now in such cases, when the arbitrator fails or loses contact during arbitration, the requesting node immediately fails rather than waiting to time out. (Bug #20538179)
The values of the
Ndb_last_commit_epoch_sessionstatus variables were incorrectly reported on some platforms. To correct this problem, these values are now stored internally as
long long, rather than
long. (Bug #20372169)
When a data node fails or is being restarted, the remaining nodes in the same nodegroup resend to subscribers any data which they determine has not already been sent by the failed node. Normally, when a data node (actually, the
SUMAkernel block) has sent all data belonging to an epoch for which it is responsible, it sends a
SUB_GCP_COMPLETE_REPsignal, together with a count, to all subscribers, each of which responds with a
SUMAreceives this acknowledgment from all subscribers, it reports this to the other nodes in the same nodegroup so that they know that there is no need to resend this data in case of a subsequent node failure. If a node failed before all subscribers sent this acknowledgement but before all the other nodes in the same nodegroup received it from the failing node, data for some epochs could be sent (and reported as complete) twice, which could lead to an unplanned shutdown.
The fix for this issue adds to the count reported by
SUB_GCP_COMPLETE_ACKa list of identifiers which the receiver can use to keep track of which buckets are completed and to ignore any duplicate reported for an already completed bucket. (Bug #17579998)
When performing a restart, it was sometimes possible to find a log end marker which had been written by a previous restart, and that should have been invalidated. Now when searching for the last page to invalidate, the same search algorithm is used as when searching for the last page of the log to read. (Bug #76207, Bug #20665205)
When reading and copying transporter short signal data, it was possible for the data to be copied back to the same signal with overlapping memory. (Bug #75930, Bug #20553247)
When a bulk delete operation was committed early to avoid an additional round trip, while also returning the number of affected rows, but failed with a timeout error, an SQL node performed no verification that the transaction was in the Committed state. (Bug #74494, Bug #20092754)
References: See also: Bug #19873609.
NDB Cluster APIs: When a transaction is started from a cluster connection,
Indexschema objects may be passed to this transaction for use. If these schema objects have been acquired from a different connection (
Ndb_cluster_connectionobject), they can be deleted at any point by the deletion or disconnection of the owning connection. This can leave a connection with invalid schema objects, which causes an NDB API application to fail when these are dereferenced.
To avoid this problem, if your application uses multiple connections, you can now set a check to detect sharing of schema objects between connections when passing a schema object to a transaction, using the
NdbTransaction::setSchemaObjectOwnerChecks()method added in this release. When this check is enabled, the schema objects having the same names are acquired from the connection and compared to the schema objects passed to the transaction. Failure to match causes the application to fail with an error. (Bug #19785977)
NDB Cluster APIs: The increase in the default number of hashmap buckets (
DefaultHashMapSizeAPI node configuration parameter) from 240 to 3480 in MySQL Cluster NDB 7.2.11 increased the size of the internal
DictHashMapInfo::HashMaptype considerably. This type was allocated on the stack in some
getTable()calls which could lead to stack overflow issues for NDB API users.
To avoid this problem, the hashmap is now dynamically allocated from the heap. (Bug #19306793)
NDB Cluster APIs: A scan operation, whether it is a single table scan or a query scan used by a pushed join, stores the result set in a buffer. This maximum size of this buffer is calculated and preallocated before the scan operation is started. This buffer may consume a considerable amount of memory; in some cases we observed a 2 GB buffer footprint in tests that executed 100 parallel scans with 2 single-threaded (ndbd) data nodes. This memory consumption was found to scale linearly with additional fragments.
A number of root causes, listed here, were discovered that led to this problem:
Result rows were unpacked to full
NdbRecordformat before they were stored in the buffer. If only some but not all columns of a table were selected, the buffer contained empty space (essentially wasted).
These issues became more evident in NDB 7.2 and later MySQL Cluster release series. This was due to the fact buffer size is scaled by
BatchSize, and that the default value for this parameter was increased fourfold (from 64 to 256) beginning with MySQL Cluster NDB 7.2.1.
This fix causes result rows to be buffered using the packed format instead of the unpacked format; a buffered scan result row is now not unpacked until it becomes the current row. In addition,
MaxScanBatchSizeare now used as limiting factors when calculating the required buffer size.
Also as part of this fix, refactoring has been done to separate handling of buffered (packed) from handling of unbuffered result sets, and to remove code that had been unused since NDB 7.0 or earlier. The
NdbRecordclass declaration has also been cleaned up by removing a number of unused or redundant member variables. (Bug #73781, Bug #75599, Bug #19631350, Bug #20408733)
Functionality Added or Changed
Performance: Recent improvements made to the multithreaded scheduler were intended to optimize the cache behavior of its internal data structures, with members of these structures placed such that those local to a given thread do not overflow into a cache line which can be accessed by another thread. Where required, extra padding bytes are inserted to isolate cache lines owned (or shared) by other threads, thus avoiding invalidation of the entire cache line if another thread writes into a cache line not entirely owned by itself. This optimization improved MT Scheduler performance by several percent.
It has since been found that the optimization just described depends on the global instance of struct
thr_repositorystarting at a cache line aligned base address as well as the compiler not rearranging or adding extra padding to the scheduler struct; it was also found that these prerequisites were not guaranteed (or even checked). Thus this cache line optimization has previously worked only when
g_thr_repository(that is, the global instance) ended up being cache line aligned only by accident. In addition, on 64-bit platforms, the compiler added extra padding words in struct
thr_safe_poolsuch that attempts to pad it to a cache line aligned size failed.
The current fix ensures that
g_thr_repositoryis constructed on a cache line aligned address, and the constructors modified so as to verify cacheline aligned adresses where these are assumed by design.
Results from internal testing show improvements in MT Scheduler read performance of up to 10% in some cases, following these changes. (Bug #18352514)
NDB Cluster APIs: Two new example programs, demonstrating reads and writes of
VARBINARYcolumn values, have been added to
storage/ndb/ndbapi-examplesin the MySQL Cluster source tree. For more information about these programs, including source code listings, see NDB API Simple Array Example, and NDB API Simple Array Example Using Adapter.
The global checkpoint commit and save protocols can be delayed by various causes, including slow disk I/O. The
DIHmaster node monitors the progress of both of these protocols, and can enforce a maximum lag time during which the protocols are stalled by killing the node responsible for the lag when it reaches this maximum. This
DIHmaster GCP monitor mechanism did not perform its task more than once per master node; that is, it failed to continue monitoring after detecting and handling a GCP stop. (Bug #20128256)
References: See also: Bug #19858151, Bug #20069617, Bug #20062754.
When running mysql_upgrade on a MySQL Cluster SQL node, the expected drop of the
performance_schemadatabase on this node was instead performed on all SQL nodes connected to the cluster. (Bug #20032861)
A number of problems relating to the fired triggers pool have been fixed, including the following issues:
When the fired triggers pool was exhausted,
NDBreturned Error 218 (Out of LongMessageBuffer). A new error code 221 is added to cover this case.
An additional, separate case in which Error 218 was wrongly reported now returns the correct error.
Setting low values for
MaxNoOfFiredTriggersled to an error when no memory was allocated if there was only one hash bucket.
An aborted transaction now releases any fired trigger records it held. Previously, these records were held until its
ApiConnectRecordwas reused by another transaction.
In addition, for the
Fired Triggerspool in the internal
ndbinfo.ndb$poolstable, the high value always equalled the total, due to the fact that all records were momentarily seized when initializing them. Now the high value shows the maximum following completion of initialization.
Online reorganization when using ndbmtd data nodes and with binary logging by mysqld enabled could sometimes lead to failures in the
DBLQHkernel blocks, or in silent data corruption. (Bug #19903481)
References: See also: Bug #19912988.
The local checkpoint scan fragment watchdog and the global checkpoint monitor can each exclude a node when it is too slow when participating in their respective protocols. This exclusion was implemented by simply asking the failing node to shut down, which in case this was delayed (for whatever reason) could prolong the duration of the GCP or LCP stall for other, unaffected nodes.
To minimize this time, an isolation mechanism has been added to both protocols whereby any other live nodes forcibly disconnect the failing node after a predetermined amount of time. This allows the failing node the opportunity to shut down gracefully (after logging debugging and other information) if possible, but limits the time that other nodes must wait for this to occur. Now, once the remaining live nodes have processed the disconnection of any failing nodes, they can commence failure handling and restart the related protocol or protocol, even if the failed node takes an excessively long time to shut down. (Bug #19858151)
References: See also: Bug #20128256, Bug #20069617, Bug #20062754.
A watchdog failure resulted from a hang while freeing a disk page in
TUP_COMMITREQ, due to use of an uninitialized block variable. (Bug #19815044, Bug #74380)
Multiple threads crashing led to multiple sets of trace files being printed and possibly to deadlocks. (Bug #19724313)
When a client retried against a new master a schema transaction that failed previously against the previous master while the latter was restarting, the lock obtained by this transaction on the new master prevented the previous master from progressing past start phase 3 until the client was terminated, and resources held by it were cleaned up. (Bug #19712569, Bug #74154)
When using the
NDBstorage engine, the maximum possible length of a database or table name is 63 characters, but this limit was not always strictly enforced. This meant that a statement using a name having 64 characters such
DROP DATABASE, or
ALTER TABLE RENAMEcould cause the SQL node on which it was executed to fail. Now such statements fail with an appropriate error message. (Bug #19550973)
When a new data node started, API nodes were allowed to attempt to register themselves with the data node for executing transactions before the data node was ready. This forced the API node to wait an extra heartbeat interval before trying again.
To address this issue, a number of HA_ERR_NO_CONNECTION errors (Error 4009) that could be issued during this time have been changed to Cluster temporarily unavailable errors (Error 4035), which should allow API nodes to use new data nodes more quickly than before. As part of this fix, some errors which were incorrectly categorised have been moved into the correct categories, and some errors which are no longer used have been removed. (Bug #19524096, Bug #73758)
When executing very large pushdown joins involving one or more indexes each defined over several columns, it was possible in some cases for the
DBSPJblock (see The DBSPJ Block) in the
NDBkernel to generate
SCAN_FRAGREQsignals that were excessively large. This caused data nodes to fail when these could not be handled correctly, due to a hard limit in the kernel on the size of such signals (32K). This fix bypasses that limitation by breaking up
SCAN_FRAGREQdata that is too large for one such signal, and sending the
SCAN_FRAGREQas a chunked or fragmented signal instead. (Bug #19390895)
ndb_index_stat sometimes failed when used against a table containing unique indexes. (Bug #18715165)
Queries against tables containing a CHAR(0) columns failed with ERROR 1296 (HY000): Got error 4547 'RecordSpecification has overlapping offsets' from NDBCLUSTER. (Bug #14798022)
NDBkernel, it was possible for a
TransporterFacadeobject to reset a buffer while the data contained by the buffer was being sent, which could lead to a race condition. (Bug #75041, Bug #20112981)
mysql_upgrade failed to drop and recreate the
ndbinfodatabase and its tables as expected. (Bug #74863, Bug #20031425)
Due to a lack of memory barriers, MySQL Cluster programs such as ndbmtd did not compile on
POWERplatforms. (Bug #74782, Bug #20007248)
In some cases, when run against a table having an
AFTER DELETEtrigger, a
DELETEstatement that matched no rows still caused the trigger to execute. (Bug #74751, Bug #19992856)
A basic requirement of the
NDBstorage engine's design is that the transporter registry not attempt to receive data (
TransporterRegistry::performReceive()) from and update the connection status (
TransporterRegistry::update_connections()) of the same set of transporters concurrently, due to the fact that the updates perform final cleanup and reinitialization of buffers used when receiving data. Changing the contents of these buffers while reading or writing to them could lead to "garbage" or inconsistent signals being read or written.
During the course of work done previously to improve the implementation of the transporter facade, a mutex intended to protect against the concurrent use of the
update_connections()) methods on the same transporter was inadvertently removed. This fix adds a watchdog check for concurrent usage. In addition,
performReceive()calls are now serialized together while polling the transporters. (Bug #74011, Bug #19661543)
ndb_restore failed while restoring a table which contained both a built-in conversion on the primary key and a staging conversion on a
During staging, a
BLOBtable is created with a primary key column of the target type. However, a conversion function was not provided to convert the primary key values before loading them into the staging blob table, which resulted in corrupted primary key values in the staging
BLOBtable. While moving data from the staging table to the target table, the
BLOBread failed because it could not find the primary key in the
BLOBtables are checked to see whether there are conversions on primary keys of their main tables. This check is done after all the main tables are processed, so that conversion functions and parameters have already been set for the main tables. Any conversion functions and parameters used for the primary key in the main table are now duplicated in the
BLOBtable. (Bug #73966, Bug #19642978)
Corrupted messages to data nodes sometimes went undetected, causing a bad signal to be delivered to a block which aborted the data node. This failure in combination with disconnecting nodes could in turn cause the entire cluster to shut down.
To keep this from happening, additional checks are now made when unpacking signals received over TCP, including checks for byte order, compression flag (which must not be used), and the length of the next message in the receive buffer (if there is one).
Whenever two consecutive unpacked messages fail the checks just described, the current message is assumed to be corrupted. In this case, the transporter is marked as having bad data and no more unpacking of messages occurs until the transporter is reconnected. In addition, an entry is written to the cluster log containing the error as well as a hex dump of the corrupted message. (Bug #73843, Bug #19582925)
Transporter send buffers were not updated properly following a failed send. (Bug #45043, Bug #20113145)
NDB Disk Data: An update on many rows of a large Disk Data table could in some rare cases lead to node failure. In the event that such problems are observed with very large transactions on Disk Data tables you can now increase the number of page entries allocated for disk page buffer memory by raising the value of the
DiskPageBufferEntriesdata node configuration parameter added in this release. (Bug #19958804)
NDB Disk Data: In some cases, during
DICTmaster takeover, the new master could crash while attempting to roll forward an ongoing schema transaction. (Bug #19875663, Bug #74510)
NDB Disk Data: When a node acting as a
DICTmaster fails, the arbitrator selects another node to take over in place of the failed node. During the takeover procedure, which includes cleaning up any schema transactions which are still open when the master failed, the disposition of the uncommitted schema transaction is decided. Normally this transaction be rolled back, but if it has completed a sufficient portion of a commit request, the new master finishes processing the commit. Until the fate of the transaction has been decided, no new
TRANS_END_REQmessages from clients can be processed. In addition, since multiple concurrent schema transactions are not supported, takeover cleanup must be completed before any new transactions can be started.
A similar restriction applies to any schema operations which are performed in the scope of an open schema transaction. The counter used to coordinate schema operation across all nodes is employed both during takeover processing and when executing any non-local schema operations. This means that starting a schema operation while its schema transaction is in the takeover phase causes this counter to be overwritten by concurrent uses, with unpredictable results.
The scenarios just described were handled previously using a pseudo-random delay when recovering from a node failure. Now we check before the new master has rolled forward or backwards any schema transactions remaining after the failure of the previous master and avoid starting new schema transactions or performing operations using old transactions until takeover processing has cleaned up after the abandoned transaction. (Bug #19874809, Bug #74503)
NDB Disk Data: When a node acting as
DICTmaster fails, it is still possible to request that any open schema transaction be either committed or aborted by sending this request to the new
DICTmaster. In this event, the new master takes over the schema transaction and reports back on whether the commit or abort request succeeded. In certain cases, it was possible for the new master to be misidentified—that is, the request was sent to the wrong node, which responded with an error that was interpreted by the client application as an aborted schema transaction, even in cases where the transaction could have been successfully committed, had the correct node been contacted. (Bug #74521, Bug #19880747)
NDB Cluster APIs: It was possible to delete an
Ndb_cluster_connectionobject while there remained instances of
Ndbusing references to it. Now the
Ndb_cluster_connectiondestructor waits for all related
Ndbobjects to be released before completing. (Bug #19999242)
References: See also: Bug #19846392.
NDB Cluster APIs: The buffer allocated by an
NdbScanOperationfor receiving scanned rows was not released until the
NdbTransactionowning the scan operation was closed. This could lead to excessive memory usage in an application where multiple scans were created within the same transaction, even if these scans were closed at the end of their lifecycle, unless
NdbScanOperation::close()was invoked with the
releaseOpargument equal to
true. Now the buffer is released whenever the cursor navigating the result set is closed with
NdbScanOperation::close(), regardless of the value of this argument. (Bug #75128, Bug #20166585)
Functionality Added or Changed
After adding new data nodes to the configuration file of a MySQL Cluster having many API nodes, but prior to starting any of the data node processes, API nodes tried to connect to these “missing” data nodes several times per second, placing extra loads on management nodes and the network. To reduce unnecessary traffic caused in this way, it is now possible to control the amount of time that an API node waits between attempts to connect to data nodes which fail to respond; this is implemented in two new API node configuration parameters
Time elapsed during node connection attempts is not taken into account when applying these parameters, both of which are given in milliseconds with approximately 100 ms resolution. As long as the API node is not connected to any data nodes as described previously, the value of the
StartConnectBackoffMaxTimeparameter is applied; otherwise,
In a MySQL Cluster with many unstarted data nodes, the values of these parameters can be raised to circumvent connection attempts to data nodes which have not yet begun to function in the cluster, as well as moderate high traffic to management nodes.
For more information about the behavior of these parameters, see Defining SQL and Other API Nodes in an NDB Cluster. (Bug #17257842)
--exclude-missing-tablesoption for ndb_restore. When enabled, the option causes tables present in the backup but not in the target database to be ignored. (Bug #57566, Bug #11764704)
When assembling error messages of the form Incorrect state for node
node_state, written when the transporter failed to connect, the node state was used in place of the node ID in a number of instances, which resulted in errors of this type for which the node state was reported incorrectly. (Bug #19559313, Bug #73801)
In some cases, transporter receive buffers were reset by one thread while being read by another. This happened when a race condition occurred between a thread receiving data and another thread initiating disconnect of the transporter (disconnection clears this buffer). Concurrency logic has now been implemented to keep this race from taking place. (Bug #19552283, Bug #73790)
The failure of a data node could in some situations cause a set of API nodes to fail as well due to the sending of a
CLOSE_COMREQsignal that was sometimes not completely initialized. (Bug #19513967)
A more detailed error report is printed in the event of a critical failure in one of the
sendSignal*()methods, prior to crashing the process, as was already implemented for
sendSignal(), but was missing from the more specialized
sendSignalNoRelease()method. Having a crash of this type correctly reported can help with identifying configuration hardware issues in some cases. (Bug #19414511)
References: See also: Bug #19390895.
ndb_restore failed to restore the cluster's metadata when there were more than approximately 17 K data objects. (Bug #19202654)
The fix for a previous issue with the handling of multiple node failures required determining the number of TC instances the failed node was running, then taking them over. The mechanism to determine this number sometimes provided an invalid result which caused the number of TC instances in the failed node to be set to an excessively high value. This in turn caused redundant takeover attempts, which wasted time and had a negative impact on the processing of other node failures and of global checkpoints. (Bug #19193927)
References: This issue is a regression of: Bug #18069334.
Parallel transactions performing reads immediately preceding a delete on the same tuple could cause the
NDBkernel to crash. This was more likely to occur when separate TC threads were specified using the
ThreadConfigconfiguration parameter. (Bug #19031389)
Attribute promotion between different
TEXTtypes (any of
LONGTEXT) by ndb_restore was not handled properly in some cases. In addition,
TEXTvalues are now truncated according to the limits set by mysqld (for example, values converted to
TINYTEXTfrom another type are truncated to 256 bytes). In the case of columns using a multibyte character set, the value is truncated to the end of the last well-formed character.
Also as a result of this fix, conversion to a
TEXTcolumn of any size that uses a different character set from the original is now disallowed. (Bug #18875137)
NDBoptimized node recovery mechanism attempts to transfer only relevant page changes to a starting node in order to speed the recovery process; this is done by having the starting node indicate the index of the last global checkpoint (GCI) in which it participated, so that the node that was already running copies only data for rows which have changed since that GCI. Every row has a GCI metacolumn which facilitates this; for a deleted row, the slot formerly stpring this row's data contains a GCI value, and for deleted pages, every row on the missing page is considered changed and thus needs to be sent.
When these changes are received by the starting node, this node performs a lookup for the page and index to determine what they contain. This lookup could cause a real underlying page to be mapped against the logical page ID, even when this page contained no data.
One way in which this issue could manifest itself occurred after cluster
DataMemoryusage approached maximum, and deletion of many rows followed by a rolling restart of the data nodes was performed with the expectation that this would free memory, but in fact it was possible in this scenario for memory not to be freed and in some cases for memory usage actually to increase to its maximum.
This fix solves these issues by ensuring that a real physical page is mapped to a logical ID during node recovery only when this page contains actual data which needs to be stored. (Bug #18683398, Bug #18731008)
When a data node sent a
MISSING_DATAsignal due to a buffer overflow and no event data had yet been sent for the current epoch, the dummy event list created to handle this inconsistency was not deleted after the information in the dummy event list was transferred to the completed list. (Bug #18410939)
Incorrect calculation of the next autoincrement value following a manual insertion towards the end of a cached range could result in duplicate values sometimes being used. This issue could manifest itself when using certain combinations of values for
This issue has been fixed by modifying the calculation to make sure that the next value from the cache as computed by
NDBis of the form
auto_increment_offset + (. This avoids any rounding up by the MySQL Server of the returned value, which could result in duplicate entries when the rounded-up value fell outside the range of values cached by
NDB. (Bug #17893872)
--helpoption with ndb_print_file caused the program to segfault. (Bug #17069285)
For multithreaded data nodes, some threads do communicate often, with the result that very old signals can remain at the top of the signal buffers. When performing a thread trace, the signal dumper calculated the latest signal ID from what it found in the signal buffers, which meant that these old signals could be erroneously counted as the newest ones. Now the signal ID counter is kept as part of the thread state, and it is this value that is used when dumping signals for trace files. (Bug #73842, Bug #19582807)
NDB Cluster APIs: The fix for Bug #16723708 stopped the
ndb_logevent_get_next()function from casting a log event's
enumtype, but this change interfered with existing applications, and so the function's original behavior is now reinstated. A new MGM API function exhibiting the corrected behavior
ndb_logevent_get_next2()has been added in this release to take the place of the reverted function, for use in applications that do not require backward compatibility. In all other respects apart from this, the new function is identical with its predecessor. (Bug #18354165)
References: Reverted patches: Bug #16723708.
NDB Cluster APIs: NDB API scans leaked
nextResult()was called when an operation resulted in an error. This leak locked up the corresponding connection objects in the
DBTCkernel block until the connection was closed. (Bug #17730825, Bug #20170731)
Functionality Added or Changed
NDB Cluster APIs: Added as an aid to debugging the ability to specify a human-readable name for a given
Ndbobject and later to retrieve it. These operations are implemented, respectively, as the
To make tracing of event handling between a user application and
NDBeasier, you can use the reference (from
getReference()followed by the name (if provided) in printouts; the reference ties together the application
Ndbobject, the event buffer, and the
SUMAblock. (Bug #18419907)
NDB Cluster APIs: When two tables had different foreign keys with the same name, ndb_restore considered this a name conflict and failed to restore the schema. As a result of this fix, a slash character (
/) is now expressly disallowed in foreign key names, and the naming format
fk_nameis now enforced by the NDB API. (Bug #18824753)
Processing a NODE_FAILREP signal that contained an invalid node ID could cause a data node to fail. (Bug #18993037, Bug #73015)
References: This issue is a regression of: Bug #16007980.
When building out of source, some files were written to the source directory instead of the build dir. These included the
manifest.mffiles used for creating ClusterJ jars and the
pom.xmlfile used by
mvn_install_ndbjtie.sh. In addition,
ndbinfo.sqlwas written to the build directory, but marked as output to the source directory in
CMakeLists.txt. (Bug #18889568, Bug #72843)
When the binary log injector thread commits an epoch to the binary log and this causes the log file to reach maximum size, it may need to rotate the binary log. The rotation is not performed until either all the committed transactions from all client threads are flushed to the binary log, or a maximum of 30 seconds has elapsed. In the case where all transactions were committed prior to the 30-second wait, it was possible for committed transactions from multiple client threads to belong to newer epochs than the latest epoch committed by the injector thread, causing the thread to deadlock with itself, and causing an unnecessary 30-second delay before breaking the deadlock. (Bug #18845822)
Adding a foreign key failed with NDB Error 208 if the parent index was parent table's primary key, the primary key was not on the table's initial attributes, and the child table was not empty. (Bug #18825966)
NDBtable served as both the parent table and a child table for 2 different foreign keys having the same name, dropping the foreign key on the child table could cause the foreign key on the parent table to be dropped instead, leading to a situation in which it was impossible to drop the remaining foreign key. This situation can be modelled using the following
CREATE TABLE parent ( id INT NOT NULL, PRIMARY KEY (id) ) ENGINE=NDB; CREATE TABLE child ( id INT NOT NULL, parent_id INT, PRIMARY KEY (id), INDEX par_ind (parent_id), FOREIGN KEY (parent_id) REFERENCES parent(id) ) ENGINE=NDB; CREATE TABLE grandchild ( id INT, parent_id INT, INDEX par_ind (parent_id), FOREIGN KEY (parent_id) REFERENCES child(id) ) ENGINE=NDB;
With the tables created as just shown, the issue occured when executing the statement
ALTER TABLE child DROP FOREIGN KEY parent_id, because it was possible in some cases for
NDBto drop the foreign key from the
grandchildtable instead. When this happened, any subsequent attempt to drop the foreign key from either the
childor from the
grandchildtable failed. (Bug #18662582)
It was possible for a data node restart to become stuck indefinitely in start phase 101 (see Summary of NDB Cluster Start Phases) when there were connection problems between the node being restarted and one or more subscribing API nodes.
To help prevent this from happening, a new data node configuration parameter
RestartSubscriberConnectTimeouthas been introduced, which can be used to control how long a data node restart can stall in start phase 101 before giving up and attempting to restart again. The default is 12000 ms. (Bug #18599198)
ALTER TABLE ... REORGANIZE PARTITIONafter increasing the number of data nodes in the cluster from 4 to 16 led to a crash of the data nodes. This issue was shown to be a regression caused by previous fix which added a new dump handler using a dump code that was already in use (7019), which caused the command to execute two different handlers with different semantics. The new handler was assigned a new
DUMPcode (7024). (Bug #18550318)
References: This issue is a regression of: Bug #14220269.
Following a long series of inserts, when running with a relatively small redo log and an insufficient large value for
MaxNoOfConcurrentTransactions, there remained transactions that were blocked by the lack of redo log and were thus not aborted in the correct state (waiting for prepare log to be sent to disk, or
LOG_QUEUEDstate). This caused the redo log to remain blocked until unblocked by a completion of a local checkpoint. This could lead to a deadlock, when the blocked aborts in turned blocked global checkpoints, and blocked GCPs block LCPs. To prevent this situation from arising, we now abort immediately when we reach the
LOG_QUEUEDstate in the abort state handler. (Bug #18533982)
ndbmtd supports multiple parallel receiver threads, each of which performs signal reception for a subset of the remote node connections (transporters) with the mapping of remote_nodes to receiver threads decided at node startup. Connection control is managed by the multi-instance
TRPMANblock, which is organized as a proxy and workers, and each receiver thread has a
TRPMANworker running locally.
QMGRblock sends signals to
TRPMANto enable and disable communications with remote nodes. These signals are sent to the
TRPMANproxy, which forwards them to the workers. The workers themselves decide whether to act on signals, based on the set of remote nodes they manage.
The current issue arises because the mechanism used by the
TRPMANworkers for determining which connections they are responsible for was implemented in such a way that each worker thought it was responsible for all connections. This resulted in the
CLOSE_COMREQbeing processed multiple times.
The fix keeps
TRPMANinstances (receiver threads) executing
CLOSE_COMREQrequests. In addition, the correct
TRPMANinstance is now chosen when routing from this instance for a specific remote connection. (Bug #18518037)
During data node failure handling, the transaction coordinator performing takeover gathers all known state information for any failed TC instance transactions, determines whether each transaction has been committed or aborted, and informs any involved API nodes so that they can report this accurately to their clients. The TC instance provides this information by sending
TCKEY_FAILCONFsignals to the API nodes as appropriate top each affected transaction.
In the event that this TC instance does not have a direct connection to the API node, it attempts to deliver the signal by routing it through another data node in the same node group as the failing TC, and sends a
GSN_TCKEY_FAILREFCONF_Rsignal to TC block instance 0 in that data node. A problem arose in the case of multiple transaction cooridnators, when this TC instance did not have a signal handler for such signals, which led it to fail.
This issue has been corrected by adding a handler to the TC proxy block which in such cases forwards the signal to one of the local TC worker instances, which in turn attempts to forward the signal on to the API node. (Bug #18455971)
When running with a very slow main thread, and one or more transaction coordinator threads, on different CPUs, it was possible to encounter a timeout when sending a
DIH_SCAN_GET_NODESREQsignal, which could lead to a crash of the data node. Now in such cases the timeout is avoided. (Bug #18449222)
Failure of multiple nodes while using ndbmtd with multiple TC threads was not handled gracefully under a moderate amount of traffic, which could in some cases lead to an unplanned shutdown of the cluster. (Bug #18069334)
A local checkpoint (LCP) is tracked using a global LCP state (
c_lcpState), and each
NDBtable has a status indicator which indicates the LCP status of that table (
tabLcpStatus). If the global LCP state is
LCP_STATUS_IDLE, then all the tables should have an LCP status of
When an LCP starts, the global LCP status is
LCP_INIT_TABLESand the thread starts setting all the
TLS_ACTIVE. If any tables are not ready for LCP, the LCP initialization procedure continues with
CONTINUEBsignals until all tables have become available and been marked
TLS_ACTIVE. When this initialization is complete, the global LCP status is set to
This bug occurred when the following conditions were met:
An LCP was in the
LCP_INIT_TABLESstate, and some but not all tables had been set to
The master node failed before the global LCP state changed to
LCP_STATUS_ACTIVE; that is, before the LCP could finish processing all tables.
NODE_FAILREPsignal resulting from the node failure was processed before the final
CONTINUEBsignal from the LCP initialization process, so that the node failure was processed while the LCP remained in the
Following master node failure and selection of a new one, the new master queries the remaining nodes with a
MASTER_LCPREQsignal to determine the state of the LCP. At this point, since the LCP status was
LCP_INIT_TABLES, the LCP status was reset to
LCP_STATUS_IDLE. However, the LCP status of the tables was not modified, so there remained tables with
TLS_ACTIVE. Afterwards, the failed node is removed from the LCP. If the LCP status of a given table is
TLS_ACTIVE, there is a check that the global LCP status is not
LCP_STATUS_IDLE; this check failed and caused the data node to fail.
MASTER_LCPREQhandler ensures that the
tabLcpStatusfor all tables is updated to
TLS_COMPLETEDwhen the global LCP status is changed to
LCP_STATUS_IDLE. (Bug #18044717)
When performing a copying
ALTER TABLEoperation, mysqld creates a new copy of the table to be altered. This intermediate table, which is given a name bearing the prefix
#sql-, has an updated schema but contains no data. mysqld then copies the data from the original table to this intermediate table, drops the original table, and finally renames the intermediate table with the name of the original table.
mysqld regards such a table as a temporary table and does not include it in the output from
SHOW TABLES; mysqldump also ignores an intermediate table. However,
NDBsees no difference between such an intermediate table and any other table. This difference in how intermediate tables are viewed by mysqld (and MySQL client programs) and by the
NDBstorage engine can give rise to problems when performing a backup and restore if an intermediate table existed in
NDB, possibly left over from a failed
ALTER TABLEthat used copying. If a schema backup is performed using mysqldump and the mysql client, this table is not included. However, in the case where a data backup was done using the ndb_mgm client's
BACKUPcommand, the intermediate table was included, and was also included by ndb_restore, which then failed due to attempting to load data into a table which was not defined in the backed up schema.
To prevent such failures from occurring, ndb_restore now by default ignores intermediate tables created during
ALTER TABLEoperations (that is, tables whose names begin with the prefix
#sql-). A new option
--exclude-intermediate-sql-tablesis added that makes it possible to override the new behavior. The option's default value is
TRUE; to cause ndb_restore to revert to the old behavior and to attempt to restore intermediate tables, set this option to
FALSE. (Bug #17882305)
The logging of insert failures has been improved. This is intended to help diagnose occasional issues seen when writing to the
mysql.ndb_binlog_indextable. (Bug #17461625)
DEFINERcolumn in the
INFORMATION_SCHEMA.VIEWStable contained erroneous values for views contained in the
ndbinfoinformation database. This could be seen in the result of a query such as
SELECT TABLE_NAME, DEFINER FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_SCHEMA='ndbinfo'. (Bug #17018500)
CHARcolumn that used the
UTF8character set as a table's primary key column led to node failure when restarting data nodes. Attempting to restore a table with such a primary key also caused ndb_restore to fail. (Bug #16895311, Bug #68893)
-o) option for the ndb_select_all utility worked only when specified as the last option, and did not work with an equals sign.
As part of this fix, the program's
--helpoutput was also aligned with the
--orderoption's correct behavior. (Bug #64426, Bug #16374870)
NDB Disk Data: Setting the undo buffer size used by
InitialLogFileGroupto a value greater than that set by
SharedGlobalMemoryprevented data nodes from starting; the data nodes failed with Error 1504 Out of logbuffer memory. While the failure itself is expected behavior, the error message did not provide sufficient information to diagnose the actual source of the problem; now in such cases, a more specific error message Out of logbuffer memory (specify smaller undo_buffer_size or increase SharedGlobalMemory) is supplied. (Bug #11762867, Bug #55515)
NDB Cluster APIs: When an
NDBdata node indicates a buffer overflow via an empty epoch, the event buffer places an inconsistent data event in the event queue. When this was consumed, it was not removed from the event queue as expected, causing subsequent
nextEvent()calls to return 0. This caused event consumption to stall because the inconsistency remained flagged forever, while event data accumulated in the queue.
Event data belonging to an empty inconsistent epoch can be found either at the beginning or somewhere in the middle.
pollEvents()returns 0 for the first case. This fix handles the second case: calling
nextEvent()call dequeues the inconsistent event before it returns. In order to benefit from this fix, user applications must call
pollEvents()returns 0. (Bug #18716991)
NDB Cluster APIs: The
pollEvents()method returned 1, even when called with a wait time equal to 0, and there were no events waiting in the queue. Now in such cases it returns 0 as expected. (Bug #18703871)
Functionality Added or Changed
LongMessageBuffershortages and statistics has been improved as follows:
The default value of
LongMessageBufferhas been increased from 4 MB to 64 MB.
When this resource is exhausted, a suitable informative message is now printed in the data node log describing possible causes of the problem and suggesting possible solutions.
LongMessageBufferusage information is now shown in the
ndbinfo.memoryusagetable. See the description of this table for an example and additional information.
Important Change: The server system variables
ndb_index_stat_freq, which had been deprecated in a previous MySQL Cluster release series, have now been removed. (Bug #11746486, Bug #26673)
ALTER TABLEstatement changed table schemas without causing a change in the table's partitioning, the new table definition did not copy the hash map from the old definition, but used the current default hash map instead. However, the table data was not reorganized according to the new hashmap, which made some rows inaccessible using a primary key lookup if the two hash maps had incompatible definitions.
To keep this situation from occurring, any
ALTER TABLEthat entails a hashmap change now triggers a reorganisation of the table. In addition, when copying a table definition in such cases, the hashmap is now also copied. (Bug #18436558)
When certain queries generated signals having more than 18 data words prior to a node failure, such signals were not written correctly in the trace file. (Bug #18419554)
Checking of timeouts is handled by the signal
TIME_SIGNAL. Previously, this signal was generated by the
NDBkernel block in the main thread, and sent to the
DBTCblocks (see NDB Kernel Blocks) as needed to check (respectively) heartbeats, disk writes, and transaction timeouts. In ndbmtd (as opposed to ndbd), these blocks all execute in different threads. This meant that if, for example, QMGR was actively working and some other thread was put to sleep, the previously sleeping thread received a large number of TIME_SIGNAL messages simultaneously when it was woken up again, with the effect that effective times moved very quickly in DBLQH as well as in DBTC. In DBLQH, this had no noticeable adverse effects, but this was not the case in DBTC; the latter block could not work on transactions even though time was still advancing, leading to a situation in which many operations appeared to time out because the transaction coordinator (TC) thread was comparatively slow in answering requests.
In addition, when the TC thread slept for longer than 1500 milliseconds, the data node crashed due to detecting that the timeout handling loop had not yet stopped. To rectify this problem, the generation of the
TIME_SIGNALhas been moved into the local threads instead of
QMGR; this provides for better control over how quickly
TIME_SIGNALmessages are allowed to arrive. (Bug #18417623)
After dropping an
NDBtable, neither the cluster log nor the output of the
REPORT MemoryUsagecommand showed that the
IndexMemoryused by that table had been freed, even though the memory had in fact been deallocated. This issue was introduced in MySQL Cluster NDB 7.3.2. (Bug #18296810)
ndb_show_tables sometimes failed with the error message Unable to connect to management server and immediately terminated, without providing the underlying reason for the failure. To provide more useful information in such cases, this program now also prints the most recent error from the
Ndb_cluster_connectionobject used to instantiate the connection. (Bug #18276327)
-DWITH_NDBMTD=0did not function correctly, which could cause the build to fail on platforms such as ARM and Raspberry Pi which do not define the memory barrier functions required to compile ndbmtd. (Bug #18267919)
References: See also: Bug #16620938.
The block threads managed by the multi-threading scheduler communicate by placing signals in an out queue or job buffer which is set up between all block threads. This queue has a fixed maximum size, such that when it is filled up, the worker thread must wait for the consumer to drain the queue. In a highly loaded system, multiple threads could end up in a circular wait lock due to full out buffers, such that they were preventing each other from performing any useful work. This condition eventually led to the data node being declared dead and killed by the watchdog timer.
To fix this problem, we detect situations in which a circular wait lock is about to begin, and cause buffers which are otherwise held in reserve to become available for signal processing by queues which are highly loaded. (Bug #18229003)
An issue found when compiling the MySQL Cluster software for Solaris platforms could lead to problems when using
ThreadConfigon such systems. (Bug #18181656)
The ndb_mgm client
START BACKUPcommand (see Commands in the NDB Cluster Management Client) could experience occasional random failures when a ping was received prior to an expected
BackupCompletedevent. Now the connection established by this command is not checked until it has been properly set up. (Bug #18165088)
When creating a table with foreign key referencing an index in another table, it sometimes appeared possible to create the foreign key even if the order of the columns in the indexes did not match, due to the fact that an appropriate error was not always returned internally. This fix improves the error used internally to work in most cases; however, it is still possible for this situation to occur in the event that the parent index is a unique index. (Bug #18094360)
Dropping a nonexistent foreign key on an
NDBtable (using, for example,
ALTER TABLE) appeared to succeed. Now in such cases, the statement fails with a relevant error message, as expected. (Bug #17232212)
Data nodes running ndbmtd could stall while performing an online upgrade of a MySQL Cluster containing a great many tables from a version prior to NDB 7.2.5 to version 7.2.5 or later. (Bug #16693068)
NDB Cluster APIs: When an NDB API client application received a signal with an invalid block or signal number,
NDBprovided only a very brief error message that did not accurately convey the nature of the problem. Now in such cases, appropriate printouts are provided when a bad signal or message is detected. In addition, the message length is now checked to make certain that it matches the size of the embedded signal. (Bug #18426180)
NDB Cluster APIs: Refactoring that was performed in MySQL Cluster NDB 7.3.4 inadvertently introduced a dependency in
Ndb.hppon a file that is not included in the distribution, which caused NDB API applications to fail to compile. The dependency has been removed. (Bug #18293112, Bug #71803)
References: This issue is a regression of: Bug #17647637.
NDB Cluster APIs: An NDB API application sends a scan query to a data node; the scan is processed by the transaction coordinator (TC). The TC forwards a
LQHKEYREQrequest to the appropriate LDM, and aborts the transaction if it does not receive a
LQHKEYCONFresponse within the specified time limit. After the transaction is successfully aborted, the TC sends a
TCROLLBACKREPto the NDBAPI client, and the NDB API client processes this message by cleaning up any
Ndbobjects associated with the transaction.
The client receives the data which it has requested in the form of
TRANSID_AIsignals, buffered for sending at the data node, and may be delivered after a delay. On receiving such a signal,
NDBchecks the transaction state and ID: if these are as expected, it processes the signal using the
Ndbobjects associated with that transaction.
The current bug occurs when all the following conditions are fulfilled:
The transaction coordinator aborts a transaction due to delays and sends a
TCROLLBACPREPsignal to the client, while at the same time a
TRANSID_AIwhich has been buffered for delivery at an LDM is delivered to the same client.
The NDB API client considers the transaction complete on receipt of a
TCROLLBACKREPsignal, and immediately closes the transaction.
The client has a separate receiver thread running concurrently with the thread that is engaged in closing the transaction.
The arrival of the late
TRANSID_AIinterleaves with the closing of the user thread's transaction such that
TRANSID_AIprocessing passes normal checks before
closeTransaction()resets the transaction state and invalidates the receiver.
When these conditions are all met, the receiver thread proceeds to continue working on the
TRANSID_AIsignal using the invalidated receiver. Since the receiver is already invalidated, its usage results in a node failure.
Ndbobject cleanup done for
TCROLLBACKREPincludes invalidation of the transaction ID, so that, for a given transaction, any signal which is received after the
TCROLLBACKREParrives does not pass the transaction ID check and is silently dropped. This fix is also implemented for the
TCKEY_FAILREFsignals as well.
See also Operations and Signals, for additional information about NDB messaging. (Bug #18196562)
NDB Cluster APIs: The example
ndbapi-examples/ndbapi_blob_ndbrecord/main.cppincluded an internal header file (
ndb_global.h) not found in the MySQL Cluster binary distribution. The example now uses
string.hinstead of this file. (Bug #18096866, Bug #71409)
NDB Cluster APIs: When
Dictionary::dropTable()attempted (as a normal part of its internal operations) to drop an index used by a foreign key constraint, the drop failed. Now in such cases, invoking
dropTable()causes all foreign keys on the table to be dropped, whether this table acts as a parent table, child table, or both.
This issue did not affect dropping of indexes using SQL statements. (Bug #18069680)
References: See also: Bug #17591531.
NDB Cluster APIs: ndb_restore could sometimes report Error 701 System busy with other schema operation unnecessarily when restoring in parallel. (Bug #17916243)
Packaging: Compilation of ndbmtd failed on Solaris 10 and 11 for 32-bit
x86, and the binary was not included in the binary distributions for these platforms. (Bug #16620938)
NDB Disk Data: When using Disk Data tables and ndbmtd data nodes, it was possible for the undo buffer to become overloaded, leading to a crash of the data nodes. This issue was more likely to be encountered when using Disk Data columns whose size was approximately 8K or larger. (Bug #16766493)
NDB Cluster APIs:
UINT_MAX64was treated as a signed value by Visual Studio 2010. To prevent this from happening, the value is now explicitly defined as unsigned. (Bug #17947674)
References: See also: Bug #17647637.
Interrupting a drop of a foreign key could cause the underlying table to become corrupt. (Bug #18041636)
Monotonic timers on several platforms can experience issues which might result in the monotonic clock doing small jumps back in time. This is due to imperfect synchronization of clocks between multiple CPU cores and does not normally have an adverse effect on the scheduler and watchdog mechanisms; so we handle some of these cases by making backtick protection less strict, although we continue to ensure that the backtick is less than 10 milliseconds. This fix also removes several checks for backticks which are thereby made redundant. (Bug #17973819)
Under certain specific circumstances, in a cluster having two SQL nodes, one of these could hang, and could not be accessed again even after killing the mysqld process and restarting it. (Bug #17875885, Bug #18080104)
References: See also: Bug #17934985.
Poor support or lack of support on some platforms for monotonic timers caused issues with delayed signal handling by the job scheduler for the multithreaded data node. Variances (timer leaps) on such platforms are now handled in the same way the multithreaded data node process that they are by the singlethreaded version. (Bug #17857442)
References: See also: Bug #17475425, Bug #17647637.
In some cases, with
ndb_join_pushdownenabled, it was possible to obtain from a valid query the error Got error 290 'Corrupt key in TC, unable to xfrm' from NDBCLUSTER even though the data was not actually corrupted.
It was determined that a
VARCHARcolumn could be used to construct a lookup key, but since
NULLis never equal to any other value, such a lookup could simple have been eliminated instead. This
NULLlookup in turn led to the spurious error message.
This fix takes advantage of the fact that a key lookup with
NULLnever finds any matching rows, and so
NDBdoes not try to perform the lookup that would have led to the error. (Bug #17845161)
It was theoretically possible in certain cases for a number of output functions internal to the
NDBcode to supply an uninitialized buffer as output. Now in such cases, a newline character is printed instead. (Bug #17775602, Bug #17775772)
Use of the
NDBmultithreading code led to otherwise nondeterministic failures in ndbmtd. This fix replaces this function, which on many platforms uses a buffer shared among multiple threads, with
localtime_r(), which can have allocated to it a buffer of its own. (Bug #17750252)
When using single-threaded (ndbd) data nodes with
RealTimeSchedulerenabled, the CPU did not, as intended, temporarily lower its scheduling priority to normal every 10 milliseconds to give other, non-realtime threads a chance to run. (Bug #17739131)
During arbitrator selection,
QMGR(see The QMGR Block) runs through a series of states, the first few of which are (in order)
START. A check for an arbitration selection timeout occurred in the
FINDstate, even though the corresponding timer was not set until
PREP2states. Attempting to read the resulting uninitialized timestamp value could lead to false Could not find an arbitrator, cluster is not partition-safe warnings.
This fix moves the setting of the timer for arbitration timeout to the
INITstate, so that the value later read during
FINDis always initialized. (Bug #17738720)
Timers used in timing scheduler events in the
NDBkernel have been refactored, in part to insure that they are monotonic on all platforms. In particular, on Windows, event intervals were previously calculated using values obtained from
GetSystemTimeAsFileTime(), which reads directly from the system time (“wall clock”), and which may arbitrarily be reset backward or forward, leading to false watchdog or heartbeat alarms, or even node shutdown. Lack of timer monotonicity could also cause slow disk writes during backups and global checkpoints. To fix this issue, the Windows implementation now uses
GetSystemTimeAsFileTime(). In the event that a monotonic timer is not found on startup of the data nodes, a warning is logged.
In addition, on all platforms, a check is now performed at compile time for available system monotonic timers, and the build fails if one cannot be found; note that
CLOCK_HIGHRESis now supported as an alternative for
CLOCK_MONOTONICif the latter is not available. (Bug #17647637)
The global checkpoint lag watchdog tracking the number of times a check for GCP lag was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the GCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called.
In addition, any backticks (rare in any case) are now handled by taking the backward time as the new current time and calculating the elapsed time for this round as 0. Finally, any ill effects of a forward leap, which possibly could expire the watchdog timer immediately, are reduced by never calculating an elapsed time longer than the requested delay time for the watchdog timer. (Bug #17647469)
References: See also: Bug #17842035.
The length of the interval (intended to be 10 seconds) between warnings for
GCP_COMMITwhen the GCP progress watchdog did not detect progress in a global checkpoint was not always calculated correctly. (Bug #17647213)
Trying to drop an index used by a foreign key constraint caused data node failure. Now in such cases, the statement used to perform the drop fails. (Bug #17591531)
In certain rare cases on commit of a transaction, an
Ndbobject was released before the transaction coordinator (
DBTCkernel block) sent the expected
NDBfailed to send a
COMMIT_ACKsignal in response, which caused a memory leak in the
NDBkernel could later lead to node failure.
Ndbobject is not released until the
COMMIT_CONFsignal has actually been received. (Bug #16944817)
Losing its connections to the management node or data nodes while a query against the
ndbinfo.memoryusagetable was in progress caused the SQL node where the query was issued to fail. (Bug #14483440, Bug #16810415)
The ndbd_redo_log_reader utility now supports a
--helpoption. Using this options causes the program to print basic usage information, and then to exit. (Bug #11749591, Bug #36805)
NDB Cluster APIs: It was possible for an
Ndbobject to receive signals for handling before it was initialized, leading to thread interleaving and possible data node failure when executing a call to
Ndb::init(). To guard against this happening, a check is now made when it is starting to receive signals that the
Ndbobject is properly initialized before any signals are actually handled. (Bug #17719439)
NDB Cluster APIs: Compilation of example NDB API program files failed due to missing include directives. (Bug #17672846, Bug #70759)
NDB Cluster APIs: An application, having opened two distinct instances of
Ndb_cluster_connection, attempted to use the second connection object to send signals to itself, but these signals were blocked until the destructor was explicitly called for that connection object. (Bug #17626525)
References: This issue is a regression of: Bug #16595838.
Functionality Added or Changed
The length of time a management node waits for a heartbeat message from another management node is now configurable using the
HeartbeatIntervalMgmdMgmdmanagement node configuration parameter added in this release. The connection is considered dead after 3 missed heartbeats. The default value is 1500 milliseconds, or a timeout of approximately 6000 ms. (Bug #17807768, Bug #16426805)
The MySQL Cluster Auto-Installer now generates a
my.cnffile for each mysqld in the cluster before starting it. For more information, see Using the NDB Cluster Auto-Installer. (Bug #16994782)
Performance: In a number of cases found in various locations in the MySQL Cluster codebase, unnecessary iterations were performed; this was caused by failing to break out of a repeating control structure after a test condition had been met. This community-contributed fix removes the unneeded repetitions by supplying the missing breaks. (Bug #16904243, Bug #69392, Bug #16904338, Bug #69394, Bug #16778417, Bug #69171, Bug #16778494, Bug #69172, Bug #16798410, Bug #69207, Bug #16801489, Bug #69215, Bug #16904266, Bug #69393)
Packaging: Portions of the documentation specific to MySQL Cluster and the
NDBstorage engine were not included when installing from RPMs. (Bug #16303451)
NDB Disk Data:
NDBerror 899 RowId already allocated was raised due to a RowId “leak” which occurred under either of the following sets of circumstances:
Insertion of a row into an in-memory table was rejected after an ordered index update failed due to insufficient
Insertion of a row into a Disk Data table was rejected due to lack of sufficient table space.
References: See also: Bug #22494024, Bug #13990924.
ndb_restore could abort during the last stages of a restore using attribute promotion or demotion into an existing table. This could happen if a converted attribute was nullable and the backup had been run on active database. (Bug #17275798)
It was not possible to start MySQL Cluster processes created by the Auto-Installer on a Windows host running freeSSHd. (Bug #17269626)
DBUTILdata node block is now less strict about the order in which it receives certain messages from other nodes. (Bug #17052422)
ALTER ONLINE TABLE ... REORGANIZE PARTITIONfailed when run against a table having or using a reference to a foreign key. (Bug #17036744, Bug #69619)
TUPKEYREQsignals are used to read data from the tuple manager block (
DBTUP), and are used for all types of data access, especially for scans which read many rows. A TUPKEYREQ specifies a series of 'columns' to be read, which can be either single columns in a specific table, or pseudocolumns, two of which—
READ_PACKED—are aliases to read all columns in a table, or some subset of these columns. Pseudocolumns are used by modern NDB API applications as they require less space in the
TUPKEYREQto specify columns to be read, and can return the data in a more compact (packed) format.
This fix moves the creation and initialization of on-stack Signal objects to only those pseudocolumn reads which need to
EXECUTE_DIRECTto other block instances, rather than for every read. In addition, the size of an on-stack signal is now varied to suit the requirements of each pseudocolumn, so that only reads of the
INDEX_STATpseudocolumn now require initialization (and 3KB memory each time this is performed). (Bug #17009502)
A race condition could sometimes occur when trying to lock receive threads to cores. (Bug #17009393)
Results from joins using a
ORDER BY ... DESCclause were not sorted properly; the
DESCkeyword in such cases was effectively ignored. (Bug #16999886, Bug #69528)
The Windows error ERROR_FILE_EXISTS was not recognized by
NDB, which treated it as an unknown error. (Bug #16970960)
RealTimeSchedulerdid not work correctly with data nodes running ndbmtd. (Bug #16961971)
File system errors occurring during a local checkpoint could sometimes cause an LCP to hang with no obvious cause when they were not handled correctly. Now in such cases, such errors always cause the node to fail. Note that the LQH block always shuts down the node when a local checkpoint fails; the change here is to make likely node failure occur more quickly and to make the original file system error more visible. (Bug #16961443)
Maintenance and checking of parent batch completion in the
SPJblock of the
NDBkernel was reimplemented. Among other improvements, the completion state of all ancestor nodes in the tree are now preserved. (Bug #16925513)
Dropping a column, which was not itself a foreign key, from an
NDBtable having foreign keys failed with ER_TABLE_DEF_CHANGED. (Bug #16912989)
The LCP fragment scan watchdog periodically checks for lack of progress in a fragment scan performed as part of a local checkpoint, and shuts down the node if there is no progress after a given amount of time has elapsed. This interval, formerly hard-coded as 60 seconds, can now be configured using the
LcpScanProgressTimeoutdata node configuration parameter added in this release.
This configuration parameter sets the maximum time the local checkpoint can be stalled before the LCP fragment scan watchdog shuts down the node. The default is 60 seconds, which provides backward compatibility with previous releases.
You can disable the LCP fragment scan watchdog by setting this parameter to 0. (Bug #16630410)
Added the ndb_error_reporter options
--connection-timeout, which makes it possible to set a timeout for connecting to nodes,
--dry-scp, which disables scp connections to remote hosts, and
--skip-nodegroup, which skips all nodes in a given node group. (Bug #16602002)
References: See also: Bug #11752792, Bug #44082.
START BACKUP, if
idhad already been used for a backup ID, an error caused by the duplicate ID occurred as expected, but following this, the
START BACKUPcommand never completed. (Bug #16593604, Bug #68854)
ndb_mgm treated backup IDs provided to
ABORT BACKUPcommands as signed values, so that backup IDs greater than 231 wrapped around to negative values. This issue also affected out-of-range backup IDs, which wrapped around to negative values instead of causing errors as expected in such cases. The backup ID is now treated as an unsigned value, and ndb_mgm now performs proper range checking for backup ID values greater than
MAX_BACKUPS(232). (Bug #16585497, Bug #68798)
When trying to specify a backup ID greater than the maximum allowed, the value was silently truncated. (Bug #16585455, Bug #68796)
The unexpected shutdown of another data node as a starting data node received its node ID caused the latter to hang in Start Phase 1. (Bug #16007980)
References: See also: Bug #18993037.
NDBreceive thread waited unnecessarily for additional job buffers to become available when receiving data. This caused the receive mutex to be held during this wait, which could result in a busy wait when the receive thread was running with real-time priority.
This fix also handles the case where a negative return value from the initial check of the job buffer by the receive thread prevented further execution of data reception, which could possibly lead to communication blockage or configured
ReceiveBufferMemoryunderutilization. (Bug #15907515)
When the available job buffers for a given thread fell below the critical threshold, the internal multi-threading job scheduler waited for job buffers for incoming rather than outgoing signals to become available, which meant that the scheduler waited the maximum timeout (1 millisecond) before resuming execution. (Bug #15907122)
lower_case_table_namesto 1 or 2 on Windows systems caused
ALTER TABLE ... ADD FOREIGN KEYstatements against tables with names containing uppercase letters to fail with Error 155, No such table: '(null)'. (Bug #14826778, Bug #67354)
Under some circumstances, a race occurred where the wrong watchdog state could be reported. A new state name
Packing Send Buffersis added for watchdog state number 11, previously reported as
Unknown place. As part of this fix, the state numbers for states without names are always now reported in such cases. (Bug #14824490)
When a node fails, the Distribution Handler (
DBDIHkernel block) takes steps together with the Transaction Coordinator (
DBTC) to make sure that all ongoing transactions involving the failed node are taken over by a surviving node and either committed or aborted. Transactions taken over which are then committed belong in the epoch that is current at the time the node failure occurs, so the surviving nodes must keep this epoch available until the transaction takeover is complete. This is needed to maintain ordering between epochs.
A problem was encountered in the mechanism intended to keep the current epoch open which led to a race condition between this mechanism and that normally used to declare the end of an epoch. This could cause the current epoch to be closed prematurely, leading to failure of one or more surviving data nodes. (Bug #14623333, Bug #16990394)
LongMessageBuffermemory under heavy load could cause data nodes running ndbmtd to fail. (Bug #14488185)
When using dynamic listening ports for accepting connections from API nodes, the port numbers were reported to the management server serially. This required a round trip for each API node, causing the time required for data nodes to connect to the management server to grow linearly with the number of API nodes. To correct this problem, each data node now reports all dynamic ports at once. (Bug #12593774)
ndb_error-reporter did not support the
--helpoption. (Bug #11756666, Bug #48606)
References: See also: Bug #11752792, Bug #44082.
Formerly, the node used as the coordinator or leader for distributed decision making between nodes (also known as the
DICTmanager—see The DBDICT Block) was indicated in the output of the ndb_mgm client
SHOWcommand as the “master” node, although this node has no relationship to a master server in MySQL Replication. (It should also be noted that it is not necessary to know which node is the leader except when debugging
NDBCLUSTERsource code.) To avoid possible confusion, this label has been removed, and the leader node is now indicated in
SHOWcommand output using an asterisk (
*) character. (Bug #11746263, Bug #24880)
The matrix of values used for thread configuration when applying the setting of the
MaxNoOfExecutionThreadsconfiguration parameter has been improved to align with support for greater numbers of LDM threads. See Multi-Threading Configuration Parameters (ndbmtd), for more information about the changes. (Bug #75220, Bug #20215689)
Program execution failed to break out of a loop after meeting a desired condition in a number of internal methods, performing unneeded work in all cases where this occurred. (Bug #69610, Bug #69611, Bug #69736, Bug #17030606, Bug #17030614, Bug #17160263)
ABORT BACKUPin the ndb_mgm client (see Commands in the NDB Cluster Management Client) took an excessive amount of time to return (approximately as long as the backup would have required to complete, had it not been aborted), and failed to remove the files that had been generated by the aborted backup. (Bug #68853, Bug #17719439)
Note that converted character data is not checked to conform to any character set.
When performing such promotions, the only other sort of type conversion that can be performed at the same time is between character types and binary types.
NDB Cluster APIs: The
Event::setTable()method now supports a pointer or a reference to table as its required argument. If a null table pointer is used, the method now returns -1 to make it clear that this is what has occurred. (Bug #16329082)
Packaging: The MySQL Cluster installer for Windows provided a nonfunctional option to install debug symbols (contained in
*.pdbfiles). This option has been removed from the installer.Note
You can obtain the
*.pdbdebug files for a given MySQL Cluster release from the Windows
.ziparchive for the same release, such as
(Bug #16748308, Bug #69112)
mysql_upgrade failed when upgrading from MySQL Cluster NDB 7.1.26 to MySQL Cluster NDB 7.2.13 when it attempted to invoke a stored procedure before the
mysql.proctable had been upgraded. (Bug #16933405)
References: This issue is a regression of: Bug #16226274.
The planned or unplanned shutdown of one or more data nodes while reading table data from the
ndbinfodatabase caused a memory leak. (Bug #16932989)
DBDIHwas updating table checkpoint information subsequent to a node failure could lead to a data node failure. (Bug #16904469)
In certain cases, when starting a new SQL node, mysqld failed with Error 1427 Api node died, when SUB_START_REQ reached node. (Bug #16840741)
Failure to use container classes specific
NDBduring node failure handling could cause leakage of commit-ack markers, which could later lead to resource shortages or additional node crashes. (Bug #16834416)
Use of an uninitialized variable employed in connection with error handling in the
DBLQHkernel block could sometimes lead to a data node crash or other stability issues for no apparent reason. (Bug #16834333)
A race condition in the time between the reception of a
execNODE_FAILREPsignal by the
QMGRkernel block and its reception by the
DBTCkernel blocks could lead to data node crashes during shutdown. (Bug #16834242)
CLUSTERLOGcommand (see Commands in the NDB Cluster Management Client) caused ndb_mgm to crash on Solaris SPARC systems. (Bug #16834030)
On Solaris SPARC platforms, batched key access execution of some joins could fail due to invalid memory access. (Bug #16818575)
NDBtables had foreign key references to each other, it was necessary to drop the tables in the same order in which they were created. (Bug #16817928)
The duplicate weedout algorithm introduced in MySQL 5.6 evaluates semi-joins such as subqueries using
IN) by first performing a normal join between the outer and inner table which may create duplicates of rows form the outer (and inner) table and then removing any duplicate result rows from the outer table by comparing their primary key values. Problems could arise when
VARCHARvalues using their maximum length, resulting in a binary key image which contained garbage past the actual lengths of the
VARCHARvalues, which meant that multiple instances of the same key were not binary-identical as assumed by the MySQL server.
To fix this problem,
NDBnow zero-pads such values to the maximum length of the column so that copies of the same key are treated as identical by the weedout process. (Bug #16744050)
DROP DATABASEfailed to work correctly when executed against a database containing
NDBtables joined by foreign key constraints (and all such tables being contained within this database), leaving these tables in place while dropping the remaining tables in the database and reporting failure. (Bug #16692652, Bug #69008)
optimizer_switchsystem variable, pushed joins could return too many rows. (Bug #16664035)
A variable used by the batched key access implementation was not initialized by
NDBas expected. This could cause a “batch full” condition to be reported after only a single row had been batched, effectively disabling batching altogether and leading to an excessive number of round trips between mysqld and
NDB. (Bug #16485658)
When started with
--initialand an invalid
-f) option, ndb_mgmd removed the old configuration cache before verifying the configuration file. Now in such cases, ndb_mgmd first checks for the file, and continues with removing the configuration cache only if the configuration file is found and is valid. (Bug #16299289)
Creating more than 32 hash maps caused data nodes to fail. Usually new hashmaps are created only when performing reorganzation after data nodes have been added or when explicit partitioning is used, such as when creating a table with the
MAX_ROWSoption, or using
PARTITION BY KEY() PARTITIONS. (Bug #14710311)
foreign_key_checks = 0had no effect on the handling of
NDBtables. Now, doing so causes such checks of foreign key constraints to be suspended—that is, has the same effect on
NDBtables as it has on
InnoDBtables. (Bug #14095855, Bug #16286309)
References: See also: Bug #16286164.
NDB Disk Data: The statements
ALTER LOGFILE GROUP, and
ALTER TABLESPACEfailed with a syntax error when
INITIAL_SIZEwas specified using letter abbreviations such as
G. In addition,
CREATE LOGFILE GROUPfailed when
UNDO_BUFFER_SIZE, or both options were specified using letter abbreviations. (Bug #13116514, Bug #16104705, Bug #62858)
NDB Cluster APIs: For each log event retrieved using the MGM API, the log event category (
ndb_mgm_event_category) was simply cast to an
enumtype, which resulted in invalid category values. Now an offset is added to the category following the cast to ensure that the value does not fall out of the allowed range.Note
This change was reverted by the fix for Bug #18354165. See the MySQL Cluster API Developer documentation for
ndb_logevent_get_next(), for more information.
References: See also: Bug #18354165.
NDB Cluster APIs: The
Ndb::computeHash()API method performs a
malloc()if no buffer is provided for it to use. However, it was assumed that the memory thus returned would always be suitably aligned, which is not always the case. Now when
malloc()provides a buffer to this method, the buffer is aligned after it is allocated, and before it is used. (Bug #16484617)
Based on MySQL Server 5.6
Important Change: MySQL Cluster SQL nodes are now based on MySQL Server 5.6. For information about feature additions and other changes made in MySQL 5.6, see What Is New in MySQL 5.6.
The mysqld binary provided with MySQL Cluster NDB 7.3.1 is based on MySQL Server 5.6.10, and includes all MySQL Server 5.6 feature enhancements and bug fixes found in that release; see Changes in MySQL 5.6.10 (2013-02-05, General Availability), for information about these.
MySQL Cluster GUI Configuration Wizard
Important Change: The MySQL Cluster distribution now includes a browser-based graphical configuration wizard that assists the user in configuring and deploying a MySQL Cluster. This deployment can consist of an arbitrary number of nodes (within certain limits) on the user machine only, or include nodes distributed on a local network. The wizard can be launched from the command line (using the ndb_setup utility now included in the binary distribution) or a desktop file browser.
For more information about this tool, see The NDB Cluster Auto-Installer.
Support for Foreign Key Constraints
Important Change: MySQL Cluster now supports foreign key constraints between
NDBtables, including support for
SET NULL, and
NO ACTIONreference options for
UPDATEactions. (MySQL currently does not support
MySQL requires generally that all child and parent tables in foreign key relationships employ the same storage engine; thus, to use foreign keys with MySQL Cluster tables, the child and parent table must each use the
NDBstorage engine. (It is not possible, for example, for a foreign key on an
NDBtable to reference an index of an
Note that MySQL Cluster tables that are explicitly partitioned by
LINEAR KEYmay contain foreign key references or be referenced by foreign keys (or both). This is unlike the case with
InnoDBtables that are user partitioned, which may not have any foreign key relationships.
You can create an
NDBtable having a foreign key reference on another
CREATE TABLE ... [CONSTRAINT] FOREIGN KEY ... REFERENCES. A child table's foreign key definitions can be seen in the output of
SHOW CREATE TABLE; you can also obtain information about foreign keys by querying the
ndbadapter, which uses the NDB API to provide high-performance native access to MySQL Cluster; and the
mysql-jsadapter, which uses the MySQL Server and the
node-mysqldriver available from https://github.com/felixge/node-mysql/ .
node-mysqldriver is also required for the
Functionality Added or Changed
Important Change: The behavior of and values used for the
TCP_SND_BUF_SIZETCP configuration parameters have been improved. Formerly, the default values for these parameters were 70080 and 71540, respectively—which it was later found could lead to excessive timeouts in some circumstances—with the minimum for each of them being 1. Now, the default and recommended value is 0 for both
TCP_SND_BUF_SIZE, which allows the operating system or platform to choose the send or receive buffer size for TCP sockets. (Bug #14554519)
References: See also: Bug #14168828.
NDB Cluster APIs: Added
DUMPcode 2514, which provides information about counts of transaction objects per API node. For more information, see DUMP 2514. See also Commands in the NDB Cluster Management Client. (Bug #15878085)
When ndb_restore fails to find a table, it now includes in the error output an NDB API error code giving the reason for the failure. (Bug #16329067)
Data node logs now provide tracking information about arbitrations, including which nodes have assumed the arbitrator role and at what times. (Bug #11761263, Bug #53736)
API: mysqld failed to respond when
mysql_shutdown()was invoked from a C application, or mysqladmin
shutdownwas run from the command line. (Bug #14849574)
When an update of an
NDBtable changes the primary key (or part of the primary key), the operation is executed as a delete plus an insert. In some cases, the initial read operation did not retrieve all column values required by the insert, so that another read was required. This fix ensures that all required column values are included in the first read in such cases, which saves the overhead of an additional read operation. (Bug #16614114)
Pushed joins executed when
optimizer_switch='batched_key_access=on'was also in use returned incorrect results. (Bug #16437431)
Selecting from the
INFORMATION_SCHEMA.KEY_COLUMN_USAGEtable while using tables with foreign keys caused mysqld to crash. (Bug #16246874, Bug #68224)
Including a table as a part of a pushed join should be rejected if there are outer joined tables in between the table to be included and the tables with which it is joined with; however the check as performed for any such outer joined tables did so by checking the join type against the root of the pushed query, rather than the common ancestor of the tables being joined. (Bug #16199028)
References: See also: Bug #16198866.
Some queries were handled differently with
ndb_join_pushdownenabled, due to the fact that outer join conditions were not always pruned correctly from joins before they were pushed down. (Bug #16198866)
References: See also: Bug #16199028.
Attempting to perform additional operations such as
ADD COLUMNas part of an
ALTER [ONLINE | OFFLINE] TABLE ... RENAME ...statement is not supported, and now fails with an ER_NOT_SUPPORTED_YET error. (Bug #16021021)
Purging the binary logs could sometimes cause mysqld to crash. (Bug #15854719)