Important Change; Packaging: Naming and organization of the RPMs provided for MySQL NDB Cluster have been changed to align with those released for the MySQL server. All MySQL NDB Cluster RPM package names are now prefixed with
mysql-cluster. Data nodes are now installed using the
data-nodepackage; management nodes are now installed from the
management-serverpackage; and SQL nodes require the
commonpackages. Important: SQL nodes must use the
mysql-clusterversion of these RPMs; the versions released for the standard MySQL server do not provide support for the
NDBstorage engine. All client programs, including both the mysql client and the ndb_mgm management client, are now included in the
For more information, see Installing NDB Cluster from RPM.
Important Change: Added the
FOR_RA_BY_LDM_X_4, which can be used to set the number of partitions used by each local data manager to two, three, and four partitions, respectively, in addition to
FOR_RA_BY_LDM, which sets this number to one.
MAX_ROWSfor setting the number of partitions used by
NDBtables is now deprecated and subject to removal in a future MySQL NDB Cluster version.
For more information, see Setting NDB_TABLE Options. (Bug #81759, Bug #23544301)
--print-sql-logoption for the ndb_restore program included with the MySQL NDB Cluster distribution. This option causes the program to log SQL statements to
Note that each table being restored in this fashion must have an explicitly defined primary key; the hidden primary key implemented by the
NDBstorage engine is not sufficient for this purpose. (Bug #13511949)
For fully replicated tables, ndb_desc shows only nodes holding main fragment replicas for partitions; nodes with copy fragment replicas only are ignored. To make this information available in the mysql client, several new tables have been introduced in the
ndbinfoinformation database. These tables are listed here, with brief descriptions:
dict_obj_infoprovides the names and types of database (
DICT) objects in
NDB, such as tables and indexes, as well as information about parent objects where applicable
NDBtable distribution status information
table_fragmentsprovides information about the distribution of
table_infoprovides information about logging, checkpointing, storage, and other options in force for each
table_replicasprovides information about fragment replicas
For more information, see ndbinfo: The NDB Cluster Information Database. (Bug #81762, Bug #23547643)
Important Change: The default value of the
--ndb-default-column-formatserver option has been changed from
FIXED. This has been done for backwards compatibility. Only the default has been changed; setting this option to
DYNAMICcontinues to cause
DYNAMICto be used for
COLUMN_FORMATunless overridden. (Bug #24487363)
Important Change: Event buffer status reporting has been improved by altering the semantics for calculating lag or slippage. Rather than defining this lag as the number of epochs behind, lag is now taken as the number of epochs completely buffered in the event buffer, but not yet consumed by the binlog injector thread. As part of this work, the default value for the
ndb_report_thresh_binlog_epoch_slipsystem variable has been increased from 3 to 10. For more information, see the description of this variable in the documentation, as well as Event Buffer Reporting in the Cluster Log. (Bug #22916457)
References: See also: Bug #22901309.
NDB Cluster APIs: Reuse of transaction IDs could occur when
Ndbobjects were created and deleted concurrently. As part of this fix, the NDB API methods
unlock_ndb_objectsare now declared as
const. (Bug #23709232)
NDB Cluster APIs: When the management server was restarted while running an MGM API application that continuously monitored events, subsequent events were not reported to the application, with timeouts being returned indefinitely instead of an error.
This occurred because sockets for event listeners were not closed when restarting mgmd. This is fixed by ensuring that event listener sockets are closed when the management server shuts down, causing applications using functions such as
ndb_logevent_get_next()to receive a read error following the restart. (Bug #19474782)
NDB Cluster APIs: To process incoming signals, a thread which wants to act as a receiver must acquire polling rights from the transporter layer. This can be requested and assigned to a separate receiver thread, or each client thread can take the receiver role when it is waiting for a result.
When the thread acting as poll owner receives a sufficient amount of data, it releases locks on any other clients taken while delivering signals to them. This could make them runnable again, and the operating system scheduler could decide that it was time to wake them up, which happened at the expense of the poll owner threads, which were in turn excluded from the CPU while still holding polling rights on it. After this fix, polling rights are released by a thread before unlocking and signalling other threads. This makes polling rights available for other threads that are actively executing on this CPU.
This change increases concurrency when polling receiver data, which should also reduce latency for clients waiting to be woken up. (Bug #83129, Bug #24716756)
NDB Cluster APIs:
libmysqlclientexported conflicting symbols, resulting in a segmentation fault in debug builds on Linux. To fix this issue, the conflicting symbols in
libndbclient.soare no longer publicly visible. Due to this change, the version number for
libndbclient.sohas been raised from 6.0.0 to 6.1.0. (Bug #83093, Bug #24707521)
References: See also: Bug #80352, Bug #22722555.
NDB Cluster APIs: When
NDBschema object ownership checks are enabled by a given
NdbTransaction, objects used by this transaction are checked to make sure that they belong to the
NdbDictionaryowned by this connection. An attempt to create a
NdbIndexScanOperationon a table or index not belonging to the same connection fails.
This fix corrects a resource leak which occurred when the operation object to be created was allocated before checking schema object ownership and subsequently not released when the object creation failed. (Bug #81949, Bug #23623978)
References: See also: Bug #81945, Bug #23623251.
NDB Cluster APIs: NDB API objects are allocated in the context of an
Ndbobject, or of an
NdbTransactionobject which is itself owned by an
Ndbobject. When a given
Ndbobject is destroyed, all remaining
NdbTransactionobjects are terminated, and all NDB API objects related to this
Ndbobject should be released at this time as well. It was found, when there remained unclosed NdbTransaction objects when their parent
Ndbobject was destroyed, leaks of objects allocated from the
NdbTransactionobjects could occur. (However, the
NdbTransactionobjects themselves did not leak.)
While it is advisable (and, indeed, recommended) to close an
NdbTransactionexplicitly as soon as its lifetime ends, the destruction of the parent
Ndbobject should be sufficient to release whatever objects are dependent on it. Now in cases such as described previously, the
Ndbdestructor checks to ensure that all objects derived from a given
Ndbinstance are truly released. (Bug #81945, Bug #23623251)
NDB Cluster APIs: The term “fragment count type” has been superceded by “partition balance”. This change affects
NDBtables as well as in the NDB API. In
NDB_TABLEtable option syntax, the
FRAGMENT_COUNT_TYPEkeyword is replaced with
PARTITION_BALANCE. In the NDB API, the
setFragmentCountType()have been renamed to
getFragmentCountTypeString()is renamed to
getPartitionBalanceString(). In addition,
Object::FragmentCountTypehas been renamed to
PartitionBalance, and the names of its enumerated values have been updated to be consistent with the new nomenclature.
For more information on how these changes affect NDB API applications, see the indicated
Objectmember descriptions. For more information on the SQL-level changes made as part of this fix, Setting NDB_TABLE Options. (Bug #81761, Bug #23547525)
References: See also: Bug #83147, Bug #24733331.
NDB Cluster APIs: In some of the NDB API example programs included with the MySQL NDB Cluster distribution,
ndb_end()was called prior to calling the
Ndb_cluster_connectiondestructor. This caused a segmentation fault in debug builds on all platforms. The example programs affected have also been extensively revised and refactored. See NDB API Examples, for more information. (Bug #80352, Bug #22722555)
References: See also: Bug #83093, Bug #24707521.
If more than 4096 seconds elapsed while calculating an internal
NdbDuration::microSec()value, this could cause an assert warning that the calculation would overflow. We fix this to avoid any overflow or precision loss when converting from the internal “tick” format to microseconds and nanoseconds, by performing the calculation in two parts corresponding to seconds and fractions of a second. (Bug #24695026)
The serial commit protocol—which commits each operation at each replica individually and serially, and is used by the
DBTCkernel block (see The DBTC Block) for takeover and when a transaction is judged to have timed out during the
COMPLETEphase—had no support for
LATE_COMMIT, which is required for the
FULLY_REPLICATEDprotocols. (Bug #24681305)
In some cases,
ALTER TABLE ... REORGANIZE PARTITIONcould lead to an unplanned shutdown of the cluster. This was due to the fact that, for fully replicated tables. the log part ID was assumed to be the same as the partition ID. This worked when
FOR_RA_BY_LDMwas used, but not necessarily for the other partition balancing types. (Bug #24610551)
ALGORITHM=INPLACEwhen changing any of a table's
NDB_TABLEproperties (see Setting NDB_TABLE Options) caused the server to fail. (Bug #24584741)
Case-insensitivity of keywords such as
NDB_TABLEcomments was not honored. (Bug #24577931)
A number of dependencies between the binlog injector thread and the
NDButility thread—a recurring source of synchronization and other problems—were removed. The principal changes are listed here:
Moved the setup of binlog injector structures from the utility thread to the injector thread itself.
Removed sharing of some utility and injector thread structures between these threads.
Moved stopping of the utility thread from the injector thread into a common block in which other such threads are stopped.
Removed a number of hacks required by the previous design.
Removed some injector mutex locking and injector condition signaling which were made obsolete by the changes already listed.
References: See also: Bug #22204186.
A late commit
ACKsignal used for
READ_BACKUPtables caused the associated
ApiConnectionRecordto have an invalid state. (Bug #24459817)
References: See also: Bug #24444861.
Added missing error information for a failure occurring when tables on disk became full. (Bug #24425373)
When ndbmtd crashed, the resulting error log incorrectly specified the name of the trace for thread 0, appending the nonexistent suffix
_t0to the file name. (Bug #24353408)
Passing a nonexistent node ID to
CREATE NODEGROUPled to random data node failures. (Bug #23748958)
DROP TABLEfollowed by a node shutdown and subesequent master takeover—and with the containing local checkpoint not yet complete prior to the takeover—caused the LCP to be ignored, and in some cases, the data node to fail. (Bug #23735996)
References: See also: Bug #23288252.
Removed an invalid assertion to the effect that all cascading child scans are closed at the time API connection records are released following an abort of the main transaction. The assertion was invalid because closing of scans in such cases is by design asynchronous with respect to the main transaction, which means that subscans may well take some time to close after the main transaction is closed. (Bug #23709284)
Although arguments to the
DUMPcommand are 32-bit integers, ndb_mgmd used a buffer of only 10 bytes when processing them. (Bug #23708039)
READ_BACKUPsetting was not honored when performing scans on
BLOBtables. (Bug #23703536)
READ_BACKUPsetting was not applied to unique indexes. (Bug #23702848)
DBSPJread primary fragment replicas for tables with
READ_BACKUP(see Setting NDB_TABLE Options), even when a local fragment was available. (Bug #23633848)
ALL REPORT MemoryUsageproduced incorrect output when fully replicated tables were in use. (Bug #23539805)
Ordered indexes did not inherit
READ_BACKUP(see Setting NDB_TABLE Options) from an indexed table, which meant that ordered index scans continued to be routed to only to primary fragment replicas and never to backup fragment replicas.
DBDICTsets this property on ordered indexes from the table property when it distributes this information to instances of
DBSPJ. (Bug #23522027)
Updates to a table containing a virtual column could cause the binary logging thread to fail. (Bug #23514050)
A number of potential buffer overflow issues were found and fixed in the
NDBcodebase. (Bug #23152979)
During an online upgrade from a MySQL NDB Cluster 7.3 release to an NDB 7.4 (or later) release, the failures of several data nodes running the lower version during local checkpoints (LCPs), and just prior to upgrading these nodes, led to additional node failures following the upgrade. This was due to lingering elements of the
EMPTY_LCPprotocol initiated by the older nodes as part of an LCP-plus-restart sequence, and which is no longer used in NDB 7.4 and later due to LCP optimizations implemented in those versions. (Bug #23129433)
SIGNAL_DROPPED_REPhandler invoked in response to long message buffer exhaustion was defined in the
SPJkernel block, but not actually used. This meant that the default handler from
SimulatedBlockwas used instead in such cases, which shut down the data node. (Bug #23048816)
References: See also: Bug #23251145, Bug #23251423.
When a data node has insufficient redo buffer during a system restart, it does not participate in the restart until after the other nodes have started. After this, it performs a takeover of its fragments from the nodes in its node group that have already started; during this time, the cluster is already running and user activity is possible, including DML and DDL operations.
During a system restart, table creation is handled differently in the
DIHkernel block than normally, as this creation actually consists of reloading table definition data from disk on the master node. Thus,
DIHassumed that any table creation that occurred before all nodes had restarted must be related to the restart and thus always on the master node. However, during the takeover, table creation can occur on non-master nodes due to user activity; when this happened, the cluster underwent a forced shutdown.
Now an extra check is made during system restarts to detect in such cases whether the executing node is the master node, and use that information to determine whether the table creation is part of the restart proper, or is taking place during a subsequent takeover. (Bug #23028418)
ndb_restore set the
MAX_ROWSattribute for a table for which it had not been set prior to taking the backup. (Bug #22904640)
Whenever data nodes are added to or dropped from the cluster, the
NDBkernel's Event API is notified of this using a
SUB_GCP_COMPLETE_REPsignal with either the
ADD(add) flag or
SUB(drop) flag set, as well as the number of nodes to add or drop; this allows
NDBto maintain a correct count of
SUB_GCP_COMPLETE_REPsignals pending for every incomplete bucket. In addition to handling the bucket for the epoch associated with the addition or removal, it must also compensate for any later incomplete buckets associated with later epochs. Although it was possible to complete such buckets out of order, there was no handling of these, leading a stall in to event reception.
This fix adds detection and handling of such out of order bucket completion. (Bug #20402364)
References: See also: Bug #82424, Bug #24399450.
When performing online reorganization of tables, unique indexes were not included in the reorganization. (Bug #13714258)
Under very high loads, neither the receive thread nor any user thread had sufficient capacity to handle poll ownership properly. This meant that, as the load and the number of active thread increased, it became more difficult to sustain throughput. This fix attempts to increase the priority of the receive thread and retains poll ownership if successful.
This fix requires sufficient permissions to be enabled. On Linux systems, this means ensuring that either the data node binary or the user it runs as has permission to change the nice level. (Bug #83217, Bug #24761073)
When restoring a backup taken from a database containing tables that had foreign keys, ndb_restore disabled the foreign keys for data, but not for the logs. (Bug #83155, Bug #24736950)
Local reads of unique index and blob tables did not work correctly for fully replicated tables using more than one node group. (Bug #83016, Bug #24675602)
The effects of an
ALTER TABLEstatement changing a table to use
READ_BACKUPwere not preserved after a restart of the cluster. (Bug #82812, Bug #24570439)
PARTITION_BALANCEdid not work with fully replicated tables. (Bug #82801, Bug #24565265)
READ_BACKUPsettings were not propagated to internal blob tables. (Bug #82788, Bug #24558232)
The count displayed by the
c_execcolumn in the
ndbinfo.threadstattable was incomplete. (Bug #82635, Bug #24482218)
The internal function
ndbcluster_binlog_wait(), which provides a way to make sure that all events originating from a given thread arrive in the binary log, is used by
SHOW BINLOG EVENTSas well as when resetting the binary log. This function waits on an injector condition while the latest global epoch handled by
NDBis more recent than the epoch last committed in this session, which implies that this condition must be signalled whenever the binary log thread completes and updates a new latest global epoch. Inspection of the code revealed that this condition signalling was missing, and that, instead of being awakened whenever a new latest global epoch completes (~100ms), client threads waited for the maximum timeout (1 second).
This fix adds the missing injector condition signalling, while also changing it to a condition broadcast to make sure that all client threads are alerted. (Bug #82630, Bug #24481551)
Fully replicated internal foreign key or unique index triggers could fire multiple times, which led to aborted transactions for an insert or a delete operation. This happened due to redundant deferred constraint triggers firing during pre-commit. Now in such cases, we ensure that only triggers specific to unique indexes are fired in this stage. (Bug #82570, Bug #24454378)
Backups potentially could fail when using fully replicated tables due to their high usage (and subsequent exhaustion) of internal trigger resources. To compensate for this, the amount of memory reserved in the
NDBkernel for internal triggers has been increased, and is now based in part on the maximum number of tables. (Bug #82569, Bug #24454262)
References: See also: Bug #23539733.
NDBkernel, an incorrect check of state led in some cases to failure handling when no failure had actually occurred. (Bug #82568, Bug #24454093)
References: See also: Bug #23539733.
When returning from
LQHKEYREQwith failure in
LQHKEYREFin an internal trigger operation, no check was made as to whether the trigger was fully replicated, so that those triggers that were fully replicated were never handled. (Bug #82566, Bug #24453949)
References: See also: Bug #23539733.
READ_BACKUPhad not previously been set, then was set to 1 as part of an
ALTER TABLE ... ALGORITHM=INPLACEstatement, the change was not propagated to internal unique index tables or
BLOBtables. (Bug #82491, Bug #24424459)
Distribution of MySQL privileges was incomplete due to the failure of the
mysql_cluster_move_privileges()procedure to convert the
NDB. The root cause of this was an
ALTER TABLE ... ENGINE NDBstatement which sometimes failed when this table contained illegal
TIMESTAMPvalues. (Bug #82464, Bug #24430209)
The internal variable
m_max_warning_levelwas not initialized in
storage/ndb/src/kernel/blocks/thrman.cpp. This sometimes led to node failures during a restart when the uninitialized value was treated as 0. (Bug #82053, Bug #23717703)
Usually, when performing a system restart, all nodes are restored from redo logs and local checkpoints (LCPs), but in some cases some node might require a copy phase before it is finished with the system restart. When this happens, the node in question waits for all other nodes to start up completely before performing the copy phase. Notwithstanding the fact that it is thus possible to begin a local checkpoint before reaching start phase 4 in the
DBDIHblock, LCP status was initialized to
IDLEin all cases, which could lead to a node failure. Now, when performing this variant of a system restart, the LCP status is no longer initialized. (Bug #82050, Bug #23717479)
After adding a new node group online and executing
ALTER TABLE ... ALGORITHM=INPLACE REORGANIZE PARTITION, partition IDs were not set correctly for new fragments.
In a related change done as part of fixing this issue, ndb_desc
-pnow displays rows relating to partitions in order of partition ID. (Bug #82037, Bug #23710999)
STOP BACKUPit is possible sometimes that a few bytes are written to the backup data file before the backup process actually terminates. When using
ODIRECT, this resulted in the wrong error code being returned. Now in such cases, nothing is written to
O_DIRECTfiles unless the alignment is correct. (Bug #82017, Bug #23701911)
When transaction coordinator (TC) connection records were used up, it was possible to handle scans only for local checkpoints and backups, so that operations coming from the
ALTER TABLE ... REORGANIZE PARTITIONand other operations that reorganize metadata—were unnecessarily blocked. In addition, such operations were not always retried when TC records were exhausted. To fix this issue, a number of operation records are now earmarked for
DBUTILusage, as well as for LCP and backup usage so that these operations are also not negatively impacted by operations coming from
For more information, see The DBUTIL Block. (Bug #81992, Bug #23642198)
Operations performing multiple updates of the same row within the same transaction could sometimes lead to corruption of lengths of page entries. (Bug #81938, Bug #23619031)
During a node restart, a fragment can be restored using information obtained from local checkpoints (LCPs); up to 2 restorable LCPs are retained at any given time. When an LCP is reported to the
DIHkernel block as completed, but the node fails before the last global checkpoint index written into this LCP has actually completed, the latest LCP is not restorable. Although it should be possible to use the older LCP, it was instead assumed that no LCP existed for the fragment, which slowed the restart process. Now in such cases, the older, restorable LCP is used, which should help decrease long node restart times. (Bug #81894, Bug #23602217)
NDBno longer retries a global schema lock if this has failed due to a timeout (default 3000ms) and there is the potential for this lock request to participate in a metadata lock-global schema lock deadlock. Now in such cases it selects itself as a “victim”, and returns the decision to the requestor of the metadata lock, which then handles the request as a failed lock request (preferable to remaining deadlocked indefinitely), or, where a deadlock handler exists, retries the metadata lock-global schema lock. (Bug #81775, Bug #23553267)
Two issues were found in the implementation of hash maps—used by
NDBfor mapping a table row's hash value to a partition—for fully replicated tables:
Hash maps were selected based on the number of fragments rather than the number of partitions. This was previously undetected due to the fact that, for other kinds of tables, these values are always the same.
The hash map was employed as a partition-to-partition map, using the table row's hash value modulus the partition count as input.
This fix addresses both of the problems just described. (Bug #81757, Bug #23544220)
References: See also: Bug #81761, Bug #23547525, Bug #23553996.
Using mysqld together with
--ndbclusterled to problems later when attempting to use mysql_upgrade. When running with
--initialize, the server does not require
NDBsupport, and having it enabled can lead to issues with
ndbinfotables. To prevent this from happening, using the
--initializeoption now causes mysqld to ignore the
--ndbclusteroption if the latter is also specified.
This issue affects upgrades from MySQL NDB Cluster 7.5.2 or 7.5.3 only. In cases where such upgrades fail for the reasons outlined previously, you can work around the issue by deleting all
.frmfiles in the
data/ndbinfodirectory following a rolling restart of the entire cluster, then running mysql_upgrade. (Bug #81689, Bug #23518923)
References: See also: Bug #82724, Bug #24521927.
While a mysqld was waiting to connect to the management server during initialization of the
NDBhandler, it was not possible to shut down the mysqld. If the mysqld was not able to make the connection, it could become stuck at this point. This was due to an internal wait condition in the utility and index statistics threads that could go unmet indefinitely. This condition has been augmented with a maximum timeout of 1 second, which makes it more likely that these threads terminate themselves properly in such cases.
In addition, the connection thread waiting for the management server connection performed 2 sleeps in the case just described, instead of 1 sleep, as intended. (Bug #81585, Bug #23343673)
ALTER TABLE ... ALGORITHM=INPLACEon a fully replicated table did not copy the associated trigger ID, leading to a failure in the
DBDICTkernel block. (Bug #81544, Bug #23330359)
The list of deferred tree node lookup requests created when preparing to abort a
DBSPJrequest were not cleared when this was complete, which could lead to deferred operations being started even after the
DBSPJrequest aborted. (Bug #81355, Bug #23251423)
References: See also: Bug #23048816.
Error and abort handling in
Dbspj::execTRANSID_AI()was implemented such that its
abort()method was called before processing of the incoming signal was complete. Since this method sends signals to the LDM, this partly overwrote the contents of the signal which was later required by
execTRANSID_AI(). This could result in aborted
DBSPJrequests cleaning up their allocated resources too early, or not at all. (Bug #81353, Bug #23251145)
References: See also: Bug #23048816.
The read backup feature added in MySQL NDB Cluster 7.5.2 that makes it possible to read from backup replicas was not used for reads with lock, or for reads of
BLOBtables or unique key tables where locks were upgraded to reads with lock. Now the
SCAN_TABREQsignals use a flag to convey information about such locks making it possible to read from a backup replica when a read lock was upgraded due to being the read of the base table for a BLOB table, or due to being the read for a unique key. (Bug #80861, Bug #23001841)
Primary replicas of partitioned tables were not distributed evenly among node groups and local data managers.
As part of the fix for this issue, the maximum number of node groups supported for a single MySQL NDB Cluster, which was previously not determined, is now set at 48 (
MAX_NDB_NODE_GROUPS). (Bug #80845, Bug #22996305)
Several object constructors and similar functions in the
NDBcodebase did not always perform sanity checks when creating new instances. These checks are now performed under such circumstances. (Bug #77408, Bug #21286722)
An internal call to
malloc()was not checked for
NULL. The function call was replaced with a direct write. (Bug #77375, Bug #21271194)