This section contains unified change history highlights for all
MySQL Cluster releases based on version 7.1 of the
NDB storage engine through MySQL
Cluster NDB 7.1.37. Included are all changelog
entries in the categories MySQL Cluster,
Disk Data, and Cluster
For an overview of features that were added in MySQL Cluster NDB 7.1, see What is New in MySQL Cluster NDB 7.1.
- Changes in MySQL Cluster NDB 7.1.35 (5.1.73-ndb-7.1.35)
- Changes in MySQL Cluster NDB 7.1.34 (5.1.73-ndb-7.1.34)
- Changes in MySQL Cluster NDB 7.1.33 (5.1.73-ndb-7.1.33)
- Changes in MySQL Cluster NDB 7.1.32 (5.1.73-ndb-7.1.32)
- Changes in MySQL Cluster NDB 7.1.31 (5.1.73-ndb-7.1.31)
- Changes in MySQL Cluster NDB 7.1.30 (5.1.73-ndb-7.1.30)
- Changes in MySQL Cluster NDB 7.1.29 (5.1.72-ndb-7.1.29)
- Changes in MySQL Cluster NDB 7.1.28 (5.1.70-ndb-7.1.28)
- Changes in MySQL Cluster NDB 7.1.27 (5.1.69-ndb-7.1.27)
- Changes in MySQL Cluster NDB 7.1.26 (5.1.67-ndb-7.1.26)
- Changes in MySQL Cluster NDB 7.1.25 (5.1.66-ndb-7.1.25)
- Changes in MySQL Cluster NDB 7.1.24 (5.1.63-ndb-7.1.24)
- Changes in MySQL Cluster NDB 7.1.23 (5.1.63-ndb-7.1.23)
- Changes in MySQL Cluster NDB 7.1.22 (5.1.61-ndb-7.1.22)
- Changes in MySQL Cluster NDB 7.1.21 (5.1.61-ndb-7.1.21)
- Changes in MySQL Cluster NDB 7.1.20 (5.1.61-ndb-7.1.20)
- Changes in MySQL Cluster NDB 7.1.19 (5.1.56-ndb-7.1.19)
- Changes in MySQL Cluster NDB 7.1.18 (5.1.56-ndb-7.1.18)
- Changes in MySQL Cluster NDB 7.1.17 (5.1.56-ndb-7.1.17)
- Changes in MySQL Cluster NDB 7.1.16 (5.1.56-ndb-7.1.16)
- Changes in MySQL Cluster NDB 7.1.15a (5.1.56-ndb-7.1.15a)
- Changes in MySQL Cluster NDB 7.1.15 (5.1.56-ndb-7.1.15)
- Changes in MySQL Cluster NDB 7.1.14 (5.1.56-ndb-7.1.14)
- Changes in MySQL Cluster NDB 7.1.13 (5.1.56-ndb-7.1.13)
- Changes in MySQL Cluster NDB 7.1.12 (5.1.51-ndb-7.1.12)
- Changes in MySQL Cluster NDB 7.1.11 (5.1.51-ndb-7.1.11)
- Changes in MySQL Cluster NDB 7.1.10 (5.1.51-ndb-7.1.10)
- Changes in MySQL Cluster NDB 7.1.9a (5.1.51-ndb-7.1.9a)
- Changes in MySQL Cluster NDB 7.1.9 (5.1.51-ndb-7.1.9)
- Changes in MySQL Cluster NDB 7.1.8 (5.1.47-ndb-7.1.8)
- Changes in MySQL Cluster NDB 7.1.7 (5.1.47-ndb-7.1.7)
- Changes in MySQL Cluster NDB 7.1.6 (5.1.47-ndb-7.1.6)
- Changes in MySQL Cluster NDB 7.1.5 (5.1.47-ndb-7.1.5)
- Changes in MySQL Cluster NDB 7.1.4b (5.1.44-ndb-7.1.4b)
- Changes in MySQL Cluster NDB 7.1.4a (5.1.44-ndb-7.1.4a)
- Changes in MySQL Cluster NDB 7.1.4 (5.1.44-ndb-7.1.4)
- Changes in MySQL Cluster NDB 7.1.3 (5.1.44-ndb-7.1.3)
- Changes in MySQL Cluster NDB 7.1.2 (5.1.41-ndb-7.1.2)
- Changes in MySQL Cluster NDB 7.1.1 (5.1.41-ndb-7.1.1)
- Changes in MySQL Cluster NDB 7.1.0 (5.1.39-ndb-7.1.0)
This is a commercial release for supported customers only. The last Community release of MySQL Cluster 7.1 was MySQL Cluster NDB 7.1.34, and no further Community releases of MySQL Cluster NDB 7.1 are planned. Users of the Community Edition of MySQL Cluster NDB 7.1 should upgrade as soon as possible to the latest release series. Currently, this is MySQL Cluster NDB 7.4.
It was found during testing that problems could arise when the node registered as the arbitrator disconnected or failed during the arbitration process.
In this situation, the node requesting arbitration could never receive a positive acknowledgement from the registered arbitrator; this node also lacked a stable set of members and could not initiate selection of a new arbitrator.
Now in such cases, when the arbitrator fails or loses contact during arbitration, the requesting node immediately fails rather than waiting to time out. (Bug #20538179)
When a data node fails or is being restarted, the remaining nodes in the same nodegroup resend to subscribers any data which they determine has not already been sent by the failed node. Normally, when a data node (actually, the
SUMAkernel block) has sent all data belonging to an epoch for which it is responsible, it sends a
SUB_GCP_COMPLETE_REPsignal, together with a count, to all subscribers, each of which responds with a
SUMAreceives this acknowledgment from all subscribers, it reports this to the other nodes in the same nodegroup so that they know that there is no need to resend this data in case of a subsequent node failure. If a node failed before all subscribers sent this acknowledgement but before all the other nodes in the same nodegroup received it from the failing node, data for some epochs could be sent (and reported as complete) twice, which could lead to an unplanned shutdown.
The fix for this issue adds to the count reported by
SUB_GCP_COMPLETE_ACKa list of identifiers which the receiver can use to keep track of which buckets are completed and to ignore any duplicate reported for an already completed bucket. (Bug #17579998)
When reading and copying transporter short signal data, it was possible for the data to be copied back to the same signal with overlapping memory. (Bug #75930, Bug #20553247)
Cluster API: The increase in the default number of hashmap buckets (
DefaultHashMapSizeAPI node configuration parameter) from 240 to 3480 in MySQL Cluster NDB 7.2.11 increased the size of the internal
DictHashMapInfo::HashMaptype considerably. This type was allocated on the stack in some
getTable()calls which could lead to stack overflow issues for NDB API users.
To avoid this problem, the hashmap is now dynamically allocated from the heap. (Bug #19306793)
Cluster API: A scan operation, whether it is a single table scan or a query scan used by a pushed join, stores the result set in a buffer. This maximum size of this buffer is calculated and preallocated before the scan operation is started. This buffer may consume a considerable amount of memory; in some cases we observed a 2 GB buffer footprint in tests that executed 100 parallel scans with 2 single-threaded (ndbd) data nodes. This memory consumption was found to scale linearly with additional fragments.
A number of root causes, listed here, were discovered that led to this problem:
Result rows were unpacked to full
NdbRecordformat before they were stored in the buffer. If only some but not all columns of a table were selected, the buffer contained empty space (essentially wasted).
These issues became more evident in NDB 7.2 and later MySQL Cluster release series. This was due to the fact buffer size is scaled by
BatchSize, and that the default value for this parameter was increased fourfold (from 64 to 256) beginning with MySQL Cluster NDB 7.2.1.
This fix causes result rows to be buffered using the packed format instead of the unpacked format; a buffered scan result row is now not unpacked until it becomes the current row. In addition,
MaxScanBatchSizeare now used as limiting factors when calculating the required buffer size.
Also as part of this fix, refactoring has been done to separate handling of buffered (packed) from handling of unbuffered result sets, and to remove code that had been unused since NDB 7.0 or earlier. The
NdbRecordclass declaration has also been cleaned up by removing a number of unused or redundant member variables. (Bug #73781, Bug #75599, Bug #19631350, Bug #20408733)
This is the final Community release for MySQL Cluster NDB 7.1, and no further Community releases of MySQL Cluster NDB 7.1 are planned. Users of the Community Edition of MySQL Cluster NDB 7.1 should upgrade as soon as possible to the latest release series. Currently, this is MySQL Cluster NDB 7.4.
Online reorganization when using ndbmtd data nodes and with binary logging by mysqld enabled could sometimes lead to failures in the
DBLQHkernel blocks, or in silent data corruption. (Bug #19903481)
References: See also: Bug #19912988.
A watchdog failure resulted from a hang while freeing a disk page in
TUP_COMMITREQ, due to use of an uninitialized block variable. (Bug #19815044, Bug #74380)
Multiple threads crashing led to multiple sets of trace files being printed and possibly to deadlocks. (Bug #19724313)
When a client retried against a new master a schema transaction that failed previously against the previous master while the latter was restarting, the lock obtained by this transaction on the new master prevented the previous master from progressing past start phase 3 until the client was terminated, and resources held by it were cleaned up. (Bug #19712569, Bug #74154)
When a new data node started, API nodes were allowed to attempt to register themselves with the data node for executing transactions before the data node was ready. This forced the API node to wait an extra heartbeat interval before trying again.
To address this issue, a number of HA_ERR_NO_CONNECTION errors (Error 4009) that could be issued during this time have been changed to Cluster temporarily unavailable errors (Error 4035), which should allow API nodes to use new data nodes more quickly than before. As part of this fix, some errors which were incorrectly categorised have been moved into the correct categories, and some errors which are no longer used have been removed. (Bug #19524096, Bug #73758)
Queries against tables containing a CHAR(0) columns failed with ERROR 1296 (HY000): Got error 4547 'RecordSpecification has overlapping offsets' from NDBCLUSTER. (Bug #14798022)
When a bulk delete operation was committed early to avoid an additional round trip, while also returning the number of affected rows, but failed with a timeout error, an SQL node performed no verification that the transaction was in the Committed state. (Bug #74494, Bug #20092754)
References: See also: Bug #19873609.
ndb_restore failed while restoring a table which contained both a built-in conversion on the primary key and a staging conversion on a
During staging, a
BLOBtable is created with a primary key column of the target type. However, a conversion function was not provided to convert the primary key values before loading them into the staging blob table, which resulted in corrupted primary key values in the staging
BLOBtable. While moving data from the staging table to the target table, the
BLOBread failed because it could not find the primary key in the
BLOBtables are checked to see whether there are conversions on primary keys of their main tables. This check is done after all the main tables are processed, so that conversion functions and parameters have already been set for the main tables. Any conversion functions and parameters used for the primary key in the main table are now duplicated in the
BLOBtable. (Bug #73966, Bug #19642978)
Corrupted messages to data nodes sometimes went undetected, causing a bad signal to be delivered to a block which aborted the data node. This failure in combination with disconnecting nodes could in turn cause the entire cluster to shut down.
To keep this from happening, additional checks are now made when unpacking signals received over TCP, including checks for byte order, compression flag (which must not be used), and the length of the next message in the receive buffer (if there is one).
Whenever two consecutive unpacked messages fail the checks just described, the current message is assumed to be corrupted. In this case, the transporter is marked as having bad data and no more unpacking of messages occurs until the transporter is reconnected. In addition, an entry is written to the cluster log containing the error as well as a hex dump of the corrupted message. (Bug #73843, Bug #19582925)
Transporter send buffers were not updated properly following a failed send. (Bug #45043, Bug #20113145)
Disk Data: In some cases, during
DICTmaster takeover, the new master could crash while attempting to roll forward an ongoing schema transaction. (Bug #19875663, Bug #74510)
Disk Data: When a node acting as a
DICTmaster fails, the arbitrator selects another node to take over in place of the failed node. During the takeover procedure, which includes cleaning up any schema transactions which are still open when the master failed, the disposition of the uncommitted schema transaction is decided. Normally this transaction be rolled back, but if it has completed a sufficient portion of a commit request, the new master finishes processing the commit. Until the fate of the transaction has been decided, no new
TRANS_END_REQmessages from clients can be processed. In addition, since multiple concurrent schema transactions are not supported, takeover cleanup must be completed before any new transactions can be started.
A similar restriction applies to any schema operations which are performed in the scope of an open schema transaction. The counter used to coordinate schema operation across all nodes is employed both during takeover processing and when executing any non-local schema operations. This means that starting a schema operation while its schema transaction is in the takeover phase causes this counter to be overwritten by concurrent uses, with unpredictable results.
The scenarios just described were handled previously using a pseudo-random delay when recovering from a node failure. Now we check before the new master has rolled forward or backwards any schema transactions remaining after the failure of the previous master and avoid starting new schema transactions or performing operations using old transactions until takeover processing has cleaned up after the abandoned transaction. (Bug #19874809, Bug #74503)
Disk Data: When a node acting as
DICTmaster fails, it is still possible to request that any open schema transaction be either committed or aborted by sending this request to the new
DICTmaster. In this event, the new master takes over the schema transaction and reports back on whether the commit or abort request succeeded. In certain cases, it was possible for the new master to be misidentified—that is, the request was sent to the wrong node, which responded with an error that was interpreted by the client application as an aborted schema transaction, even in cases where the transaction could have been successfully committed, had the correct node been contacted. (Bug #74521, Bug #19880747)
Cluster API: The buffer allocated by an
NdbScanOperationfor receiving scanned rows was not released until the
NdbTransactionowning the scan operation was closed. This could lead to excessive memory usage in an application where multiple scans were created within the same transaction, even if these scans were closed at the end of their lifecycle, unless
NdbScanOperation::close()was invoked with the
releaseOpargument equal to
true. Now the buffer is released whenever the cursor navigating the result set is closed with
NdbScanOperation::close(), regardless of the value of this argument. (Bug #75128, Bug #20166585)
Functionality Added or Changed
--exclude-missing-tablesoption for ndb_restore. When enabled, the option causes tables present in the backup but not in the target database to be ignored. (Bug #57566, Bug #11764704)
When assembling error messages of the form Incorrect state for node
node_state, written when the transporter failed to connect, the node state was used in place of the node ID in a number of instances, which resulted in errors of this type for which the node state was reported incorrectly. (Bug #19559313, Bug #73801)
In some cases, transporter receive buffers were reset by one thread while being read by another. This happened when a race condition occurred between a thread receiving data and another thread initiating disconnect of the transporter (disconnection clears this buffer). Concurrency logic has now been implemented to keep this race from taking place. (Bug #19552283, Bug #73790)
A more detailed error report is printed in the event of a critical failure in one of the
sendSignal*()methods, prior to crashing the process, as was already implemented for
sendSignal(), but was missing from the more specialized
sendSignalNoRelease()method. Having a crash of this type correctly reported can help with identifying configuration hardware issues in some cases. (Bug #19414511)
References: See also: Bug #19390895.
ndb_restore failed to restore the cluster's metadata when there were more than approximately 17 K data objects. (Bug #19202654)
Parallel transactions performing reads immediately preceding a delete on the same tuple could cause the
NDBkernel to crash. This was more likely to occur when separate TC threads were specified using the
ThreadConfigconfiguration parameter. (Bug #19031389)
Incorrect calculation of the next autoincrement value following a manual insertion towards the end of a cached range could result in duplicate values sometimes being used. This issue could manifest itself when using certain combinations of values for
This issue has been fixed by modifying the calculation to make sure that the next value from the cache as computed by
NDBis of the form
auto_increment_offset + (. This avoids any rounding up by the MySQL Server of the returned value, which could result in duplicate entries when the rounded-up value fell outside the range of values cached by
NDB. (Bug #17893872)
--helpoption with ndb_print_file caused the program to segfault. (Bug #17069285)
For multithreaded data nodes, some threads do communicate often, with the result that very old signals can remain at the top of the signal buffers. When performing a thread trace, the signal dumper calculated the latest signal ID from what it found in the signal buffers, which meant that these old signals could be erroneously counted as the newest ones. Now the signal ID counter is kept as part of the thread state, and it is this value that is used when dumping signals for trace files. (Bug #73842, Bug #19582807)
Cluster API: The fix for Bug #16723708 stopped the
ndb_logevent_get_next()function from casting a log event's
enumtype, but this change interfered with existing applications, and so the function's original behavior is now reinstated. A new MGM API function exhibiting the corrected behavior
ndb_logevent_get_next2()has been added in this release to take the place of the reverted function, for use in applications that do not require backward compatibility. In all other respects apart from this, the new function is identical with its predecessor. (Bug #18354165)
References: Reverted patches: Bug #16723708.
Functionality Added or Changed
Cluster API: Added as an aid to debugging the ability to specify a human-readable name for a given
Ndbobject and later to retrieve it. These operations are implemented, respectively, as the
To make tracing of event handling between a user application and
NDBeasier, you can use the reference (from
getReference()followed by the name (if provided) in printouts; the reference ties together the application
Ndbobject, the event buffer, and the
SUMAblock. (Bug #18419907)
Processing a NODE_FAILREP signal that contained an invalid node ID could cause a data node to fail. (Bug #18993037, Bug #73015)
References: This issue is a regression of: Bug #16007980.
Attribute promotion between different
TEXTtypes (any of
LONGTEXT) by ndb_restore was not handled properly in some cases. In addition,
TEXTvalues are now truncated according to the limits set by mysqld (for example, values converted to
TINYTEXTfrom another type are truncated to 256 bytes). In the case of columns using a multibyte character set, the value is truncated to the end of the last well-formed character.
Also as a result of this fix, conversion to a
TEXTcolumn of any size that uses a different character set from the original is now disallowed. (Bug #18875137)
ALTER TABLE ... REORGANIZE PARTITIONafter increasing the number of data nodes in the cluster from 4 to 16 led to a crash of the data nodes. This issue was shown to be a regression caused by previous fix which added a new dump handler using a dump code that was already in use (7019), which caused the command to execute two different handlers with different semantics. The new handler was assigned a new
DUMPcode (7024). (Bug #18550318)
References: This issue is a regression of: Bug #14220269.
Following a long series of inserts, when running with a relatively small redo log and an insufficient large value for
MaxNoOfConcurrentTransactions, there remained transactions that were blocked by the lack of redo log and were thus not aborted in the correct state (waiting for prepare log to be sent to disk, or
LOG_QUEUEDstate). This caused the redo log to remain blocked until unblocked by a completion of a local checkpoint. This could lead to a deadlock, when the blocked aborts in turned blocked global checkpoints, and blocked GCPs block LCPs. To prevent this situation from arising, we now abort immediately when we reach the
LOG_QUEUEDstate in the abort state handler. (Bug #18533982)
ndbmtd supports multiple parallel receiver threads, each of which performs signal reception for a subset of the remote node connections (transporters) with the mapping of remote_nodes to receiver threads decided at node startup. Connection control is managed by the multi-instance
TRPMANblock, which is organized as a proxy and workers, and each receiver thread has a
TRPMANworker running locally.
QMGRblock sends signals to
TRPMANto enable and disable communications with remote nodes. These signals are sent to the
TRPMANproxy, which forwards them to the workers. The workers themselves decide whether to act on signals, based on the set of remote nodes they manage.
The current issue arises because the mechanism used by the
TRPMANworkers for determining which connections they are responsible for was implemented in such a way that each worker thought it was responsible for all connections. This resulted in the
CLOSE_COMREQbeing processed multiple times.
The fix keeps
TRPMANinstances (receiver threads) executing
CLOSE_COMREQrequests. In addition, the correct
TRPMANinstance is now chosen when routing from this instance for a specific remote connection. (Bug #18518037)
A local checkpoint (LCP) is tracked using a global LCP state (
c_lcpState), and each
NDBtable has a status indicator which indicates the LCP status of that table (
tabLcpStatus). If the global LCP state is
LCP_STATUS_IDLE, then all the tables should have an LCP status of
When an LCP starts, the global LCP status is
LCP_INIT_TABLESand the thread starts setting all the
TLS_ACTIVE. If any tables are not ready for LCP, the LCP initialization procedure continues with
CONTINUEBsignals until all tables have become available and been marked
TLS_ACTIVE. When this initialization is complete, the global LCP status is set to
This bug occurred when the following conditions were met:
An LCP was in the
LCP_INIT_TABLESstate, and some but not all tables had been set to
The master node failed before the global LCP state changed to
LCP_STATUS_ACTIVE; that is, before the LCP could finish processing all tables.
NODE_FAILREPsignal resulting from the node failure was processed before the final
CONTINUEBsignal from the LCP initialization process, so that the node failure was processed while the LCP remained in the
Following master node failure and selection of a new one, the new master queries the remaining nodes with a
MASTER_LCPREQsignal to determine the state of the LCP. At this point, since the LCP status was
LCP_INIT_TABLES, the LCP status was reset to
LCP_STATUS_IDLE. However, the LCP status of the tables was not modified, so there remained tables with
TLS_ACTIVE. Afterwards, the failed node is removed from the LCP. If the LCP status of a given table is
TLS_ACTIVE, there is a check that the global LCP status is not
LCP_STATUS_IDLE; this check failed and caused the data node to fail.
MASTER_LCPREQhandler ensures that the
tabLcpStatusfor all tables is updated to
TLS_COMPLETEDwhen the global LCP status is changed to
LCP_STATUS_IDLE. (Bug #18044717)
The logging of insert failures has been improved. This is intended to help diagnose occasional issues seen when writing to the
mysql.ndb_binlog_indextable. (Bug #17461625)
CHARcolumn that used the
UTF8character set as a table's primary key column led to node failure when restarting data nodes. Attempting to restore a table with such a primary key also caused ndb_restore to fail. (Bug #16895311, Bug #68893)
-o) option for the ndb_select_all utility worked only when specified as the last option, and did not work with an equals sign.
As part of this fix, the program's
--helpoutput was also aligned with the
--orderoption's correct behavior. (Bug #64426, Bug #16374870)
Cluster API: When an
NDBdata node indicates a buffer overflow via an empty epoch, the event buffer places an inconsistent data event in the event queue. When this was consumed, it was not removed from the event queue as expected, causing subsequent
nextEvent()calls to return 0. This caused event consumption to stall because the inconsistency remained flagged forever, while event data accumulated in the queue.
Event data belonging to an empty inconsistent epoch can be found either at the beginning or somewhere in the middle.
pollEvents()returns 0 for the first case. This fix handles the second case: calling
nextEvent()call dequeues the inconsistent event before it returns. In order to benefit from this fix, user applications must call
pollEvents()returns 0. (Bug #18716991)
Cluster API: The
pollEvents()method returned 1, even when called with a wait time equal to 0, and there were no events waiting in the queue. Now in such cases it returns 0 as expected. (Bug #18703871)
Functionality Added or Changed
LongMessageBuffershortages and statistics has been improved as follows:
The default value of
LongMessageBufferhas been increased from 4 MB to 64 MB.
When this resource is exhausted, a suitable informative message is now printed in the data node log describing possible causes of the problem and suggesting possible solutions.
LongMessageBufferusage information is now shown in the
ndbinfo.memoryusagetable. See the description of this table for an example and additional information.
Important Change: The server system variables
ndb_index_stat_freq, which had been deprecated in a previous MySQL Cluster release series, have now been removed. (Bug #11746486, Bug #26673)
ALTER TABLEstatement changed table schemas without causing a change in the table's partitioning, the new table definition did not copy the hash map from the old definition, but used the current default hash map instead. However, the table data was not reorganized according to the new hashmap, which made some rows inaccessible using a primary key lookup if the two hash maps had incompatible definitions.
To keep this situation from occurring, any
ALTER TABLEthat entails a hashmap change now triggers a reorganisation of the table. In addition, when copying a table definition in such cases, the hashmap is now also copied. (Bug #18436558)
When certain queries generated signals having more than 18 data words prior to a node failure, such signals were not written correctly in the trace file. (Bug #18419554)
After dropping an
NDBtable, neither the cluster log nor the output of the
REPORT MemoryUsagecommand showed that the
IndexMemoryused by that table had been freed, even though the memory had in fact been deallocated. This issue was introduced in MySQL Cluster NDB 7.1.28. (Bug #18296810)
ndb_show_tables sometimes failed with the error message Unable to connect to management server and immediately terminated, without providing the underlying reason for the failure. To provide more useful information in such cases, this program now also prints the most recent error from the
Ndb_cluster_connectionobject used to instantiate the connection. (Bug #18276327)
The block threads managed by the multi-threading scheduler communicate by placing signals in an out queue or job buffer which is set up between all block threads. This queue has a fixed maximum size, such that when it is filled up, the worker thread must wait for the consumer to drain the queue. In a highly loaded system, multiple threads could end up in a circular wait lock due to full out buffers, such that they were preventing each other from performing any useful work. This condition eventually led to the data node being declared dead and killed by the watchdog timer.
To fix this problem, we detect situations in which a circular wait lock is about to begin, and cause buffers which are otherwise held in reserve to become available for signal processing by queues which are highly loaded. (Bug #18229003)
The ndb_mgm client
START BACKUPcommand (see Commands in the MySQL Cluster Management Client) could experience occasional random failures when a ping was received prior to an expected
BackupCompletedevent. Now the connection established by this command is not checked until it has been properly set up. (Bug #18165088)
When performing a copying
ALTER TABLEoperation, mysqld creates a new copy of the table to be altered. This intermediate table, which is given a name bearing the prefix
#sql-, has an updated schema but contains no data. mysqld then copies the data from the original table to this intermediate table, drops the original table, and finally renames the intermediate table with the name of the original table.
mysqld regards such a table as a temporary table and does not include it in the output from
SHOW TABLES; mysqldump also ignores an intermediate table. However,
NDBsees no difference between such an intermediate table and any other table. This difference in how intermediate tables are viewed by mysqld (and MySQL client programs) and by the
NDBstorage engine can give rise to problems when performing a backup and restore if an intermediate table existed in
NDB, possibly left over from a failed
ALTER TABLEthat used copying. If a schema backup is performed using mysqldump and the mysql client, this table is not included. However, in the case where a data backup was done using the ndb_mgm client's
BACKUPcommand, the intermediate table was included, and was also included by ndb_restore, which then failed due to attempting to load data into a table which was not defined in the backed up schema.
To prevent such failures from occurring, ndb_restore now by default ignores intermediate tables created during
ALTER TABLEoperations (that is, tables whose names begin with the prefix
#sql-). A new option
--exclude-intermediate-sql-tablesis added that makes it possible to override the new behavior. The option's default value is
TRUE; to cause ndb_restore to revert to the old behavior and to attempt to restore intermediate tables, set this option to
FALSE. (Bug #17882305)
Data nodes running ndbmtd could stall while performing an online upgrade of a MySQL Cluster containing a great many tables from a version prior to NDB 7.1.20 to version 7.1.20 or later. (Bug #16693068)
Cluster API: When an NDB API client application received a signal with an invalid block or signal number,
NDBprovided only a very brief error message that did not accurately convey the nature of the problem. Now in such cases, appropriate printouts are provided when a bad signal or message is detected. In addition, the message length is now checked to make certain that it matches the size of the embedded signal. (Bug #18426180)
Cluster API: Refactoring that was performed in MySQL Cluster NDB 7.1.30 inadvertently introduced a dependency in
Ndb.hppon a file that is not included in the distribution, which caused NDB API applications to fail to compile. The dependency has been removed. (Bug #18293112, Bug #71803)
References: This issue is a regression of: Bug #17647637.
Cluster API: An NDB API application sends a scan query to a data node; the scan is processed by the transaction coordinator (TC). The TC forwards a
LQHKEYREQrequest to the appropriate LDM, and aborts the transaction if it does not receive a
LQHKEYCONFresponse within the specified time limit. After the transaction is successfully aborted, the TC sends a
TCROLLBACKREPto the NDBAPI client, and the NDB API client processes this message by cleaning up any
Ndbobjects associated with the transaction.
The client receives the data which it has requested in the form of
TRANSID_AIsignals, buffered for sending at the data node, and may be delivered after a delay. On receiving such a signal,
NDBchecks the transaction state and ID: if these are as expected, it processes the signal using the
Ndbobjects associated with that transaction.
The current bug occurs when all the following conditions are fulfilled:
The transaction coordinator aborts a transaction due to delays and sends a
TCROLLBACPREPsignal to the client, while at the same time a
TRANSID_AIwhich has been buffered for delivery at an LDM is delivered to the same client.
The NDB API client considers the transaction complete on receipt of a
TCROLLBACKREPsignal, and immediately closes the transaction.
The client has a separate receiver thread running concurrently with the thread that is engaged in closing the transaction.
The arrival of the late
TRANSID_AIinterleaves with the closing of the user thread's transaction such that
TRANSID_AIprocessing passes normal checks before
closeTransaction()resets the transaction state and invalidates the receiver.
When these conditions are all met, the receiver thread proceeds to continue working on the
TRANSID_AIsignal using the invalidated receiver. Since the receiver is already invalidated, its usage results in a node failure.
Ndbobject cleanup done for
TCROLLBACKREPincludes invalidation of the transaction ID, so that, for a given transaction, any signal which is received after the
TCROLLBACKREParrives does not pass the transaction ID check and is silently dropped. This fix is also implemented for the
TCKEY_FAILREFsignals as well.
See also Operations and Signals, for additional information about NDB messaging. (Bug #18196562)
Cluster API: ndb_restore could sometimes report Error 701 System busy with other schema operation unnecessarily when restoring in parallel. (Bug #17916243)
Packaging: Compilation of ndbmtd failed on Solaris 10 and 11 for 32-bit
x86, and the binary was not included in the binary distributions for these platforms. (Bug #16620938)
Disk Data: When using Disk Data tables and ndbmtd data nodes, it was possible for the undo buffer to become overloaded, leading to a crash of the data nodes. This issue was more likely to be encountered when using Disk Data columns whose size was approximately 8K or larger. (Bug #16766493)
UINT_MAX64was treated as a signed value by Visual Studio 2010. To prevent this from happening, the value is now explicitly defined as unsigned. (Bug #17947674)
References: See also: Bug #17647637.
Monotonic timers on several platforms can experience issues which might result in the monotonic clock doing small jumps back in time. This is due to imperfect synchronization of clocks between multiple CPU cores and does not normally have an adverse effect on the scheduler and watchdog mechanisms; so we handle some of these cases by making backtick protection less strict, although we continue to ensure that the backtick is less than 10 milliseconds. This fix also removes several checks for backticks which are thereby made redundant. (Bug #17973819)
Poor support or lack of support on some platforms for monotonic timers caused issues with delayed signal handling by the job scheduler for the multithreaded data node. Variances (timer leaps) on such platforms are now handled in the same way the multithreaded data node process that they are by the singlethreaded version. (Bug #17857442)
References: See also: Bug #17475425, Bug #17647637.
When using single-threaded (ndbd) data nodes with
RealTimeSchedulerenabled, the CPU did not, as intended, temporarily lower its scheduling priority to normal every 10 milliseconds to give other, non-realtime threads a chance to run. (Bug #17739131)
Timers used in timing scheduler events in the
NDBkernel have been refactored, in part to insure that they are monotonic on all platforms. In particular, on Windows, event intervals were previously calculated using values obtained from
GetSystemTimeAsFileTime(), which reads directly from the system time (“wall clock”), and which may arbitrarily be reset backward or forward, leading to false watchdog or heartbeat alarms, or even node shutdown. Lack of timer monotonicity could also cause slow disk writes during backups and global checkpoints. To fix this issue, the Windows implementation now uses
GetSystemTimeAsFileTime(). In the event that a monotonic timer is not found on startup of the data nodes, a warning is logged.
In addition, on all platforms, a check is now performed at compile time for available system monotonic timers, and the build fails if one cannot be found; note that
CLOCK_HIGHRESis now supported as an alternative for
CLOCK_MONOTONICif the latter is not available. (Bug #17647637)
The global checkpoint lag watchdog tracking the number of times a check for GCP lag was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the GCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called.
In addition, any backticks (rare in any case) are now handled by taking the backward time as the new current time and calculating the elapsed time for this round as 0. Finally, any ill effects of a forward leap, which possibly could expire the watchdog timer immediately, are reduced by never calculating an elapsed time longer than the requested delay time for the watchdog timer. (Bug #17647469)
References: See also: Bug #17842035.
In certain rare cases on commit of a transaction, an
Ndbobject was released before the transaction coordinator (
DBTCkernel block) sent the expected
NDBfailed to send a
COMMIT_ACKsignal in response, which caused a memory leak in the
NDBkernel could later lead to node failure.
Ndbobject is not released until the
COMMIT_CONFsignal has actually been received. (Bug #16944817)
After restoring the database metadata (but not any data) by running ndb_restore
-m), SQL nodes would hang while trying to
SELECTfrom a table in the database to which the metadata was restored. In such cases the attempt to query the table now fails as expected, since the table does not actually exist until ndb_restore is executed with
-r). (Bug #16890703)
References: See also: Bug #21184102.
The ndbd_redo_log_reader utility now supports a
--helpoption. Using this options causes the program to print basic usage information, and then to exit. (Bug #11749591, Bug #36805)
Cluster API: It was possible for an
Ndbobject to receive signals for handling before it was initialized, leading to thread interleaving and possible data node failure when executing a call to
Ndb::init(). To guard against this happening, a check is now made when it is starting to receive signals that the
Ndbobject is properly initialized before any signals are actually handled. (Bug #17719439)
Functionality Added or Changed
The length of time a management node waits for a heartbeat message from another management node is now configurable using the
HeartbeatIntervalMgmdMgmdmanagement node configuration parameter added in this release. The connection is considered dead after 3 missed heartbeats. The default value is 1500 milliseconds, or a timeout of approximately 6000 ms. (Bug #17807768, Bug #16426805)
ndb_restore could abort during the last stages of a restore using attribute promotion or demotion into an existing table. This could happen if a converted attribute was nullable and the backup had been run on active database. (Bug #17275798)
DBUTILdata node block is now less strict about the order in which it receives certain messages from other nodes. (Bug #17052422)
The Windows error ERROR_FILE_EXISTS was not recognized by
NDB, which treated it as an unknown error. (Bug #16970960)
RealTimeSchedulerdid not work correctly with data nodes running ndbmtd. (Bug #16961971)
Maintenance and checking of parent batch completion in the
SPJblock of the
NDBkernel was reimplemented. Among other improvements, the completion state of all ancestor nodes in the tree are now preserved. (Bug #16925513)
The LCP fragment scan watchdog periodically checks for lack of progress in a fragment scan performed as part of a local checkpoint, and shuts down the node if there is no progress after a given amount of time has elapsed. This interval, formerly hard-coded as 60 seconds, can now be configured using the
LcpScanProgressTimeoutdata node configuration parameter added in this release.
This configuration parameter sets the maximum time the local checkpoint can be stalled before the LCP fragment scan watchdog shuts down the node. The default is 60 seconds, which provides backward compatibility with previous releases.
You can disable the LCP fragment scan watchdog by setting this parameter to 0. (Bug #16630410)
Added the ndb_error_reporter options
--connection-timeout, which makes it possible to set a timeout for connecting to nodes,
--dry-scp, which disables scp connections to remote hosts, and
--skip-nodegroup, which skips all nodes in a given node group. (Bug #16602002)
References: See also: Bug #11752792, Bug #44082.
NDBreceive thread waited unnecessarily for additional job buffers to become available when receiving data. This caused the receive mutex to be held during this wait, which could result in a busy wait when the receive thread was running with real-time priority.
This fix also handles the case where a negative return value from the initial check of the job buffer by the receive thread prevented further execution of data reception, which could possibly lead to communication blockage or configured
ReceiveBufferMemoryunderutilization. (Bug #15907515)
When the available job buffers for a given thread fell below the critical threshold, the internal multi-threading job scheduler waited for job buffers for incoming rather than outgoing signals to become available, which meant that the scheduler waited the maximum timeout (1 millisecond) before resuming execution. (Bug #15907122)
Under some circumstances, a race occurred where the wrong watchdog state could be reported. A new state name
Packing Send Buffersis added for watchdog state number 11, previously reported as
Unknown place. As part of this fix, the state numbers for states without names are always now reported in such cases. (Bug #14824490)
When a node fails, the Distribution Handler (
DBDIHkernel block) takes steps together with the Transaction Coordinator (
DBTC) to make sure that all ongoing transactions involving the failed node are taken over by a surviving node and either committed or aborted. Transactions taken over which are then committed belong in the epoch that is current at the time the node failure occurs, so the surviving nodes must keep this epoch available until the transaction takeover is complete. This is needed to maintain ordering between epochs.
A problem was encountered in the mechanism intended to keep the current epoch open which led to a race condition between this mechanism and that normally used to declare the end of an epoch. This could cause the current epoch to be closed prematurely, leading to failure of one or more surviving data nodes. (Bug #14623333, Bug #16990394)
ndb_error-reporter did not support the
--helpoption. (Bug #11756666, Bug #48606)
References: See also: Bug #11752792, Bug #44082.
START BACKUP WAIT STARTEDwas run from the command line using ndb_mgm
-e), the client did not exit until the backup completed. (Bug #11752837, Bug #44146)
Formerly, the node used as the coordinator or leader for distributed decision making between nodes (also known as the
DICTmanager—see The DBDICT Block) was indicated in the output of the ndb_mgm client
SHOWcommand as the “master” node, although this node has no relationship to a master server in MySQL Replication. (It should also be noted that it is not necessary to know which node is the leader except when debugging
NDBCLUSTERsource code.) To avoid possible confusion, this label has been removed, and the leader node is now indicated in
SHOWcommand output using an asterisk (
*) character. (Bug #11746263, Bug #24880)
Program execution failed to break out of a loop after meeting a desired condition in a number of internal methods, performing unneeded work in all cases where this occurred. (Bug #69610, Bug #69611, Bug #69736, Bug #17030606, Bug #17030614, Bug #17160263)
ABORT BACKUPin the ndb_mgm client (see Commands in the MySQL Cluster Management Client) took an excessive amount of time to return (approximately as long as the backup would have required to complete, had it not been aborted), and failed to remove the files that had been generated by the aborted backup. (Bug #68853, Bug #17719439)
Note that converted character data is not checked to conform to any character set.
When performing such promotions, the only other sort of type conversion that can be performed at the same time is between character types and binary types.
Cluster API: The
Event::setTable()method now supports a pointer or a reference to table as its required argument. If a null table pointer is used, the method now returns -1 to make it clear that this is what has occurred. (Bug #16329082)
Functionality Added or Changed
ExtraSendBufferMemoryparameter for management nodes and API nodes. (Formerly, this parameter was available only for configuring data nodes.) See
ExtraSendBufferMemory(management nodes), and
ExtraSendBufferMemory(API nodes), for more information. (Bug #14555359)
Performance: In a number of cases found in various locations in the MySQL Cluster codebase, unnecessary iterations were performed; this was caused by failing to break out of a repeating control structure after a test condition had been met. This community-contributed fix removes the unneeded repetitions by supplying the missing breaks. (Bug #16904243, Bug #69392, Bug #16904338, Bug #69394, Bug #16778417, Bug #69171, Bug #16778494, Bug #69172, Bug #16798410, Bug #69207, Bug #16801489, Bug #69215, Bug #16904266, Bug #69393)
File system errors occurring during a local checkpoint could sometimes cause an LCP to hang with no obvious cause when they were not handled correctly. Now in such cases, such errors always cause the node to fail. Note that the LQH block always shuts down the node when a local checkpoint fails; the change here is to make likely node failure occur more quickly and to make the original file system error more visible. (Bug #16961443)
The planned or unplanned shutdown of one or more data nodes while reading table data from the
ndbinfodatabase caused a memory leak. (Bug #16932989)
DBDIHwas updating table checkpoint information subsequent to a node failure could lead to a data node failure. (Bug #16904469)
In certain cases, when starting a new SQL node, mysqld failed with Error 1427 Api node died, when SUB_START_REQ reached node. (Bug #16840741)
Failure to use container classes specific
NDBduring node failure handling could cause leakage of commit-ack markers, which could later lead to resource shortages or additional node crashes. (Bug #16834416)
Use of an uninitialized variable employed in connection with error handling in the
DBLQHkernel block could sometimes lead to a data node crash or other stability issues for no apparent reason. (Bug #16834333)
A race condition in the time between the reception of a
execNODE_FAILREPsignal by the
QMGRkernel block and its reception by the
DBTCkernel blocks could lead to data node crashes during shutdown. (Bug #16834242)
CLUSTERLOGcommand (see Commands in the MySQL Cluster Management Client) caused ndb_mgm to crash on Solaris SPARC systems. (Bug #16834030)
START BACKUP, if
idhad already been used for a backup ID, an error caused by the duplicate ID occurred as expected, but following this, the
START BACKUPcommand never completed. (Bug #16593604, Bug #68854)
ndb_mgm treated backup IDs provided to
ABORT BACKUPcommands as signed values, so that backup IDs greater than 231 wrapped around to negative values. This issue also affected out-of-range backup IDs, which wrapped around to negative values instead of causing errors as expected in such cases. The backup ID is now treated as an unsigned value, and ndb_mgm now performs proper range checking for backup ID values greater than
MAX_BACKUPS(232). (Bug #16585497, Bug #68798)
When trying to specify a backup ID greater than the maximum allowed, the value was silently truncated. (Bug #16585455, Bug #68796)
The unexpected shutdown of another data node as a starting data node received its node ID caused the latter to hang in Start Phase 1. (Bug #16007980)
References: See also: Bug #18993037.
Creating more than 32 hash maps caused data nodes to fail. Usually new hashmaps are created only when performing reorganzation after data nodes have been added or when explicit partitioning is used, such as when creating a table with the
MAX_ROWSoption, or using
PARTITION BY KEY() PARTITIONS. (Bug #14710311)
When performing an
INSERT ... ON DUPLICATE KEY UPDATEon an
NDBtable where the row to be inserted already existed and was locked by another transaction, the error message returned from the
INSERTfollowing the timeout was Transaction already aborted instead of the expected Lock wait timeout exceeded. (Bug #14065831, Bug #65130)
When using dynamic listening ports for accepting connections from API nodes, the port numbers were reported to the management server serially. This required a round trip for each API node, causing the time required for data nodes to connect to the management server to grow linearly with the number of API nodes. To correct this problem, each data node now reports all dynamic ports at once. (Bug #12593774)
Cluster API: For each log event retrieved using the MGM API, the log event category (
ndb_mgm_event_category) was simply cast to an
enumtype, which resulted in invalid category values. Now an offset is added to the category following the cast to ensure that the value does not fall out of the allowed range.Note
This change was reverted by the fix for Bug #18354165. See the MySQL Cluster API Developer documentation for
ndb_logevent_get_next(), for more information.
References: See also: Bug #18354165.
Functionality Added or Changed
Cluster API: Added
DUMPcode 2514, which provides information about counts of transaction objects per API node. For more information, see DUMP 2514. See also Commands in the MySQL Cluster Management Client. (Bug #15878085)
When ndb_restore fails to find a table, it now includes in the error output an NDB API error code giving the reason for the failure. (Bug #16329067)
Following an upgrade to MySQL Cluster NDB 7.2.7 or later, it was not possible to downgrade online again to any previous version, due to a change in that version in the default size (number of LDM threads used) for
NDBtable hash maps. The fix for this issue makes the size configurable, with the addition of the
To retain compatibility with an older release that does not support large hash maps, you can set this parameter in the cluster'
config.inifile to the value used in older releases (240) before performing an upgrade, so that the data nodes continue to use smaller hash maps that are compatible with the older release. You can also now employ this parameter in MySQL Cluster NDB 7.0 and MySQL Cluster NDB 7.1 to enable larger hash maps prior to upgrading to MySQL Cluster NDB 7.2. For more information, see the description of the
DefaultHashMapSizeparameter. (Bug #14800539)
References: See also: Bug #14645319.
Important Change; Cluster API: When checking—as part of evaluating an
ifpredicate—which error codes should be propagated to the application, any error code less than 6000 caused the current row to be skipped, even those codes that should have caused the query to be aborted. In addition, a scan that aborted due to an error from
DBTUPwhen no rows had been sent to the API caused
DBLQHto send a
SCAN_FRAGCONFsignal rather than a
DBTC. This caused
DBTCto time out waiting for a
SCAN_FRAGREFsignal that was never sent, and the scan was never closed.
As part of this fix, the default
ErrorCodevalue used by
NdbInterpretedCode::interpret_exit_nok()has been changed from 899 (Rowid already allocated) to 626 (Tuple did not exist). The old value continues to be supported for backward compatibility. User-defined values in the range 6000-6999 (inclusive) are also now supported. You should also keep in mind that the result of using any other
ErrorCodevalue not mentioned here is not defined or guaranteed.
The NDB Error-Reporting Utility (ndb_error_reporter) failed to include the cluster nodes' log files in the archive it produced when the
FILEoption was set for the parameter
LogDestination. (Bug #16765651)
References: See also: Bug #11752792, Bug #44082.
WHEREcondition that contained a boolean test of the result of an
INsubselect was not evaluated correctly. (Bug #16678033)
In some cases a data node could stop with an exit code but no error message other than
(null)was logged. (This could occur when using ndbd or ndbmtd for the data node process.) Now in such cases the appropriate error message is used instead (see ndbd Error Messages). (Bug #16614114)
When using tables having more than 64 fragments in a MySQL Cluster where multiple TC threads were configured (on data nodes running ndbmtd, using
KeyInfomemory could be freed prematurely, before scans relying on these objects could be completed, leading to a crash of the data node. (Bug #16402744)
References: See also: Bug #13799800. This issue is a regression of: Bug #14143553.
When started with
--initialand an invalid
-f) option, ndb_mgmd removed the old configuration cache before verifying the configuration file. Now in such cases, ndb_mgmd first checks for the file, and continues with removing the configuration cache only if the configuration file is found and is valid. (Bug #16299289)
DUMP 2304command during a data node restart could cause the data node to crash with a Pointer too large error. (Bug #16284258)
Improved handling of lagging row change event subscribers by setting size of the GCP pool to the value of
MaxBufferedEpochs. This fix also introduces a new
MaxBufferedEpochBytesdata node configuration parameter, which makes it possible to set a total number of bytes per node to be reserved for buffering epochs. In addition, a new
DUMPcode (8013) has been added which causes a list a lagging subscribers for each node to be printed to the cluster log (see DUMP 8013). (Bug #16203623)
Data nodes could fail during a system restart when the host ran short of memory, due to signals of the wrong types (
TRANSID_AI_R) being sent to the
DBSPJkernel block. (Bug #16187976)
Attempting to perform additional operations such as
ADD COLUMNas part of an
ALTER [ONLINE | OFFLINE] TABLE ... RENAME ...statement is not supported, and now fails with an ER_NOT_SUPPORTED_YET error. (Bug #16021021)
Purging the binary logs could sometimes cause mysqld to crash. (Bug #15854719)
Due to a known issue in the MySQL Server, it is possible to drop the
PERFORMANCE_SCHEMAdatabase. (Bug #15831748) In addition, when executed on a MySQL Server acting as a MySQL Cluster SQL node,
DROP DATABASEcaused this database to be dropped on all SQL nodes in the cluster. Now, when executing a distributed drop of a database,
NDBdoes not delete tables that are local only. This prevents MySQL system databases from being dropped in such cases. (Bug #14798043)
References: See also: Bug #15831748.
An error message in
src/mgmsrv/MgmtSrvr.cppwas corrected. (Bug #14548052, Bug #66518)
DUMP 1000command (see DUMP 1000) that contained extra or malformed arguments could lead to data node failures. (Bug #14537622)
LongMessageBuffermemory under heavy load could cause data nodes running ndbmtd to fail. (Bug #14488185)
The help text for ndb_select_count did not include any information about using table names. (Bug #11755737, Bug #47551)
The ndb_mgm client
HELPcommand did not show the complete syntax for the
Cluster API: The
Ndb::computeHash()API method performs a
malloc()if no buffer is provided for it to use. However, it was assumed that the memory thus returned would always be suitably aligned, which is not always the case. Now when
malloc()provides a buffer to this method, the buffer is aligned after it is allocated, and before it is used. (Bug #16484617)
Functionality Added or Changed
Added several new columns to the
transporterstable and counters for the
counterstable of the
ndbinfoinformation database. The information provided may help in troublehsooting of transport overloads and problems with send buffer memory allocation. For more information, see the descriptions of these tables. (Bug #15935206)
To provide information which can help in assessing the current state of arbitration in a MySQL Cluster as well as in diagnosing and correcting arbitration problems, 3 new tables—
arbitrator_validity_summary—have been added to the
ndbinfoinformation database. (Bug #13336549)
NDBtable grew to contain approximately one million rows or more per partition, it became possible to insert rows having duplicate primary or unique keys into it. In addition, primary key lookups began to fail, even when matching rows could be found in the table by other means.
This issue was introduced in MySQL Cluster NDB 7.0.36, MySQL Cluster NDB 7.1.26, and MySQL Cluster NDB 7.2.9. Signs that you may have been affected include the following:
Rows left over that should have been deleted
Rows unchanged that should have been updated
Rows with duplicate unique keys due to inserts or updates (which should have been rejected) that failed to find an existing row and thus (wrongly) inserted a new one
This issue does not affect simple scans, so you can see all rows in a given
SELECT * FROMand similar queries that do not depend on a primary or unique key.
Upgrading to or downgrading from an affected release can be troublesome if there are rows with duplicate primary or unique keys in the table; such rows should be merged, but the best means of doing so is application dependent.
In addition, since the key operations themselves are faulty, a merge can be difficult to achieve without taking the MySQL Cluster offline, and it may be necessary to dump, purge, process, and reload the data. Depending on the circumstances, you may want or need to process the dump with an external application, or merely to reload the dump while ignoring duplicates if the result is acceptable.
Another possibility is to copy the data into another table without the original table' unique key constraints or primary key (recall that
CREATE TABLE t2 SELECT * FROM t1does not by default copy
t1's primary or unique key definitions to
t2). Following this, you can remove the duplicates from the copy, then add back the unique constraints and primary key definitions. Once the copy is in the desired state, you can either drop the original table and rename the copy, or make a new dump (which can be loaded later) from the copy. (Bug #16023068, Bug #67928)
The management client command
ALL REPORT BackupStatusfailed with an error when used with data nodes having multiple LQH worker threads (ndbmtd data nodes). The issue did not effect the
form of this command. (Bug #15908907)
The multi-threaded job scheduler could be suspended prematurely when there were insufficient free job buffers to allow the threads to continue. The general rule in the job thread is that any queued messages should be sent before the thread is allowed to suspend itself, which guarantees that no other threads or API clients are kept waiting for operations which have already completed. However, the number of messages in the queue was specified incorrectly, leading to increased latency in delivering signals, sluggish response, or otherwise suboptimal performance. (Bug #15908684)
The setting for the
DefaultOperationRedoProblemActionAPI node configuration parameter was ignored, and the default value used instead. (Bug #15855588)
Node failure during the dropping of a table could lead to the node hanging when attempting to restart.
When this happened, the
NDBinternal dictionary (
DBDICT) lock taken by the drop table operation was held indefinitely, and the logical global schema lock taken by the SQL the drop table operation from which the drop operation originated was held until the
NDBinternal operation timed out. To aid in debugging such occurrences, a new dump code,
DUMP DictDumpLockQueue), which dumps the contents of the
DICTlock queue, has been added in the ndb_mgm client. (Bug #14787522)
Job buffers act as the internal queues for work requests (signals) between block threads in ndbmtd and could be exhausted if too many signals are sent to a block thread.
Performing pushed joins in the
DBSPJkernel block can execute multiple branches of the query tree in parallel, which means that the number of signals being sent can increase as more branches are executed. If
DBSPJexecution cannot be completed before the job buffers are filled, the data node can fail.
This problem could be identified by multiple instances of the message sleeploop 10!! in the cluster out log, possibly followed by job buffer full. If the job buffers overflowed more gradually, there could also be failures due to error 1205 (Lock wait timeout exceeded), shutdowns initiated by the watchdog timer, or other timeout related errors. These were due to the slowdown caused by the 'sleeploop'.
Normally up to a 1:4 fanout ratio between consumed and produced signals is permitted. However, since there can be a potentially unlimited number of rows returned from the scan (and multiple scans of this type executing in parallel), any ratio greater 1:1 in such cases makes it possible to overflow the job buffers.
The fix for this issue defers any lookup child which otherwise would have been executed in parallel with another is deferred, to resume when its parallel child completes one of its own requests. This restricts the fanout ratio for bushy scan-lookup joins to 1:1. (Bug #14709490)
References: See also: Bug #14648712.
During an online upgrade, certain SQL statements could cause the server to hang, resulting in the error Got error 4012 'Request ndbd time-out, maybe due to high load or communication problems' from NDBCLUSTER. (Bug #14702377)
The recently added LCP fragment scan watchdog occasionally reported problems with LCP fragment scans having very high table id, fragment id, and row count values.
This was due to the watchdog not accounting for the time spent draining the backup buffer used to buffer rows before writing to the fragment checkpoint file.
Now, in the final stage of an LCP fragment scan, the watchdog switches from monitoring rows scanned to monitoring the buffer size in bytes. The buffer size should decrease as data is written to the file, after which the file should be promptly closed. (Bug #14680057)
Under certain rare circumstances, MySQL Cluster data nodes could crash in conjunction with a configuration change on the data nodes from a single-threaded to a multi-threaded transaction coordinator (using the
ThreadConfigconfiguration parameter for ndbmtd). The problem occurred when a mysqld that had been started prior to the change was shut down following the rolling restart of the data nodes required to effect the configuration change. (Bug #14609774)
Functionality Added or Changed
Added 3 new columns to the
transporterstable in the
bytes_receivedcolumns help to provide an overview of data transfer across the transporter links in a MySQL Cluster. This information can be useful in verifying system balance, partitioning, and front-end server load balancing; it may also be of help when diagnosing network problems arising from link saturation, hardware faults, or other causes. (Bug #14685458)
Data node logs now provide tracking information about arbitrations, including which nodes have assumed the arbitrator role and at what times. (Bug #11761263, Bug #53736)
A slow filesystem during local checkpointing could exert undue pressure on
DBDIHkernel block file page buffers, which in turn could lead to a data node crash when these were exhausted. This fix limits the number of table definition updates that
DBDIHcan issue concurrently. (Bug #14828998)
The management server process, when started with
--config-cache=FALSE, could sometimes hang during shutdown. (Bug #14730537)
The output from ndb_config
--configinfonow contains the same information as that from ndb_config
--xml, including explicit indicators for parameters that do not require restarting a data node with
--initialto take effect. In addition,
ndb_configindicated incorrectly that the
LogLevelCheckpointdata node configuration parameter requires an initial node restart to take effect, when in fact it does not; this error was also present in the MySQL Cluster documentation, where it has also been corrected. (Bug #14671934)
ALTER TABLEwith other DML statements on the same NDB table returned Got error -1 'Unknown error code' from NDBCLUSTER. (Bug #14578595)
CPU consumption peaked several seconds after the forced termination an NDB client application due to the fact that the DBTC kernel block waited for any open transactions owned by the disconnected API client to be terminated in a busy loop, and did not break between checks for the correct state. (Bug #14550056)
Receiver threads could wait unnecessarily to process incomplete signals, greatly reducing performance of ndbmtd. (Bug #14525521)
On platforms where epoll was not available, setting multiple receiver threads with the
ThreadConfigparameter caused ndbmtd to fail. (Bug #14524939)
--connect-delaystartup options for ndbd and ndbmtd.
--connect-retries(default 12) controls how many times the data node tries to connect to a management server before giving up; setting it to -1 means that the data node never stops trying to make contact.
--connect-delaysets the number of seconds to wait between retries; the default is 5. (Bug #14329309, Bug #66550)
Following a failed
ALTER TABLE ... REORGANIZE PARTITIONstatement, a subsequent execution of this statement after adding new data nodes caused a failure in the
DBDIHkernel block which led to an unplanned shutdown of the cluster.
DUMPcode 7019 was added as part of this fix. It can be used to obtain diagnostic information relating to a failed data node. See DUMP 7019, for more information. (Bug #14220269)
References: See also: Bug #18550318.
It was possible in some cases for two transactions to try to drop tables at the same time. If the master node failed while one of these operations was still pending, this could lead either to additional node failures (and cluster shutdown) or to new dictionary operations being blocked. This issue is addressed by ensuring that the master will reject requests to start or stop a transaction while there are outstanding dictionary takeover requests. In addition, table-drop operations now correctly signal when complete, as the
DBDICTkernel block could not confirm node takeovers while such operations were still marked as pending completion. (Bug #14190114)
DBSPJkernel block had no information about which tables or indexes actually existed, or which had been modified or dropped, since execution of a given query began. Thus,
DBSPJmight submit dictionary requests for nonexistent tables or versions of tables, which could cause a crash in the
This fix introduces a simplified dictionary into the
DBSPJkernel block such that
DBSPJcan now check reliably for the existence of a particular table or version of a table on which it is about to request an operation. (Bug #14103195)
Previously, it was possible to store a maximum of 46137488 rows in a single MySQL Cluster partition. This limitation has now been removed. (Bug #13844405, Bug #14000373)
References: See also: Bug #13436216.
When using ndbmtd and performing joins, data nodes could fail where ndbmtd processes were configured to use a large number of local query handler threads (as set by the
ThreadConfigconfiguration parameter), the tables accessed by the join had a large number of partitions, or both. (Bug #13799800, Bug #14143553)
When reloading the redo log during a node or system restart, and with
NoOfFragmentLogFilesgreater than or equal to 42, it was possible for metadata to be read for the wrong file (or files). Thus, the node or nodes involved could try to reload the wrong set of data. (Bug #14389746)
Important Change: When
FILEwas used for the value of the
LogDestinationparameter without also specifying the
filename, the log file name defaulted to
logger.log. Now in such cases, the name defaults to
ndb_. (Bug #11764570, Bug #57417)
If the Transaction Coordinator aborted a transaction in the “prepared” state, this could cause a resource leak. (Bug #14208924)
When attempting to connect using a socket with a timeout, it was possible (if the timeout was exceeded) for the socket not to be set back to blocking. (Bug #14107173)
An error handling routine in the local query handler (
DBLQH) used the wrong code path, which could corrupt the transaction ID hash, causing the data node process to fail. This could in some cases possibly lead to failures of other data nodes in the same node group when the failed node attempted to restart. (Bug #14083116)
When a fragment scan occurring as part of a local checkpoint (LCP) stopped progressing, this kept the entire LCP from completing, which could result it redo log exhaustion, write service outage, inability to recover nodes, and longer system recovery times. To help keep this from occurring, MySQL Cluster now implements an LCP watchdog mechanism, which monitors the fragment scans making up the LCP and takes action if the LCP is observed to be delinquent.
This is intended to guard against any scan related system-level I/O errors or other issues causing problems with LCP and thus having a negative impact on write service and recovery times. Each node independently monitors the progress of local fragment scans occurring as part of an LCP. If no progress is made for 20 seconds, warning logs are generated every 10 seconds thereafter for up to 1 minute. At this point, if no progress has been made, the fragment scan is considered to have hung, and the node is restarted to enable the LCP to continue.
In addition, a new ndbd exit code NDBD_EXIT_LCP_SCAN_WATCHDOG_FAIL is added to identify when this occurs. See LQH Errors, for more information. (Bug #14075825)
In some circumstances, transactions could be lost during an online upgrade. (Bug #13834481)
NDBtable was created during a data node restart, the operation was rolled back in the
NDBengine, but not on the SQL node where it was executed. This was due to the table
.FRMfiles not being cleaned up following the operation that was rolled back by
NDB. Now in such cases these files are removed. (Bug #13824846)
Attempting to add both a column and an index on that column in the same online
ALTER TABLEstatement caused mysqld to fail. Although this issue affected only the mysqld shipped with MySQL Cluster, the table named in the
ALTER TABLEcould use any storage engine for which online operations are supported. (Bug #12755722)
Cluster API: When an NDB API application called
NdbScanOperation::nextResult()again after the previous call had returned end-of-file (return code 1), a transaction object was leaked. Now when this happens, NDB returns error code 4210 (Ndb sent more info than length specified); previouslyu in such cases, -1 was returned. In addition, the extra transaction object associated with the scan is freed, by returning it to the transaction coordinator's idle list. (Bug #11748194)
DUMP 2303in the ndb_mgm client now includes the status of the single fragment scan record reserved for a local checkpoint. (Bug #13986128)
A shortage of scan fragment records in
DBTCresulted in a leak of concurrent scan table records and key operation records. (Bug #13966723)
Important Change: The
ALTER ONLINE TABLE ... REORGANIZE PARTITIONstatement can be used to create new table partitions after new empty nodes have been added to a MySQL Cluster. Usually, the number of partitions to create is determined automatically, such that, if no new partitions are required, then none are created. This behavior can be overridden by creating the original table using the
MAX_ROWSoption, which indicates that extra partitions should be created to store a large number of rows. However, in this case
ALTER ONLINE TABLE ... REORGANIZE PARTITIONsimply uses the
MAX_ROWSvalue specified in the original
CREATE TABLEstatement to determine the number of partitions required; since this value remains constant, so does the number of partitions, and so no new ones are created. This means that the table is not rebalanced, and the new data nodes remain empty.
To solve this problem, support is added for
ALTER ONLINE TABLE ... MAX_ROWS=, where
newvalueis greater than the value used with
MAX_ROWSin the original
CREATE TABLEstatement. This larger
MAX_ROWSvalue implies that more partitions are required; these are allocated on the new data nodes, which restores the balanced distribution of the table data.
ALTER ONLINE TABLEfailed when a
DEFAULToption was used. (Bug #13830980)
In some cases, restarting data nodes spent a very long time in Start Phase 101, when API nodes must connect to the starting node (using
NdbEventOperation), when the API nodes trying to connect failed in a live-lock scenario. This connection process uses a handshake during which a small number of messages are exchanged, with a timeout used to detect failures during the handshake.
Prior to this fix, this timeout was set such that, if one API node encountered the timeout, all other nodes connecting would do the same. The fix also decreases this timeout. This issue (and the effects of the fix) are most likely to be observed on relatively large configurations having 10 or more data nodes and 200 or more API nodes. (Bug #13825163)
ndbmtd failed to restart when the size of a table definition exceeded 32K.
(The size of a table definition is dependent upon a number of factors, but in general the 32K limit is encountered when a table has 250 to 300 columns.) (Bug #13824773)
An initial start using ndbmtd could sometimes hang. This was due to a state which occurred when several threads tried to flush a socket buffer to a remote node. In such cases, to minimize flushing of socket buffers, only one thread actually performs the send, on behalf of all threads. However, it was possible in certain cases for there to be data in the socket buffer waiting to be sent with no thread ever being chosen to perform the send. (Bug #13809781)
When trying to use ndb_size.pl
portto connect to a MySQL server running on a nonstandard port, the
portargument was ignored. (Bug #13364905, Bug #62635)
Important Change: A number of changes have been made in the configuration of transporter send buffers.
The data node configuration parameter
ReservedSendBufferMemoryis now deprecated, and thus subject to removal in a future MySQL Cluster release.
ReservedSendBufferMemoryhas been non-functional since it was introduced and remains so.
TotalSendBufferMemorynow works correctly with data nodes using ndbmtd.
A new data node configuration parameter
ExtraSendBufferMemoryis introduced. Its purpose is to control how much additional memory can be allocated to the send buffer over and above that specified by
SendBufferMemory. The default setting (0) allows up to 16MB to be allocated automatically.
(Bug #13633845, Bug #11760629, Bug #53053)
A data node crashed when more than 16G fixed-size memory was allocated by
DBTUPto one fragment (because the
DBACCkernel block was not prepared to accept values greater than 32 bits from it, leading to an overflow). Now in such cases, the data node returns Error 889 Table fragment fixed data reference has reached maximum possible value.... When this happens, you can work around the problem by increasing the number of partitions used by the table (such as by using the
CREATE TABLE). (Bug #13637411)
References: See also: Bug #11747870, Bug #34348.
Several instances in the NDB code affecting the operation of multi-threaded data nodes, where
SendBufferMemorywas associated with a specific thread for an unnecessarily long time, have been identified and fixed, by minimizing the time that any of these buffers can be held exclusively by a given thread (send buffer memory being critical to operation of the entire node). (Bug #13618181)
LIKE ... ESCAPEon
NDBtables failed when pushed down to the data nodes. Such queries are no longer pushed down, regardless of the value of
engine_condition_pushdown. (Bug #13604447, Bug #61064)
To avoid TCP transporter overload, an overload flag is kept in the NDB kernel for each data node; this flag is used to abort key requests if needed, yielding error 1218 Send Buffers overloaded in NDB kernel in such cases. Scans can also put significant pressure on transporters, especially where scans with a high degree of parallelism are executed in a configuration with relatively small send buffers. However, in these cases, overload flags were not checked, which could lead to node failures due to send buffer exhaustion. Now, overload flags are checked by scans, and in cases where returning sufficient rows to match the batch size (
--ndb-batch-sizeserver option) would cause an overload, the number of rows is limited to what can be accommodated by the send buffer.
See also Configuring MySQL Cluster Send Buffer Parameters. (Bug #13602508)
References: See also: Bug #13608135.
A node failure and recovery while performing a scan on more than 32 partitions led to additional node failures during node takeover. (Bug #13528976)
--skip-config-cacheoption now causes ndb_mgmd to skip checking for the configuration directory, and thus to skip creating it in the event that it does not exist. (Bug #13428853)
Accessing a table having a
BLOBcolumn but no primary key following a restart of the SQL node failed with Error 1 (Unknown error code). (Bug #13563280)
At the beginning of a local checkpoint, each data node marks its local tables with a “to be checkpointed” flag. A failure of the master node during this process could cause either the LCP to hang, or one or more data nodes to be forcibly shut down. (Bug #13436481)
A node failure while a
ANALYZE TABLEstatement was executing resulted in a hung connection (and the user was not informed of any error that would cause this to happen). (Bug #13416603)
References: See also: Bug #13407848.
MinFreePctdata node configuration parameter, which specifies a percentage of data node resources to hold in reserve for restarts. The resources monitored are
IndexMemory, and any per-table
MAX_ROWSsettings (see CREATE TABLE Syntax). The default value of
MinFreePctis 5, which means that 5% from each these resources is now set aside for restarts. (Bug #13436216)
Because the log event buffer used internally by data nodes was circular, periodic events such as statistics events caused it to be overwritten too quickly. Now the buffer is partitioned by log event category, and its default size has been increased from 4K to 8K. (Bug #13394771)
BatchByteSizeconfiguration parameters, used to control the maximum sizes of result batches, are defined as integers. However, the values used to store these were incorrectly interpreted as numbers of bytes in the NDB kernel. This caused the
DBLQHkernel block to fail to detect when the specified
In addition, the
DBSPJkernel block could miscalculate statistics for adaptive parallelism. (Bug #13355055)
Previously, forcing simultaneously the shutdown of multiple data nodes using
SHUTDOWN -Fin the ndb_mgm management client could cause the entire cluster to fail. Now in such cases, any such nodes are forced to abort immediately. (Bug #12928429)
A SubscriberNodeIdUndefined error was previously unhandled, resulting in a data node crash, but is now handled by NDB Error 1429, Subscriber node undefined in SubStartReq. (Bug #12598496)
Functionality Added or Changed
CrashOnCorruptedTupledata node configuration parameter. When enabled, this parameter causes data nodes to handle corrupted tuples in a fail-fast manner—in other words, whenever the data node detects a corrupted tuple, it forcibly shuts down if
CrashOnCorruptedTupleis enabled. For backward compatibility, this parameter is disabled by default. (Bug #12598636)
ThreadConfigdata node configuration parameter to enable control of multiple threads and CPUs when using ndbmtd, by assigning threads of one or more specified types to execute on one or more CPUs. This can provide more precise and flexible control over multiple threads than can be obtained using the
LockExecuteThreadToCPUparameter. (Bug #11795581)
Added the ndbinfo_select_all utility.
When adding data nodes online, if the SQL nodes were not restarted before starting the new data nodes, the next query to be executed crashed the SQL node on which it was run. (Bug #13715216, Bug #62847)
References: This issue is a regression of: Bug #13117187.
When a failure of multiple data nodes during a local checkpoint (LCP) that took a long time to complete included the node designated as master, any new data nodes attempting to start before all ongoing LCPs were completed later crashed. This was due to the fact that node takeover by the new master cannot be completed until there are no pending local checkpoints. Long-running LCPs such as those which triggered this issue can occur when fragment sizes are sufficiently large (see MySQL Cluster Nodes, Node Groups, Replicas, and Partitions, for more information). Now in such cases, data nodes (other than the new master) are kept from restarting until the takeover is complete. (Bug #13323589)
When deleting from multiple tables using a unique key in the
WHEREcondition, the wrong rows were deleted. In addition,
UPDATEtriggers failed when rows were changed by deleting from or updating multiple tables. (Bug #12718336, Bug #61705, Bug #12728221)
Shutting down a mysqld while under load caused the spurious error messages Opening ndb_binlog_index: killed and Unable to lock table ndb_binlog_index to be written in the cluster log. (Bug #11930428)
Cluster API: When more than 32KB of data must be sent in a single signal using the NDB API, the data is split across 2 or more signals each of which is smaller than 32kB, and these are then reassembled back into the original, full-length signal by the receiver. Such fragmented signals are used for some scan requests, as well as for SPJ
QueryOperationrequests. However, extra (spurious) signals could sometimes be sent when using fragmented signals, causing errors on the receiver; these implementation artifacts have now been eliminated. (Bug #13087016)
Functionality Added or Changed
It is now possible to filter the output from ndb_config so that it displays only system, data node, or connection parameters and values, using one of the options
--connections, respectively. In addition, it is now possible to specify from which data node the configuration data is obtained, using the
--config_from_nodeoption that is added in this release.
For more information, see ndb_config — Extract MySQL Cluster Configuration Information. (Bug #11766870)
Incompatible Change; Cluster API: Restarting a machine hosting data nodes, SQL nodes, or both, caused such nodes when restarting to time out while trying to obtain node IDs.
As part of the fix for this issue, the behavior and default values for the NDB API
Ndb_cluster_connection::connect()method have been improved. Due to these changes, the version number for the included NDB client library (
libndbclient.so) has been increased from 4.0.0 to 5.0.0. For NDB API applications, this means that as part of any upgrade, you must do both of the following:
Review and possibly modify any NDB API code that uses the
connect()method, in order to take into account its changed default retry handling.
Recompile any NDB API applications using the new version of the client library.
Also in connection with this issue, the default value for each of the two mysqld options
--ndb-wait-setuphas been increased to 30 seconds (from 0 and 15, respectively). In addition, a hard-coded 30-second delay was removed, so that the value of
--ndb-wait-connectedis now handled correctly in all cases. (Bug #12543299)
When replicating DML statements with
IGNOREbetween clusters, the number of operations that failed due to nonexistent keys was expected to be no greater than the number of defined operations of any single type. Because the slave SQL thread defines operations of multiple types in batches together, code which relied on this assumption could cause mysqld to fail. (Bug #12859831)
The maximum effective value for the
OverloadLimitconfiguration parameter was limited by the value of
SendBufferMemory. Now the value set for
OverloadLimitis used correctly, up to this parameter's stated maximum (4G). (Bug #12712109)
AUTO_INCREMENTvalues were not set correctly for
INSERT IGNOREstatements affecting
NDBtables. This could lead such statements to fail with Got error 4350 'Transaction already aborted' from NDBCLUSTER when inserting multiple rows containing duplicate values. (Bug #11755237, Bug #46985)
When failure handling of an API node takes longer than 300 seconds, extra debug information is included in the resulting output. In cases where the API node's node ID was greater than 48, these extra debug messages could lead to a crash, and confuing output otherwise. This was due to an attempt to provide information specific to data nodes for API nodes as well. (Bug #62208)
In rare cases, a series of node restarts and crashes during restarts could lead to errors while reading the redo log. (Bug #62206)
Functionality Added or Changed
MaxDMLOperationsPerTransactiondata node configuration parameter, which can be used to limit the number of DML operations used by a transaction; if the transaction requires more than this many DML operations, the transaction is aborted. (Bug #12589613)
When global checkpoint indexes were written with no intervening end-of-file or megabyte border markers, this could sometimes lead to a situation in which the end of the redo log was mistakenly regarded as being between these GCIs, so that if the restart of a data node took place before the start of the next redo log was overwritten, the node encountered an Error while reading the REDO log. (Bug #12653993, Bug #61500)
References: See also: Bug #56961.
Restarting a mysqld during a rolling upgrade with data nodes running a mix of old and new versions of the MySQL Cluster software caused the mysqld to run in read-only mode. (Bug #12651364, Bug #61498)
Error reporting has been improved for cases in which API nodes are unable to connect due to apparent unavailability of node IDs. (Bug #12598398)
Error messages for Failed to convert connection transporter registration problems were inspecific. (Bug #12589691)
Under certain rare circumstances, a data node process could fail with Signal 11 during a restart. This was due to uninitialized variables in the
QMGRkernel block. (Bug #12586190)
Multiple management servers were unable to detect one another until all nodes had fully started. As part of the fix for this issue, two new status values
CONNECTEDcan be reported for management nodes in the output of the ndb_mgm client
SHOWcommand (see Commands in the MySQL Cluster Management Client). Two corresponding status values
NDB_MGM_NODE_STATUS_CONNECTEDare also added to the list of possible values for an
ndb_mgm_node_statusdata structure in the MGM API. (Bug #12352191, Bug #48301)
Handling of the
MaxNoOfAttributesconfiguration parameters was not consistent in all parts of the
NDBkernel, and were only strictly enforced by the
SUMAkernel blocks. This could lead to problems when tables could be created but not replicated. Now these parameters are treated by
DBDICTas suggested maximums rather than hard limits, as they are elsewhere in the
NDBkernel. (Bug #61684)
It was not possible to shut down a management node while one or more data nodes were stopped (for whatever reason). This issue was a regression introduced in MySQL Cluster NDB 7.0.24 and MySQL Cluster NDB 7.1.13. (Bug #61607)
References: See also: Bug #61147.
Cluster API: Applications that included the header file
ndb_logevent.hcould not be built using the Microsoft Visual Studio C compiler or the Oracle (Sun) Studio C compiler due to empty struct definitions. (Bug #12678971)
Cluster API: Within a transaction, after creating, executing, and closing a scan, calling
NdbTransaction::refresh()after creating and executing but not closing a second scan caused the application to crash. (Bug #12646659)
Ndb_getinaddr()function has been rewritten to use
my_gethostbyname_r()(which is removed in a later version of the MySQL Server). (Bug #12542120)
mysql_upgrade failed when performing an online upgrade from MySQL Cluster NDB 7.1.8 or an earlier release to MySQL Cluster NDB 7.1.9 or later in which the SQL nodes were upgraded before the data nodes. This issue could occur during any online upgrade or downgrade where one or more
ndbinfotables had more, fewer, or differing columns between the two versions, and when the data nodes were not upgraded before the SQL nodes.
For more information, see Upgrade and downgrade compatibility: MySQL Cluster NDB 7.x. (Bug #11885602, Bug #12581895, Bug #12581954)
Two unused test files in
storage/ndb/test/sqlcontained incorrect versions of the GNU Lesser General Public License. The files and the directory containing them have been removed. (Bug #11810156)
References: See also: Bug #11810224.
Error 1302 gave the wrong error message (Out of backup record). This has been corrected to A backup is already running. (Bug #11793592)
When using two management servers, issuing in an ndb_mgm client connected to one management server a
STOPcommand for stopping the other management server caused Error 2002 (Stop failed ... Send to process or receive failed.: Permanent error: Application error), even though the
STOPcommand actually succeeded, and the second ndb_mgmd was shut down. (Bug #61147)
In ndbmtd, a node connection event is detected by a
CMVMIthread which sends a
CONNECT_REPsignal to the
QMGRkernel block. In a few isolated circumstances, a signal might be transferred to
QMGRdirectly by the
NDBtransporter before the
CONNECT_REPsignal actually arrived. This resulted in reports in the error log with status
Temporary error, restart node, and the message Internal program error. (Bug #61025)
Under heavy loads with many concurrent inserts, temporary failures in transactions could occur (and were misreported as being due to
NDBError 899 Rowid already allocated). As part of the fix for this issue,
NDBError 899 has been reclassified as an internal error, rather than as a temporary transaction error. (Bug #56051, Bug #11763354)
Disk Data: Accounting for
MaxNoOfOpenFileswas incorrect with regard to data files in MySQL Cluster Disk Data tablespaces. This could lead to a crash when
MaxNoOfOpenFileswas exceeded. (Bug #12581213)
Functionality Added or Changed
It is now possible to add data nodes online to a running MySQL Cluster without performing a rolling restart of the cluster or starting data node processes with the
--nowait-nodesoption. This can be done by setting
Nodegroup = 65536in the
config.inifile for any data nodes that should be started at a later time, when first starting the cluster. (It was possible to set
NodeGroupto this value previously, but the management server failed to start.)
As part of this fix, a new data node configuration parameter
StartNoNodeGroupTimeouthas been added. When the management server sees that there are data nodes with no node group (that is, nodes for which
Nodegroup = 65536), it waits
StartNoNodeGroupTimeoutmilliseconds before treating these nodes as though they were listed with the
--nowait-nodesoption, and proceeds to start.
For more information, see Adding MySQL Cluster Data Nodes Online. (Bug #11766167, Bug #59213)
config_generationcolumn has been added to the
nodestable of the
ndbinfodatabase. By checking this column, it is now possible to determine which version or versions of the MySQL Cluster configuration file are in effect on the data nodes. This information can be especially useful when performing a rolling restart of the cluster to update its configuration.
Cluster API: A unique index operation is executed in two steps: a lookup on an index table, and an operation on the base table. When the operation on the base table failed, while being executed in a batch with other operations that succeeded, this could lead to a hanging execute, eventually timing out with Error 4012 (Request ndbd time-out, maybe due to high load or communication problems). (Bug #12315582)
A memory leak in
LGMAN, that leaked 8 bytes of log buffer memory per 32k written, was introduced in MySQL Cluster NDB 7.0.9, effecting all MySQL Cluster NDB 7.1 releases as well as MySQL Cluster NDB 7.0.9 and later MySQL Cluster NDB 7.0 releases. (For example, when 128MB log buffer memory was used, it was exhausted after writing 512GB to the undo log.) This led to a GCP stop and data node failure. (Bug #60946)
References: This issue is a regression of: Bug #47966.
When using ndbmtd, a MySQL Cluster configured with 32 data nodes failed to start correctly. (Bug #60943)
When performing a TUP scan with locks in parallel, and with a highly concurrent load of inserts and deletions, the scan could sometimes fail to notice that a record had moved while waiting to acquire a lock on it, and so read the wrong record. During node recovery, this could lead to a crash of a node that was copying data to the node being started, and a possible forced shutdown of the cluster.
Cluster API: Performing interpreted operations using a unique index did not work correctly, because the interpret bit was kept when sending the lookup to the index table.
Functionality Added or Changed
Improved scaling of ordered index scans performance by removing a hard-coded limit (
MAX_PARALLEL_INDEX_SCANS_PER_FRAG) and making the number of
TUXscans per fragment configurable by adding the
MaxParallelScansPerFragmentdata node configuration parameter. (Bug #11769048)
Important Change: Formerly, the
--ndb-cluster-connection-poolserver option set a status variable as well as a system variable. The status variable has been removed as redundant. (Bug #60119)
A scan with a pushed condition (filter) using the
CommittedReadlock mode could hang for a short interval when it was aborted when just as it had decided to send a batch. (Bug #11932525)
When aborting a multi-read range scan exactly as it was changing ranges in the local query handler, LQH could fail to detect it, leaving the scan hanging. (Bug #11929643)
Schema distribution did not take place for tables converted from another storage engine to
ALTER TABLE; this meant that such tables were not always visible to all SQL nodes attached to the cluster. (Bug #11894966)
A GCI value inserted by ndb_restore
ndb_apply_statustable was actually 1 less than the correct value. (Bug #11885852)
Disk Data: Limits imposed by the size of
SharedGlobalMemorywere not always enforced consistently with regard to Disk Data undo buffers and log files. This could sometimes cause a
CREATE LOGFILE GROUPor
ALTER LOGFILE GROUPstatement to fail for no apparent reason, or cause the log file group specified by
InitialLogFileGroupnot to be created when starting the cluster. (Bug #57317)
Functionality Added or Changed
Disk Data: The
INFORMATION_SCHEMA.TABLEStable now provides disk usage as well as memory usage information for Disk Data tables. Also,
INFORMATION_SCHEMA.PARTITIONS, formerly did not show any statistics for
NDBtables. Now the
DATA_FREEcolumns contain correct information for the table's partitions.
--rewrite-databaseoption is added for ndb_restore, which makes it possible to restore to a database having a different name from that of the database in the backup.
For more information, see ndb_restore — Restore a MySQL Cluster Backup. (Bug #54327)
For additional information about type conversions currently supported by MySQL Cluster for attribute promotion and demotion, see Replication of Columns Having Different Data Types.
Made it possible to enable multi-threaded building of ordered indexes during initial restarts, using the new
TwoPassInitialNodeRestartCopydata node configuration parameter.
The NDB kernel now implements a number of statistical counters relating to actions performed by or affecting
Ndbobjects, such as starting, closing, or aborting transactions; primary key and unique key operations; table, range, and pruned scans; blocked threads waiting for various operations to complete; and data and events sent and received by
NDBCLUSTER. These NDB API counters are incremented inside the NDB kernel whenever NDB API calls are made or data is sent to or received by the data nodes. mysqld exposes these counters as system status variables; their values can be read in the output of
SHOW STATUS, or by querying the
GLOBAL_STATUStable in the
INFORMATION_SCHEMAdatabase. By comparing the values of these status variables prior to and following the execution of SQL statements that act on
NDBtables, you can observe the corresponding actions taken on the NDB API level, which can be beneficial for monitoring and performance tuning of MySQL Cluster.
This issue affects all previous MySQL Cluster NDB 7.1 releases. (Bug #60045)
--rebuild-indexescaused multi-threaded index building to occur on the master node only. (Bug #59920)
Successive queries on the
counterstable from the same SQL node returned unchanging results. To fix this issue, and to prevent similar issues from occurring in the future,
ndbinfotables are now excluded from the query cache. (Bug #59831)
CREATE TABLEstatement failed due to
NDBerror 1224 (Too many fragments), it was not possible to create the table afterward unless either it had no ordered indexes, or a
DROP TABLEstatement was issued first, even if the subsequent
CREATE TABLEwas valid and should otherwise have succeeded. (Bug #59756)
References: See also: Bug #59751.
When attempting to create a table on a MySQL Cluster with many standby data nodes (setting
config.inifor the nodes that should wait, starting the nodes that should start immediately with the
--nowait-nodesoption, and using the
MAX_ROWSoption), mysqld miscalculated the number of fragments to use. This caused the
CREATE TABLEto fail.Note
CREATE TABLEfailure caused by this issue in turn prevented any further attempts to create the table, even if the table structure was simplified or changed in such a way that the attempt should have succeeded. This “ghosting” issue is handled in Bug #59756.
References: See also: Bug #59756.
NDBsometimes treated a simple (not unique) ordered index as unique. (Bug #59519)
The logic used in determining whether to collapse a range to a simple equality was faulty. In certain cases, this could cause
NDBto treat a range as if it were a primary key lookup when determining the query plan to be used. Although this did not affect the actual result returned by the query, it could in such cases result in inefficient execution of queries due to the use of an inappropriate query plan. (Bug #59517)
When a query used multiple references to or instances of the same physical tables,
NDBfailed to recognize these multiple instances as different tables; in such a case,
NDBcould incorrectly use condition pushdown on a condition referring to these other instances to be pushed to the data nodes, even though the condition should have been rejected as unpushable, leading to invalid results. (Bug #58791)
Cluster API: When calling
NdbEventOperation::execute()during a node restart, it was possible to get a spurious error 711 (System busy with node restart, schema operations not allowed when a node is starting). (Bug #59723)
Cluster API: When an NDBAPI client application was waiting for more scan results after calling
NdbScanOperation::nextResult(), the calling thread sometimes woke up even if no new batches for any fragment had arrived, which was unnecessary, and which could have a negative impact on the application's performance. (Bug #52298)
Functionality Added or Changed
Important Change: The following changes have been made with regard to the
TimeBetweenEpochsTimeoutdata node configuration parameter:
The maximum possible value for this parameter has been increased from 32000 milliseconds to 256000 milliseconds.
Setting this parameter to zero now has the effect of disabling GCP stops caused by save timeouts, commit timeouts, or both.
The current value of this parameter and a warning are written to the cluster log whenever a GCP save takes longer than 1 minute or a GCP commit takes longer than 10 seconds.
For more information, see Disk Data and GCP Stop errors. (Bug #58383)
--skip-broken-objectsoption for ndb_restore. This option causes ndb_restore to ignore tables corrupted due to missing blob parts tables, and to continue reading from the backup file and restoring the remaining tables. (Bug #54613)
References: See also: Bug #51652.
Made it possible to exercise more direct control over handling of timeouts occurring when trying to flush redo logs to disk using two new data node configuration parameters
RedoOverCommitLimit, as well as the new API node configuration parameter
DefaultOperationRedoProblemAction, all added in this release. Now, when such timeouts occur more than a specified number of times for the flushing of a given redo log, any transactions that were to be written are instead aborted, and the operations contained in those transactions can be either re-tried or themselves aborted.
For more information, see Redo log over-commit handling.
Cluster API: It is now possible to stop or restart a node even while other nodes are starting, using the MGM API
ndb_mgm_restart4()function, respectively, with the
forceparameter set to 1. (Bug #58451)
References: See also: Bug #58319.
Cluster API: In some circumstances, very large
BLOBread and write operations in MySQL Cluster applications can cause excessive resource usage and even exhaustion of memory. To fix this issue and to provide increased stability when performing such operations, it is now possible to set limits on the volume of
BLOBdata to be read or written within a given transaction in such a way that when these limits are exceeded, the current transaction implicitly executes any accumulated operations. This avoids an excessive buildup of pending data which can result in resource exhaustion in the NDB kernel. The limits on the amount of data to be read and on the amount of data to be written before this execution takes place can be configured separately. (In other words, it is now possible in MySQL Cluster to specify read batching and write batching that is specific to
BLOBdata.) These limits can be configured either on the NDB API level, or in the MySQL Server.
On the NDB API level, four new methods are added to the
setMaxPendingBlobReadBytes()can be used to get and to set, respectively, the maximum amount of
BLOBdata to be read that accumulates before this implicit execution is triggered.
setMaxPendingBlobWriteBytes()can be used to get and to set, respectively, the maximum volume of
BLOBdata to be written that accumulates before implicit execution occurs.
For the MySQL server, two new options are added. The
--ndb-blob-read-batch-bytesoption sets a limit on the amount of pending
BLOBdata to be read before triggering implicit execution, and the
--ndb-blob-write-batch-bytesoption controls the amount of pending
BLOBdata to be written. These limits can also be set using the mysqld configuration file, or read and set within the mysql client and other MySQL client applications using the corresponding server system variables. (Bug #59113)
Two related problems could occur with read-committed scans made in parallel with transactions combining multiple (concurrent) operations:
When committing a multiple-operation transaction that contained concurrent insert and update operations on the same record, the commit arrived first for the insert and then for the update. If a read-committed scan arrived between these operations, it could thus read incorrect data; in addition, if the scan read variable-size data, it could cause the data node to fail.
When rolling back a multiple-operation transaction having concurrent delete and insert operations on the same record, the abort arrived first for the delete operation, and then for the insert. If a read-committed scan arrived between the delete and the insert, it could incorrectly assume that the record should not be returned (in other words, the scan treated the insert as though it had not yet been committed).
On Windows platforms, issuing a
SHUTDOWNcommand in the ndb_mgm client caused management processes that had been started with the
--nodaemonoption to exit abnormally. (Bug #59437)
A row insert or update followed by a delete operation on the same row within the same transaction could in some cases lead to a buffer overflow. (Bug #59242)
References: See also: Bug #56524. This issue is a regression of: Bug #35208.
Data nodes configured with very large amounts (multiple gigabytes) of
DiskPageBufferMemoryfailed during startup with NDB error 2334 (Job buffer congestion). (Bug #58945)
References: See also: Bug #47984.
FAIL_REPsignal, used inside the NDB kernel to declare that a node has failed, now includes the node ID of the node that detected the failure. This information can be useful in debugging. (Bug #58904)
When executing a full table scan caused by a
in combination with a join,
NDBfailed to close the scan. (Bug #58750)
References: See also: Bug #57481.
In some circumstances, an SQL trigger on an
NDBtable could read stale data. (Bug #58538)
During a node takeover, it was possible in some circumstances for one of the remaining nodes to send an extra transaction confirmation (
LQH_TRANSCONF) signal to the
DBTCkernel block, conceivably leading to a crash of the data node trying to take over as the new transaction coordinator. (Bug #58453)
A query having multiple predicates joined by
WHEREclause and which used the
sort_unionaccess method (as shown using
EXPLAIN) could return duplicate rows. (Bug #58280)
Trying to drop an index while it was being used to perform scan updates caused data nodes to crash. (Bug #58277, Bug #57057)
When handling failures of multiple data nodes, an error in the construction of internal signals could cause the cluster's remaining nodes to crash. This issue was most likely to affect clusters with large numbers of data nodes. (Bug #58240)
strcasecmpwere declared in
ndb_global.hbut never defined or used. The declarations have been removed. (Bug #58204)
The number of rows affected by a statement that used a
WHEREclause having an
INcondition with a value list containing a great many elements, and that deleted or updated enough rows such that
NDBprocessed them in batches, was not computed or reported correctly. (Bug #58040)
MySQL Cluster failed to compile correctly on FreeBSD 8.1 due to misplaced
#includestatements. (Bug #58034)
A query using
BETWEENas part of a pushed-down
WHEREcondition could cause mysqld to hang or crash. (Bug #57735)
Data nodes no longer allocated all memory prior to being ready to exchange heartbeat and other messages with management nodes, as in NDB 6.3 and earlier versions of MySQL Cluster. This caused problems when data nodes configured with large amounts of memory failed to show as connected or showed as being in the wrong start phase in the ndb_mgm client even after making their initial connections to and fetching their configuration data from the management server. With this fix, data nodes now allocate all memory as they did in earlier MySQL Cluster versions. (Bug #57568)
In some circumstances, it was possible for mysqld to begin a new multi-range read scan without having closed a previous one. This could lead to exhaustion of all scan operation objects, transaction objects, or lock objects (or some combination of these) in
NDB, causing queries to fail with such errors as Lock wait timeout exceeded or Connect failure - out of connection objects. (Bug #57481)
References: See also: Bug #58750.
NULLon a table with a unique index created with
columnalways returned an empty result. (Bug #57032)
engine_condition_pushdownenabled, a query using
ENUMcolumn of an
NDBtable failed to return any results. This issue is resolved by disabling
engine_condition_pushdownwhen performing such queries. (Bug #53360)
When a slash character (
/) was used as part of the name of an index on an
NDBtable, attempting to execute a
TRUNCATE TABLEstatement on the table failed with the error Index not found, and the table was rendered unusable. (Bug #38914)
Partitioning; Disk Data: When using multi-threaded data nodes, an
NDBtable created with a very large value for the
MAX_ROWSoption could—if this table was dropped and a new table with fewer partitions, but having the same table ID, was created—cause ndbmtd to crash when performing a system restart. This was because the server attempted to examine each partition whether or not it actually existed.
This issue is the same as that reported in Bug #45154, except that the current issue is specific to ndbmtd instead of ndbd. (Bug #58638)
References: See also: Bug #45154.
Disk Data: In certain cases, a race condition could occur when
DROP LOGFILE GROUPremoved the logfile group while a read or write of one of the effected files was in progress, which in turn could lead to a crash of the data node. (Bug #59502)
Disk Data: A race condition could sometimes be created when
DROP TABLESPACEwas run concurrently with a local checkpoint; this could in turn lead to a crash of the data node. (Bug #59501)
Disk Data: Performing what should have been an online drop of a multi-column index was actually performed offline. (Bug #55618)
Disk Data: When at least one data node was not running, queries against the
INFORMATION_SCHEMA.FILEStable took an excessive length of time to complete because the MySQL server waited for responses from any stopped nodes to time out. Now, in such cases, MySQL does not attempt to contact nodes which are not known to be running. (Bug #54199)
Cluster API: It was not possible to obtain the status of nodes accurately after an attempt to stop a data node using
ndb_mgm_stop()failed without returning an error. (Bug #58319)
Cluster API: Attempting to read the same value (using
getValue()) more than 9000 times within the same transaction caused the transaction to hang when executed. Now when more reads are performed in this way than can be accommodated in a single transaction, the call to
execute()fails with a suitable error. (Bug #58110)
Important Note: Issuing an
ALL DUMPcommand during a rolling upgrade to MySQL Cluster NDB 7.1.9 caused the cluster to crash. (Bug #58256)
InnoDB; Packaging: The
InnoDBplugin was not included in MySQL Cluster RPM packages. (Bug #58283)
References: See also: Bug #54912.
Functionality Added or Changed
Important Change; InnoDB: Building the MySQL Server with the
InnoDBplugin is now supported when building MySQL Cluster. For more information, see MySQL Cluster Installation and Upgrades. (Bug #54912)
References: See also: Bug #58283.
Important Change: ndbd now bypasses use of Non-Uniform Memory Access support on Linux hosts by default. If your system supports NUMA, you can enable it and override ndbd use of interleaving by setting the
Numadata node configuration parameter which is added in this release. See Defining Data Nodes: Realtime Performance Parameters, for more information. (Bug #57807)
Important Change: The
Idconfiguration parameter used with MySQL Cluster management, data, and API nodes (including SQL nodes) is now deprecated, and the
NodeIdparameter (long available as a synonym for
Idwhen configuring these types of nodes) should be used instead.
Idcontinues to be supported for reasons of backward compatibility, but now generates a warning when used with these types of nodes, and is subject to removal in a future release of MySQL Cluster.
This change affects the name of the configuration parameter only, establishing a clear preference for
[api]sections of the MySQL Cluster global configuration (
config.ini) file. The behavior of unique identifiers for management, data, and SQL and API nodes in MySQL Cluster has not otherwise been altered.
Idparameter as used in the
[computer]section of the MySQL Cluster global configuration file is not affected by this change.
diskpagebuffertable, providing statistics on disk page buffer usage by Disk Data tables, is added to the
ndbinfoinformation database. These statistics can be used to monitor performance of reads and writes on Disk Data tables, and to assist in the tuning of related parameters such as
Packaging: MySQL Cluster RPM distributions did not include a
shared-compatRPM for the MySQL Server, which meant that MySQL applications depending on
libmysqlclient.so.15(MySQL 5.0 and earlier) no longer worked. (Bug #38596)
On Windows, the angel process which monitors and (when necessary) restarts the data node process failed to spawn a new worker in some circumstances where the arguments vector contained extra items placed at its beginning. This could occur when the path to ndbd.exe or ndbmtd.exe contained one or more spaces. (Bug #57949)
The disconnection of an API or management node due to missed heartbeats led to a race condition which could cause data nodes to crash. (Bug #57946)
The method for calculating table schema versions used by schema transactions did not follow the established rules for recording schemas used in the
P0.SchemaLogfile. (Bug #57897)
References: See also: Bug #57896.
LQHKEYREQrequest message used by the local query handler when checking the major schema version of a table, being only 16 bits wide, could cause this check to fail with an Invalid schema version error (
NDBerror code 1227). This issue occurred after creating and dropping (and re-creating) the same table 65537 times, then trying to insert rows into the table. (Bug #57896)
References: See also: Bug #57897.
Data nodes compiled with gcc 4.5 or higher crashed during startup. (Bug #57761)
Transient errors during a local checkpoint were not retried, leading to a crash of the data node. Now when such errors occur, they are retried up to 10 times if necessary. (Bug #57650)
ndb_restore now retries failed transactions when replaying log entries, just as it does when restoring data. (Bug #57618)
SUMAkernel block has a 10-element ring buffer for storing out-of-order
SUB_GCP_COMPLETE_REPsignals received from the local query handlers when global checkpoints are completed. In some cases, exceeding the ring buffer capacity on all nodes of a node group at the same time caused the node group to fail with an assertion. (Bug #57563)
During a GCP takeover, it was possible for one of the data nodes not to receive a
SUB_GCP_COMPLETE_REPsignal, with the result that it would report itself as
GCP_COMMITTINGwhile the other data nodes reported
GCP_PREPARING. (Bug #57522)
WHEREclause of the form
when selecting from an
NDBtable having a primary key on multiple columns could result in Error 4259 Invalid set of range scan bounds if
range2started exactly where
range1ended and the primary key definition declared the columns in a different order relative to the order in the table's column list. (Such a query should simply return all rows in the table, since any expression
is always true.)
CREATE TABLE t (a, b, PRIMARY KEY (b, a)) ENGINE NDB;
This issue could then be triggered by a query such as this one:
SELECT * FROM t WHERE b < 8 OR b >= 8;
In addition, the order of the ranges in the
WHEREclause was significant; the issue was not triggered, for example, by the query
SELECT * FROM t WHERE b <= 8 OR b > 8. (Bug #57396)
A number of cluster log warning messages relating to deprecated configuration parameters contained spelling, formatting, and other errors. (Bug #57381)
CREATE TABLEwas ignored, which meant that it was not possible to enable multi-threaded building of indexes. (Bug #57360)
A GCP stop is detected using 2 parameters which determine the maximum time that a global checkpoint or epoch can go unchanged; one of these controls this timeout for GCPs and one controls the timeout for epochs. Suppose the cluster is configured such that
TimeBetweenEpochsTimeoutis 100 ms but
HeartbeatIntervalDbDbis 1500 ms. A node failure can be signalled after 4 missed heartbeats—in this case, 6000 ms. However, this would exceed
TimeBetweenEpochsTimeout, causing false detection of a GCP. To prevent this from happening, the configured value for
TimeBetweenEpochsTimeoutis automatically adjusted, based on the values of
The current issue arose when the automatic adjustment routine did not correctly take into consideration the fact that, during cascading node-failures, several intervals of length
4 * (HeartbeatIntervalDBDB + ArbitrationTimeout)may elapse before all node failures have internally been resolved. This could cause false GCP detection in the event of a cascading node failure. (Bug #57322)
DROP NODEGROUPcommands could cause mysqld processes to crash. (Bug #57164)
NDBtable having a
VARCHARcolumn as its primary key failed to return all matching rows. (Bug #56853)
Aborting a native
NDBbackup in the ndb_mgm client using the
ABORT BACKUPcommand did not work correctly when using ndbmtd, in some cases leading to a crash of the cluster. (Bug #56285)
When a data node angel process failed to fork off a new worker process (to replace one that had failed), the failure was not handled. This meant that the angel process either transformed itself into a worker process, or itself failed. In the first case, the data node continued to run, but there was no longer any angel to restart it in the event of failure, even with
StopOnErrorset to 0. (Bug #53456)
Disk Data: When performing online DDL on Disk Data tables, scans and moving of the relevant tuples were done in more or less random order. This fix causes these scans to be done in the order of the tuples, which should improve performance of such operations due to the more sequential ordering of the scans. (Bug #57848)
References: See also: Bug #57827.
Cluster API: An application dropping a table at the same time that another application tried to set up a replication event on the same table could lead to a crash of the data node. The same issue could sometimes cause
NdbEventOperation::execute()to hang. (Bug #57886)
Cluster API: An NDB API client program under load could abort with an assertion error in
TransporterFacade::remove_from_cond_wait_queue. (Bug #51775)
References: See also: Bug #32708.
Functionality Added or Changed
References: See also: Bug #34325, Bug #11747863.
It is now possible using the ndb_mgm management client or the MGM API to force a data node shutdown or restart even if this would force the shutdown or restart of the entire cluster.
In the management client, this is implemented through the addition of the
-f(force) option to the
RESTARTcommands. For more information, see Commands in the MySQL Cluster Management Client.
Cluster API: The MGM API function
ndb_mgm_get_version(), which was previously internal, has now been moved to the public API. This function can be used to get
NDBstorage engine and other version information from the management server. (Bug #51310)
References: See also: Bug #51273.
At startup, an ndbd or ndbmtd process creates directories for its file system without checking to see whether they already exist. Portability code added in MySQL Cluster NDB 7.0.18 and MySQL Cluster NDB 7.1.7 did not account for this fact, printing a spurious error message when a directory to be created already existed. This unneeded printout has been removed. (Bug #57087)
A data node can be shut down having completed and synchronized a given GCI
x, while having written a great many log records belonging to the next GCI
x+ 1, as part of normal operations. However, when starting, completing, and synchronizing GCI
x+ 1, then the log records from original start must not be read. To make sure that this does not happen, the REDO log reader finds the last GCI to restore, scans forward from that point, and erases any log records that were not (and should never be) used.
The current issue occurred because this scan stopped immediately as soon as it encountered an empty page. This was problematic because the REDO log is divided into several files; thus, it could be that there were log records in the beginning of the next file, even if the end of the previous file was empty. These log records were never invalidated; following a start or restart, they could be reused, leading to a corrupt REDO log. (Bug #56961)
An error in program flow in
ndbd.cppcould result in data node shutdown routines being called multiple times. (Bug #56890)
Under certain rare conditions, attempting to start more than one ndb_mgmd process simultaneously using the
--reloadoption caused a race condition such that none of the ndb_mgmd processes could start. (Bug #56844)
DROP TABLEoperations among several SQL nodes attached to a MySQL Cluster. the
LOCK_OPENlock normally protecting mysqld's internal table list is released so that other queries or DML statements are not blocked. However, to make sure that other DDL is not executed simultaneously, a global schema lock (implemented as a row-level lock by
NDB) is used, such that all operations that can modify the state of the mysqld internal table list also need to acquire this global schema lock. The
SHOW TABLE STATUSstatement did not acquire this lock. (Bug #56841)
In certain cases,
DROP DATABASEcould sometimes leave behind a cached table object, which caused problems with subsequent DDL operations. (Bug #56840)
Memory pages used for
DataMemory, once assigned to ordered indexes, were not ever freed, even after any rows that belonged to the corresponding indexes had been deleted. (Bug #56829)
MySQL Cluster stores, for each row in each
NDBtable, a Global Checkpoint Index (GCI) which identifies the last committed transaction that modified the row. As such, a GCI can be thought of as a coarse-grained row version.
Due to changes in the format used by
NDBto store local checkpoints (LCPs) in MySQL Cluster NDB 6.3.11, it could happen that, following cluster shutdown and subsequent recovery, the GCI values for some rows could be changed unnecessarily; this could possibly, over the course of many node or system restarts (or both), lead to an inconsistent database. (Bug #56770)
When multiple SQL nodes were connected to the cluster and one of them stopped in the middle of a DDL operation, the mysqld process issuing the DDL timed out with the error distributing
tbl_nametimed out. Ignoring. (Bug #56763)
ALTER TABLE ... ADD COLUMNoperation that changed the table schema such that the number of 32-bit words used for the bitmask allocated to each DML operation increased during a transaction in DML which was performed prior to DDL which was followed by either another DML operation or—if using replication—a commit, led to data node failure.
This was because the data node did not take into account that the bitmask for the before-image was smaller than the current bitmask, which caused the node to crash. (Bug #56524)
References: This issue is a regression of: Bug #35208.
On Windows, a data node refused to start in some cases unless the ndbd.exe executable was invoked using an absolute rather than a relative path. (Bug #56257)
The text file
cluster_change_hist.txtcontaining old MySQL Cluster changelog information was no longer being maintained, and so has been removed from the tree. (Bug #56116)
The failure of a data node during some scans could cause other data nodes to fail. (Bug #54945)
Exhausting the number of available commit-ack markers (controlled by the
MaxNoOfConcurrentTransactionsparameter) led to a data node crash. (Bug #54944)
When running a
TEXTcolumns, memory was allocated for the columns but was not freed until the end of the
SELECT. This could cause problems with excessive memory usage when dumping (using for example mysqldump) tables with such columns and having many rows, large column values, or both. (Bug #52313)
References: See also: Bug #56488, Bug #50310.
Cluster API: The MGM API functions
ndb_mgm_restart()set the error code and message without first checking whether the management server handle was
NULL, which could lead to fatal errors in MGM API applications that depended on these functions. (Bug #57089)
Cluster API: The MGM API function
ndb_mgm_get_version()did not set the error message before returning with an error. With this fix, it is now possible to call
ndb_mgm_get_latest_error()after a failed call to this function such that
ndb_mgm_get_latest_error()returns an error number and error message, as expected of MGM API calls. (Bug #57088)
Functionality Added or Changed
Important Change: More finely grained control over restart-on-failure behavior is provided with two new data node configuration parameters
MaxStartFailRetrieslimits the total number of retries made before giving up on starting the data node;
StartFailRetryDelaysets the number of seconds between retry attempts.
These parameters are used only if
StopOnErroris set to 0.
For more information, see Defining MySQL Cluster Data Nodes. (Bug #54341)
Important Change: It is no longer possible to make a dump of the
ndbinfodatabase using mysqldump. (Bug #54316)
ndb_restore always reported 0 for the
GCPStop(end point of the backup). Now it provides useful binary log position and epoch information. (Bug #56298)
LockExecuteThreadToCPUconfiguration parameter was not handled correctly for CPU ID values greater than 255. (Bug #56185)
Following a failure of the master data node, the new master sometimes experienced a race condition which caused the node to terminate with a GcpStop error. (Bug #56044)
Trying to create a table having a
DEFAULT ''failed with the error Illegal null attribute. (An empty default is permitted and ignored by
NDBshould do the same.) (Bug #55121)
--nodaemonlogged to the console in addition to the configured log destination. (Bug #54779)
The warning MaxNoOfExecutionThreads (
#) > LockExecuteThreadToCPU count (
#), this could cause contention could be logged when running ndbd, even though the condition described can occur only when using ndbmtd. (Bug #54342)
Startup messages previously written by ndb_mgmd to
stdoutare now written to the cluster log instead when
LogDestinationis set. (Bug #47595)
The graceful shutdown of a data node could sometimes cause transactions to be aborted unnecessarily. (Bug #18538)
References: See also: Bug #55641.
Functionality Added or Changed
--server-id-bitsoption for mysqld and mysqlbinlog.
For mysqld, the
--server-id-bitsoption indicates the number of least significant bits within the 32-bit server ID which actually identify the server. Indicating that the server ID uses less than 32 bits permits the remaining bits to be used for other purposes by NDB API applications using the Event API and
For mysqlbinlog, the
--server-id-bitsoption tells mysqlbinlog how to interpret the server IDs in the binary log when the binary log was written by a mysqld having its
server_id_bitsset to less than the maximum (32). (Bug #52305)
Important Change; Cluster API: The poll and select calls made by the MGM API were not interrupt-safe; that is, a signal caught by the process while waiting for an event on one or more sockets returned error -1 with
errnoset to EINTR. This caused problems with MGM API functions such as
To fix this problem, the internal
ndb_socket_poller::poll()function has been made EINTR-safe.
The old version of this function has been retained as
poll_unsafe(), for use by those parts of NDB that do not need the EINTR-safe version of the function. (Bug #55906)
The TCP configuration parameters
HostName2were not displayed in the output of ndb_config
--configinfo. (Bug #55839)
When another data node failed, a given data node
DBTCkernel block could time out while waiting for
DBDIHto signal commits of pending transactions, leading to a crash. Now in such cases the timeout generates a prinout, and the data node continues to operate. (Bug #55715)
Starting ndb_mgmd with
--config-cache=0caused it to leak memory. (Bug #55205)
The configure.js option
WITHOUT_DYNAMIC_PLUGINS=TRUEwas ignored when building MySQL Cluster for Windows using CMake. Among the effects of this issue was that CMake attempted to build the
InnoDBstorage engine as a plugin (
.DLLfile) even though the
InnoDB Pluginis not currently supported by MySQL Cluster. (Bug #54913)
It was possible for a
DROP DATABASEstatement to remove
NDBhidden blob tables without removing the parent tables, with the result that the tables, although hidden to MySQL clients, were still visible in the output of ndb_show_tables but could not be dropped using ndb_drop_table. (Bug #54788)
An excessive number of timeout warnings (normally used only for debugging) were written to the data node logs. (Bug #53987)
Disk Data: As an optimization when inserting a row to an empty page, the page is not read, but rather simply initialized. However, this optimzation was performed in all cases when an empty row was inserted, even though it should have been done only if it was the first time that the page had been used by a table or fragment. This is because, if the page had been in use, and then all records had been released from it, the page still needed to be read to learn its log sequence number (LSN).
This caused problems only if the page had been flushed using an incorrect LSN and the data node failed before any local checkpoint was completed—which would remove any need to apply the undo log, hence the incorrect LSN was ignored.
The user-visible result of the incorrect LSN was that it caused the data node to fail during a restart. It was perhaps also possible (although not conclusively proven) that this issue could lead to incorrect data. (Bug #54986)
Functionality Added or Changed
Restrictions on some types of mismatches in column definitions when restoring data using ndb_restore have been relaxed. These include the following types of mismatches:
Different default values
Different distribution key settings
Now, when one of these types of mismatches in column definitions is encountered, ndb_restore no longer stops with an error; instead, it accepts the data and inserts it into the target table, while issuing a warning to the user.
For more information, see ndb_restore — Restore a MySQL Cluster Backup. (Bug #54423)
References: See also: Bug #53810, Bug #54178, Bug #54242, Bug #54279.
It is now possible to install management node and data node processes as Windows services. (See Installing MySQL Cluster Processes as Windows Services, for more information.) In addition, data node processes on Windows are now maintained by angel processes, just as they are on other platforms supported by MySQL Cluster.
The disconnection of all API nodes (including SQL nodes) during an
ALTER TABLEcaused a memory leak. (Bug #54685)
If a node shutdown (either in isolation or as part of a system shutdown) occurred directly following a local checkpoint, it was possible that this local checkpoint would not be used when restoring the cluster. (Bug #54611)
The setting for
BuildIndexThreadswas ignored by ndbmtd, which made it impossible to use more than 4 cores for rebuilding indexes. (Bug #54521)
When adding multiple new node groups to a MySQL Cluster, it was necessary for each new node group to add only the nodes to be assigned to the new node group, create that node group using
CREATE NODEGROUP, then repeat this process for each new node group to be added to the cluster. The fix for this issue makes it possible to add all of the new nodes at one time, and then issue several
CREATE NODEGROUPcommands in succession. (Bug #54497)
When performing an online alter table where 2 or more SQL nodes connected to the cluster were generating binary logs, an incorrect message could be sent from the data nodes, causing mysqld processes to crash. This problem was often difficult to detect, because restarting SQL node or data node processes could clear the error, and because the crash in mysqld did not occur until several minutes after the erroneous message was sent and received. (Bug #54168)
A table having the maximum number of attributes permitted could not be backed up using the ndb_mgm client.Note
The maximum number of attributes supported per table is not the same for all MySQL Cluster releases. See Limits Associated with Database Objects in MySQL Cluster, to determine the maximum that applies in the release which you are using.
The presence of duplicate
[tcp]sections in the
config.inifile caused the management server to crash. Now in such cases, ndb_mgmd fails gracefully with an appropriate error message. (Bug #49400)
Cluster API: When using the NDB API, it was possible to rename a table with the same name as that of an existing table.Note
This issue did not affect table renames executed using SQL on MySQL servers acting as MySQL Cluster API nodes.
Cluster API: An excessive number of client connections, such that more than 1024 file descriptors, sockets, or both were open, caused NDB API applications to crash. (Bug #34303)
Functionality Added or Changed
Important Change: Commercial binary releases of MySQL Cluster NDB 7.1 now include support for the
InnoDBstorage engine. (Bug #52945)
References: Reverted patches: Bug #31989.
Cluster API: The value of an internal constant used in the implementation of the
NdbScanOperationclasses caused MySQL Cluster NDB 7.0 NDB API applications compiled against MySQL Cluster NDB 7.0.14 or earlier to fail when run with MySQL Cluster 7.0.15, and MySQL Cluster NDB 7.1 NDB API applications compiled against MySQL Cluster NDB 7.1.3 or earlier to break when used with MySQL Cluster 7.1.4. (Bug #54516)
When using mysqldump to back up and restore schema information while using ndb_restore for restoring only the data, restoring to MySQL Cluster NDB 7.1.4 from an older version failed on tables having columns with default values. This was because versions of MySQL Cluster prior to MySQL Cluster NDB 7.1.4 did not have native support for default values.
In addition, the MySQL Server supports
TIMESTAMPcolumns having dynamic default values, such as
DEFAULT CURRENT_TIMESTAMP; however, the current implementation of
NDB-native default values permits only a constant default value.
To fix this issue, the manner in which
TIMESTAMPcolumns is reverted to its pre-NDB-7.1.4 behavior (obtaining the default value from mysqld rather than
NDBCLUSTER) except where a
TIMESTAMPcolumn uses a constant default, as in the case of a column declared as
TIMESTAMP DEFAULT 0or
TIMESTAMP DEFAULT 20100607174832. (Bug #54242)
Functionality Added or Changed
Important Change: The maximum number of attributes (columns plus indexes) per table has increased to 512.
--wait-nodesoption has been added for ndb_waiter. When this option is used, the program waits only for the nodes having the listed IDs to reach the desired state. For more information, see ndb_waiter — Wait for MySQL Cluster to Reach a Given Status. (Bug #52323)
As part of this change, new methods relating to default values have been added to the
Tableclasses in the NDB API. For more information, see Column::getDefaultValue(), Column::setDefaultValue(), and Table::hasDefaultValues(). (Bug #30529)
Added the MySQL Cluster management server option
--config-cache, which makes it possible to enable and disable configuration caching. This option is turned on by default; to disable configuration caching, start ndb_mgmd with
--config-cache=0, or with
--skip-config-cache. See ndb_mgmd — The MySQL Cluster Management Server Daemon, for more information.
--skip-unknown-objectsoption for ndb_restore. This option causes ndb_restore to ignore any schema objects which it does not recognize. Currently, this is useful chiefly for restoring native backups made from a cluster running MySQL Cluster NDB 7.0 to a cluster running MySQL Cluster NDB 6.3.
Incompatible Change; Cluster API: The default behavior of the NDB API Event API has changed as follows:
Previously, when creating an
Event, DDL operations (alter and drop operations on tables) were automatically reported on any event operation that used this event, but as a result of this change, this is no longer the case. Instead, you must now invoke the event's
setReport()method, with the new
ER_DDL, to get this behavior.
For existing NDB API applications where you wish to retain the old behavior, you must update the code as indicated previously, then recompile, following an upgrade. Otherwise, DDL operations are no longer reported after upgrading
NDBtables until creation of a table failed due to
NDBerror 905 Out of attribute records (increase MaxNoOfAttributes), then increasing
MaxNoOfAttributesand restarting all management node and data node processes, attempting to drop and re-create one of the tables failed with the error Out of table records..., even when sufficient table records were available. (Bug #53944)
References: See also: Bug #52055. This issue is a regression of: Bug #44294.
Creating a Disk Data table, dropping it, then creating an in-memory table and performing a restart, could cause data node processes to fail with errors in the
DBTUPkernel block if the new table's internal ID was the same as that of the old Disk Data table. This could occur because undo log handling during the restart did not check that the table having this ID was now in-memory only. (Bug #53935)
A table created while
ndb_table_no_loggingwas enabled was not always stored to disk, which could lead to a data node crash with Error opening DIH schema files for table. (Bug #53934)
An internal buffer allocator used by
NDBhas the form
alloc(and attempts to allocate
wantedpages, but is permitted to allocate a smaller number of pages, between
minimum. However, this allocator could sometimes allocate fewer than
minimumpages, causing problems with multi-threaded building of ordered indexes. (Bug #53580)
When compiled with support for
epollbut this functionality is not available at runtime, MySQL Cluster tries to fall back to use the
select()function in its place. However, an extra
ndbout_c()call in the transporter registry code caused ndbd to fail instead. (Bug #53482)
The value set for the ndb_mgmd option
--ndb-nodeidwas not verified prior to use as being within the permitted range (1 to 255, inclusive), leading to a crash of the management server. (Bug #53412)
NDBtruncated a column declared as
DECIMAL(65,0)to a length of 64. Now such a column is accepted and handled correctly. In cases where the maximum length (65) is exceeded,
NDBnow raises an error instead of truncating. (Bug #53352)
NDBlog handler failed, the memory allocated to it was freed twice. (Bug #53200)
DataMemoryhigher than 4G on 32-bit platforms caused ndbd to crash, instead of failing gracefully with an error. (Bug #52536, Bug #50928)
When creating an index,
NDBfailed to check whether the internal ID allocated to the index was within the permissible range, leading to an assertion. This issue could manifest itself as a data node failure with
NDBerror 707 (No more table metadata records (increase MaxNoOfTables)), when creating tables in rapid succession (for example, by a script, or when importing from mysqldump), even with a relatively high value for
MaxNoOfTablesand a relatively low number of tables. (Bug #52055)
ndb_restore did not raise any errors if hashmap creation failed during execution. (Bug #51434)
Specifying the node ID as part of the
--ndb-connectstringoption to mysqld was not handled correctly.
The fix for this issue includes the following changes:
Multiple occurrences of any of the mysqld options
--ndb-nodeidare now handled in the same way as with other MySQL server options, in that the value set in the last occurrence of the option is the value that is used by mysqld.
--ndb-nodeidis used, its value overrides that of any
nodeidsetting used in
--ndb-connectstring. For example, starting mysqld with
--ndb-connectstring=nodeid=1,10.100.1.100 --ndb-nodeid=3now produces the same result as starting it with
The 1024-character limit on the length of the connection string is removed, and
--ndb-connectstringis now handled in this regard in the same way as other mysqld options.
In the NDB API, a new constructor for
Ndb_cluster_connectionis added which takes as its arguments a connection string and the node ID to force the API node to use.
NDB did not distinguish correctly between table names differing only by lettercase when
lower_case_table_nameswas set to 0. (Bug #33158)
ndb_mgm -e "ALL STATUS"erroneously reported that data nodes remained in start phase 0 until they had actually started.
Functionality Added or Changed
maxcolumn has been renamed to
max) columns now display values in bytes rather than memory pages.
Added the columns
total_pagesto show the amount of a resource used and total amount available in pages.
The size of the memory pages used for calculating data memory (
total_pagescolumns) is now 32K rather than 16K.
For more information, aee The ndbinfo memoryusage Table.
Important Change: The experimental
poolstable has been removed from the
ndbinfodatabase. Information useful to MySQL Cluster administration that was contained in this table should be available from other
Important Note: MySQL Cluster 7.1 is now supported for production use on Windows platforms.Important
Some limitations specific to Windows remain; the most important of these are given in the following list:
There is not yet any Windows installer for MySQL Cluster; you must extract, place, configure, and start the necessary MySQL Cluster executables manually.
MySQL Cluster processes cannot yet be installed as Windows services. This means that each process executable must be run from a command prompt, and cannot be backgrounded. If you close the command prompt window in which you started the process, the process terminates.
There is as yet no “angel” process for data nodes; if a data node process quits, it must be restarted manually.
ndb_error_reporter is not yet available on Windows.
The multi-threaded data node process (ndbmtd) is not yet included in the binary distribution. However, it should be built automatically if you build MySQL Cluster from source.
As with MySQL Cluster on other supported platforms, you cannot build MySQL Cluster for Windows from the MySQL Server 5.1 sources; you must use the source code from the MySQL Cluster NDB 7.1 tree.
If a node or cluster failure occurred while mysqld was scanning the
ndb.ndb_schematable (which it does when attempting to connect to the cluster), insufficient error handling could lead to a crash by mysqld in certain cases. This could happen in a MySQL Cluster with a great many tables, when trying to restart data nodes while one or more mysqld processes were restarting. (Bug #52325)
In MySQL Cluster NDB 7.0 and later, DDL operations are performed within schema transactions; the NDB kernel code for starting a schema transaction checks that all data nodes are at the same version before permitting a schema transaction to start. However, when a version mismatch was detected, the client was not actually informed of this problem, which caused the client to hang. (Bug #52228)
After running a mixed series of node and system restarts, a system restart could hang or fail altogether. This was caused by setting the value of the newest completed global checkpoint too low for a data node performing a node restart, which led to the node reporting incorrect GCI intervals for its first local checkpoint. (Bug #52217)
When performing a complex mix of node restarts and system restarts, the node that was elected as master sometimes required optimized node recovery due to missing
REDOinformation. When this happened, the node crashed with Failure to recreate object ... during restart, error 721 (because the
DBDICTrestart code was run twice). Now when this occurs, node takeover is executed immediately, rather than being made to wait until the remaining data nodes have started. (Bug #52135)
References: See also: Bug #48436.
The internal variable
ndb_new_handler, which is no longer used, has been removed. (Bug #51858)
ha_ndbcluster.ccwas not compiled with the same
SAFE_MUTEXflags as the MySQL Server. (Bug #51857)
When debug compiling MySQL Cluster on Windows, the mysys library was not compiled with -DSAFEMALLOC and -DSAFE_MUTEX, due to the fact that my_socket.c was misnamed as my_socket.cc. (Bug #51856)
Some values shown in the
memoryusagetable did not match corresponding values shown by the ndb_mgm client
ALL REPORT MEMORYUSAGEcommand. (Bug #51735)
The redo log protects itself from being filled up by periodically checking how much space remains free. If insufficient redo log space is available, it sets the state
TAIL_PROBLEMwhich results in transactions being aborted with error code 410 (out of redo log). However, this state was not set following a node restart, which meant that if a data node had insufficient redo log space following a node restart, it could crash a short time later with Fatal error due to end of REDO log. Now, this space is checked during node restarts. (Bug #51723)
Restoring a MySQL Cluster backup between platforms having different endianness failed when also restoring metadata and the backup contained a hashmap not already present in the database being restored to. This issue was discovered when trying to restore a backup made on Solaris/SPARC to a MySQL Cluster running on Solaris/x86, but could conceivably occur in other cases where the endianness of the platform on which the backup was taken differed from that of the platform being restored to. (Bug #51432)
A mysqld, when attempting to access the
ndbinfodatabase, crashed if could not contact the management server. (Bug #51067)
The mysql client
systemcommand did not work properly. This issue was only known to affect the version of the mysql client that was included with MySQL Cluster NDB 7.0 and MySQL Cluster NDB 7.1 releases. (Bug #48574)
Packaging; Cluster API: The file
META-INF/services/org.apache.openjpa.lib.conf.ProductDerivationwas missing from the
clusterjpaJAR file. This could cause setting
ndb” to be rejected. (Bug #52106)
References: See also: Bug #14192154.
Disk Data: Inserts of blob column values into a MySQL Cluster Disk Data table that exhausted the tablespace resulted in misleading no such tuple error messages rather than the expected error tablespace full.
This issue appeared similar to Bug #48113, but had a different underlying cause. (Bug #52201)
References: See also: Bug #48113.
Disk Data: DDL operations on Disk Data tables having a relatively small
UNDO_BUFFER_SIZEcould fail unexpectedly.
Cluster API: A number of issues were corrected in the NDB API coding examples found in the
storage/ndb/ndbapi-examplesdirectory in the MySQL Cluster source tree. These included possible endless recursion in
ndbapi_scan.cppas well as problems running some of the examples on systems using Windows or Mac OS X due to the lettercase used for some table names. (Bug #30552, Bug #30737)
Functionality Added or Changed
Cluster API: It is now possible to determine, using the ndb_desc utility or the NDB API, which data nodes contain replicas of which partitions. For ndb_desc, a new
--extra-node-infooption is added to cause this information to be included in its output. A new method
Table::getFragmentNodes()is added to the NDB API for obtaining this information programmatically. (Bug #51184)
Numeric codes used in management server status update messages in the cluster logs have been replaced with text descriptions. (Bug #49627)
References: See also: Bug #44248.
A new configuration parameter
HeartbeatThreadPrioritymakes it possible to select between a first-in, first-out or round-round scheduling policy for management node and API node heartbeat threads, as well as to set the priority of these threads. See Defining a MySQL Cluster Management Server, or Defining SQL and Other API Nodes in a MySQL Cluster, for more information. (Bug #49617)
Start phases are now written to the data node logs. (Bug #49158)
DUMPcommands returned output to all ndb_mgm clients connected to the same MySQL Cluster. Now, these commands return their output only to the ndb_mgm client that actually issued the command. (Bug #40865)
Disk Data: The ndb_desc utility can now show the extent space and free extent space for subordinate
TEXTcolumns (stored in hidden
BLOBtables by NDB). A
--blob-infooption has been added for this program that causes ndb_desc to generate a report for each subordinate BLOB table. For more information, see ndb_desc — Describe NDB Tables. (Bug #50599)
Important Change: The
DATA_MEMORYcolumn of the
memoryusagetable was renamed to
memory_type. (Bug #50926)
When deciding how to divide the REDO log, the
DBDIHkernel block saved more than was needed to restore the previous local checkpoint, which could cause REDO log space to be exhausted prematurely (
NDBerror 410). (Bug #51547)
DML operations can fail with
NDBerror 1220 (REDO log files overloaded...) if the opening and closing of REDO log files takes too much time. If this occurred as a GCI marker was being written in the REDO log while REDO log file 0 was being opened or closed, the error could persist until a GCP stop was encountered. This issue could be triggered when there was insufficient REDO log space (for example, with configuration parameter settings
NoOfFragmentLogFiles = 6and
FragmentLogFileSize = 6M) with a load including a very high number of updates. (Bug #51512)
References: See also: Bug #20904.
An attempted online upgrade from a MySQL Cluster NDB 6.3 or 7.0 release to a MySQL Cluster NDB 7.1 release failed, as the first upgraded data node rejected the remaining data nodes as using incompatible versions. (Bug #51429)
A side effect of the ndb_restore
--rebuild-indexesoptions is to change the schema versions of indexes. When a mysqld later tried to drop a table that had been restored from backup using one or both of these options, the server failed to detect these changed indexes. This caused the table to be dropped, but the indexes to be left behind, leading to problems with subsequent backup and restore operations. (Bug #51374)
The output of the ndb_mgm client
REPORT BACKUPSTATUScommand could sometimes contain errors due to uninitialized data. (Bug #51316)
IndexMemorygreater than 2GB could cause data nodes to crash while starting. (Bug #51256)
ndb_restore crashed while trying to restore a corrupted backup, due to missing error handling. (Bug #51223)
The ndb_restore message
Successfully created index `PRIMARY`...was directed to
stdout. (Bug #51037)
An initial restart of a data node configured with a large amount of memory could fail with a Pointer too large error. (Bug #51027)
References: This issue is a regression of: Bug #47818.
NoOfReplicasequal to 1 or 2, if data nodes from one node group were restarted 256 times and applications were running traffic such that it would encounter
NDBerror 1204 (Temporary failure, distribution changed), the live node in the node group would crash, causing the cluster to crash as well. The crash occurred only when the error was encountered on the 256th restart; having the error on any previous or subsequent restart did not cause any problems. (Bug #50930)
GROUP BYquery against
NDBtables sometimes did not use any indexes unless the query included a
FORCE INDEXoption. With this fix, indexes are used by such queries (where otherwise possible) even when
FORCE INDEXis not specified. (Bug #50736)
transporterstable showed the status of a disconnected node as
DISCONNECTED. (Bug #50654)
ndbmtd started on a single-core machine could sometimes fail with a Job Buffer Full error when
MaxNoOfExecutionThreadswas set greater than
LockExecuteThreadToCPU. Now a warning is logged when this occurs. (Bug #50582)
The following issues were fixed in the ndb_mgm client
Issuing a command in the ndb_mgm client after it had lost its connection to the management server could cause the client to crash. (Bug #49219)
Replication of a MySQL Cluster using multi-threaded data nodes could fail with forced shutdown of some data nodes due to the fact that ndbmtd exhausted
LongMessageBuffermuch more quickly than ndbd. After this fix, passing of replication data between the
SUMANDB kernel blocks is done using
Until you can upgrade, you may be able to work around this issue by increasing the
LongMessageBuffersetting; doubling the default should be sufficient in most cases. (Bug #46914)
Information about several management client commands was missing from (that is, truncated in) the output of the
HELPcommand. (Bug #46114)
MemReportFrequencyconfiguration parameter was set in
config.ini, the ndb_mgm client
REPORT MEMORYUSAGEcommand printed its output multiple times. (Bug #37632)
ndb_mgm -e "... REPORT ..." did not write any output to
The fix for this issue also prevents the cluster log from being flooded with
DataMemoryusage reaches 100%, and insures that when the usage is decreased, an appropriate message is written to the cluster log. (Bug #31542, Bug #44183, Bug #49782)
Disk Data: The error message returned after atttempting to execute
ALTER LOGFILE GROUPon an nonexistent logfile group did not indicate the reason for the failure. (Bug #51111)
Disk Data: For a Disk Data tablespace whose extent size was not equal to a whole multiple of 32K, the value of the
FREE_EXTENTScolumn in the
INFORMATION_SCHEMA.FILEStable was smaller than the value of
As part of this fix, the implicit rounding of
NDBCLUSTER(see CREATE TABLESPACE Syntax) is now done explicitly, and the rounded values are used for calculating
INFORMATION_SCHEMA.FILEScolumn values and other purposes. (Bug #49709)
References: See also: Bug #31712.
Disk Data: Once all data files associated with a given tablespace had been dropped, there was no way for MySQL client applications (including the mysql client) to tell that the tablespace still existed. To remedy this problem,
INFORMATION_SCHEMA.FILESnow holds an additional row for each tablespace. (Previously, only the data files in each tablespace were shown.) This row shows
FILE_NAMEcolumn. (Bug #31782)
Disk Data: It was possible to issue a
ALTER TABLESPACEstatement in which
INITIAL_SIZEwas less than
EXTENT_SIZE. (In such cases,
INFORMATION_SCHEMA.FILESerroneously reported the value of the
1and that of the
0.) Now when either of these statements is issued such that
INITIAL_SIZEis less than
EXTENT_SIZE, the statement fails with an appropriate error message. (Bug #31712)
References: See also: Bug #49709.
Cluster API: An issue internal to ndb_mgm could cause problems when trying to start a large number of data nodes at the same time. (Bug #51273)
References: See also: Bug #51310.
Cluster API: When reading blob data with lock mode
LM_SimpleRead, the lock was not upgraded as expected. (Bug #51034)
Functionality Added or Changed
ndbinfodatabase is added to provide MySQL Cluster metadata in real time. The tables making up this database contain information about memory, buffer, and other resource usage, as well as configuration parameters and settings, event counts, and other useful data. Access to
ndbinfois done by executing standard SQL queries on its tables using the mysql command-line client or other MySQL client application. No special setup procedures are required;
ndbinfois created automatically and visible in the output of
SHOW DATABASESwhen the MySQL Server is connected to a MySQL Cluster.
For more information, see The ndbinfo MySQL Cluster Information Database.
Cluster API: ClusterJ 1.0 and ClusterJPA 1.0 are now available for programming Java applications with MySQL Cluster. ClusterJ is a Java connector providing an object-relational API for performing high-speed operations such as primary key lookups on a MySQL Cluster database, but does not require the use of the MySQL Server or JDBC (Connector/J). ClusterJ uses a new library NdbJTie which enables direct access from Java to the NDB API and thus to the
NDBCLUSTERstorage engine. ClusterJPA is a new implementation of OpenJPA, and can use either a JDBC connection to a MySQL Cluster SQL node (MySQL Server) or a direct connection to MySQL Cluster using NdbJTie, depending on availability and operational performance.
ClusterJ, ClusterJPA, and NdbJTie require Java 1.5 or 1.6, and MySQL Cluster NDB 7.0 or later.
All necessary libraries and other files for ClusterJ, ClusterJPA, and NdbJTie can be found in the MySQL Cluster NDB 7.1.1 or later distribution.
When a primary key lookup on an
NDBtable containing one or more
BLOBcolumns was executed in a transaction, a shared lock on any blob tables used by the
NDBtable was held for the duration of the transaction. (This did not occur for indexed or non-indexed
Now in such cases, the lock is released after all
BLOBdata has been read. (Bug #49190)
This version was for testing and internal use only, and not officially released.
Functionality Added or Changed
Important Change: The default value of the
DiskIOThreadPooldata node configuration parameter has changed from 8 to 2.
Incompatible Change; Cluster API: Several NDB API methods were declared as
const, but did not return an
lvalue, which caused compiler warnings when using gcc 4.3 or newer to perform the build. The methods affected are
NdbOperation::getType(). (Bug #44840)
Important Change: The
--with-ndb-port-baseoption for configure has been removed. It is now handled as an unknown and invalid option if you attempt to use it when configuring a build of MySQL Cluster. (Bug #47941)
References: See also: Bug #38502.
mysqld could sometimes crash during a commit while trying to handle NDB Error 4028 Node failure caused abort of transaction. (Bug #38577)