MySQL NDB Cluster 8.0.29 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB
storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster 8.0.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.29 (see Changes in MySQL 8.0.29 (2022-04-26, General Availability)).
This release is no longer available for download. It was
removed due to a critical issue that could cause data in
InnoDB
tables having added
columns to be interpreted incorrectly. Please upgrade to MySQL
Cluster 8.0.30 instead.
NDB
could not be built using GCC 11 due to an array out of bounds error. (Bug #33459671)Removed a number of
-Wstringop-truncation
warnings raised when compilingNDB
with GCC 9 as well as suppression of such warnings. Also removed unneeded includes from the header filendb_global.h
. (Bug #32233543)
-
Eight new tables providing
NDB
dictionary information about database objects have been added to thendbinfo
information database. This makes it possible to obtain a great deal of information of this type by issuing queries in the mysql client, without the need to use ndb_desc, ndb_select_all, and similar utilities. (It is still be necessary to use ndb_desc to obtain fragment distribution information.) These tables are listed here, together with theNDB
objects about which they provide information:blobs
: Blob tablesdictionary_columns
: Table columnsdictionary_tables
: Tablesevents
: Event subscriptionsfiles
: Files used by disk data tablesforeign_keys
: Foreign keyshash_maps
: Hash mapsindex_columns
: Table indexes
An additional change in
ndbinfo
is that onlyfiles
andhash_maps
are defined as views; the remaining six tables listed previously are in fact base tables, even though they are not named using thendb$
prefix. As a result, these tables are not hidden as otherndbinfo
base tables are.For more information, see the descriptions of the tables in ndbinfo: The NDB Cluster Information Database. (WL #11968)
-
ndbcluster
plugin threads can now be seen in the Performance Schema. Thethreads
andsetup_threads
tables show all three of these threads: the binary logging thread (ndb_binlog
thread), the index statistics thread (ndb_index_stat
thread), and the metadata thread (ndb_metadata
thread).This makes it possible to obtain the thread IDs and thread OS IDs of these threads for use in queries on these and other Performance Schema tables.
For more information and examples, see ndbcluster Plugin Threads. (WL #15000)
-
NDB Cluster APIs: The NDB API now implements a
List::clear()
method which clears all data from a list. This makes it simpler to reuse an existing list with the Dictionary methodslistEvents()
,listIndexes()
, andlistObjects()
.In addition, the
List
destructor has been modified such that it now callsclear()
before attempting the removal of any elements or attributes from the list being destroyed. (Bug #33676070) -
The client receive thread was enabled only when under high load, where the criterion for determining “high load” was that the number of clients waiting in the poll queue (the receive queue) was greater than
min_active_clients_recv_thread
(default:8
).This was a poor metric for determining high load, since a single client, such as the binary log injector thread handling incoming replication events, could experience high load on its own as well. The same was true of a pushed join query (in which very large batches of incoming
TRANSID_AI
signals are received).We change the receive thread such that it now sleeps in the poll queue rather than being deactivated completely, so that it is now always available for handling incoming signals, even when the client is not under high load. (Bug #33752914)
-
It is now possible to restore the
ndb_apply_status
table from anNDB
backup, using ndb_restore with the--with-apply-status
option added in this release. In some cases, this information can be useful in new setting up new replication links.--with-apply-status
restores all rows of thendb_apply_status
table except for the row for which theserver_id
value is0
; use--restore-epoch
to restore this row.To use the
--with-apply-status
option, you must also supply--restore-data
when invoking ndb_restore.For more information, see the description of the
--with-apply-status
option in the Reference Manual, as well as ndb_apply_status Table. (Bug #32604161, Bug #33594652) -
Previously, when a user query attempted to open an
NDB
table with a missing (or broken) index, the MySQL server raisedNDB
error4243
Index not found. Now when such an attempt is made, it is handled as described here:If the query does not make use of the problematic index, the query succeeds with no errors or warnings.
If the query attempts to use the missing or broken index, the query is rejected with a warning from
NDB
(Indexidx
is not available in NDB. Use "ALTER TABLEtbl
ALTER INDEXidx
INVISIBLE" to prevent MySQL from attempting to access it, or use "ndb_restore --rebuild-indexes" to rebuild it), and an error (ER_NOT_KEYFILE
).
The rationale for this change is that constraint violations or missing data sometimes make it impossible to restore an index on an
NDB
table, in which case, running ndb_restore with--disable-indexes
restores the data without the index. With this change, once the data is restored from backup, it is possible to use SQL to fix any corrupt data and rebuild the index. (Bug #28584066, WL #14867)
Important Change: The maximum value supported for the
--ndb-batch-size
server option has been increased from31536000
to2147483648
(2 GB). (Bug #21040523)-
Performance: When profiling multithreaded data nodes (ndbmtd) performing a transaction including a large number of inserts, it was found that more than 50% of CPU time was spent in the internal method
Dblqh::findTransaction()
. It was found that, when there were many operations belonging to uncommitted transactions in the hash list searched by this method, the hash buckets overfilled, the result being that an excessive number of CPU cycles were consumed searching through the hash buckets.To address this problem, we fix the number of hash buckets at 4095, and scale the size of a hash bucket relative to the maximum number of operations, so that only relatively few items should now be placed in the same bucket. (Bug #33803541)
References: See also: Bug #33803487.
-
Performance: When inserting a great many rows into an empty or small table in the same transaction, the rate at which rows were inserted quickly declined to less than 50% of the initial rate; subsequently, it was found that roughly 50% of all CPU time was spent in
Dbacc::getElement()
, and the root cause identified to be the timing of resizing the structures used for storing elements byDBACC
, growing with the insertion of more rows in the same transaction, and shrinking following a commit.We fix this issue by checking for a need to resize immediately following the insertion or deletion of an element. This also handles the subsequent rejection of an insert. (Bug #33803487)
References: See also: Bug #33803541.
-
Performance: A considerable amount of time was being spent searching the event buffer data hash (using the internal method
EventBufData_hash::search()
), due to the following issues:The number of buckets proved to be too low under high load, when the hash bucket list could become very large.
The hash buckets were implemented using a linked list. Traversing a long linked list can be highly inefficient.
We fix these problems by using a vector (
std::vector
) rather than a linked list, and by making the array containing the set of hash buckets expandable. (Bug #33796754) Performance: The internal function
computeXorChecksum()
was implemented such that great care was taken to aid the compiler in generating optimal code, but it was found that it consumed excessive CPU resources, and did not perform as well as a simpler implementation. This function is now reimplemented with a loop summing upXOR
results over an array, which appears to result in better optimization with both GCC and Clang compilers. (Bug #33757412)-
Microsoft Windows: The
CompressedLCP
data node configuration parameter had no effect on Windows platforms.NoteWhen upgrading to this release, Windows users should verify the setting for
CompressedLCP
; if it was previously enabled, you may experience an increase in CPU usage by I/O threads following the upgrade, when under load, when restoring data as part of a node restart, or in both cases. If this behavior is not desired, disableCompressedLCP
.(Bug #33727690)
Microsoft Windows: The internal function
Win32AsyncFile::rmrfReq()
did not always check for both ERROR_FILE_NOT_FOUND and ERROR_PATH_NOT_FOUND when either condition was likely. (Bug #33727647)Microsoft Windows: Corrected several minor issues that occurred with file handling on Windows platforms. (Bug #33727629)
-
NDB Replication: When performing certain schema operations on an
NDB
table, including those involving a copyingALTER TABLE
, the epoch column in themysql.ndb_apply_status
table on the replica was updated to 0, although this should happen only for transactions originating from storage engines other thanNDBCLUSTER
.To fix this, we now update (only) the binary log position when writing a row into
ndb_apply_status
from the same server ID as the previous one, but do not overwrite the current epoch when applying schema operations. (Bug #14139386) -
NDB Cluster APIs: Hash key generation using the internal API method
NdbBlob::getBlobKeyHash()
ignored the most significant byte of the key. This unnecessarily caused uneven distribution in the NDB API blob hash list, resulting in a increased need for comparing key values, and thus more CPU usage. (Bug #33803583)References: See also: Bug #33783274.
NDB Cluster APIs: Removed an unnecessary assertion that could be hit when iterating through the list returned by
Dictionary::listEvents()
. (Bug #33630835)Builds on Ubuntu 21.10 using GCC 11 stopped with -Werror=maybe-uninitialized. (Bug #33976268)
In certain cases,
NDB
did not handle node IDs of data nodes correctly. (Bug #33916404)In some cases,
NDB
did not validate all node IDs of data nodes correctly. (Bug #33896409)In some cases, array indexes were not handled correctly. (Bug #33896389, Bug #33896399, Bug #33916134)
In some cases, integers were not handled correctly. (Bug #33896356)
As part of work done in NDB 8.0.23 to implement the
AutomaticThreadConfig
configuration parameter, the maximum numbers of LQH and TC threads supported by ndbmtd were raised from 129 each to 332 and 160, respectively. This adversely affected the performance ofexecSEND_PACKED()
methods implemented by several NDB kernel blocks, which complete sending of packed signals when the scheduler is about to suspend execution of the current block thread. This was due to continuing simply to iterate over the arrays of such threads despite the arrays' increased size. We fix this by using a bitmask to track the thread states alongside the full arrays. (Bug #33856371)-
When operating on blob columns,
NDB
must add extra operations to read and write the blob head column and blob part rows. These operations are added to the tail of the transaction's operation list automatically when the transaction is executed.To insert a new operation prior to a given operation, it was necessary to traverse the operation list from the beginning until the desired operation was found, with a cost proportional to the length
L
of the list of preceding operations. This is approximately
, increasing as more operations are added to the list; when a large number of operations modifying blobs were defined in a batch, this traversal cost was paid for each operation. This had a noticeable impact on performance when reading and writing blobs.L
2 / 2We fix this by using list splicing in
NdbTransaction::execute()
to eliminate unnecessary traversals of this sort when defining blob operations. (Bug #33797931) -
The block thread scheduler makes frequent calls to
update_sched_config()
to update its scheduling strategy. That involves checking the fill degree of the job buffer queues used to send signals between the nodes' internal block threads. When these queues are about to fill up, the thread scheduler assigns a smaller value tomax_signals
for the next round, in order to reduce the pressure on the job buffers. When the minimum free threshold has been reached, the scheduler yields the CPU while waiting for the consumer threads to free some job buffer slots.The fix in NDB 8.0.18 for a previous issue introduced a mechanism whereby the main thread was allowed to continue executing even when this lower threshold had been reached; in some cases the main thread consumed all job buffers, including those held in reserve, leading to an unplanned shutdown of the data node due to resource exhaustion. (Bug #33792362, Bug #33872577)
References: This issue is a regression of: Bug #29887068.
-
Setting up a cluster with one LDM thread and one query thread using the
ThreadConfig
parameter (for example,ThreadConfig=ldm={cpubind=1},query={cpubind=2}
) led to unplanned shutdowns of data nodes.This was due to internal thread variables being assigned the wrong values when there were no main or request threads explicitly assigned. Now we make sure in such cases that these are assigned the thread number of the first receive thread, as expected. (Bug #33791270)
NdbEventBuffer
hash key generation for non-character data reused the same 256 hash keys; in addition, strings of zero length were ignored when calculating hash keys. (Bug #33783274)-
The collection of NDB API statistics based on the
EventBytesRecvdCount
event counter incurred excessive overhead. Now this counter is updated using a value which is aggregated as the event buffer is filled, rather than traversing all of the event buffer data in a separate function call.For more information, see NDB API Statistics Counters and Variables. (Bug #33778923)
The internal method
THRConfig::reorganize_ldm_bindings()
behaved unexpectedly, in some cases changing thread bindings afterAutomaticThreadConfig
had already bound the threads to the correct CPUs. We fix this by removing the method, no longer using it when parsing configuration data or adding threads. (Bug #33764260)The receiver thread ID was hard-coded in the internal method
TransporterFacade::raise_thread_prio()
such that it always acted to raise the priority of the receiver thread, even when called from the send thread. (Bug #33752983)-
A fix in NDB 8.0.28 addressed an issue with the code used by various
NDB
components, includingNdb_index_stat
, that checked whether the data nodes were up and running. In clusters with multiple SQL nodes, this resulted in an increase in the frequency of race conditions between index statistics threads trying to create a table event on thendb_index_stat_head
table; that is, it was possible for two SQL nodes to try to create the event at the same time, with the losing SQL node raising Error 746 Event name already exists. Due to this error, the binary logging thread ended up waiting for the index statistics thread to signal that its own setup was complete, and so the second SQL node timed out with Could not create index stat system tables after--ndb-wait-setup
seconds. (Bug #33728909)References: This issue is a regression of: Bug #32019119.
On a write error, the message printed by ndbxfrm referenced the source file rather than the destination file. (Bug #33727551)
-
A complex nested join was rejected with the error FirstInner/Upper has to be an ancestor or a sibling, which is thrown by the internal
NdbQueryOperation
interface used to define a pushed join in the SPJ API, indicating that the join-nest dependencies for the interface were not properly defined.The query showing the issue had the join nest structure
t2, t1, (t3, (t5, t4))
. Neither of the join conditions ont5
ort4
had any references or explicit dependencies on tablet3
, but each had an implicit dependency ont3
in virtue of being in a nest within the same nest ast3
.When preparing a pushed join,
NDB
tracks all required table dependencies between tables and join-nests by adding them to them_ancestor
bitmask for each table. For nest level dependencies, they should all be added to the first table in the relevant nest. When the relevant dependencies for a specific table are calculated, they include the set of all tables being explicitly refered in the join condition, plus any implicit dependencies due to the join nests the table is a member of, limited by the uppermost table referred to in the join condition.For this particular join query we did not properly take into account that there might not be any references to tables in the closest upper nest (the nest starting with
t3
); in such cases we are dependent on all nests up to the nest containing the uppermost table referenced. We fix the issue by introducing a while-loop in which we add ancestor nest dependencies until we reach this uppermost table. (Bug #33670002) When the transient memory pool (
TransientPool
) used internally byNDB
grew above 256 MB, subsequent attempts to shrink the pool caused an error which eventually led to an unplanned shutdown of the data node. (Bug #33647601)Check that the connection to
NDB
has been set up before querying about statistics for partitions. (Bug #33643512)-
When the ordered index
PRIMARY
was not created for thendb_sql_metadata
table, application of stored grants could not proceed due to the missing index.We fix this by protecting creation of utility tables (including ndb_sql_metadata) by wrapping the associated
CREATE TABLE
statement with a schema transaction, thus handling rejection of the statement by rollback. In addition, in the event the newly-created table is not created correctly, it is dropped. These changes avoid leaving behind a table that is only partially created, so that the next attempt to create the utility table starts from the beginning of the process. (Bug #33634453) Removed
-Wmaybe-uninitialized
warnings which occurred when compiling NDB Cluster with GCC 11.2. (Bug #33611915)-
NDB
accepted an arbitrary (and invalid) string of characters following a numeric parameter value in theconfig.ini
global configuration. For example, it was possible to use eitherOverloadLimit=10 "M12L"
orOverloadLimit=10 M
(which contains a space) and have it interpreted asOverloadLimit=10M
.It was also possible to use a bare letter suffix in place of an expected numeric value, such as
OverloadLimit=M
, and have it interpreted as zero. This happened as well with an arbitrary string whose first letter was one of the MySQL standard modifiersK
,M
, orG
; thus,OverloadLimit=MAX_UINT
also had the effect of settingOverloadLimit
to zero.Now, only one of the suffixes
K
,M
, orG
is accepted with a numeric parameter value, and it must follow the numeric value immediately, with no intervening whitespace characters or quotation marks. In other words, to setOverloadLimit
to 10 megabytes, you must use one ofOverloadLimit=10000000
,OverloadLimit=10M
, orOverloadLimit=10000K
.NoteTo maintain availability, you should check your
config.ini
file for any settings that do not conform to the rule enforced as a result of this change and correct them prior to upgrading. Otherwise, the cluster may not be able to start afterwards, until you rectify the issue.(Bug #33589961)
Enabling
AutomaticThreadConfig
with fewer than 8 CPUs available led to unplanned shutdowns of data nodes. (Bug #33588734)Removed the unused source files
buddy.cpp
andbuddy.hpp
fromstorage/ndb/src/common/transporter/
. (Bug #33575155)The
NDB
stored grants mechanism now sets the session variableprint_identified_with_as_hex
totrue
, so that password hashes stored in thendb_sql_metadata
table are formatted as hexadecimal values rather than being formatted as strings. (Bug #33542052)-
Binary log thread event handling includes optional high-verbosity logging, which, when enabled and the connection to
NDB
lost, produces an excess of log messages like these:datetime 2 [Note] [MY-010866] [Server] NDB Binlog: cluster failure for epoch 55/0. datetime 2 [Note] [MY-010866] [Server] NDB Binlog: cluster failure for epoch 55/0.
Such repeated log messages, not being of much help in diagnosing errors, have been removed. This leaves a similar log message in such cases, from the handling of schema distribution event operation teardown. (Bug #33492244)
Historically, a number of different methods have been used to enforce compile-time checks of various interdependencies and assumptions in the
NDB
codebase in a portable way. Since the standardstatic_assert()
function is now always available, theNDB_STATIC_ASSERT
andSTATIC_ASSERT
macros have been replaced with direct usage ofstatic_assert()
. (Bug #33466577)-
When the internal
AbstractQueryPlan
interface determined the access type to be used for a specific table, it tried to work around an optimizer problem where theref
access type was specified for a table and later turned out to be accessible byeq_ref
. The workaround introduced a new issue by sometimes determiningeq_ref
access for a table actually needingref
access; in addition, the prior fix did not take into accountUNIQUE USING HASH
indexes, which need eithereq_ref
or full table scan access, even when the MySQL Optimizer regards it as aref
access.We fix this by first removing the workaround (which had been made obsolete by the proper fix for the previous issue), and then by introducing the setting of
eq_ref
orfull_table_scan
access for hash indexes. (Bug #33451256)References: This issue is a regression of: Bug #28965762.
When a pushed join is prepared but not executed, the
Ndb_pushed_queries_dropped
status variable is incremented. Now, in addition to this,NDB
now emits a warning Prepared pushed join could not be executed... which is passed toER_GET_ERRMSG
. (Bug #33449000)-
The deprecated
-r
option for ndbd has been removed. In addition, this change also removes extraneous text from the output of ndbd--help
. (Bug #33362935)References: See also: Bug #31565810.
ndb_import sometimes could not parse correctly a
.csv
file containing Windows/DOS-style (\r\n
) linefeeds. (Bug #32006725)-
The ndb_import tool handled only the hidden primary key which is defined by
NDB
when a table does not have an explicit primary key. This caused an error when inserting a row containingNULL
for an auto-increment primary key column, even though the same row was accepted byLOAD DATA INFILE
.We fix this by adding support for importing a table with one or more instances of
NULL
in an auto-increment primary key column. This includes a check that a table has no more than one auto-increment column; if this column is nullable, it is redefined by ndb_import asNOT NULL
, and any occurrence ofNULL
in this column is replaced by a generated auto-increment value before inserting the row intoNDB
. (Bug #30799495) When a node failure is detected, surviving nodes in the same nodegroup as this node attempt to resend any buffered change data to event subscribers. In cases in which there were no outstanding epoch deliveries, that is, the list of unacknowledged GCIs was empty, the surviving nodes made the incorrect assumption that this list would never be empty. (Bug #30509416)
-
When executing a copying
ALTER TABLE
of the parent table for a foreign key and the SQL node terminates prior to completion, there remained an extraneous temporary table with (additional, temporary) foreign keys on all child tables. One consequence of this issue was that it was not possible to restore a backup made using mysqldump--no-data
.To fix this,
NDB
now performs cleanup of temporary tables whenever a mysqld process connects (or reconnects) to the cluster. (Bug #24935788, Bug #29892252) An unplanned data node shutdown occurred following a bus error on Mac OS X for ARM. We fix this by moving the call to
NdbCondition_Signal()
(inAsyncIoThread.cpp
) such that it executes prior toNdbMutex_Unlock()
—that is, into the mutex, so that the condition being signalled is not lost during execution. (Bug #105522, Bug #33559219)In
DblqhMain.cpp
, a missing return in the internalexecSCAN_FRAGREQ()
function led to an unplanned shutdown of the data node when inserting a nonfatal error. In addition, the condition!seize_op_rec(tcConnectptr)
present in the same function was never actually checked. (Bug #105051, Bug #33401830, Bug #33671869)-
It was possible to set any of
MaxNoOfFiredTriggers
,MaxNoOfLocalScans
, andMaxNoOfLocalOperations
concurrently withTransactionMemory
, although this is not allowed.In addition, it was not possible to set any of
MaxNoOfConcurrentTransactions
,MaxNoOfConcurrentOperations
, orMaxNoOfConcurrentScans
concurrently withTransactionMemory
, although there is no reason to prevent this.In both cases, the concurrent settings behavior now matches the documentation for the
TransactionMemory
parameter. (Bug #102509, Bug #32474988) -
When a redo log part is unable to accept an operation's log entry immediately, the operation (a prepare, commit, or abort) is queued, or (prepare only) optionally aborted. By default operations are queued.
This mechanism was modified in 8.0.23 as part of decoupling local data managers and redo log parts, and introduced a regression whereby it was possible for queued operations to remain in the queued state until all activity on the log part quiesced. When this occurred, operations could remain queued until
DBTC
declared them timed out, and aborted them. (Bug #102502, Bug #32478380)