MySQL NDB Cluster 8.0.29 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster 8.0.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.29 (see Changes in MySQL 8.0.29 (2022-04-26, General Availability)).
This release is no longer available for download. It was
removed due to a critical issue that could cause data in
InnoDB tables having added
columns to be interpreted incorrectly. Please upgrade to MySQL
Cluster 8.0.30 instead.
NDBcould not be built using GCC 11 due to an array out of bounds error. (Bug #33459671)Removed a number of
-Wstringop-truncationwarnings raised when compilingNDBwith GCC 9 as well as suppression of such warnings. Also removed unneeded includes from the header filendb_global.h. (Bug #32233543)
-
Eight new tables providing
NDBdictionary information about database objects have been added to thendbinfoinformation database. This makes it possible to obtain a great deal of information of this type by issuing queries in the mysql client, without the need to use ndb_desc, ndb_select_all, and similar utilities. (It is still be necessary to use ndb_desc to obtain fragment distribution information.) These tables are listed here, together with theNDBobjects about which they provide information:blobs: Blob tablesdictionary_columns: Table columnsdictionary_tables: Tablesevents: Event subscriptionsfiles: Files used by disk data tablesforeign_keys: Foreign keyshash_maps: Hash mapsindex_columns: Table indexes
An additional change in
ndbinfois that onlyfilesandhash_mapsare defined as views; the remaining six tables listed previously are in fact base tables, even though they are not named using thendb$prefix. As a result, these tables are not hidden as otherndbinfobase tables are.For more information, see the descriptions of the tables in ndbinfo: The NDB Cluster Information Database. (WL #11968)
-
ndbclusterplugin threads can now be seen in the Performance Schema. Thethreadsandsetup_threadstables show all three of these threads: the binary logging thread (ndb_binlogthread), the index statistics thread (ndb_index_statthread), and the metadata thread (ndb_metadatathread).This makes it possible to obtain the thread IDs and thread OS IDs of these threads for use in queries on these and other Performance Schema tables.
For more information and examples, see ndbcluster Plugin Threads. (WL #15000)
-
NDB Cluster APIs: The NDB API now implements a
List::clear()method which clears all data from a list. This makes it simpler to reuse an existing list with the Dictionary methodslistEvents(),listIndexes(), andlistObjects().In addition, the
Listdestructor has been modified such that it now callsclear()before attempting the removal of any elements or attributes from the list being destroyed. (Bug #33676070) -
The client receive thread was enabled only when under high load, where the criterion for determining “high load” was that the number of clients waiting in the poll queue (the receive queue) was greater than
min_active_clients_recv_thread(default:8).This was a poor metric for determining high load, since a single client, such as the binary log injector thread handling incoming replication events, could experience high load on its own as well. The same was true of a pushed join query (in which very large batches of incoming
TRANSID_AIsignals are received).We change the receive thread such that it now sleeps in the poll queue rather than being deactivated completely, so that it is now always available for handling incoming signals, even when the client is not under high load. (Bug #33752914)
-
It is now possible to restore the
ndb_apply_statustable from anNDBbackup, using ndb_restore with the--with-apply-statusoption added in this release. In some cases, this information can be useful in new setting up new replication links.--with-apply-statusrestores all rows of thendb_apply_statustable except for the row for which theserver_idvalue is0; use--restore-epochto restore this row.To use the
--with-apply-statusoption, you must also supply--restore-datawhen invoking ndb_restore.For more information, see the description of the
--with-apply-statusoption in the Reference Manual, as well as ndb_apply_status Table. (Bug #32604161, Bug #33594652) -
Previously, when a user query attempted to open an
NDBtable with a missing (or broken) index, the MySQL server raisedNDBerror4243Index not found. Now when such an attempt is made, it is handled as described here:If the query does not make use of the problematic index, the query succeeds with no errors or warnings.
If the query attempts to use the missing or broken index, the query is rejected with a warning from
NDB(Indexidxis not available in NDB. Use "ALTER TABLEtblALTER INDEXidxINVISIBLE" to prevent MySQL from attempting to access it, or use "ndb_restore --rebuild-indexes" to rebuild it), and an error (ER_NOT_KEYFILE).
The rationale for this change is that constraint violations or missing data sometimes make it impossible to restore an index on an
NDBtable, in which case, running ndb_restore with--disable-indexesrestores the data without the index. With this change, once the data is restored from backup, it is possible to use SQL to fix any corrupt data and rebuild the index. (Bug #28584066, WL #14867)
Important Change: The maximum value supported for the
--ndb-batch-sizeserver option has been increased from31536000to2147483648(2 GB). (Bug #21040523)-
Performance: When profiling multithreaded data nodes (ndbmtd) performing a transaction including a large number of inserts, it was found that more than 50% of CPU time was spent in the internal method
Dblqh::findTransaction(). It was found that, when there were many operations belonging to uncommitted transactions in the hash list searched by this method, the hash buckets overfilled, the result being that an excessive number of CPU cycles were consumed searching through the hash buckets.To address this problem, we fix the number of hash buckets at 4095, and scale the size of a hash bucket relative to the maximum number of operations, so that only relatively few items should now be placed in the same bucket. (Bug #33803541)
References: See also: Bug #33803487.
-
Performance: When inserting a great many rows into an empty or small table in the same transaction, the rate at which rows were inserted quickly declined to less than 50% of the initial rate; subsequently, it was found that roughly 50% of all CPU time was spent in
Dbacc::getElement(), and the root cause identified to be the timing of resizing the structures used for storing elements byDBACC, growing with the insertion of more rows in the same transaction, and shrinking following a commit.We fix this issue by checking for a need to resize immediately following the insertion or deletion of an element. This also handles the subsequent rejection of an insert. (Bug #33803487)
References: See also: Bug #33803541.
-
Performance: A considerable amount of time was being spent searching the event buffer data hash (using the internal method
EventBufData_hash::search()), due to the following issues:The number of buckets proved to be too low under high load, when the hash bucket list could become very large.
The hash buckets were implemented using a linked list. Traversing a long linked list can be highly inefficient.
We fix these problems by using a vector (
std::vector) rather than a linked list, and by making the array containing the set of hash buckets expandable. (Bug #33796754) Performance: The internal function
computeXorChecksum()was implemented such that great care was taken to aid the compiler in generating optimal code, but it was found that it consumed excessive CPU resources, and did not perform as well as a simpler implementation. This function is now reimplemented with a loop summing upXORresults over an array, which appears to result in better optimization with both GCC and Clang compilers. (Bug #33757412)-
Microsoft Windows: The
CompressedLCPdata node configuration parameter had no effect on Windows platforms.NoteWhen upgrading to this release, Windows users should verify the setting for
CompressedLCP; if it was previously enabled, you may experience an increase in CPU usage by I/O threads following the upgrade, when under load, when restoring data as part of a node restart, or in both cases. If this behavior is not desired, disableCompressedLCP.(Bug #33727690)
Microsoft Windows: The internal function
Win32AsyncFile::rmrfReq()did not always check for both ERROR_FILE_NOT_FOUND and ERROR_PATH_NOT_FOUND when either condition was likely. (Bug #33727647)Microsoft Windows: Corrected several minor issues that occurred with file handling on Windows platforms. (Bug #33727629)
-
NDB Replication: When performing certain schema operations on an
NDBtable, including those involving a copyingALTER TABLE, the epoch column in themysql.ndb_apply_statustable on the replica was updated to 0, although this should happen only for transactions originating from storage engines other thanNDBCLUSTER.To fix this, we now update (only) the binary log position when writing a row into
ndb_apply_statusfrom the same server ID as the previous one, but do not overwrite the current epoch when applying schema operations. (Bug #14139386) -
NDB Cluster APIs: Hash key generation using the internal API method
NdbBlob::getBlobKeyHash()ignored the most significant byte of the key. This unnecessarily caused uneven distribution in the NDB API blob hash list, resulting in a increased need for comparing key values, and thus more CPU usage. (Bug #33803583)References: See also: Bug #33783274.
NDB Cluster APIs: Removed an unnecessary assertion that could be hit when iterating through the list returned by
Dictionary::listEvents(). (Bug #33630835)Builds on Ubuntu 21.10 using GCC 11 stopped with -Werror=maybe-uninitialized. (Bug #33976268)
In certain cases,
NDBdid not handle node IDs of data nodes correctly. (Bug #33916404)In some cases,
NDBdid not validate all node IDs of data nodes correctly. (Bug #33896409)In some cases, array indexes were not handled correctly. (Bug #33896389, Bug #33896399, Bug #33916134)
In some cases, integers were not handled correctly. (Bug #33896356)
As part of work done in NDB 8.0.23 to implement the
AutomaticThreadConfigconfiguration parameter, the maximum numbers of LQH and TC threads supported by ndbmtd were raised from 129 each to 332 and 160, respectively. This adversely affected the performance ofexecSEND_PACKED()methods implemented by several NDB kernel blocks, which complete sending of packed signals when the scheduler is about to suspend execution of the current block thread. This was due to continuing simply to iterate over the arrays of such threads despite the arrays' increased size. We fix this by using a bitmask to track the thread states alongside the full arrays. (Bug #33856371)-
When operating on blob columns,
NDBmust add extra operations to read and write the blob head column and blob part rows. These operations are added to the tail of the transaction's operation list automatically when the transaction is executed.To insert a new operation prior to a given operation, it was necessary to traverse the operation list from the beginning until the desired operation was found, with a cost proportional to the length
Lof the list of preceding operations. This is approximately, increasing as more operations are added to the list; when a large number of operations modifying blobs were defined in a batch, this traversal cost was paid for each operation. This had a noticeable impact on performance when reading and writing blobs.L2 / 2We fix this by using list splicing in
NdbTransaction::execute()to eliminate unnecessary traversals of this sort when defining blob operations. (Bug #33797931) -
The block thread scheduler makes frequent calls to
update_sched_config()to update its scheduling strategy. That involves checking the fill degree of the job buffer queues used to send signals between the nodes' internal block threads. When these queues are about to fill up, the thread scheduler assigns a smaller value tomax_signalsfor the next round, in order to reduce the pressure on the job buffers. When the minimum free threshold has been reached, the scheduler yields the CPU while waiting for the consumer threads to free some job buffer slots.The fix in NDB 8.0.18 for a previous issue introduced a mechanism whereby the main thread was allowed to continue executing even when this lower threshold had been reached; in some cases the main thread consumed all job buffers, including those held in reserve, leading to an unplanned shutdown of the data node due to resource exhaustion. (Bug #33792362, Bug #33872577)
References: This issue is a regression of: Bug #29887068.
-
Setting up a cluster with one LDM thread and one query thread using the
ThreadConfigparameter (for example,ThreadConfig=ldm={cpubind=1},query={cpubind=2}) led to unplanned shutdowns of data nodes.This was due to internal thread variables being assigned the wrong values when there were no main or request threads explicitly assigned. Now we make sure in such cases that these are assigned the thread number of the first receive thread, as expected. (Bug #33791270)
NdbEventBufferhash key generation for non-character data reused the same 256 hash keys; in addition, strings of zero length were ignored when calculating hash keys. (Bug #33783274)-
The collection of NDB API statistics based on the
EventBytesRecvdCountevent counter incurred excessive overhead. Now this counter is updated using a value which is aggregated as the event buffer is filled, rather than traversing all of the event buffer data in a separate function call.For more information, see NDB API Statistics Counters and Variables. (Bug #33778923)
The internal method
THRConfig::reorganize_ldm_bindings()behaved unexpectedly, in some cases changing thread bindings afterAutomaticThreadConfighad already bound the threads to the correct CPUs. We fix this by removing the method, no longer using it when parsing configuration data or adding threads. (Bug #33764260)The receiver thread ID was hard-coded in the internal method
TransporterFacade::raise_thread_prio()such that it always acted to raise the priority of the receiver thread, even when called from the send thread. (Bug #33752983)-
A fix in NDB 8.0.28 addressed an issue with the code used by various
NDBcomponents, includingNdb_index_stat, that checked whether the data nodes were up and running. In clusters with multiple SQL nodes, this resulted in an increase in the frequency of race conditions between index statistics threads trying to create a table event on thendb_index_stat_headtable; that is, it was possible for two SQL nodes to try to create the event at the same time, with the losing SQL node raising Error 746 Event name already exists. Due to this error, the binary logging thread ended up waiting for the index statistics thread to signal that its own setup was complete, and so the second SQL node timed out with Could not create index stat system tables after--ndb-wait-setupseconds. (Bug #33728909)References: This issue is a regression of: Bug #32019119.
On a write error, the message printed by ndbxfrm referenced the source file rather than the destination file. (Bug #33727551)
-
A complex nested join was rejected with the error FirstInner/Upper has to be an ancestor or a sibling, which is thrown by the internal
NdbQueryOperationinterface used to define a pushed join in the SPJ API, indicating that the join-nest dependencies for the interface were not properly defined.The query showing the issue had the join nest structure
t2, t1, (t3, (t5, t4)). Neither of the join conditions ont5ort4had any references or explicit dependencies on tablet3, but each had an implicit dependency ont3in virtue of being in a nest within the same nest ast3.When preparing a pushed join,
NDBtracks all required table dependencies between tables and join-nests by adding them to them_ancestorbitmask for each table. For nest level dependencies, they should all be added to the first table in the relevant nest. When the relevant dependencies for a specific table are calculated, they include the set of all tables being explicitly refered in the join condition, plus any implicit dependencies due to the join nests the table is a member of, limited by the uppermost table referred to in the join condition.For this particular join query we did not properly take into account that there might not be any references to tables in the closest upper nest (the nest starting with
t3); in such cases we are dependent on all nests up to the nest containing the uppermost table referenced. We fix the issue by introducing a while-loop in which we add ancestor nest dependencies until we reach this uppermost table. (Bug #33670002) When the transient memory pool (
TransientPool) used internally byNDBgrew above 256 MB, subsequent attempts to shrink the pool caused an error which eventually led to an unplanned shutdown of the data node. (Bug #33647601)Check that the connection to
NDBhas been set up before querying about statistics for partitions. (Bug #33643512)-
When the ordered index
PRIMARYwas not created for thendb_sql_metadatatable, application of stored grants could not proceed due to the missing index.We fix this by protecting creation of utility tables (including ndb_sql_metadata) by wrapping the associated
CREATE TABLEstatement with a schema transaction, thus handling rejection of the statement by rollback. In addition, in the event the newly-created table is not created correctly, it is dropped. These changes avoid leaving behind a table that is only partially created, so that the next attempt to create the utility table starts from the beginning of the process. (Bug #33634453) Removed
-Wmaybe-uninitializedwarnings which occurred when compiling NDB Cluster with GCC 11.2. (Bug #33611915)-
NDBaccepted an arbitrary (and invalid) string of characters following a numeric parameter value in theconfig.iniglobal configuration. For example, it was possible to use eitherOverloadLimit=10 "M12L"orOverloadLimit=10 M(which contains a space) and have it interpreted asOverloadLimit=10M.It was also possible to use a bare letter suffix in place of an expected numeric value, such as
OverloadLimit=M, and have it interpreted as zero. This happened as well with an arbitrary string whose first letter was one of the MySQL standard modifiersK,M, orG; thus,OverloadLimit=MAX_UINTalso had the effect of settingOverloadLimitto zero.Now, only one of the suffixes
K,M, orGis accepted with a numeric parameter value, and it must follow the numeric value immediately, with no intervening whitespace characters or quotation marks. In other words, to setOverloadLimitto 10 megabytes, you must use one ofOverloadLimit=10000000,OverloadLimit=10M, orOverloadLimit=10000K.NoteTo maintain availability, you should check your
config.inifile for any settings that do not conform to the rule enforced as a result of this change and correct them prior to upgrading. Otherwise, the cluster may not be able to start afterwards, until you rectify the issue.(Bug #33589961)
Enabling
AutomaticThreadConfigwith fewer than 8 CPUs available led to unplanned shutdowns of data nodes. (Bug #33588734)Removed the unused source files
buddy.cppandbuddy.hppfromstorage/ndb/src/common/transporter/. (Bug #33575155)The
NDBstored grants mechanism now sets the session variableprint_identified_with_as_hextotrue, so that password hashes stored in thendb_sql_metadatatable are formatted as hexadecimal values rather than being formatted as strings. (Bug #33542052)-
Binary log thread event handling includes optional high-verbosity logging, which, when enabled and the connection to
NDBlost, produces an excess of log messages like these:datetime 2 [Note] [MY-010866] [Server] NDB Binlog: cluster failure for epoch 55/0. datetime 2 [Note] [MY-010866] [Server] NDB Binlog: cluster failure for epoch 55/0.Such repeated log messages, not being of much help in diagnosing errors, have been removed. This leaves a similar log message in such cases, from the handling of schema distribution event operation teardown. (Bug #33492244)
Historically, a number of different methods have been used to enforce compile-time checks of various interdependencies and assumptions in the
NDBcodebase in a portable way. Since the standardstatic_assert()function is now always available, theNDB_STATIC_ASSERTandSTATIC_ASSERTmacros have been replaced with direct usage ofstatic_assert(). (Bug #33466577)-
When the internal
AbstractQueryPlaninterface determined the access type to be used for a specific table, it tried to work around an optimizer problem where therefaccess type was specified for a table and later turned out to be accessible byeq_ref. The workaround introduced a new issue by sometimes determiningeq_refaccess for a table actually needingrefaccess; in addition, the prior fix did not take into accountUNIQUE USING HASHindexes, which need eithereq_refor full table scan access, even when the MySQL Optimizer regards it as arefaccess.We fix this by first removing the workaround (which had been made obsolete by the proper fix for the previous issue), and then by introducing the setting of
eq_reforfull_table_scanaccess for hash indexes. (Bug #33451256)References: This issue is a regression of: Bug #28965762.
When a pushed join is prepared but not executed, the
Ndb_pushed_queries_droppedstatus variable is incremented. Now, in addition to this,NDBnow emits a warning Prepared pushed join could not be executed... which is passed toER_GET_ERRMSG. (Bug #33449000)-
The deprecated
-roption for ndbd has been removed. In addition, this change also removes extraneous text from the output of ndbd--help. (Bug #33362935)References: See also: Bug #31565810.
ndb_import sometimes could not parse correctly a
.csvfile containing Windows/DOS-style (\r\n) linefeeds. (Bug #32006725)-
The ndb_import tool handled only the hidden primary key which is defined by
NDBwhen a table does not have an explicit primary key. This caused an error when inserting a row containingNULLfor an auto-increment primary key column, even though the same row was accepted byLOAD DATA INFILE.We fix this by adding support for importing a table with one or more instances of
NULLin an auto-increment primary key column. This includes a check that a table has no more than one auto-increment column; if this column is nullable, it is redefined by ndb_import asNOT NULL, and any occurrence ofNULLin this column is replaced by a generated auto-increment value before inserting the row intoNDB. (Bug #30799495) When a node failure is detected, surviving nodes in the same nodegroup as this node attempt to resend any buffered change data to event subscribers. In cases in which there were no outstanding epoch deliveries, that is, the list of unacknowledged GCIs was empty, the surviving nodes made the incorrect assumption that this list would never be empty. (Bug #30509416)
-
When executing a copying
ALTER TABLEof the parent table for a foreign key and the SQL node terminates prior to completion, there remained an extraneous temporary table with (additional, temporary) foreign keys on all child tables. One consequence of this issue was that it was not possible to restore a backup made using mysqldump--no-data.To fix this,
NDBnow performs cleanup of temporary tables whenever a mysqld process connects (or reconnects) to the cluster. (Bug #24935788, Bug #29892252) An unplanned data node shutdown occurred following a bus error on Mac OS X for ARM. We fix this by moving the call to
NdbCondition_Signal()(inAsyncIoThread.cpp) such that it executes prior toNdbMutex_Unlock()—that is, into the mutex, so that the condition being signalled is not lost during execution. (Bug #105522, Bug #33559219)In
DblqhMain.cpp, a missing return in the internalexecSCAN_FRAGREQ()function led to an unplanned shutdown of the data node when inserting a nonfatal error. In addition, the condition!seize_op_rec(tcConnectptr)present in the same function was never actually checked. (Bug #105051, Bug #33401830, Bug #33671869)-
It was possible to set any of
MaxNoOfFiredTriggers,MaxNoOfLocalScans, andMaxNoOfLocalOperationsconcurrently withTransactionMemory, although this is not allowed.In addition, it was not possible to set any of
MaxNoOfConcurrentTransactions,MaxNoOfConcurrentOperations, orMaxNoOfConcurrentScansconcurrently withTransactionMemory, although there is no reason to prevent this.In both cases, the concurrent settings behavior now matches the documentation for the
TransactionMemoryparameter. (Bug #102509, Bug #32474988) -
When a redo log part is unable to accept an operation's log entry immediately, the operation (a prepare, commit, or abort) is queued, or (prepare only) optionally aborted. By default operations are queued.
This mechanism was modified in 8.0.23 as part of decoupling local data managers and redo log parts, and introduced a regression whereby it was possible for queued operations to remain in the queued state until all activity on the log part quiesced. When this occurred, operations could remain queued until
DBTCdeclared them timed out, and aborted them. (Bug #102502, Bug #32478380)