MySQL NDB Cluster 8.0.23 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB
storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster 8.0.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.23 (see Changes in MySQL 8.0.23 (2021-01-18, General Availability)).
-
Important Change: As part of the terminology changes begun in MySQL 8.0.21 and NDB 8.0.21, the
ndb_slave_conflict_role
system variable is now deprecated, and is being replaced withndb_conflict_role
.In addition, a number of status variables have been deprecated and are being replaced, as shown in the following table:
Table 1 Deprecated NDB status variables and their replacements
Also as part of this work, the
ndbinfo.table_distribution_status
table'stab_copy_status
column valuesADD_TABLE_MASTER
andADD_TABLE_SLAVE
are deprecated, and replaced by, respectively,ADD_TABLE_COORDINATOR
andADD_TABLE_PARTICIPANT
.Finally, the
--help
output of some NDB utility programs such as ndb_restore has been updated. (Bug #31571031) -
NDB Client Programs: Effective with this release, the MySQL NDB Cluster Auto-Installer (ndb_setup.py) has been has been removed from the NDB Cluster binary and source distributions, and is no longer supported. (Bug #32084831)
References: See also: Bug #31888835.
ndbmemcache:
ndbmemcache
, which was deprecated in the previous release of NDB Cluster, has now been removed from NDB Cluster, and is no longer supported. (Bug #32106576)
-
As part of work previously done in NDB 8.0, the metadata check performed as part of auto-synchronization between the representation of an
NDB
table in the NDB dictionary and its counterpart in the MySQL data dictionary has been extended to include, in addition to table-level properties, the properties of columns, indexes, and foreign keys. (This check is also made by a debug MySQL server when performing aCREATE TABLE
statement, and when opening anNDB
table.)As part of this work, any mismatches found between an object's properties in the NDB dictionary and the MySQL data dictionary are now written to the MySQL error log. The error log message includes the name of the property, its value in the NDB dictionary, and its value in the MySQL data dictionary. If the object is a column, index, or foreign key, the object's type is also indicated in the message. (WL #13412)
-
The
ThreadConfig
parameter has been extended with two new thread types, query threads and recovery threads, intended for scaleout of LDM threads. The number of query threads must be a multiple of the number of LDM threads, up to a maximum of 3 times the number of LDM threads.It is also now possible when setting
ThreadConfig
to combine themain
andrep
threads into a single thread by setting either or both of these arguments to 0.When one of these arguments is set to 0 but the other remains set to 1, the resulting combined thread is named
main_rep
. When both are set to 0, they are combined with therecv
thread (assuming thatrecv
to 1), and this combined thread is namedmain_rep_recv
. These thread names are those shown when checking thethreads
table in thendbinfo
information database.In addition, the maximums for a number of existing thread types have been increased. The new maximums are: LDM threads: 332; TC threads: 128; receive threads: 64; send threads: 64; main threads: 2. (The maximums for query threads and recovery threads are 332 each.) Maximums for other thread types remain unchanged from previous NDB Cluster releases.
Another change related to this work causes
NDB
to employ mutexes for protecting job buffers when more than 32 block threads are in use. This may cause a slight decrease in performance (roughly 1 to 2 percent), but also results in a decrease in the amount of memory used by very large configurations. For example, a setup with 64 threads which used 2 GB of job buffer memory previously should now require only about 1 GB instead. In our testing this has resulted in an overall improvement (on the order of 5 percent) in the execution of very complex queries.For more information, see the descriptions of the arguments to the
ThreadConfig
parameter discussed previously, and of thendbinfo.threads
table. (WL #12532, WL #13219, WL #13338) -
This release adds the possibility of configuring the threads for multithreaded data nodes (ndbmtd) automatically by implementing a new data node configuration parameter
AutomaticThreadConfig
. When set to 1,NDB
sets up the thread assignments automatically, based on the number of processors available to applications. If the system does not limit the number of processors, you can do this by settingNumCPUs
to the desired number. Automatic thread configuration makes it unnecessary to set any values forThreadConfig
orMaxNoOfExecutionThreads
inconfig.ini
; ifAutomaticThreadConfig
is enabled, settings for either of these parameters are not used.As part of this work, a set of tables providing information about hardware and CPU availability and usage by NDB data nodes have been added to the
ndbinfo
information database. These tables, along with a brief description of the information provided by each, are listed here:cpudata
: CPU usage during the past secondcpudata_1sec
: CPU usage per second over the past 20 secondscpudata_20sec
: CPU usage per 20-second interval over the past 400 secondscpudata_50ms
: CPU usage per 50-millisecond interval during the past secondcpuinfo
: The CPU on which the data node executeshwinfo
: The hardware on the host where the data node resides
Not all of the tables listed are available on all platforms supported by NDB Cluster:
The
cpudata
,cpudata_1sec
,cpudata_20sec
, andcpudata_50ms
tables are available only on Linux and Solaris operating systems.The
cpuinfo
table is not available on FreeBSD or macOS.
(WL #13980)
Added statistical information in the
DBLQH
block which is employed to track the use of key lookups and scans, as well as tracking queries fromDBTC
andDBSPJ
. By detecting situations in which the load is high, but in which there is not actually any need to decrease the number of rows scanned per realtime break, rather than checking the size of job buffer queues to decide how many rows to scan, this makes it possible to scan more rows when there is no CPU congestion. This helps improve performance and realtime behaviour when handling high loads. (WL #14081)-
A new method for handling table partitions and fragments is introduced, such that the number of local data managers (LDMs) for a given data node can determined independently of the number of redo log parts, and that the number of LDMs can now be highly variable.
NDB
employs this method when theClassicFragmentation
data node configuration parameter, implemented as part of this work, is set tofalse
. When this is done, the number of LDMs is no longer used to determine how many partitions to create for a table per data node; instead, thePartitionsPerNode
parameter, also introduced in this release, now determines this number, which is now used for calculating how many fragments a table should have.When
ClassicFragmentation
has its default valuetrue
, then the traditional method of using the number of LDMs is used to determine how many fragments a table should have.For more information, see Multi-Threading Configuration Parameters (ndbmtd). (WL #13930, WL #14107)
macOS: Removed a number of compiler warnings which occurred when building
NDB
for Mac OS X. (Bug #31726693)Microsoft Windows: Removed a compiler warning C4146: unary minus operator applied to unsigned type, result still unsigned from Visual Studio 2013 found in
storage\ndb\src\kernel\blocks\dbacc\dbaccmain.cpp
. (Bug #23130016)Solaris: Due to a source-level error,
atomic_swap_32()
was supposed to be specified but was not actually used for Solaris builds of NDB Cluster. (Bug #31765608)NDB Replication: After issuing
RESET REPLICA ALL / RESET SLAVE ALL
,NDB
failed to detect that the replica had restarted. (Bug #31515760)NDB Cluster APIs: Removed redundant usage of
strlen()
in the implementation ofNdbDictionary
and related internal classes in the NDB API. (Bug #100936, Bug #31930362)-
MySQL NDB ClusterJ: When a
DomainTypeHandler
was instantiated by aSessionFactory
, it was stored locally in a static map,typeToHandlerMap
. If multiple, distinctSessionFactories
for separate connections to the data nodes were obtained by a ClusterJ application, the statictypeToHandlerMap
would be shared by all those factories. When one of theSessionFactories
was closed, the connections it created were closed and any tables opened by the connections were cleared from the NDB API global cache. However, thetypeToHandlerMap
was not cleared, and through it the otherSessionFactories
keep accessing theDomainTypeHandlers
of tables that had already been cleared. These obsoleteDomainTypeHandlers
contained invalidNdbTable
references and anyndbapi
calls using those table references ended up with errors.This patch fixes the issue by making the
typeToHandlerMap
and the relatedproxyInterfacesToDomainClassMap
maps local to aSessionFactory
, so that they are cleared when theSessionFactory
is closed. (Bug #31710047) MySQL NDB ClusterJ: Setting
com.mysql.clusterj.connection.pool.size=0
made connections to an NDB Cluster fail. With this fix, settingcom.mysql.clusterj.connection.pool.size=0
disables connection pooling as expected, so that every request for aSessionFactory
results in the creation of a new factory and separate connections to the cluster can be created using the same connection string. (Bug #21370745, Bug #31721416)-
When calling
disk_page_abort_prealloc()
, the callback from this internal function is ignored, and so removal of the operation record for theLQHKEYREQ
signal proceeds without waiting. This left the table subject to removal before the callback had completed, leading to a failure inPGMAN
when the page was retrieved from disk.To avoid this, we add an extra usage count for the table especially for this page cache miss; this count is decremented as soon as the page cache miss returns. This means that we guarantee that the table is still present when returning from the disk read. (Bug #32146931)
When a table was created, it was possible for a fragment of the table to be checkpointed too early during the next local checkpoint. This meant that Prepare Phase LCP writes were still being performed when the LCP completed, which could lead to problems with subsequent
ALTER TABLE
statements on the table just created. Now we wait for any potential Prepare Phase LCP writes to finish before the LCP is considered complete. (Bug #32130918)-
Using the maximum size of an index key supported by index statistics (3056 bytes) caused buffer issues in data nodes. (Bug #32094904)
References: See also: Bug #25038373.
-
NDB
now prefersCLOCK_MONOTONIC
which on Linux is adjusted by frequency changes but is not updated during suspend. On macOS,NDB
instead usesCLOCK_UPTIME_RAW
which is the same, except that it is not affected by any adjustments.In addition, when intializing
NdbCondition
the monotonic clock to use is taken directly fromNdbTick
, rather than re-executing the same preprocessor logic used byNdbTick
. (Bug #32073826) ndb_restore terminated unexpectedly when run with the
--decrypt
option on big-endian systems. (Bug #32068854)When the data node receive thread found that the job buffer was too full to receive, nothing was done to ensure that, the next time it checked, it resumed receiving from the transporter at the same point at which it stopped previously. (Bug #32046097)
-
The metadata check failed during auto-synchronization of tables restored using the ndb_restore tool. This was a timing issue relating to indexes, and was found in the following two scenarios encountered when a table had been selected for auto-synchronization:
When the indexes had not yet been created in the NDB dictionary
When the indexes had been created, but were not yet usable
(Bug #32004637)
Optimized sending of packed signals by registering the kernel blocks affected and the sending functions which need to be called for each one in a data structure rather than looking up this information each time. (Bug #31936941)
-
When two data definition language statements—one on a database and another on a table in the same schema—were run in parallel, it was possible for a deadlock to occur. The DDL statement affecting the database acquired the global schema lock first, but before it could acquire a metadata lock on the database, the statement affecting the table acquired an intention-exclusive metadata lock on the schema. The table DDL statement was thus waiting for the global schema lock to upgrade its metadata lock on the table to an exclusive lock, while the database DDL statement waited for an exclusive metadata lock on the database, leading to a deadlock.
A similar type of deadlock involving tablespaces and tables was already known to occur; NDB already detected and resolved that issue. The current fix extends that logic to handle databases and tables as well, to resolve the problem. (Bug #31875229)
Clang 8 raised a warning due to an uninitialized variable. (Bug #31864792)
An empty page acquired for an insert did not receive a log sequence number. This is necessary in case the page was used previously and thus required undo log execution before being used again. (Bug #31859717)
No reason was provided when rejecting an attempt to perform an in-place
ALTER TABLE ... ADD PARTITION
statement on a fully replicated table. (Bug #31809290)When the master node had recorded a more recent GCI than a node starting up which had performed an unsuccessful restart, subsequent restarts of the latter could not be performed because it could not restore the stated GCI. (Bug #31804713)
When using 3 or 4 fragment replicas, it is possible to add more than one node at a time, which means that
DBLQH
andDBDIH
can have distribution keys based on numbers of fragment replicas that differ by up to 3 (that is,MAX_REPLICAS
- 1), rather than by only 1. (Bug #31784934)It was possible in
DBLQH
for anABORT
signal to arrive fromDBTC
before it received anLQHKEYREF
signal from the next local query handler. Now in such cases, the out-of-orderABORT
signal is ignored. (Bug #31782578)NDB
did not handle correctly the case when anALTER TABLE ... COMMENT="..."
statement did not specifyALGORITHM=COPY
. (Bug #31776392)It was possible in some cases to miss the end point of undo logging for a fragment. (Bug #31774459)
-
ndb_print_sys_file did not work correctly with version 2 of the
sysfile
format that was introduced in NDB 8.0.18. (Bug #31726653)References: See also: Bug #31828452.
DBLQH
could not handle the case in which identical operation records having the same transaction ID came from different transaction coordinators. This led to locked rows persisting after a node failure, which kept node recovery from completing. (Bug #31726568)It is possible for
DBDIH
to receive a local checkpoint having a given ID to restore while a later LCP is actually used instead, but when performing a partial LCP in such cases, theDIH
block was not fully synchronized with the ID of the LCP used. (Bug #31726514)-
In most cases, when searching a hash index, the row is used to read the primary key, but when the row has not yet been committed the primary key may be read from the copy row. If the row has been deleted, it can no longer be used to read the primary key. Previously in such cases, the primary key was treated as a NULL, but this could lead to making a comparison using uninitialised data.
Now when this occurs, the comparison is made only if the row has not been deleted; otherwise the row is checked of among the operations in the serial queue. If no operation has the primary key, then any comparison can be reported as not equal, since no entry in the parallel queue can reinsert the row. This needs to be checked due to the fact that, if an entry in the serial queue is an insert then the primary key from this operation must be identified as such to preclude inserting the same primary key twice. (Bug #31688797)
As with writing redo log records, when the file currently used for writing global checkpoint records becomes full, writing switches to the next file. This switch is not supposed to occur until the new file is actually ready to receive the records, but no check was made to ensure that this was the case. This could lead to an unplanned data node shutdown restoring data from a backup using ndb_restore. (Bug #31585833)
-
Release of shared global memory when it is no longer required by the
DBSPJ
block now occurs more quickly than previously. (Bug #31321518)References: See also: Bug #31231286.
Stopping 3 nodes out of 4 in a single node group using kill -9 caused an unplanned cluster shutdown. To keep this from happening under such conditions,
NDB
now ensures that any node group that has not had any node failures is viewed by arbitration checks as fully viable. (Bug #31245543)Multi-threaded index builds could sometimes attempt to use an internal function disallowed to them. (Bug #30587462)
While adding new data nodes to the cluster, and while the management node was restarting with an updated configuration file, some data nodes terminated unexpectedly with the error virtual void TCP_Transporter::resetBuffers(): Assertion `!isConnected()' failed. (Bug #30088051)
It was not possible to execute
TRUNCATE TABLE
orDROP TABLE
for the parent table of a foreign key withforeign_key_checks
set to 0. (Bug #97501, Bug #30509759)Optimized the internal
NdbReceiver::unpackNdbRecord()
method, which is used to convert rows retrieved from the data nodes from packed wire format to the NDB API row format. Prior to the change, roughly 13% of CPU usage for executing a join occurred within this method; this was reduced to approximately 8%. (Bug #95007, Bug #29640755)