MySQL NDB Cluster 8.0.33 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.33 (see Changes in MySQL 8.0.33 (2023-04-18, General Availability)).
NDB Replication: NDB Cluster replication now supports the MySQL multithreaded applier (MTA) on replica servers. This makes it possible for binary log transactions to be applied in parallel on the replica, increasing peak replication throughput. To enable this on the replica, it is necessary to perform the following steps:
--ndb-log-transaction-dependencyoption, added in this release, to
ON. This must be done on startup of the source mysqld.
binlog_transaction_dependency_trackingserver system variable to
WRITESET, also on the source, which causes transaction dependencies to be determined at the source. This can be done at runtime.
Make sure the replica uses multiple worker threads; this is determined by the value of the
replica_parallel_workersserver system variable, which
NDBnow honors (previously,
NDBeffectively ignored any value set for this variable). The default is 4, and can be changed on the replica at runtime.
You can adjust the size of the buffer used to store the transaction dependency history on the source using the
--binlog-transaction-dependency-history-sizeoption. The source should also have
Additionally, on the replica,
For more information about the MySQL replication applier, see Replication Threads. For more information about NDB Cluster replication and the multithreaded applier, see NDB Cluster Replication Using the Multithreaded Applier. (Bug #27960, Bug #11746675, Bug #35164090, Bug #34229520, WL #14885, WL #15145, WL #15455, WL #15457)
NDB Cluster APIs: The
MySQL NDB ClusterJ: Performance has been improved for accessing tables using a single-column partition key when the column is of type
VARCHAR. (Bug #35027961)
Beginning with this release, ndb_restore implements the
--timestamp-printoutsoption, which causes all error, info, and debug node log messages to be prefixed with timestamps. (Bug #34110068)
Microsoft Windows: Two memory leaks found by code inspection were removed from
NDBprocess handles on Windows platforms. (Bug #34872901)
Microsoft Windows: On Windows platforms, the data node angel process did not detect whether a child data node process exited normally. We fix this by keeping an open process handle to the child and using this when probing for the child's exit. (Bug #34853213)
NDB Replication: When using a multithreaded applier, the
end_poscolumns of the
ndb.apply_statustable (see ndb_apply_status Table) did not contain the correct position information. (Bug #34806344)
NDB Cluster APIs; MySQL NDB ClusterJ: MySQL ClusterJ uses a scratch buffer for primary key hash calculations which was limited to 10000 bytes, which proved too small in some cases. Now we
malloc()the buffer if its size is not sufficient.
This also fixes an issue with the
computeHash()in the NDB API: Previously, if either of these methods was passed a temporary buffer of insufficient size, the method failed. Now in such cases a temporary buffer is allocated.
Our thanks to Mikael Ronström for this contribution. (Bug #103814, Bug #32959894)
NDB Cluster APIs: When dropping an event operation (
NdbEventOperation) in the NDB API, it was sometimes possible for the dropped event operation to remain visible to the application after instructing the data nodes to stop sending events related to this event operation, but before all pending buffered events were consumed and discarded. This could be observed in certain cases when performing an online alter operation, such as
RENAME COLUMN, along with concurrent writes to the affected table.
Further analysis showed that the dropped events were accessible when iterating through event operations with
Ndb::getGCIEventOperations(). Now, this method skips dropped events when called iteratively. (Bug #34809944)
NDB Cluster APIs:
ER_UPDATEDfor an event opened from
NDB, instead of returning the flags actually used by the report object. (Bug #34667384)
Before a new
NDBtable definition can be stored in the data dictionary, any existing definition must be removed. Table definitions have two unique values, the table name and the NDB Cluster
se_private_id. During installation of a new table definition, we check whether there is any existing definition with the same table name and, if so, remove it. Then we check whether the table removed and the one being installed have the same
se_private_id; if they do not, any definition that is occupying this
se_private_idis considered stale, and removed as well.
Problems arose when no existing definition was found by the search using the table's name, since no definition was dropped even if one occupied
se_private_id, leading to a duplicate key error when attempting to store the new table. The internal
store_table()function attempted to clear the diagnostics area, remove the stale definition of
se_private_id, and try to store it once again, but the diagnostics area was not actually cleared, thus leaking the error is thus leaked and presenting it to the user.
To fix this, we remove any stale table definition, regardless of any action taken (or not) by
store_table(). (Bug #35089015)
Fixed the following two issues in the output of ndb_restore:
The backup file format version was shown for both the backup file format version and the version of the cluster which produced the backup.
To reduce confusion between the version of the file format and the version of the cluster which produced the backup, the backup file format version is now shown using hexadecimal notation.
References: This issue is a regression of: Bug #34110068.
Removed a memory leak in the
DBDICTkernel block caused when an internal foreign key definition record was not released when no longer needed. This could be triggered by either of the following events:
Drop of a foreign key constraint on an
Rejection of an attempt to create a foreign key constraint on an
Such records use the
DISK_RECORDSmemory resource; you can check this on a running cluster by executing
SELECT node_id, used FROM ndbinfo.resources WHERE resource_name='DISK_RECORDS'in the mysql client. This resource uses
SharedGlobalMemory, exhaustion of which could lead not only to the rejection of attempts to create foreign keys, but of queries making use of joins as well, since the
DBSPJblock also uses shared global memory by way of
QUERY_MEMORY. (Bug #35064142)
When attempting a copying alter operation with
--ndb-allow-copying-alter-table = OFF, the reason for rejection of the statement was not always made clear to the user. (Bug #35059079)
When a transaction coordinator is starting fragment scans with many fragments to scan, it may take a realtime break (RTB) during the process to ensure fair CPU access for other requests. When the requesting API disconnected and API failure handling for the scan state occurred before the RTB continuation returned, continuation processing could not proceed because the scan state had been removed.
We fix this by adding appropriate checks on the scan state as part of the continuation process. (Bug #35037683)
Sender and receiver signal IDs were printed in trace logs as signed values even though they are actually unsigned 32-bit numbers. This could result in confusion when the top bit was set, as it cuased such numbers to be shown as negatives, counting upwards from
-MAX_32_BIT_SIGNED_INT. (Bug #35037396)
A fiber used by the
DICTblock monitors all indexes, and triggers index statistics calculations if requested by
DBTUXindex fragment monitoring; these calculations are performed using a schema transaction. When the
DICTfiber attempts but fails to seize a transaction handle for requesting a schema transaction to be started, fiber exited, so that no more automated index statistics updates could be performed without a node failure. (Bug #34992370)
References: See also: Bug #34007422.
Schema objects in NDB use composite versioning, comprising major and minor subversions. When a schema object is first created, its major and minor versions are set; when an existing schema object is altered in place, its minor subversion is incremented.
At restart time each data node checks schema objects as part of recovery; for foreign key objects, the versions of referenced parent and child tables (and indexes, for foreign key references not to or from a table's primary key) are checked for consistency. The table version of this check compares only major subversions, allowing tables to evolve, but the index version also compares minor subversions; this resulted in a failure at restart time when an index had been altered.
We fix this by comparing only major subversions for indexes in such cases. (Bug #34976028)
References: See also: Bug #21363253.
ndb_import sometimes silently ignored hint failure for tables having large
VARCHARprimary keys. For hinting which transaction coordinator to use, ndb_import can use the row's partitioning key, using a 4092 byte buffer to compute the hash for the key.
This was problematic when the key included a
VARCHARcolumn using UTF8, since the hash buffer may require in bytes up to 24 times the number of maximum characters in the column, depending on the column's collation; the hash computation failed but the calling code in ndb_import did not check for this, and continued using an undefined hash value which yielded an undefined hint.
This did not lead to any functional problems, but was not optimal, and the user was not notified of it.
We fix this by ensuring that ndb_import always uses sufficient buffer for handling character columns (regardless of their collations) in the key, and adding a check in ndb_import for any failures in hash computation and reporting these to the user. (Bug #34917498)
ndbclusterplugin creates the
ndb_schematable, the plugin inserts a row containing metadata, which is needed to keep track of this NDB Cluster instance, and which is stored as a set of key-value pairs in a row in this table.
ndb_schematable is hidden from MySQL and so not possible to query using SQL, but contains a UUID generated by the same MySQL server that creates the
ndb_schematable; the same UUID is also stored as metadata in the data dictionary of each MySQL Server when the
ndb_schematable is installed on it.
When a mysqld connects (or reconnects) to
NDB, it compares the UUID in its own data dictionary with the UUID stored in
NDBin order to detect whether it is reconnecting to the same cluster; if not, the entire contents of the data dictionary are scrapped in order to make it faster and easier to install all tables fresh from
One such case occurs when all
NDBdata nodes have been restarted with
--initial, thus removing all data and tables. Another happens when the
ndb_schematable has been restored from a backup without restoring any of its data, since this means that the row for the
ndb_schematable would be missing.
To deal with these types of situations, we now make sure that, when synchronization has completed, there is always a row in the
NDBdictionary with a UUID matching the UUID stored in the MySQL server data dictionary. (Bug #34876468)
When running an NDB Cluster with multiple management servers, termination of the ndb_mgmd processes required an excessive amount of time when shutting down the cluster. (Bug #34872372)
Schema distribution timeout was detected by the schema distribution coordinator after dropping and re-creating the
mysql.ndb_schematable when any nodes that were subscribed beforehand had not yet resubscribed when the next schema operation began. This was due to a stale list of subscribers being left behind in the schema distribution data; these subscribers were assumed by the coordinator to be participants in subsequent schema operations.
We fix this issue by clearing the list of known subscribers whenever the
mysql.ndb_schematable is dropped. (Bug #34843412)
When requesting a new global checkpoint (GCP) from the data nodes, such as by the NDB Cluster handler in mysqld to speed up delivery of schema distribution events and responses, the request was sent 100 times. While the
DBDIHblock attempted to merge these duplicate requests into one, it was possible on occasion to trigger more than one immediate GCP. (Bug #34836471)
DBSPJblock receives a query for execution, it sets up its own internal plan for how to do so. This plan is based on the query plan provided by the optimizer, with adaptions made to provide the most efficient execution of the query, both in terms of elapsed time and of total resources used.
Query plans received by
DBSPJoften contain star joins, in which several child tables depend on a common parent, as in the query shown here:
SELECT STRAIGHT_JOIN * FROM t AS t1 INNER JOIN t AS t2 ON t2.a = t1.k INNER JOIN t AS t3 ON t3.k = t1.k;
In such cases
DBSPJcould submit key-range lookups to
t3in parallel (but does not do so). An inner join also has the property that each inner joined row requires a match from the other tables in the same join nest, else the row is eliminated from the result set. Thus, by using the key-range lookups, we may retrieve rows from one such lookup which have no matches in the other, which effort is ultimately wasted. Instead,
DBSPJsets up a sequential plan for such a query.
It was found that this worked as intended for queries having only inner joins, but if any of the tables are left-joined, we did not take complete advantage of the preceding inner joined tables before issuing the outer joined tables. Suppose the previous query is modified to include a left join, like this:
SELECT STRAIGHT_JOIN * FROM t AS t1 INNER JOIN t AS t2 ON t2.a = t1.k LEFT JOIN t AS t3 ON t3.k = t1.k;
Using the following query against the
ndbinfo.counterstable, it is possible to observe how many rows are returned for each query before and after query execution:
SELECT counter_name, SUM(val) FROM ndbinfo.counters WHERE block_name="DBSPJ" AND counter_name = "SCAN_ROWS_RETURNED";
It was thus determined that requests on
t3were submitted in parallel. Now in such cases, we wait for the inner join to complete before issuing the left join, so that unmatched rows from
t1can be eliminated from the outer join on
t3. This results in less work to be performed by the data nodes, and reduces the volumne handled by the transporter as well. (Bug #34782276)
SPJ handling of a sorted result was found to suffer a significant performance impact compared to the same result set when not sorted. Further investigation showed that most of the additional performance overhead for sorted results lay in the implementation for sorted result retrieval, which required an excessive number of
SCAN_NEXTREQround trips between the client and
DBSPJon the data nodes. (Bug #34768353)
DBSPJnow implements the
firstMatchoptimization for semijoins and antijoins, such as those found in
NOT EXISTSsubqueries. (Bug #34768191)
SCAN_NEXTREQsignals to the data nodes, it tries to determine the optimum number of fragments to scan in parallel without starting more parallel scans than needed to fill the available batch buffers, thus avoiding any need to send additional
SCAN_NEXTREQsignals to complete the scan of each fragment.
DBSPJblock's statistics module calculates and samples the parallelism which was optimal for fragment scans just completed, for each completed
SCAN_FRAGREQ, providing a mean and standard deviation of the sampled parallelism. This makes it possible to calculate a lower 95th percentile of the parallelism (and batch size) which makes it possible to complete a
SCAN_FRAGREQwithout needing additional
It was found that the parallelism statistics seemed unable to provide a stable parallelism estimate and that the standard deviation was unexpectedly high. This often led to the parallelism estimate being a negative number (always rounded up to 1).
The flaw in the statistics calculation was found to be an underlying assumption that each sampled
SCAN_FRAGREQcontained the same number of key ranges to be scanned, which is not necessarily the case. Typically a full batch of rows for the first
SCAN_FRAGREQ, and relatively few rows for the final
SCAN_NEXTREQreturning the remaining rows; this resulted in wide variation in parallelism samples which made the statistics obtained from them unreliable.
We fix this by basing the statistics on the number of keys actually sent in the
SCAN_FRAGREQ, and counting the rows returned from this request. Based on this it is possible to obtain record-per-key statistics to be calculated and sampled. This makes it possible to calculate the number of fragments which can be scanned, without overflowing the batch buffers. (Bug #34768106)
It was possible in certain cases that both the
NDBbinary logging thread and metadata synchronization attempted to synchronize the
ndb_apply_statustable, which led to a race condition. We fix this by making sure that the
ndb_apply_statustable is monitored and created (or re-created) by the binary logging thread only. (Bug #34750992)
While starting a schema operation, the client is responsible for detecting timeouts until the coordinator has received the first schema event; from that point, any schema operation timeout should be detected by the coordinator. A problem occurred while the client was checking the timeout; it mistakenly set the state indicating that timeout had occurred, which caused the coordinator to ignore the first schema event taking longer than approximately one second to receive (that is, to write the send event plus handle in the binary logging thread). This had the effect that, in these cases, the coordinator was not involved in the schema operation.
We fix this by change the schema distribution timeout checking to be atomic, and to let it be performed by either the client or the coordinator. In addition, we remove the state variable used for keeping track of events received by the coordinator, and rely on the list of participants instead. (Bug #34741743)
An SQL node did not start up correctly after restoring data with ndb_restore, such that, when it was otherwise ready to accept connections, the binary log injector thread never became ready. It was found that, when a mysqld was started after a data node initial restore from which new table IDs were generated, the utility table's (
ndb_*) MySQL data dictionary definition might not match the NDB dictionary definition.
The existing mysqld definition is dropped by name, thus removing the unique
ndbcluster-key in the MySQL data dictionary but the new table ID could also already be occupied by another (stale) definition. The resulting mistmatch prevented setup of the binary log.
To fix this problem we now explicitly drop any
ndbcluster-definitions that might clash in such cases with the table being installed. (Bug #34733051)
After receiving a
SIGTERMsignal, ndb_mgmd did not wait for all threads to shut down before exiting. (Bug #33522783)
References: See also: Bug #32446105.
When multiple operations are pending on a single row, it is not possible to commit an operation which is run concurrently with an operation which is pending abort. This could lead to data node shutdown during the commit operation in
DBACC, which could manifest when a single transaction contained more than
In addition, a transaction containing insert operations is rolled back if a statement that uses a locking scan on the prepared insert fails due to too many DML operations. This could lead to an unplanned data node shutdown during tuple deallocation due to a missing reference to the expected
We solve this issue by allowing commit of a scan operation in such cases, in order to release locks previously acquired during the transaction. We also add a new special case for this scenario, so that the deallocation is performed in a single phase, and
DBLQHto deallocate immediately; in
execTUP_DEALLOCREQ()is now able to handle this immediate deallocation request. (Bug #32491105)
References: See also: Bug #28893633, Bug #32997832.
Cluster nodes sometimes reported Failed to convert connection to transporter warnings in logs, even when this was not really necessary. (Bug #14784707)
When started with no connection string on the command line, ndb_waiter printed
Connecting to mgmsrv at (null). Now in such cases, it prints
Connecting to management server at nodeid=0,localhost:1186if no other default host is specified.
--helpoption and other ndb_waiter program output was also improved. (Bug #12380163)
NdbSpin_Init()calculated the wrong number of loops in
NdbSpin, and contained logic errors. (Bug #108448, Bug #32497174, Bug #32594825)
References: See also: Bug #31765660, Bug #32413458, Bug #102506, Bug #32478388.