MySQL NDB Cluster 8.0.28 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB
storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster 8.0.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.28 (see Changes in MySQL 8.0.28 (2022-01-18, General Availability)).
Added the
ndbinfo
index_stats
table, which provides very basic information aboutNDB
index statistics. It is intended primarily for use in our internal testing, but may be helpful in conjunction with ndb_index_stat and other tools. (Bug #32906654)Previously, ndb_import always tried to import data into a table whose name was derived from the name of the CSV file being read. This release adds a
--table
option (short form:-t
) for this program, which overrides this behavior and specifies the name of the target table directly. (Bug #30832382)
-
Important Change: The deprecated data node option
--connect-delay
has been removed. This option was a synonym for--connect-retry-delay
, which was not honored in all cases; this issue has been fixed, and the option now works correctly. In addition, the short form-r
for this option has been deprecated, and you should expect it to be removed in a future release. (Bug #31565810)References: See also: Bug #33362935.
Microsoft Windows: On Windows, added missing debug and test suite binaries for MySQL Server (commercial) and MySQL NDB Cluster (commercial and community). (Bug #32713189)
-
NDB Replication: The mysqld option
--slave-skip-errors
can be used to allow the replication applier SQL thread to skip over certain numbered errors automatically. This is not recommended in production because it allows replicas to diverge since whole transactions in the binary log are not applied; forNDBCLUSTER
with its epoch transactions, this results in entire epochs of changes not being applied, likely leading to inconsistent data.Ndb also checks the sequence of epochs applied, and stops the replica applier with an error if there is a sequence problem. Where
--slave-skip-errors
is in use, and an error is skipped, this results in a whole epoch transaction being skipped; this is detected on any subsequent attempt to apply an epoch transaction, which results in the replica applier SQL thread being stopped.A new option
--ndb-applier-allow-skip-epoch
is added in this release to allow users to ignore wholly skipped epoch transactions, so that they can use the--slave-skip-errors
option as with other MySQL storage engines. This is intended for use in testing, and not in a production setting. Use of these options is entirely at your own risk.When mysqld is started with the new option (together with
--slave-skip-errors
), detection of a missing epoch generates a warning, but the replica applier SQL thread continues applying. (Bug #33398973) -
NDB Replication: The
log_name
column of thendb_apply_status
table was created asVARBINARY
, despite being defined asVARCHAR
, using thelatin1
character set, causing hex-decoded output when querying the table using some tools.We fix this by detecting the faulty column type in
ndb_apply_status
and reinstalling the table definition into the data dictionary while connecting toNDB
, when mysqld checks the layout of this table. (Bug #33380726) NDB Cluster APIs: Several new basic example C++ NDB API programs have been added to the distribution, under
storage/ndb/ndbapi-examples/ndbapi_basic/
in the source tree. These are shorter and should be easier to understand for newcomers to the NDB API than the existing API examples. They also follow recent C++ standards and practices. These examples have also been added to the NDB API documentation; see Basic NDB API Examples, for more information. (Bug #33378579, Bug #33517296)NDB Cluster APIs: It is no longer possible to use the
DIVERIFYREQ
signal asynchronously. (Bug #33161562)Timing of
wait for scans
log output during online reorganization was not performed correctly. As part of this fix, we change timing to generate one message every 10 seconds rather than scaling indefinitely, so as to supply regular updates. (Bug #35523977)Added missing values checks in ndbd and ndbmtd. (Bug #33661024)
-
Online table reorganization increases the number of fragments of a table, and moves rows between them. This is done in the following steps:
Copy rows to new fragments
Update distribution information (hashmap count and total fragments)
Wait for scan activity using old distribution to stop
Delete rows which have moved out of existing partitions
Remove reference to old hashmap
Wait for scan activity started since step 2 to stop
Due to a counting error, it was possible for the reorganization to hang in step 6; the scan reference count was not decremented, and thus never reached zero as expected. (Bug #33523991)
A
UNIQUE
index created withUSING HASH
does not support ordered or range access operations, but rather only those operations in which the full key is specified, returning at most a single row. Even so, for such an index on an NDB table, range access was still used on the index. (Bug #33466554, Bug #33474345)-
The same pushed join on
NDB
tables returned an incorrect result when thebatched_key_access
optimizer switch was enabled.This issue arose as follows: When the batch key access (BKA) algorithm is used to join two tables, a set of batched keys is first collected from one of the tables; a multirange read (MRR) operation is constructed against the other. A set of bounds (ranges) is specified on the MRR, using the batched keys to construct each bound.
When result rows are returned it is necessary to identify which range each returned row comes from. This is used to identify the outer table row to perform the BKA join with. When the MRR operation in question was a root of a pushed join operation,
SPJ
was unable to retrieve this identifier (RANGE_NO
). We fix this by implementing the missingSPJ
API functionality for returning such aRANGE_NO
from a pushed join query. (Bug #33416308) Each query against the
ndinfo.index_stats
table leaked anNdbRecord
. We fix this by changing the context so that it owns theNdbRecord
object which it creates and then to release theNdbRecord
when going out of scope, and by supporting the creation of one and only one record per context. (Bug #33408123)-
A problem with concurrency occurred when updating cached table statistics with changed rows, when several threads updating same table the threads competed for the
NDB_SHARE
mutex in order to update the cached row count.We fix this by reimplementing the storage of changed rows using an atomic counter rather than trying to take the mutex and update the actual shared value, which reduces the need to serialize the threads. In addition, we now append the number of changed rows to the row count only when removing the statistics from the cache and provide a separate mutex protecting only the cached statistics. (Bug #33384978)
References: See also: Bug #32169848.
-
If the schema distribution client detected a timeout before freeing the schema object when the coordinator received the schema event, the coordinator processed the stale schema event instead of returning.
The coordinator did not know whether a schema distribution timeout was detected by the client, and started processing the schema event as soon as the schema object was valid. To fix this, we indicate the state of the schema object and change its state when the client detects the schema distribution timeout and when the schema event is received by the coordinator, so that both the coordinator and the client are aware of this, and remain synchronized. (Bug #33318201)
The MySQL Optimizer uses two different methods,
handler::read_cost()
andCost_model::page_read_cost()
, to estimate the cost for different access methods, but the cost values returned by these were not always comparable; in some cases this led to the wrong index being chosen and longer execution time for effected queries. To fix this forNDB
, we override the optimizer'spage_read_cost()
method with one specific toNDBCLUSTER
. It was also found while working on this issue that theNDB
handler did not implement theread_time()
method, used byread_cost()
; this method is now implemented byha_ndbcluster
, and thus the optimizer can now properly take into account the cost difference forNDB
when using a unique key as opposed to an ordered index (range scan). (Bug #33317872)-
When opening
NDB
tables for queries, the index statistics are retrieved to help the optimizer select the optimal query plan. Each client accessing the stats acquires the global index statistics mutex both before and after accessing the statistics. This causes mutex contention affecting query performance, whether or not there are queries are operating on the same tables, or on different ones.We fix this by protecting the count of index statistics references with an atomic counter. The problem was clearly visible when benchmarking with more than 32 clients, when throughput did not increase with additional clients. With this fix, the throughput continues to scale with up to 64 clients. (Bug #33317320)
In certain cases, an event's category was not properly detected. (Bug #33304814)
It was not possible to add new data nodes running ndbd to an existing cluster with data nodes running ndbd. (Bug #33193393)
For a user granted the
NDB_STORED_USER
privilege, thepassword_last_changed
column in themysql.user
table was updated each time the SQL node was restarted. (Bug #33172887)DBDICT
did not always perform table name checks correctly. (Bug #33161548)Added a number of missing ID and other values checks in ndbd and ndbmtd. (Bug #33161486, Bug #33162047)
Added a number of missing ID and other values checks in ndbd and ndbmtd. (Bug #33161259, Bug #33161362)
SET_LOGLEVELORD
signals were not always handled correctly. (Bug #33161246)DUMP 11001
did not always handle all of its arguments correctly. (Bug #33157513)File names were not always verified correctly. (Bug #33157475)
Added a number of missing checks in the data nodes. (Bug #32983723, Bug #33157488, Bug #33161451, Bug #33161477, Bug #33162082)
Added a number of missing ID and other values checks in ndbd and ndbmtd. (Bug #32983700, Bug #32893708, Bug #32957478, Bug #32983256, Bug #32983339, Bug #32983489, Bug #32983517, Bug #33157527, Bug #33157531, Bug #33161271, Bug #33161298, Bug #33161314, Bug #33161331, Bug #33161372, Bug #33161462, Bug #33161511, Bug #33161519, Bug #33161537, Bug #33161570, Bug #33162059, Bug #33162065, Bug #33162074, Bug #33162082, Bug #33162092, Bug #33162098, Bug #33304819)
The management server did not always handle events of the wrong size correctly. (Bug #32957547)
-
When ndb_mgmd is started without the
--config-file
option, the user is expected to provide the connection string for another management server in the same cluster, so that the management server being started can obtain configuration information from the other. If the host address in the connection string could not be resolved, then the ndb_mgmd being started hung indefinitely while trying to establish a connection.This issue occurred because a failure to connect was treated as a temporary error, which led to the ndb_mgmd retrying the connection, which subsequently failed, and so on, repeatedly. We fix this by treating a failure in host name resolution by ndb_mgmd as a permanent error, and immediately exiting. (Bug #32901321)
The order of parameters used in the argument to ndb_import
--csvopt
is now handled consistently, with the rightmost parameter always taking precedence. This also applies to duplicate instances of a parameter. (Bug #32822757)-
In some cases, issues with the redo log while restoring a backup led to an unplanned shutdown of the data node. To fix this, when the redo log file is not available for writes, we now include the correct wait code and waiting log part in the
CONTINUEB
signal before sending it. (Bug #32733659)References: See also: Bug #31585833.
The binary logging thread sometimes attempted to start before all data nodes were ready, which led to excess logging of unnecessary warnings and errors. (Bug #32019919)
Instituted a number of value checks in the internal
Ndb_table_guard::getTable()
method. This fixes a known issue in which an SQL node underwent an unplanned shutdown while executingALTER TABLE
on anNDB
table, and potentially additional issues. (Bug #30232826)Replaced a misleading error message and otherwise improved the behavior of ndb_mgmd when the
HostName
could not be resolved. (Bug #28960182)-
A query used by MySQL Enterprise Monitor to monitor memory use in NDB Cluster became markedly less performant as the number of
NDB
tables increased. We fix this as follows:Row counts for virtual
ndbinfo
tables have been made available to the MySQL optimizerSize estimates are now provided for all
ndbinfo
tablesPrimary keys have been added to most internal
ndbinfo
tables
Following these improvements, the performance of queries against
ndbinfo
tables should be comparable to queries against equivalentMyISAM
tables. (Bug #28658625) -
Following improvements in LDM performance made in NDB 8.0.23, an
UPDATE_FRAG_DIST_KEY_ORD
signal was never sent when needed to a data node using node ID1
. When running the cluster with 3 or 4 replicas and another node in the same node group restarted, this could result in SQL statements being rejected with error MySQL 1297ER_GET_TEMPORARY_ERRMSG
and, subsequently,SHOW WARNINGS
reporting error NDB error1204
.NotePrior to upgrading to this release, you can work around the issue by restarting data node 1 whenever any other node in the same node group has been restarted.
(Bug #105098, Bug #33460188)
Following the rolling restart of a data node performed as part of an upgrade from NDB 7.6 to NDB 8.0, the data node underwent a forced shutdown. We fix this by allowing
LQHKEYREQ
signals to be sent to both theDBLQH
and theDBSPJ
kernel blocks. (Bug #105010, Bug #33387443)When the
AutomaticThreadConfig
parameter was enabled,NumCPUs
was always shown as0
in the data node log. In addition, when this parameter is in use, thread CPU bindings are now made correctly, and the data node log shows the actual CPU binding for each thread. (Bug #102503, Bug #32474961)ndb_blob_tool
--help
did not return the expected output. (Bug #98158, Bug #30733508)NDB
did not close any pending schema transactions when returning an error from internal system table creation and drop functions.