MySQL :: MySQL NDB Cluster 8.0 Release Notes :: Changes in MySQL NDB Cluster 8.0.28 (2022-01-18)

MySQL NDB Cluster 8.0 Release Notes / Changes in MySQL NDB Cluster 8.0.28 (2022-01-18)

Changes in MySQL NDB Cluster 8.0.28 (2022-01-18)

MySQL NDB Cluster 8.0.28 is a new release of NDB 8.0, based on MySQL Server 8.0 and including features in version 8.0 of the NDB storage engine, as well as fixing recently discovered bugs in previous NDB Cluster releases.

Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster 8.0.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.28 (see Changes in MySQL 8.0.28 (2022-01-18)).

Compilation Notes

NDB did not compile using GCC 11 on Ubuntu 21.10. (Bug #33424843)

Functionality Added or Changed

Added the ndbinfo index_stats table, which provides very basic information about NDB index statistics. It is intended primarily for use in our internal testing, but may be helpful in conjunction with ndb_index_stat and other tools. (Bug #32906654)
Previously, ndb_import always tried to import data into a table whose name was derived from the name of the CSV file being read. This release adds a --table option (short form: -t) for this program, which overrides this behavior and specifies the name of the target table directly. (Bug #30832382)

Bugs Fixed

Important Change: The deprecated data node option --connect-delay has been removed. This option was a synonym for --connect-retry-delay, which was not honored in all cases; this issue has been fixed, and the option now works correctly. In addition, the short form -r for this option has been deprecated, and you should expect it to be removed in a future release. (Bug #31565810)

References: See also: Bug #33362935.
Microsoft Windows: On Windows, added missing debug and test suite binaries for MySQL Server (commercial) and MySQL NDB Cluster (commercial and community). (Bug #32713189)
NDB Replication: The mysqld option --slave-skip-errors can be used to allow the replication applier SQL thread to skip over certain numbered errors automatically. This is not recommended in production because it allows replicas to diverge since whole transactions in the binary log are not applied; for NDBCLUSTER with its epoch transactions, this results in entire epochs of changes not being applied, likely leading to inconsistent data.

Ndb also checks the sequence of epochs applied, and stops the replica applier with an error if there is a sequence problem. Where --slave-skip-errors is in use, and an error is skipped, this results in a whole epoch transaction being skipped; this is detected on any subsequent attempt to apply an epoch transaction, which results in the replica applier SQL thread being stopped.

A new option --ndb-applier-allow-skip-epoch is added in this release to allow users to ignore wholly skipped epoch transactions, so that they can use the --slave-skip-errors option as with other MySQL storage engines. This is intended for use in testing, and not in a production setting. Use of these options is entirely at your own risk.

When mysqld is started with the new option (together with --slave-skip-errors), detection of a missing epoch generates a warning, but the replica applier SQL thread continues applying. (Bug #33398973)
NDB Replication: The log_name column of the ndb_apply_status table was created as VARBINARY, despite being defined as VARCHAR, using the latin1 character set, causing hex-decoded output when querying the table using some tools.

We fix this by detecting the faulty column type in ndb_apply_status and reinstalling the table definition into the data dictionary while connecting to NDB, when mysqld checks the layout of this table. (Bug #33380726)
NDB Cluster APIs: Several new basic example C++ NDB API programs have been added to the distribution, under storage/ndb/ndbapi-examples/ndbapi_basic/ in the source tree. These are shorter and should be easier to understand for newcomers to the NDB API than the existing API examples. They also follow recent C++ standards and practices. These examples have also been added to the NDB API documentation; see Basic NDB API Examples, for more information. (Bug #33378579, Bug #33517296)
NDB Cluster APIs: It is no longer possible to use the DIVERIFYREQ signal asynchronously. (Bug #33161562)
Timing of wait for scans log output during online reorganization was not performed correctly. As part of this fix, we change timing to generate one message every 10 seconds rather than scaling indefinitely, so as to supply regular updates. (Bug #35523977)
Added missing values checks in ndbd and ndbmtd. (Bug #33661024)
Online table reorganization increases the number of fragments of a table, and moves rows between them. This is done in the following steps:
1. Copy rows to new fragments
2. Update distribution information (hashmap count and total fragments)
3. Wait for scan activity using old distribution to stop
4. Delete rows which have moved out of existing partitions
5. Remove reference to old hashmap
6. Wait for scan activity started since step 2 to stop
Due to a counting error, it was possible for the reorganization to hang in step 6; the scan reference count was not decremented, and thus never reached zero as expected. (Bug #33523991)
A UNIQUE index created with USING HASH does not support ordered or range access operations, but rather only those operations in which the full key is specified, returning at most a single row. Even so, for such an index on an NDB table, range access was still used on the index. (Bug #33466554, Bug #33474345)
The same pushed join on NDB tables returned an incorrect result when the batched_key_access optimizer switch was enabled.

This issue arose as follows: When the batch key access (BKA) algorithm is used to join two tables, a set of batched keys is first collected from one of the tables; a multirange read (MRR) operation is constructed against the other. A set of bounds (ranges) is specified on the MRR, using the batched keys to construct each bound.

When result rows are returned it is necessary to identify which range each returned row comes from. This is used to identify the outer table row to perform the BKA join with. When the MRR operation in question was a root of a pushed join operation, SPJ was unable to retrieve this identifier (RANGE_NO). We fix this by implementing the missing SPJ API functionality for returning such a RANGE_NO from a pushed join query. (Bug #33416308)
Each query against the ndinfo.index_stats table leaked an NdbRecord. We fix this by changing the context so that it owns the NdbRecord object which it creates and then to release the NdbRecord when going out of scope, and by supporting the creation of one and only one record per context. (Bug #33408123)
A problem with concurrency occurred when updating cached table statistics with changed rows, when several threads updating same table the threads competed for the NDB_SHARE mutex in order to update the cached row count.

We fix this by reimplementing the storage of changed rows using an atomic counter rather than trying to take the mutex and update the actual shared value, which reduces the need to serialize the threads. In addition, we now append the number of changed rows to the row count only when removing the statistics from the cache and provide a separate mutex protecting only the cached statistics. (Bug #33384978)

References: See also: Bug #32169848.
If the schema distribution client detected a timeout before freeing the schema object when the coordinator received the schema event, the coordinator processed the stale schema event instead of returning.

The coordinator did not know whether a schema distribution timeout was detected by the client, and started processing the schema event as soon as the schema object was valid. To fix this, we indicate the state of the schema object and change its state when the client detects the schema distribution timeout and when the schema event is received by the coordinator, so that both the coordinator and the client are aware of this, and remain synchronized. (Bug #33318201)
The MySQL Optimizer uses two different methods, handler::read_cost() and Cost_model::page_read_cost(), to estimate the cost for different access methods, but the cost values returned by these were not always comparable; in some cases this led to the wrong index being chosen and longer execution time for effected queries. To fix this for NDB, we override the optimizer's page_read_cost() method with one specific to NDBCLUSTER. It was also found while working on this issue that the NDB handler did not implement the read_time() method, used by read_cost(); this method is now implemented by ha_ndbcluster, and thus the optimizer can now properly take into account the cost difference for NDB when using a unique key as opposed to an ordered index (range scan). (Bug #33317872)
When opening NDB tables for queries, the index statistics are retrieved to help the optimizer select the optimal query plan. Each client accessing the stats acquires the global index statistics mutex both before and after accessing the statistics. This causes mutex contention affecting query performance, whether or not there are queries are operating on the same tables, or on different ones.

We fix this by protecting the count of index statistics references with an atomic counter. The problem was clearly visible when benchmarking with more than 32 clients, when throughput did not increase with additional clients. With this fix, the throughput continues to scale with up to 64 clients. (Bug #33317320)
In certain cases, an event's category was not properly detected. (Bug #33304814)
It was not possible to add new data nodes running ndbd to an existing cluster with data nodes running ndbd. (Bug #33193393)
For a user granted the NDB_STORED_USER privilege, the password_last_changed column in the mysql.user table was updated each time the SQL node was restarted. (Bug #33172887)
DBDICT did not always perform table name checks correctly. (Bug #33161548)
Added a number of missing ID and other values checks in ndbd and ndbmtd. (Bug #33161486, Bug #33162047)
Added a number of missing ID and other values checks in ndbd and ndbmtd. (Bug #33161259, Bug #33161362)
SET_LOGLEVELORD signals were not always handled correctly. (Bug #33161246)
DUMP 11001 did not always handle all of its arguments correctly. (Bug #33157513)
File names were not always verified correctly. (Bug #33157475)
Added a number of missing checks in the data nodes. (Bug #32983723, Bug #33157488, Bug #33161451, Bug #33161477, Bug #33162082)
Added a number of missing ID and other values checks in ndbd and ndbmtd. (Bug #32983700, Bug #32893708, Bug #32957478, Bug #32983256, Bug #32983339, Bug #32983489, Bug #32983517, Bug #33157527, Bug #33157531, Bug #33161271, Bug #33161298, Bug #33161314, Bug #33161331, Bug #33161372, Bug #33161462, Bug #33161511, Bug #33161519, Bug #33161537, Bug #33161570, Bug #33162059, Bug #33162065, Bug #33162074, Bug #33162082, Bug #33162092, Bug #33162098, Bug #33304819)
The management server did not always handle events of the wrong size correctly. (Bug #32957547)
When ndb_mgmd is started without the --config-file option, the user is expected to provide the connection string for another management server in the same cluster, so that the management server being started can obtain configuration information from the other. If the host address in the connection string could not be resolved, then the ndb_mgmd being started hung indefinitely while trying to establish a connection.

This issue occurred because a failure to connect was treated as a temporary error, which led to the ndb_mgmd retrying the connection, which subsequently failed, and so on, repeatedly. We fix this by treating a failure in host name resolution by ndb_mgmd as a permanent error, and immediately exiting. (Bug #32901321)
The order of parameters used in the argument to ndb_import --csvopt is now handled consistently, with the rightmost parameter always taking precedence. This also applies to duplicate instances of a parameter. (Bug #32822757)
In some cases, issues with the redo log while restoring a backup led to an unplanned shutdown of the data node. To fix this, when the redo log file is not available for writes, we now include the correct wait code and waiting log part in the CONTINUEB signal before sending it. (Bug #32733659)

References: See also: Bug #31585833.
The binary logging thread sometimes attempted to start before all data nodes were ready, which led to excess logging of unnecessary warnings and errors. (Bug #32019919)
Instituted a number of value checks in the internal Ndb_table_guard::getTable() method. This fixes a known issue in which an SQL node underwent an unplanned shutdown while executing ALTER TABLE on an NDB table, and potentially additional issues. (Bug #30232826)
Replaced a misleading error message and otherwise improved the behavior of ndb_mgmd when the HostName could not be resolved. (Bug #28960182)
A query used by MySQL Enterprise Monitor to monitor memory use in NDB Cluster became markedly less performant as the number of NDB tables increased. We fix this as follows:
- Row counts for virtual ndbinfo tables have been made available to the MySQL optimizer
- Size estimates are now provided for all ndbinfo tables
- Primary keys have been added to most internal ndbinfo tables
Following these improvements, the performance of queries against ndbinfo tables should be comparable to queries against equivalent MyISAM tables. (Bug #28658625)
Following improvements in LDM performance made in NDB 8.0.23, an UPDATE_FRAG_DIST_KEY_ORD signal was never sent when needed to a data node using node ID 1. When running the cluster with 3 or 4 replicas and another node in the same node group restarted, this could result in SQL statements being rejected with error MySQL 1297 ER_GET_TEMPORARY_ERRMSG and, subsequently, SHOW WARNINGS reporting error NDB error 1204.

Note

Prior to upgrading to this release, you can work around the issue by restarting data node 1 whenever any other node in the same node group has been restarted.

(Bug #105098, Bug #33460188)
Following the rolling restart of a data node performed as part of an upgrade from NDB 7.6 to NDB 8.0, the data node underwent a forced shutdown. We fix this by allowing LQHKEYREQ signals to be sent to both the DBLQH and the DBSPJ kernel blocks. (Bug #105010, Bug #33387443)
When the AutomaticThreadConfig parameter was enabled, NumCPUs was always shown as 0 in the data node log. In addition, when this parameter is in use, thread CPU bindings are now made correctly, and the data node log shows the actual CPU binding for each thread. (Bug #102503, Bug #32474961)
ndb_blob_tool --help did not return the expected output. (Bug #98158, Bug #30733508)
NDB did not close any pending schema transactions when returning an error from internal system table creation and drop functions.

PREV HOME UP NEXT