MySQL NDB Cluster 8.0.32 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.32 (see Changes in MySQL 8.0.32 (2023-01-17, General Availability)).
--config-binary-fileoption for ndb_config, which enables this program to read configuration information from the management server's binary configuration cache. This can be useful, for example, in determining whether or not the current version of the
config.inifile has actually been read by the management server and applied to the cluster. See the description of the option in the MySQL NDB Cluster documentation, for more information and examples. (Bug #34773752)
Packaging: The man page for ndbxfrm was not present following installation. (Bug #34520046)
Solaris; NDB Client Programs: ndb_top was not built for Solaris platforms. (Bug #34186837)
MySQL NDB ClusterJ: ClusterJ could not be built on Ubuntu 22.10 with GCC 12.2. (Bug #34666985)
In some contexts, a data node process may be sent
SIGCHLDby other processes. Previously, the data node process bound a signal handler treating this signal as an error, which could cause the process to shut down unexpectedly when run in the foreground in a Kubernetes environment (and possibly under other conditions as well). This occurred despite the fact that a data node process never starts child processes itself, and thus there is no need to take action in such cases.
To fix this, the handler has been modified to use
SIG_IGN, which should result in cleanup of any child processes.
The running node from a node group scans each fragment (
CopyFrag) and sends the rows to the starting peer in order to synchronize it. If a row from the fragment is locked exclusively by a user transaction, it blocks the scan from reading the fragment, causing the copyFrag to stall.
If the starting node fails during the
CopyFragphase then normal node failure handling takes place. The cordinator node's transaction coordinator (TC) performs TC takeover of the user transactions from the TCs on the failed node. Since the scan that aids copying the fragment data over to the starting node is considered internal only, it is not a candidate for takeover, thus the takeover TC marks the
CopyFragscan as closed at the next opportunity, and waits until it is closed.
The current issue arose when the
CopyFragscan was in the
waiting for row lockstate, and the closing of the marked scan was not performed. This led to TC takeover stalling while waiting for the close, causing unfinished node failure handling, and eventually a GCP stall potentially affecting redo logging, local checkpoints, and NDB Replication.
We fix this by closing the marked
CopyFragscan whenever a node failure occurs while the
CopyFragis waiting for a row lock. (Bug #34823988)
References: See also: Bug #35037327.
In certain cases, invalid signal data was not handled correctly. (Bug #34787608)
Sending of fragmented signals to virtual (
V_QUERY) blocks is not supported, since the different signal fragments may end up in different block instances. When
SCAN_FRAGREQsignal that may end up using
V_QUERY, it checks whether the signal is fragmented and in that case changes the receiver to an instance of
DBLQH. The function
SimulatedBlock::sendBatchedFragmentedSignal()is intended to use the same check to decide whether to fragment a given signal, but did not, with the result that signals were fragmented which were not expected to be, sent using
V_QUERY, and in that case likely to fail when received.
We fix this problem by making the size check in
SimulatedBlock::sendFirstFragment(), used by
sendBatchedFragmentedSignal(), match the checks performed in
DBSPJ. (Bug #34776970)
SCAN_FRAGREQrequests to the local data managers, it usually scans only a subset of the fragments in parallel based on
recsPrKeysstatistics, if these are available, or just make a guess if no statistics are available.
SPJcontains logic which may take advantage of the result collected from the first round of fragments scanned; parallelism statistics are collected after
SCAN_FRAGCONFreplies are received, and first-match elimination may eliminate keys needed to scan in subsequent rounds.
Scanning local fragments is expected to have less overhead than scanning remote fragments, so it is preferable to err on the side of scan-parallelism for the local fragments. To take advantage of this, now two rounds are made over the fragments, the first one allowing
SCAN_FRAGREQsignals to be sent to local fragments only, the second allowing such singals to be sent to any fragment expecting it. (Bug #34768216)
References: See also: Bug #34768191.
When pushing a join to the data nodes, the query request is distributed to the
SPJblocks of all data nodes having local fragments for the first table (the
SPJroot) in the pushed query. Each
SPJblock retrieves qualifying rows from the local fragments of this root table, then uses the retrieved rows to generate a request to its joined child tables. If no qualifying rows are retrieved from the local fragments of the root,
SPJhas no further work to perform.
This implies that for a pushed join in which the root returns few rows, there are likely to be idling
SPJworkers not taking full advantage of the available parallelism. Now for such queries we do not include very small tables in the pushed join, so that, if the next table in the join plan is larger, we start with that one instead. (Bug #34723413)
The safety check for a copying
ALTER TABLEoperation uses the sum of per-fragment commit count values to determine whether any writes have been committed to a given table over a period of time. Different replicas of the same fragment do not necessarily have the same commit count over time, since a fragment replica's commit count is reset during node restart.
Read primary tables always route read requests to a table's primary fragment replicas. Read backup and fully replicated tables optimize reads by allowing
CommittedReadoperations to be routed to backup fragment replicas. This results in the set of commit counts read not always being stable for Read backup and fully replicated tables, which can cause false positive failures for the copying
ALTER TABLEsafety check.
This is solved by performing the copying
ALTER TABLEsafety check using a locking scan. Locked reads are routed to the same set of primary (main) fragments every time, which causes these counts to be stable. (Bug #34654470)
Following execution of
DROP NODEGROUPin the management client, attempting to creating or altering an
NDBtable specifying an explicit number of partitions or using
MAX_ROWSwas rejected with Got error 771 'Given NODEGROUP doesn't exist in this cluster' from NDB. (Bug #34649576)
TYPE_NOTE_TIME_TRUNCATEDwere treated as errors instead of being ignored, as was the case prior to NDB 8.0.27. This stopped building of interpreted code for pushed conditions, with the condition being returned to the server.
We fix this by reverting the handling of these status types to ignoring them, as was done previously. (Bug #34644930)
When reorganizing a table with
ALTER TABLE ... REORGANIZE PARTITIONfollowing addition of new data nodes to the cluster, fragments were not redistributed properly when the
ClassicFragmentationconfiguration parameter was set to
OFF. (Bug #34640773)
Fixed an uninitialized padding variable in
src/common/util/ndb_zlib.cpp. (Bug #34639073)
NDB_STORED_USERprivilege was granted to a user with an empty password, the user's password on each of the other SQL nodes was expired. (Bug #34626727)
In a cluster with multiple management nodes, when one management node connected and later disconnected, any remaining management nodes were not aware of this node and were eventually forced to shut down when stopped nodes reconnected; this happened whenever the cluster still had live data nodes.
On investigation it was found that node disconnection handling was done in the
ConfigManagerbut the expected
NF_COMPLETEREPsignal never actually arrived. We solve this by handling disconnecting management nodes when the
NODE_FAILREPsignal arrives, rather than waiting for
NF_COMPLETEREP. (Bug #34582919)
--diff-defaultoption and related options for ndb_config did not produce any usable output. (Bug #34549189)
References: This issue is a regression of: Bug #32233543.
Encrypted backups created on a system using one endian could not be restored on systems with the other endian; for example, encrypted backups taken on an x86 system could not be restored on a SPARC system, nor the reverse. (Bug #34446917)
A query using a pushed join with an
INsubquery did not return the expected result with
BatchSizeSQL node parameter set to a very small value such as
1. (Bug #34231718)
When defining a binary log transaction, the transaction is kept in an in-memory binary log cache before it is flushed to the binary log file. If a binary log transaction exceeds the size of the cache, it is written to a temporary file which is set up early in the initialization of the binary log thread. This write introduces extra disk I/O in the binary log injector path. The number of disk writes performed globally by the binary log injector can be found by checking the value of the
Binlog_cache_disk_usesystem status variable, but otherwise, the
NDBhandler's binary log injector thread had no way to observe this.
Binlog_cache_disk_useis accessible by the binary log injector, it can be checked both before and after the transaction is committed to see whether there were any changes to its value. If any cache spills have taken place, this is reflected by the difference of the two values, and the binary log injector thread can report it. (Bug #33960014)
When closing a file using compressed or encrypted format after reading the entire file, verify its checksum. (Bug #32550145)
When reorganizing a table with
ALTER TABLE ... REORGANIZE PARTITIONfollowing addition of new data nodes to the cluster, unique hash indexes were not redistributed properly. (Bug #30049013)
For a partial local checkpoint, each fragment LCP must be to be able to determine the precise state of the fragment at the start of the LCP and the precise difference in the fragment between the start of the current LCP and the start of the previous one. This is tracked using row header information and page header information; in cases where physical pages are removed this is also tracked in logical page map information.
A page included in the current LCP, before the LCP scan reaches it, is released due to the commit or rollback of some operation on the fragment, also releasing the last used storage on the page.
Since the released page could not be found by the scan, the release itself set the
LCP_SCANNED_BITof the page map entry it was mapped into, in order to indicate that the page was already handled from the point of view of the current LCP, causing subsequent allocation and release of the pages mapped to the entry during the LCP to be ignored. The state of the entry at the start of the LCP was also set as allocated in the page map entry.
These settings are cleared only when the next LCP is prepared. Any page release associated with the page map entry before the clearance would violate the requirement that the bit is not set; we resolve this issue by removing the (incorrect) requirement. (Bug #23539857)
A data node could hit an overly strict assertion when the thread liveness watchdog triggered while the node was already shutting down. We fix the issue by relaxing this assertion in such cases. (Bug #22159697)
Removed a leak of long message buffer memory that occurred each time an index was scanned for updating index statistics. (Bug #108043, Bug #34568135)
Backup::get_total_memory(), used to calculate proposed disk write speeds for checkpoints, wrongly considered
DataMemorythat may not have been used in the calculation of memory used by LDMs.
We fix this by obtaining the total
DataMemoryused by the LDM threads instead. as reported by
DBTUP. (Bug #106907, Bug #34035805)
Fixed an uninitialized variable in
Suma.cpp. (Bug #106081, Bug #33764143)