MySQL NDB Cluster 8.0.24 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in NDB Cluster.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.24 (see Changes in MySQL 8.0.24 (2021-04-20, General Availability)).
NDB Cluster APIs: The version of
NDBhas been upgraded to 12.20.1. (Bug #32356419)
ndbinfo Information Database: Added the
dict_obj_treetable to the
ndbinfoinformation database. This table provides information about
NDBdatabase objects similar to what is shown by the
dict_obj_infotable, but presents it in a hierarchical or tree-like fashion that simplifies seeing relationships between objects such as: tables and indexes; tablespaces and data files; log file groups and undo log files.
An example of such a view of a table
t1, having a primary key on column
aand a unique key on column
b, is shown here:
mysql> SELECT indented_name FROM ndbinfo.dict_obj_tree -> WHERE root_name = 'test/def/t1'; +----------------------------+ | indented_name | +----------------------------+ | test/def/t1 | | -> sys/def/13/b | | -> NDB$INDEX_15_CUSTOM | | -> sys/def/13/b$unique | | -> NDB$INDEX_16_UI | | -> sys/def/13/PRIMARY | | -> NDB$INDEX_14_CUSTOM | +----------------------------+ 7 rows in set (0.15 sec)
For additional information and examples, see The ndbinfo dict_obj_tree Table. (Bug #32198754)
ndbinfo Information Database: Added the
backup_idtable to the
ndbinfoinformation database. This table contains a single column (
id) and a single row, in which the column value is the backup ID of the most recent backup of the cluster taken with the ndb_mgm client. If no NDB backups can be found, the value is 0.
Selecting from this table replaces the process of obtaining this information by using the ndb_select_all utility to dump the contents of the internal
SYSTAB_0table, which is error-prone and can require an excessively long time to complete. (Bug #32073640)
Added the status variable
Ndb_config_generation, which shows the generation number of the current configuration being used by the cluster. This can be used as an indicator to determine whether the configuration of the cluster has changed. (Bug #32247424)
NDB Cluster now uses the MySQL
host_application_signalcomponent service to perform shutdown of SQL nodes. (Bug #30535835, Bug #32004109)
NDBhas implemented the following two improvements in calculation of index statistics:
Previously, index statistics were collected from a single fragment only; this is changed such that additional fragments are used for these.
The algorithm used for very small tables, such as those having very few rows where results are discarded, has been improved, so that estimates for such tables should be more accurate than previously.
See NDB API Statistics Counters and Variables, for more information.
A number of NDB Cluster programs now support input of the password for encrypting or decrypting an
NDBbackup from standard input. Changes relating to each program affected are listed here:
For ndb_restore, the
--backup-password-from-stdinoption introduced in this release enables input of the password in a secure fashion, similar to how it is done by the mysql client'
--passwordoption. Use this option together with the
Two options for ndbxfrm,
--decrypt-password-from-stdin, which are also introduced in this release, cause similar behavior when using this program, respectively, to encrypt or to decrypt a backup file.
In addition, you can cause ndb_mgm to use encryption whenever it creates a backup by starting it with
--encrypt-backup. In this case, the user is prompted for a password when invoking
START BACKUPif none is supplied. This option can also be specified in the
[ndb_mgm]section of the
Also, the behavior and syntax of the ndb_mgm management client
START BACKUPare changed slightly, such that it is now possible to use the
ENCRYPToption without also specifying
PASSWORD. Now when the user does this, the management client prompts the user for a password.
For more information, see the descriptions of the NDB Cluster programs and program options just mentioned, as well as Online Backup of NDB Cluster.
mysql-cluster-commercial-server-debugRPM packages were dependent on
instead of mysql-cluster-community-serverand
mysql-cluster-commercial-server. (Bug #32683923)
Packaging: RPM upgrades from NDB 7.6.15 to 8.0.22 did not succeed due to a file having been moved from the
serverRPM to the
client-pluginsRPM. (Bug #32208337)
Linux: On Linux systems,
NDBinterpreted memory sizes obtained from
/proc/meminfoas being supplied in bytes rather than kilobytes. (Bug #102505, Bug #32474829)
Microsoft Windows: Removed several warnings which were generated when building NDB Cluster on Windows using Microsoft Visual Studio 2019. (Bug #32107056)
NDBfailed to start correctly on Windows when initializing the
ndb_init(), with the error Failed to find CPU in CPU group.
This issue was due to how Windows works with regard to assigning processes to CPUs: when there are more than 64 logical CPUs on a machine, Windows divides them into different processor groups during boot. Each processor group can at most hold 64 CPUs; by default, a process can be assigned to only one processor group. The function
std::thread::hardware_concurrency()was used to get the maximum number of logical CPUs on the machine, but on Windows, this function returns only the maximum number of logical CPUs present in the processor group with which the current process is affiliated. This value is used to allocate memory for an array that holds hardware information about each CPU on the machine. Since the array held valid memory for CPUs from only one processor group, any attempt to store and retrieve hardware information about a CPU in a different processor group led to array bound read/write errors, leading to memory corruption and ultimately leads to process failures.
Fixed by using
GetActiveProcessorCount()instead of the
hardware_concurrency()function referenced previously. (Bug #101347, Bug #32074703)
Solaris: While preparing
NDBFSfor handling of encrypted backups, activation of
O_DIRECTwas suspended until after initialization of files was completed. This caused initialization of redo log files to require an excessive amount of time on systems using hard disk drives with
directiois used instead of
directioprior to initialization of files caused a notable increase in time required when using hard disk drives with
Now we ensure that, on systems having
O_DIRECT, this is activated before initialization of files, and that, on Solaris,
directiocontinues to be activated after initialization of files. (Bug #32187942)
NDB Cluster APIs: Several NDB API coding examples included in the source did not release all resources allocated. (Bug #31987735)
NDB Cluster APIs: Some internal dictionary objects in
NDBused an internal name format which depends on the database name of the
Ndbobject. This dependency has been made more explicit where necessary and otherwise removed.
Users of the NDB API should be aware that the
Dictionary::listObjects()still works in such a way that specifying it as
falsecauses the objects in the list it returns to use fully qualified names. (Bug #31924949)
ndbinfo Information Database: The system variables
ndbinfo_table_prefixare intended to be read-only. It was found that it was possible to set mysqld command-line options corresponding to either or both of these; doing so caused the
ndbinfodatabase to malfunction. This fix insures that it is no longer possible to set either of these variables in the mysql client or from the command line. (Bug #23583256)
In some cases, a query affecting a user with the
NDB_STORED_USERprivilege could be printed to the MySQL server log without being rewritten. Now such queries are omitted or rewritten to remove any text following the keyword
IDENTIFIED. (Bug #32541096)
The value set for the
SpinMethoddata node configuration parameter was ignored. (Bug #32478388)
The compile-time debug flag
DEBUG_FRAGMENT_LOCKwas enabled by default. This caused increased resource usage by
DBLQH, even for release builds.
This is fixed by disabling
DEBUG_FRAGMENT_LOCKby default. (Bug #32459625)
When started on a port which was already in use, ndb_mgmd did not throw any errors since the use of
SO_REUSEADDRon Windows platforms allowed multiple sockets to bind to the same address and port.
To take care of this issue, we replace
SO_EXCLUSIVEADDRUSE, which prevents re-use of a port that is already in use. (Bug #32433002)
Encountering an error in detection of an initial system restart of the cluster caused the SQL node to exit prematurely. (Bug #32424580)
The values reported for the
fromarguments in job buffer full issues were reversed. (Bug #32413686)
Under some situations, when trying to measure the time of a CPU pause, an elapsed time of zero could result. In addition, computing the average for a very fast spin (for example, 100 loops taking less than 100ns) could zero nanoseconds. In both cases, this caused the spin calibration algorithm throw an arithmetic exception due to division by zero.
We fix both issues by modifying the algorithm so that it ignores zero values when computing mean spin time. (Bug #32413458)
References: See also: Bug #32497174.
Table and database names were not formatted correctly in the messages written to the mysqld error log when the internal method
Ndb_rep_tab_reader::scan_candidates()found ambiguous matches for a given database, table, or server ID in the
ndb_replicationtable. (Bug #32393245)
Some queries with nested pushed joins were not processed correctly. (Bug #32354817)
When ndb_mgmd allocates a node ID, it reads through the configuration to find a suitable ID, causing a mutex to be held while performing hostname lookups. Because network address resolution can require large amounts of time, it is not considered good practice to hold such a mutex or lock while performing network operations.
This issue is fixed by building a list of configured nodes while holding the mutex, then using the list to perform hostname matching and other logic. (Bug #32294679)
The schema distribution participant failed to start a global checkpoint after writing a reply to the
ndb_schema_resulttable, which caused an unnecessary delay before the coordinator received events from the participant notifying it of the result. (Bug #32284873)
The global DNS cache used in ndb_mgmd caused stale lookups when restarting a node on a new machine with a new IP address, which meant that the node could not allocate a node ID.
This issue is addressed by the following changes:
Node ID allocation no longer depends on
DnsCachenow uses local scope only
ndb_restore generated a core file when started with unknown or invalid arguments. (Bug #32257374)
Auto-synchronization detected the presence of mock foreign key tables in the NDB dictionary and attempted to re-create them in the MySQL server's data dictionary, although these should remain internal to the NDB Dictionary and not be exposed to the MySQL server. To fix this issue, we now ensure that the NDB Cluster auto-synchronization mechanism ignores any such mock tables. (Bug #32245636)
Improved resource usage associated with handling of cluster configuration data. (Bug #32224672)
Removed left-over debugging printouts from ndb_mgmd showing a client's version number upon connection. (Bug #32210216)
References: This issue is a regression of: Bug #30599413.
The backup abort protocol for handling of node failures did not function correctly for single-threaded data nodes (ndbd). (Bug #32207193)
While retrieving sorted results from a pushed-down join using
ORDER BYwith the
indexaccess method (and without
filesort), an SQL node sometimes unexpectedly terminated. (Bug #32203548)
Logging of redo log initialization showed log part indexes rather than log part numbers. (Bug #32200635)
Signal data was overwritten (and lost) due to use of extended signal memory as temporary storage. Now in such cases, extended signal memory is not used in this fashion. (Bug #32195561)
= 1, the default number of partitions per node (shown in ndb_desc output as
PartitionCount) is calculated using the lowest number of LDM threads employed by any single live node, and was done only once, even after data nodes left or joined the cluster, possibly with a new configuration changing the LDM thread count and thus the default partition count. Now in such cases, we make sure the default number of partitions per node is recalculated each time data nodes join or leave the cluster.
This is not an issue in NDB 8.0.23 and later, when
ClassicFragmentationis set to 0. (Bug #32183985)
The internal function
Ndb_ReloadHWInfo()is responsible for updating hardware information for all the CPUs on the host. For the Linux ARM platform, which does not have Level 3 cache information, this assigned a socket ID for the L3 cache ID but failed to record the value for the global variable
num_shared_l3_caches, which is needed when creating lists of CPUs connected to a shared L3 cache. (Bug #32180383)
When trying to run two management nodes on the same host and using the same port number, it was not always obvious to users why they did not start. Now in such cases, in addition to writing a message to the error log, an error message Same port number is specified for management nodes
node_id2(or) they both are using the default port number on same host
host_nameis also written to the console, making the source of the issue more immediately apparent. (Bug #32175157)
The management server returned the wrong status for host name matching when some of the host names in configuration did not resolve and client trying to allocate a node ID connected from the host whose host name resolved to a loopback address with the error Could not alloc node id at <host>:<port>: Connection with id X done from wrong host ip 127.0.0.1, expected <unresolvable_host> (lookup failed).
This caused the connecting client to fail the node ID allocation.
This issue is fixed by rewriting the internal match_hostname() function so that it contains all logic for how the requesting client address should match the configured hostnames, and so that it first checks whether the configured host name can be resolved; if not, it now returns a special error so that the client receives an error indicating that node ID allocation can be retried. The new error is Could not alloc node id at <host>:<port>: No configured host found of node type <type> for connection from ip 127.0.0.1. Some hostnames are currently unresolvable. Can be retried. (Bug #32136993)
The internal function
ndb_socket_create_dual_stack()did not close a newly created socket when a call to
ndb_setsockopt()was unsuccessful. (Bug #32105957)
The local checkpoint (LCP) mechanism was changed in NDB 7.6 such that it also detected idle fragments—that is, fragments which had not changed since the last LCP and thus required no on-disk metadata update. The LCP mechanism could then immediately proceed to handle the next fragment. When there were a great many such idle fragments, the CPU consumption required merely to loop through these became highly significant, causing latency spikes in user transactions.
A 1 ms delay was already inserted between each such idle fragment being handled. Testing later showed this to be too short an interval, and that we are normally not in as great a hurry to complete these idle fragments as we previously believed.
This fix extends the idle fragment delay time to 20 ms if there are no redo alerts indicating an urgent need to complete the LCP. In case of a low redo alert state we wait 5 ms instead, and for a higher alert state we fall back to the 1 ms delay. (Bug #32068551)
References: See also: Bug #31655158, Bug #31613158.
NDBtable was created, it was invalidated in the global dictionary cache, but this was unnecessary. Furthermore, having a table which exists in the global dictionary cache is actually an advantage for subsequent uses of the new table, since it can be found in the table cache without performing a round trip to
NDB. (Bug #32047456)
Two problems occurred when
NDBclosed a table:
NDBfailed to detect when the close was done from
FLUSH TABLES, which meant that the NDB table definitions in the global dictionary cache were not invalidated.
When the close was done by a thread which had not used
NDBearlier—for example when
RESET MASTERclosed instances of
ha_ndbclusterheld in the table definition cache—a new
Thd_ndbobject was allocated, even though there is a fallback to the global
Ndbobject in case the allocation fails, which never occurs in such cases, so it is less wasteful simply to use the global object already provided.
(Bug #32018394, Bug #32357856)
Removed a large number of compiler warnings relating to unused function arguments in
NdbDictionaryImpl. (Bug #31960757)
Unnecessary casts were performed when checking internal error codes. (Bug #31930166)
NDBcontinued to use file system paths for determining the names of tables to open or perform DDL on, in spite of the fact that it longer actually uses files for these operations. This required unnecessary translation between character sets, handling the MySQL-specific file system encoding, and parsing. In addition, results of these operations were stored in buffers of fixed size, each instance of which used several hundred bytes of memory unnecessarily. Since the database and table names to use are already available to
NDBthrough other means, this translation could be (and has been) removed in most cases. (Bug #31846478)
Generation of internal statistics relating to
NDBobject counts was found to lead to an increase in transaction latency at very high rates of transactions per second, brought about by returning an excessive number of freed
NDBobjects. (Bug #31790329)
NDBbehaved unpredictably in response an attempt to change permissions on a distributed user (that is, a user having the
NDB_STORED_USERprivilege) during a binary log thread shutdown and restart. We address this issue by ensuring that the user gets a clear warning Could not distribute ACL change to other MySQL servers whenever distribution does not succeed. This fix also improves a number of mysqld log messages. (Bug #31680765)
ndb_restore encountered intermittent errors while replaying backup logs which deleted blob values; this was due to deletion of blob parts when a main table row containing blob one or more values was deleted. This is fixed by modifying ndb_restore to use the asynchronous API for blob deletes, which does not trigger blob part deletes when a blob main table row is deleted (unlike the synchronous API), so that a delete log event for the main table deletes only the row from the main table. (Bug #31546136)
Upgrading to NDB Cluster 8.0 from a prior release includes an upgrade in the schema distribution mechanism, as part of which the
ndb_schematable is dropped and recreated in a way which causes all MySQL Servers connected to the cluster to restart their binary log injector threads, causing a gap event to be written to the binary log. Since the thread restart happens at the same time on all MySQL Servers, no binary log spans the time during which the schema distribution functionality upgrade was performed, which breaks NDB Cluster Replication.
This issue is fixed by adding support for gracefully reconstituting the schema distribution tables while allowing the injector thread to continue processing changes from the cluster. This is implemented by handling the DDL event notification for
DROP TABLEto turn off support for schema distribution temporarily, and to start regular checks to re-create the tables. When the tables have been successfully created again, the regular checks are turned off and support for schema distribution is turned back on.
In addition, the minimum version required to perform the schema distribution upgrade is raised to 8.0.24, which prevents automatic triggering of the schema distribution upgrade until all connected API nodes support the new upgrade procedure. (Bug #30877233)
References: See also: Bug #30876990.
When a table creation schema transaction is prepared, the table is in
TS_CREATINGstate, and is changed to
TS_ACTIVEstate when the schema transaction commits on the
DBDIHblock. In the case where the node acting as
DBDIHcoordinator fails while the schema transaction is committing, another node starts taking over for the coordinator. The following actions are taken when handling this node failure:
DBDICTrolls the table creation schema transaction forward and commits, resulting in the table involved changing to
DBDIHstarts removing the failed node from tables by moving active table replicas on the failed node from a list of stored fragment replicas to another list.
These actions are performed asynchronously many times, and when interleaving may cause a race condition. As a result, the replica list in which the replica of a failed node resides becomes nondeterministic and may differ between the recovering node (that is, the new coordinator) and other
DIHparticipant nodes. This difference violated a requirement for knowing which list the failed node's replicas can be found during the recovery of the failed node recovery on the other participants.
To fix this, moving active table replicas now covers not only tables in
TS_ACTIVEstate, but those in
TS_CREATING(prepared) state as well, since the prepared schema transaction is always rolled forward.
In addition, the state of a table creation schema transaction which is being aborted is now changed from
TS_DROPPING, to avoid any race condition there. (Bug #30521812)
SNAPSHOTSTART WAIT STARTEDcould return control to the user prior to the backup's restore point from the user point of view; that is the
Backup startednotification was sent before waiting for the synchronising global checkpoint (GCP) boundary. This meant that transactions committed after receiving the notification might be included in the restored data.
To fix this problem,
START BACKUPnow sends a notification to the client that the backup has been started only after the GCP has truly started. (Bug #29344262)
Fixed a number of issues uncovered when trying to build
NDBwith GCC 6. (Bug #25038373)
Calculation of the redo alert state based on redo log usage was overly aggressive, and thus incorrect, when using more than 1 log part per LDM.