MySQL NDB Cluster 8.0.24 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB
storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in MySQL NDB Cluster 8.0.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.24 (see Changes in MySQL 8.0.24 (2021-04-20, General Availability)).
NDB Cluster APIs: The version of
Node.js
used byNDB
has been upgraded to 12.20.1. (Bug #32356419)-
ndbinfo Information Database: Added the
dict_obj_tree
table to thendbinfo
information database. This table provides information aboutNDB
database objects similar to what is shown by thedict_obj_info
table, but presents it in a hierarchical or tree-like fashion that simplifies seeing relationships between objects such as: tables and indexes; tablespaces and data files; log file groups and undo log files.An example of such a view of a table
t1
, having a primary key on columna
and a unique key on columnb
, is shown here:mysql> SELECT indented_name FROM ndbinfo.dict_obj_tree -> WHERE root_name = 'test/def/t1'; +----------------------------+ | indented_name | +----------------------------+ | test/def/t1 | | -> sys/def/13/b | | -> NDB$INDEX_15_CUSTOM | | -> sys/def/13/b$unique | | -> NDB$INDEX_16_UI | | -> sys/def/13/PRIMARY | | -> NDB$INDEX_14_CUSTOM | +----------------------------+ 7 rows in set (0.15 sec)
For additional information and examples, see The ndbinfo dict_obj_tree Table. (Bug #32198754)
-
ndbinfo Information Database: Added the
backup_id
table to thendbinfo
information database. This table contains a single column (id
) and a single row, in which the column value is the backup ID of the most recent backup of the cluster taken with the ndb_mgm client. If no NDB backups can be found, the value is 0.Selecting from this table replaces the process of obtaining this information by using the ndb_select_all utility to dump the contents of the internal
SYSTAB_0
table, which is error-prone and can require an excessively long time to complete. (Bug #32073640) Added the status variable
Ndb_config_generation
, which shows the generation number of the current configuration being used by the cluster. This can be used as an indicator to determine whether the configuration of the cluster has changed. (Bug #32247424)NDB Cluster now uses the MySQL
host_application_signal
component service to perform shutdown of SQL nodes. (Bug #30535835, Bug #32004109)-
NDB
has implemented the following two improvements in calculation of index statistics:Previously, index statistics were collected from a single fragment only; this is changed such that additional fragments are used for these.
The algorithm used for very small tables, such as those having very few rows where results are discarded, has been improved, so that estimates for such tables should be more accurate than previously.
See NDB API Statistics Counters and Variables, for more information. (WL #13144)
-
A number of NDB Cluster programs now support input of the password for encrypting or decrypting an
NDB
backup from standard input. Changes relating to each program affected are listed here:For ndb_restore, the
--backup-password-from-stdin
option introduced in this release enables input of the password in a secure fashion, similar to how it is done by the mysql client'--password
option. Use this option together with the--decrypt
option.ndb_print_backup_file now also supports
--backup-password-from-stdin
as the long form of the existing-P
option.For ndb_mgm,
--backup-password-from-stdin
is supported together with--execute "START BACKUP [
for starting an encrypted cluster backup from the system shell, and has the same effect.options
]"Two options for ndbxfrm,
--encrypt-password-from-stdin
and--decrypt-password-from-stdin
, which are also introduced in this release, cause similar behavior when using this program, respectively, to encrypt or to decrypt a backup file.
In addition, you can cause ndb_mgm to use encryption whenever it creates a backup by starting it with
--encrypt-backup
. In this case, the user is prompted for a password when invokingSTART BACKUP
if none is supplied. This option can also be specified in the[ndb_mgm]
section of themy.cnf
file.Also, the behavior and syntax of the ndb_mgm management client
START BACKUP
are changed slightly, such that it is now possible to use theENCRYPT
option without also specifyingPASSWORD
. Now when the user does this, the management client prompts the user for a password.For more information, see the descriptions of the NDB Cluster programs and program options just mentioned, as well as Online Backup of NDB Cluster. (WL #14259)
Packaging: The
mysql-cluster-community-server-debug
andmysql-cluster-commercial-server-debug
RPM packages were dependent onmysql-community-server
andmysql-commercial-server
, respectively,instead of mysql-cluster-community-server
andmysql-cluster-commercial-server
. (Bug #32683923)Packaging: RPM upgrades from NDB 7.6.15 to 8.0.22 did not succeed due to a file having been moved from the
server
RPM to theclient-plugins
RPM. (Bug #32208337)Linux: On Linux systems,
NDB
interpreted memory sizes obtained from/proc/meminfo
as being supplied in bytes rather than kilobytes. (Bug #102505, Bug #32474829)Microsoft Windows: Removed several warnings which were generated when building NDB Cluster on Windows using Microsoft Visual Studio 2019. (Bug #32107056)
-
Microsoft Windows:
NDB
failed to start correctly on Windows when initializing theNDB
library withndb_init()
, with the error Failed to find CPU in CPU group.This issue was due to how Windows works with regard to assigning processes to CPUs: when there are more than 64 logical CPUs on a machine, Windows divides them into different processor groups during boot. Each processor group can at most hold 64 CPUs; by default, a process can be assigned to only one processor group. The function
std::thread::hardware_concurrency()
was used to get the maximum number of logical CPUs on the machine, but on Windows, this function returns only the maximum number of logical CPUs present in the processor group with which the current process is affiliated. This value is used to allocate memory for an array that holds hardware information about each CPU on the machine. Since the array held valid memory for CPUs from only one processor group, any attempt to store and retrieve hardware information about a CPU in a different processor group led to array bound read/write errors, leading to memory corruption and ultimately leads to process failures.Fixed by using
GetActiveProcessorCount()
instead of thehardware_concurrency()
function referenced previously. (Bug #101347, Bug #32074703) -
Solaris: While preparing
NDBFS
for handling of encrypted backups, activation ofO_DIRECT
was suspended until after initialization of files was completed. This caused initialization of redo log files to require an excessive amount of time on systems using hard disk drives withext3
file systems.On Solaris,
directio
is used instead ofO_DIRECT
; activatingdirectio
prior to initialization of files caused a notable increase in time required when using hard disk drives withUFS
file systems.Now we ensure that, on systems having
O_DIRECT
, this is activated before initialization of files, and that, on Solaris,directio
continues to be activated after initialization of files. (Bug #32187942) NDB Cluster APIs: Several NDB API coding examples included in the source did not release all resources allocated. (Bug #31987735)
-
NDB Cluster APIs: Some internal dictionary objects in
NDB
used an internal name format which depends on the database name of theNdb
object. This dependency has been made more explicit where necessary and otherwise removed.Users of the NDB API should be aware that the
fullyQualified
argument toDictionary::listObjects()
still works in such a way that specifying it asfalse
causes the objects in the list it returns to use fully qualified names. (Bug #31924949) ndbinfo Information Database: The system variables
ndbinfo_database
andndbinfo_table_prefix
are intended to be read-only. It was found that it was possible to set mysqld command-line options corresponding to either or both of these; doing so caused thendbinfo
database to malfunction. This fix insures that it is no longer possible to set either of these variables in the mysql client or from the command line. (Bug #23583256)In some cases, a query affecting a user with the
NDB_STORED_USER
privilege could be printed to the MySQL server log without being rewritten. Now such queries are omitted or rewritten to remove any text following the keywordIDENTIFIED
. (Bug #32541096)The value set for the
SpinMethod
data node configuration parameter was ignored. (Bug #32478388)-
The compile-time debug flag
DEBUG_FRAGMENT_LOCK
was enabled by default. This caused increased resource usage byDBLQH
, even for release builds.This is fixed by disabling
DEBUG_FRAGMENT_LOCK
by default. (Bug #32459625) ndb_mgmd now exits gracefully in the event of a
SIGTERM
just as it does following a management clientSHUTDOWN
command. (Bug #32446105)-
When started on a port which was already in use, ndb_mgmd did not throw any errors since the use of
SO_REUSEADDR
on Windows platforms allowed multiple sockets to bind to the same address and port.To take care of this issue, we replace
SO_REUSEADDRPORT
withSO_EXCLUSIVEADDRUSE
, which prevents re-use of a port that is already in use. (Bug #32433002) Encountering an error in detection of an initial system restart of the cluster caused the SQL node to exit prematurely. (Bug #32424580)
The values reported for the
to
andfrom
arguments in job buffer full issues were reversed. (Bug #32413686)-
Under some situations, when trying to measure the time of a CPU pause, an elapsed time of zero could result. In addition, computing the average for a very fast spin (for example, 100 loops taking less than 100ns) could zero nanoseconds. In both cases, this caused the spin calibration algorithm throw an arithmetic exception due to division by zero.
We fix both issues by modifying the algorithm so that it ignores zero values when computing mean spin time. (Bug #32413458)
References: See also: Bug #32497174.
Table and database names were not formatted correctly in the messages written to the mysqld error log when the internal method
Ndb_rep_tab_reader::scan_candidates()
found ambiguous matches for a given database, table, or server ID in thendb_replication
table. (Bug #32393245)Some queries with nested pushed joins were not processed correctly. (Bug #32354817)
-
When ndb_mgmd allocates a node ID, it reads through the configuration to find a suitable ID, causing a mutex to be held while performing hostname lookups. Because network address resolution can require large amounts of time, it is not considered good practice to hold such a mutex or lock while performing network operations.
This issue is fixed by building a list of configured nodes while holding the mutex, then using the list to perform hostname matching and other logic. (Bug #32294679)
The schema distribution participant failed to start a global checkpoint after writing a reply to the
ndb_schema_result
table, which caused an unnecessary delay before the coordinator received events from the participant notifying it of the result. (Bug #32284873)-
The global DNS cache used in ndb_mgmd caused stale lookups when restarting a node on a new machine with a new IP address, which meant that the node could not allocate a node ID.
This issue is addressed by the following changes:
Node ID allocation no longer depends on
LocalDnsCache
DnsCache
now uses local scope only
(Bug #32264914)
ndb_restore generated a core file when started with unknown or invalid arguments. (Bug #32257374)
Auto-synchronization detected the presence of mock foreign key tables in the NDB dictionary and attempted to re-create them in the MySQL server's data dictionary, although these should remain internal to the NDB Dictionary and not be exposed to the MySQL server. To fix this issue, we now ensure that the NDB Cluster auto-synchronization mechanism ignores any such mock tables. (Bug #32245636)
Improved resource usage associated with handling of cluster configuration data. (Bug #32224672)
-
Removed left-over debugging printouts from ndb_mgmd showing a client's version number upon connection. (Bug #32210216)
References: This issue is a regression of: Bug #30599413.
The backup abort protocol for handling of node failures did not function correctly for single-threaded data nodes (ndbd). (Bug #32207193)
While retrieving sorted results from a pushed-down join using
ORDER BY
with theindex
access method (and withoutfilesort
), an SQL node sometimes unexpectedly terminated. (Bug #32203548)Logging of redo log initialization showed log part indexes rather than log part numbers. (Bug #32200635)
Signal data was overwritten (and lost) due to use of extended signal memory as temporary storage. Now in such cases, extended signal memory is not used in this fashion. (Bug #32195561)
-
When
ClassicFragmentation
= 1
, the default number of partitions per node (shown in ndb_desc output asPartitionCount
) is calculated using the lowest number of LDM threads employed by any single live node, and was done only once, even after data nodes left or joined the cluster, possibly with a new configuration changing the LDM thread count and thus the default partition count. Now in such cases, we make sure the default number of partitions per node is recalculated each time data nodes join or leave the cluster.This is not an issue in NDB 8.0.23 and later, when
ClassicFragmentation
is set to 0. (Bug #32183985) The internal function
Ndb_ReloadHWInfo()
is responsible for updating hardware information for all the CPUs on the host. For the Linux ARM platform, which does not have Level 3 cache information, this assigned a socket ID for the L3 cache ID but failed to record the value for the global variablenum_shared_l3_caches
, which is needed when creating lists of CPUs connected to a shared L3 cache. (Bug #32180383)When trying to run two management nodes on the same host and using the same port number, it was not always obvious to users why they did not start. Now in such cases, in addition to writing a message to the error log, an error message Same port number is specified for management nodes
node_id1
andnode_id2
(or) they both are using the default port number on same hosthost_name
is also written to the console, making the source of the issue more immediately apparent. (Bug #32175157)Added a
--cluster-config-suffix
option for ndb_mgmd and ndb_config, for use in internal testing to override a defaults group suffix. (Bug #32157276)-
The management server returned the wrong status for host name matching when some of the host names in configuration did not resolve and client trying to allocate a node ID connected from the host whose host name resolved to a loopback address with the error Could not alloc node id at <host>:<port>: Connection with id X done from wrong host ip 127.0.0.1, expected <unresolvable_host> (lookup failed).
This caused the connecting client to fail the node ID allocation.
This issue is fixed by rewriting the internal match_hostname() function so that it contains all logic for how the requesting client address should match the configured hostnames, and so that it first checks whether the configured host name can be resolved; if not, it now returns a special error so that the client receives an error indicating that node ID allocation can be retried. The new error is Could not alloc node id at <host>:<port>: No configured host found of node type <type> for connection from ip 127.0.0.1. Some hostnames are currently unresolvable. Can be retried. (Bug #32136993)
The internal function
ndb_socket_create_dual_stack()
did not close a newly created socket when a call tondb_setsockopt()
was unsuccessful. (Bug #32105957)-
The local checkpoint (LCP) mechanism was changed in NDB 7.6 such that it also detected idle fragments—that is, fragments which had not changed since the last LCP and thus required no on-disk metadata update. The LCP mechanism could then immediately proceed to handle the next fragment. When there were a great many such idle fragments, the CPU consumption required merely to loop through these became highly significant, causing latency spikes in user transactions.
A 1 ms delay was already inserted between each such idle fragment being handled. Testing later showed this to be too short an interval, and that we are normally not in as great a hurry to complete these idle fragments as we previously believed.
This fix extends the idle fragment delay time to 20 ms if there are no redo alerts indicating an urgent need to complete the LCP. In case of a low redo alert state we wait 5 ms instead, and for a higher alert state we fall back to the 1 ms delay. (Bug #32068551)
References: See also: Bug #31655158, Bug #31613158.
When an
NDB
table was created, it was invalidated in the global dictionary cache, but this was unnecessary. Furthermore, having a table which exists in the global dictionary cache is actually an advantage for subsequent uses of the new table, since it can be found in the table cache without performing a round trip toNDB
. (Bug #32047456)No clear error message was provided when an ndb_mgmd process tried to start using the
PortNumber
of a port that was already in use. (Bug #32045786)-
Two problems occurred when
NDB
closed a table:NDB
failed to detect when the close was done fromFLUSH TABLES
, which meant that the NDB table definitions in the global dictionary cache were not invalidated.When the close was done by a thread which had not used
NDB
earlier—for example whenFLUSH TABLES
orRESET MASTER
closed instances ofha_ndbcluster
held in the table definition cache—a newThd_ndb
object was allocated, even though there is a fallback to the globalNdb
object in case the allocation fails, which never occurs in such cases, so it is less wasteful simply to use the global object already provided.
(Bug #32018394, Bug #32357856)
Removed a large number of compiler warnings relating to unused function arguments in
NdbDictionaryImpl
. (Bug #31960757)Unnecessary casts were performed when checking internal error codes. (Bug #31930166)
NDB
continued to use file system paths for determining the names of tables to open or perform DDL on, in spite of the fact that it longer actually uses files for these operations. This required unnecessary translation between character sets, handling the MySQL-specific file system encoding, and parsing. In addition, results of these operations were stored in buffers of fixed size, each instance of which used several hundred bytes of memory unnecessarily. Since the database and table names to use are already available toNDB
through other means, this translation could be (and has been) removed in most cases. (Bug #31846478)Generation of internal statistics relating to
NDB
object counts was found to lead to an increase in transaction latency at very high rates of transactions per second, brought about by returning an excessive number of freedNDB
objects. (Bug #31790329)NDB
behaved unpredictably in response an attempt to change permissions on a distributed user (that is, a user having theNDB_STORED_USER
privilege) during a binary log thread shutdown and restart. We address this issue by ensuring that the user gets a clear warning Could not distribute ACL change to other MySQL servers whenever distribution does not succeed. This fix also improves a number of mysqld log messages. (Bug #31680765)ndb_restore encountered intermittent errors while replaying backup logs which deleted blob values; this was due to deletion of blob parts when a main table row containing blob one or more values was deleted. This is fixed by modifying ndb_restore to use the asynchronous API for blob deletes, which does not trigger blob part deletes when a blob main table row is deleted (unlike the synchronous API), so that a delete log event for the main table deletes only the row from the main table. (Bug #31546136)
-
When a table creation schema transaction is prepared, the table is in
TS_CREATING
state, and is changed toTS_ACTIVE
state when the schema transaction commits on theDBDIH
block. In the case where the node acting asDBDIH
coordinator fails while the schema transaction is committing, another node starts taking over for the coordinator. The following actions are taken when handling this node failure:DBDICT
rolls the table creation schema transaction forward and commits, resulting in the table involved changing toTS_ACTIVE
state.DBDIH
starts removing the failed node from tables by moving active table replicas on the failed node from a list of stored fragment replicas to another list.
These actions are performed asynchronously many times, and when interleaving may cause a race condition. As a result, the replica list in which the replica of a failed node resides becomes nondeterministic and may differ between the recovering node (that is, the new coordinator) and other
DIH
participant nodes. This difference violated a requirement for knowing which list the failed node's replicas can be found during the recovery of the failed node recovery on the other participants.To fix this, moving active table replicas now covers not only tables in
TS_ACTIVE
state, but those inTS_CREATING
(prepared) state as well, since the prepared schema transaction is always rolled forward.In addition, the state of a table creation schema transaction which is being aborted is now changed from
TS_CREATING
orTS_IDLE
toTS_DROPPING
, to avoid any race condition there. (Bug #30521812) -
START BACKUP
SNAPSHOTSTART WAIT STARTED
could return control to the user prior to the backup's restore point from the user point of view; that is theBackup started
notification was sent before waiting for the synchronising global checkpoint (GCP) boundary. This meant that transactions committed after receiving the notification might be included in the restored data.To fix this problem,
START BACKUP
now sends a notification to the client that the backup has been started only after the GCP has truly started. (Bug #29344262) -
Upgrading to NDB Cluster 8.0 from a prior release includes an upgrade in the schema distribution mechanism, as part of which the
ndb_schema
table is dropped and recreated in a way which causes all MySQL Servers connected to the cluster to restart their binary log injector threads, causing a gap event to be written to the binary log. Since the thread restart happens at the same time on all MySQL Servers, no binary log spans the time during which the schema distribution functionality upgrade was performed, which breaks NDB Cluster Replication.This issue is fixed by adding support for gracefully reconstituting the schema distribution tables while allowing the injector thread to continue processing changes from the cluster. This is implemented by handling the DDL event notification for
DROP TABLE
to turn off support for schema distribution temporarily, and to start regular checks to re-create the tables. When the tables have been successfully created again, the regular checks are turned off and support for schema distribution is turned back on.NDB
also now detects automatically when thendb_apply_status
table has been dropped and re-creates it. The drop and re-creation leaves a gap event in the binary log, which in a replication setup causes the replica MySQL Server to stop applying changes from the source until the replication channel is restarted (see ndb_apply_status Table).In addition, the minimum version required to perform the schema distribution upgrade is raised to 8.0.24, which prevents automatic triggering of the schema distribution upgrade until all connected API nodes support the new upgrade procedure.
For more information, see NDB Cluster Replication Schema and Tables. (Bug #27697409, Bug #30877233)
References: See also: Bug #30876990.
Fixed a number of issues uncovered when trying to build
NDB
with GCC 6. (Bug #25038373)Calculation of the redo alert state based on redo log usage was overly aggressive, and thus incorrect, when using more than 1 log part per LDM.