MySQL NDB Cluster 8.0.19 is a new release of NDB 8.0, based on
MySQL Server 8.0 and including features in version 8.0 of the
NDB storage engine, as well as fixing
recently discovered bugs in previous NDB Cluster releases.
Obtaining NDB Cluster 8.0. NDB Cluster 8.0 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
For an overview of changes made in NDB Cluster 8.0, see What is New in NDB Cluster.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 8.0 through MySQL 8.0.19 (see Changes in MySQL 8.0.19 (2020-01-13, General Availability)).
Important Change: The default value for the
ndb_autoincrement_prefetch_szserver system variable has been increased to 512. (Bug #30316314)
NDBnow supports more than 2 fragment replicas (up to a maximum of 4). Setting
NoOfReplicas=4is now fully covered in our internal testing and thus supported for use in production. (Bug #97479, Bug #97579, Bug #25261716, Bug #30501414, Bug #30528105)
Important Change: Added the
TransactionMemorydata node configuration parameter which simplifies configuration of data node memory allocation for transaction operations. This is part of ongoing work on pooling of transactional and Local Data Manager (LDM) memory.
The following parameters are incompatible with
TransactionMemoryand cannot be set in the
config.iniconfiguration file if this parameter has been set:
If you attempt to set any of these incompatible parameters concurrently with
TransactionMemory, the cluster management server cannot start.
For more information, see the description of the
TransactionMemoryparameter and Parameters incompatible with TransactionMemory. See also Data Node Memory Management, for information about how memory resources are allocated by NDB Cluster data nodes. (Bug #96995, Bug #30344471)
Important Change: The maximum or default values for several NDB Cluster data node configuration parameters have been changed in this release. These changes are listed here:
The maximum value for
DataMemoryis increased from 1 terabyte to 16 TB.
The maximum value for
DiskPageBufferMemoryis also increased from 1 TB to 16 TB.
The default value for
StringMemoryis decreased to 5 percent. Previously, this was 25 percent.
The default value for
LcpScanProgressTimeoutis increased from 60 seconds to 180 seconds.
Performance: Read from any fragment replica, which greatly improves the performance of table reads at a very low cost to table write performance, is now enabled by default for all
NDBtables. This means both that the default value for the
ndb_read_backupsystem variable is now ON, and that the value of the
READ_BACKUPis 1 when creating a new
NDBtable. (Previously, the default values were OFF and 0, respectively.)
NDB Disk Data: The latency of checkpoints for Disk Data files has been reduced when using non-volatile memory devices such as solid-state drives (especially those using NVMe for data transfer), separate physical drives for Disk Data files, or both. As part of this work, two new data node configuration parameters, listed here, have been introduced:
MaxDiskDataLatencysets a maximum on allowed latency for disk access, aborting transactions exceeding this amount of time to complete
DiskDataUsingSameDiskmakes it possible to take advantage of keeping Disk Data files on separate disks by increasing the rate at which Disk Data checkpoints can be made
This release also adds three new tables to the
ndbinfodatabase. These tables, listed here, can assist with performance monitoring of Disk Data checkpointing:
diskstatprovides information about Disk Data tablespace reads, writes, and page requests during the previous 1 second
diskstats_1secprovides information similar to that given by the
diskstattable, but does so for each of the last 20 seconds
pgman_time_track_statstable reports on the latency of disk operations affecting Disk Data tablespaces
For additional information, see Disk Data latency parameters.
ndb_metadata_syncserver system variable, which simplifies knowing when metadata synchronization has completed successfully. Setting this variable to
truetriggers immediate synchronization of all changes between the
NDBdictionary and the MySQL data dictionary without regard to any values set for
ndb_metadata_check_interval. When synchronization has completed, its value is automatically reset to
false. (Bug #30406657)
DedicatedNodeparameter for data nodes, API nodes, and management nodes. When set to true, this parameter prevents the management server from handing out this node's node ID to any node that does not request it specifically. Intended primarily for testing, this parameter may be useful in cases in which multiple management servers are running on the same host, and using the host name alone is not sufficient for distinguishing among processes of the same type. (Bug #91406, Bug #28239197)
A stack trace is now written to the data node log on abnormal termination of a data node.
Automatic synchronization of metadata from the MySQL data dictionary to
NDBnow includes databases containing
NDBtables. With this enhancement, if a table exists in
NDB, and the table and the database it belongs to do not exist on a given SQL node, it is no longer necessary to create the database manually. Instead, the database, along with all
NDBtables belonging to this database, should be created on the SQL node automatically.
Incompatible Change: ndb_restore no longer restores shared users and grants to the
mysql.ndb_sql_metadatatable by default. A new command-line option
--include-stored-grantsis added to override this behavior and enable restoring of shared user and grant data and metadata.
As part of this fix, ndb_restore can now also correctly handle an ordered index on a system table. (Bug #30237657)
References: See also: Bug #29534239, Bug #30459246.
Incompatible Change: The minimum value for the
RedoOverCommitCounterdata node configuration parameter has been increased from 0 to 1. The minimum value for the
RedoOverCommitLimitdata node configuration parameter has also been increased from 0 to 1.
You should check the cluster global configuration file and make any necessary adjustments to values set for these parameters before upgrading. (Bug #29752703)
macOS: On macOS, SQL nodes sometimes shut down unexpectedly during the binary log setup phase when starting the cluster. This occurred when there existed schemas whose names used uppercase letters and
lower_case_table_nameswas set to 2. This caused acquisition of metadata locks to be attempted using keys having the incorrect lettercase, and, subsequently, these locks to fail. (Bug #30192373)
Microsoft Windows; NDB Disk Data: On Windows, restarting a data node other than the master when using Disk Data tables led to a failure in
TSMAN. (Bug #97436, Bug #30484272)
Solaris: When debugging, ndbmtd consumed all available swap space on Solaris 11.4 SRU 12 and later. (Bug #30446577)
Solaris: The byte order used for numeric values stored in the
mysql.ndb_sql_metadatatable was incorrect on Solaris/Sparc. This could be seen when using ndb_select_all or ndb_restore
NDB Disk Data: After dropping a disk data table on one SQL node, trying to execute a query against
INFORMATION_SCHEMA.FILESon a different SQL node stalled at
Waiting for tablespace metadata lock. (Bug #30152258)
References: See also: Bug #29871406.
NDB Disk Data:
ALTER TABLESPACE ... ADD DATAFILEcould sometimes hang while trying to acquire a metadata lock. (Bug #29871406)
NDB Disk Data: Compatibility code for the Version 1 disk format used prior to the introduction of the Version 2 format in NDB 7.6 turned out not to be necessary, and is no longer used.
Work done in NDB 8.0.18 to allow more nodes introduced long signal variants of several signals taking a bitmask as one of their arguments, and we started using these new long signal variants even if the previous (still supported) short variants would have been sufficient. This introduced several new opportunities for hitting out of LongMessageBuffer errors.
To avoid this, now in such cases we use the short signal variants wherever possible. Some of the signals affected include
START_LCP_REQ. (Bug #30708009)
References: See also: Bug #30707970.
The fix made in NDB 8.0.18 for an issue in which a transaction was committed prematurely aborted the transaction if the table definition had changed midway, but failed in testing to free memory allocated by
getExtraMetadata(). Now this memory is properly freed before aborting the transaction. (Bug #30576983)
References: This issue is a regression of: Bug #29911440.
Excessive allocation of attribute buffer when initializing data in
DBTCled to preallocation of api connection records failing due to unexpectedly running out of memory. (Bug #30570264)
Failure of a transaction during execution of an
ALTER TABLE ... ALGORITHM=COPYstatement following the rename of the new table to the name of the original table but before dropping the original table caused mysqld to exit prematurely. (Bug #30548209)
Non-MSI builds on Windows using
-DWITH_NDBCLUSTERdid not succeed unless the WiX toolkit was installed. (Bug #30536837)
allowed_valuesoutput from ndb_config
Arbitrationdata node configuration parameter in NDB 8.0.18 was not consistent with that obtained in previous releases. (Bug #30529220)
References: See also: Bug #30505003.
ndbrequire()introduced when implementing partial local checkpoints assumed that
m_participatingLQHmust be clear when receiving
START_LCP_REQ, which is not necessarily true when a failure happens for the master after sending
START_LCP_REQand before handling any
START_LCP_CONFsignals. (Bug #30523457)
A local checkpoint sometimes hung when the master node failed while sending an
LCP_COMPLETE_REPsignal and it was sent to some nodes, but not all of them. (Bug #30520818)
The management server did not handle all cases of
NODE_FAILREPcorrectly. (Bug #30520066)
SharedGlobalMemoryset to 0, some resources did not meet required minimums. (Bug #30411835)
When writing the schema operation into the
ndb_schematable failed, the states in the
NDB_SCHEMAobject were not cleared, which led to the SQL node shutting down when it tried to free the object. (Bug #30402362)
References: See also: Bug #30371590.
When synchronizing extent pages it was possible for the current local checkpoint (LCP) to stall indefinitely if a
CONTINUEBsignal for handling the LCP was still outstanding when receiving the
FSWRITECONFsignal for the last page written in the extent synchronization page. The LCP could also be restarted if another page was written from the data pages. It was also possible that this issue caused
PREP_LCPpages to be written at times when they should not have been. (Bug #30397083)
If a transaction was aborted while getting a page from the disk page buffer and the disk system was overloaded, the transaction hung indefinitely. This could also cause restarts to hang and node failure handling to fail. (Bug #30397083, Bug #30360681)
References: See also: Bug #30152258.
Data node failures with the error Another node failed during system restart... occurred during a partial restart. (Bug #30368622)
Automatic synchronization could potentially trigger an increase in the number of locks being taken on a particular metadata object at a given time, such as when a synchronization attempt coincided with a DDL or DML statement involving the same metadata object; competing locks could lead to the NDB deadlock detection logic penalizing the user action rather than the background synchronization. We fix this by changing all exclusive metadata lock acquisition attempts during auto-synchronization so that they use a timeout of 0 (rather than the 10 seconds previously allowed), which avoids deadlock detection and gives priority to the user action. (Bug #30358470)
SYNC_EXTENT_PAGES_REQsignal was received by
PGMANwhile dropping a log file group as part of a partial local checkpoint, and thus dropping the page locked by this block for processing next, the LCP terminated due to trying to access the page after it had already been dropped. (Bug #30305315)
The wrong number of bytes was reported in the cluster log for a completed local checkpoint. (Bug #30274618)
References: See also: Bug #29942998.
A mysqld trying to connect to the cluster, and thus trying to acquire the global schema lock (GSL) during setup, ignored the setting for
ndb-wait-setupand hung indefinitely when the GSL had already been acquired by another mysqld, such as when it was executing an
ALTER TABLEstatement. (Bug #30242141)
When a table containing self-referential foreign key (in other words, a foreign key referencing another column of the same table) was altered using the
COPYalgorithm, the foreign key definition was removed. (Bug #30233405)
In MySQL 8.0, names of foreign keys explicitly provided by user are generated automatically in the SQL layer and stored in the data dictionary. Such names are of the form
[which align with the names generated by the
InnoDBstorage engine in MySQL 5.7. NDB 8.0.18 introduced a change in behavior by
NDBsuch that it also uses the generated names, but in some cases, such as when tables were renamed,
NDBstill generated and used its own format for such names internally rather than those generated by the SQL layer and stored in the data dictionary, which led to the following issues:
NDBalso renames the foreign keys in such cases, using the names provided by the MySQL server, to align fully with those used by
InnoDB. (Bug #30210839)
References: See also: Bug #96508, Bug #30171959.
When a table referenced by a foreign key was renamed, participating SQL nodes did not properly update the foreign key definitions for the referencing table in their data dictionaries during schema distribution. (Bug #30191068)
Data node handling of failures of other data nodes could sometimes not be synchronized properly, such that two or more data nodes could see different nodes as the master node. (Bug #30188414)
Some scan operations failed due to the presence of an old assert in
DbtupBuffer.cppthat checked whether API nodes were using a version of the software previous to NDB 6.4. This was no longer necessary or correct, and has been removed. (Bug #30188411)
When executing a global schema lock (GSL),
NDBused a single
Ndb_table_guardobject for successive retires when attempting to obtain a table object reference; it was not possible for this to succeed after failing on the first attempt, since
Ndb_table_guardassumes that the underlying object pointer is determined once only—at initialisation—with the previously retrieved pointer being returned from a cached reference thereafter.
This resulted in infinite waits to obtain the GSL, causing the binlog injector thread to hang so that mysqld considered all
NDBtables to be read-only. To avoid this problem,
NDBnow uses a fresh instance of
Ndb_table_guardfor each such retry. (Bug #30120858)
References: This issue is a regression of: Bug #30086352.
When upgrading an SQL node to NDB 8.0 from a previous release series, the
.frmfile whose contents are read and then installed in the data dictionary does not contain any information about foreign keys. This meant that foreign key information was not installed in the SQL node's data dictionary. This is fixed by using the foreign key information available in the NDB data dictionary to update the local MySQL data dictionary during table metadata upgrade. (Bug #30071043)
Restoring tables with the
--disable-indexesoption resulted in the wrong table definition being installed in the MySQL data dictionary. This is because the serialized dictionary information (SDI) packed into the NDB dictionary's table definition is used to create the table object; the SDI definition is updated only when the DDL change is done through the MySQL server. Installation of the wrong table definition meant that the table could not be opened until the indexes were re-created in the NDB dictionary again using
This is fixed by extending auto-synchronization such that it compares the SDI to the NDB dictionary table information and fails in cases in which the column definitions do not match. Mismatches involving indexes only are treated as temporary errors, with the table in question being detected again during the next round of change detection. (Bug #30000202, Bug #30414514)
Restoring tables for which
MAX_ROWSwas used to alter partitioning from a backup made from NDB 7.4 to a cluster running NDB 7.6 did not work correctly. This is fixed by ensuring that the upgrade code handling
PartitionBalancesupplies a valid table specification to the
NDBdictionary. (Bug #29955656)
The number of data bytes for the summary event written in the cluster log when a backup completed was truncated to 32 bits, so that there was a significant mismatch between the number of log records and the number of data records printed in the log for this event. (Bug #29942998)
References: See also: Bug #29192097.
When an SQL node connected to
NDB, it did not know whether it had previously connected to that cluster, and thus could not determine whether its data dictionary information was merely out of date, or completely invalid. This issue is solved by implementing a unique schema version identifier (schema UUID) to the
NDBas well as to the
ndb_schematable object in the data dictionary. Now, whenever a mysqld connects to a cluster as an SQL node, it can compare the schema UUID stored in its data dictionary against that which is stored in the
ndb_schematable, and so know whether it is connecting for the first time. If so, the SQL node removes any entries that may be in its data dictionary. (Bug #29894166)
References: See also: Bug #27543602.
Improved log messages generated by table discovery and table metadata upgrades. (Bug #29894127)
Using 2 LDM threads on a 2-node cluster with 10 threads per node could result in a partition imbalance, such that one of the LDM threads on each node was the primary for zero fragments. Trying to restore a multi-threaded backup from this cluster failed because the datafile for one LDM contained only the 12-byte data file header, which ndb_restore was unable to read. The same problem could occur in other cases, such as when taking a backup immediately after adding an empty node online.
It was found that this occurred when
ODirectwas enabled for an EOF backup data file write whose size was less than 512 bytes and the backup was in the
STOPPINGstate. This normally occurs only for an aborted backup, but could also happen for a successful backup for which an LDM had no fragments. We fix the issue by introducing an additional check to ensure that writes are skipped only if the backup actually contains an error which should cause it to abort. (Bug #29892660)
References: See also: Bug #30371389.
ndb_restore failed in testing on 32-bit platforms. This issue is fixed by increasing the size of the thread stack used by this tool from 64 KB to 128 KB. (Bug #29699887)
References: See also: Bug #30406046.
An unplanned shutdown of the cluster occurred due to an error in
DBTUPwhile deleting rows from a table following an online upgrade. (Bug #29616383)
In some cases the
SignalSenderclass, used as part of the implementation of ndb_mgmd and
ndbinfo, buffered excessive numbers of unneeded
API_REGCONFsignals, leading to unnecessary consumption of memory. (Bug #29520353)
References: See also: Bug #20075747, Bug #29474136.
The setting for the
BackupLogBufferSizeconfiguration parameter was not honored. (Bug #29415012)
[Warning] Table 'mysql.ndb_apply_status' requires repair. [ERROR] Table 'mysql.ndb_apply_status' repair failed.
This was because
--upgrade=FORCEcauses a bootstrap system thread to run
CHECK TABLE FOR UPGRADE, but
ha_ndbcluster::open()refused to open the table before schema synchronization had completed, which eventually led to the reported conditions. (Bug #29305977)
References: See also: Bug #29205142.
When using explicit SHM connections, with
ShmSizeset to a value larger than the system's available shared memory, mysqld hung indefinitely on startup and produced no useful error messages. (Bug #28875553)
The maximum global checkpoint (GCP) commit lag and GCP save timeout are recalculated whenever a node shuts down, to take into account the change in number of data nodes. This could lead to the unintentional shutdown of a viable node when the threshold decreased below the previous value. (Bug #27664092)
References: See also: Bug #26364729.
A transaction which inserts a child row may run concurrently with a transaction which deletes the parent row for that child. One of the transactions should be aborted in this case, lest an orphaned child row result.
Before committing an insert on a child row, a read of the parent row is triggered to confirm that the parent exists. Similarly, before committing a delete on a parent row, a read or scan is performed to confirm that no child rows exist. When insert and delete transactions were run concurrently, their prepare and commit operations could interact in such a way that both transactions committed. This occurred because the triggered reads were performed using
NdbOperation::LockMode), which are not strong enough to prevent such error scenarios.
This problem is fixed by using the stronger
LM_SimpleReadlock mode for both triggered reads. The use of
LM_CommittedReadlocks ensures that at least one transaction aborts in every possible scenario involving transactions which concurrently insert into child rows and delete from parent rows. (Bug #22180583)
Failure handling in schema synchronization involves pushing warnings and errors to the binary logging thread. Schema synchronization is also retried in case of certain failures which could lead to an accumulation of warnings in the thread. Now such warnings and errors are cleared following each attempt at schema synchronization. (Bug #2991036)
INCL_NODECONFsignal from any local blocks should be ignored when a node has failed, except in order to reset
c_nodeStartSlave.nodeId. (Bug #96550, Bug #30187779)
When returning Error 1022,
NDBdid not print the name of the affected table. (Bug #74218, Bug #19763093)
References: See also: Bug #29700174.