Important Change: The default value for the
ndb_autoincrement_prefetch_sz
server system variable has been increased to 512. (Bug #30316314)Important Change:
NDB
now supports more than 2 fragment replicas (up to a maximum of 4). SettingNoOfReplicas=3
orNoOfReplicas=4
is now fully covered in our internal testing and thus supported for use in production. (Bug #97479, Bug #97579, Bug #25261716, Bug #30501414, Bug #30528105, WL #8426)-
Important Change: Added the
TransactionMemory
data node configuration parameter which simplifies configuration of data node memory allocation for transaction operations. This is part of ongoing work on pooling of transactional and Local Data Manager (LDM) memory.The following parameters are incompatible with
TransactionMemory
and cannot be set in theconfig.ini
configuration file if this parameter has been set:If you attempt to set any of these incompatible parameters concurrently with
TransactionMemory
, the cluster management server cannot start.For more information, see the description of the
TransactionMemory
parameter and Parameters incompatible with TransactionMemory. See also Data Node Memory Management, for information about how memory resources are allocated by NDB Cluster data nodes. (Bug #96995, Bug #30344471, WL #12687) -
Important Change: The maximum or default values for several NDB Cluster data node configuration parameters have been changed in this release. These changes are listed here:
The maximum value for
DataMemory
is increased from 1 terabyte to 16 TB.The maximum value for
DiskPageBufferMemory
is also increased from 1 TB to 16 TB.The default value for
StringMemory
is decreased to 5 percent. Previously, this was 25 percent.The default value for
LcpScanProgressTimeout
is increased from 60 seconds to 180 seconds.
(WL #13382)
-
Performance: Read from any fragment replica, which greatly improves the performance of table reads at a very low cost to table write performance, is now enabled by default for all
NDB
tables. This means both that the default value for thendb_read_backup
system variable is now ON, and that the value of theNDB_TABLE
comment optionREAD_BACKUP
is 1 when creating a newNDB
table. (Previously, the default values were OFF and 0, respectively.)For more information, see Setting NDB Comment Options, as well as the description of the
ndb_read_backup
system variable. (WL #13383) -
NDB Disk Data: The latency of checkpoints for Disk Data files has been reduced when using non-volatile memory devices such as solid-state drives (especially those using NVMe for data transfer), separate physical drives for Disk Data files, or both. As part of this work, two new data node configuration parameters, listed here, have been introduced:
MaxDiskDataLatency
sets a maximum on allowed latency for disk access, aborting transactions exceeding this amount of time to completeDiskDataUsingSameDisk
makes it possible to take advantage of keeping Disk Data files on separate disks by increasing the rate at which Disk Data checkpoints can be made
This release also adds three new tables to the
ndbinfo
database. These tables, listed here, can assist with performance monitoring of Disk Data checkpointing:diskstat
provides information about Disk Data tablespace reads, writes, and page requests during the previous 1 seconddiskstats_1sec
provides information similar to that given by thediskstat
table, but does so for each of the last 20 secondspgman_time_track_stats
table reports on the latency of disk operations affecting Disk Data tablespaces
For additional information, see Disk Data latency parameters. (WL #12924)
Added the
ndb_metadata_sync
server system variable, which simplifies knowing when metadata synchronization has completed successfully. Setting this variable totrue
triggers immediate synchronization of all changes between theNDB
dictionary and the MySQL data dictionary without regard to any values set forndb_metadata_check
orndb_metadata_check_interval
. When synchronization has completed, its value is automatically reset tofalse
. (Bug #30406657)Added the
DedicatedNode
parameter for data nodes, API nodes, and management nodes. When set to true, this parameter prevents the management server from handing out this node's node ID to any node that does not request it specifically. Intended primarily for testing, this parameter may be useful in cases in which multiple management servers are running on the same host, and using the host name alone is not sufficient for distinguishing among processes of the same type. (Bug #91406, Bug #28239197)A stack trace is now written to the data node log on abnormal termination of a data node. (WL #13166)
Automatic synchronization of metadata from the MySQL data dictionary to
NDB
now includes databases containingNDB
tables. With this enhancement, if a table exists inNDB
, and the table and the database it belongs to do not exist on a given SQL node, it is no longer necessary to create the database manually. Instead, the database, along with allNDB
tables belonging to this database, should be created on the SQL node automatically. (WL #13490)
-
Incompatible Change: ndb_restore no longer restores shared users and grants to the
mysql.ndb_sql_metadata
table by default. A new command-line option--include-stored-grants
is added to override this behavior and enable restoring of shared user and grant data and metadata.As part of this fix, ndb_restore can now also correctly handle an ordered index on a system table. (Bug #30237657)
References: See also: Bug #29534239, Bug #30459246.
-
Incompatible Change: The minimum value for the
RedoOverCommitCounter
data node configuration parameter has been increased from 0 to 1. The minimum value for theRedoOverCommitLimit
data node configuration parameter has also been increased from 0 to 1.You should check the cluster global configuration file and make any necessary adjustments to values set for these parameters before upgrading. (Bug #29752703)
macOS: On macOS, SQL nodes sometimes shut down unexpectedly during the binary log setup phase when starting the cluster. This occurred when there existed schemas whose names used uppercase letters and
lower_case_table_names
was set to 2. This caused acquisition of metadata locks to be attempted using keys having the incorrect lettercase, and, subsequently, these locks to fail. (Bug #30192373)Microsoft Windows; NDB Disk Data: On Windows, restarting a data node other than the master when using Disk Data tables led to a failure in
TSMAN
. (Bug #97436, Bug #30484272)Solaris: When debugging, ndbmtd consumed all available swap space on Solaris 11.4 SRU 12 and later. (Bug #30446577)
Solaris: The byte order used for numeric values stored in the
mysql.ndb_sql_metadata
table was incorrect on Solaris/Sparc. This could be seen when using ndb_select_all or ndb_restore--print
. (Bug #30265016)-
NDB Disk Data: After dropping a disk data table on one SQL node, trying to execute a query against
INFORMATION_SCHEMA.FILES
on a different SQL node stalled atWaiting for tablespace metadata lock
. (Bug #30152258)References: See also: Bug #29871406.
NDB Disk Data:
ALTER TABLESPACE ... ADD DATAFILE
could sometimes hang while trying to acquire a metadata lock. (Bug #29871406)NDB Disk Data: Compatibility code for the Version 1 disk format used prior to the introduction of the Version 2 format in NDB 7.6 turned out not to be necessary, and is no longer used.
-
Work done in NDB 8.0.18 to allow more nodes introduced long signal variants of several signals taking a bitmask as one of their arguments, and we started using these new long signal variants even if the previous (still supported) short variants would have been sufficient. This introduced several new opportunities for hitting out of LongMessageBuffer errors.
To avoid this, now in such cases we use the short signal variants wherever possible. Some of the signals affected include
CM_REGCONF
,CM_REGREF
,FAIL_REP
,NODE_FAILREP
,ISOLATE_ORD
,COPY_GCIREQ
,START_RECREQ
,NDB_STARTCONF
, andSTART_LCP_REQ
. (Bug #30708009)References: See also: Bug #30707970.
-
The fix made in NDB 8.0.18 for an issue in which a transaction was committed prematurely aborted the transaction if the table definition had changed midway, but failed in testing to free memory allocated by
getExtraMetadata()
. Now this memory is properly freed before aborting the transaction. (Bug #30576983)References: This issue is a regression of: Bug #29911440.
Excessive allocation of attribute buffer when initializing data in
DBTC
led to preallocation of api connection records failing due to unexpectedly running out of memory. (Bug #30570264)Improved error handling in the case where
NDB
attempted to update a local user having theNDB_STORED_USER
privilege but which could not be found in thendb_sql_metadata
table. (Bug #30556487)Failure of a transaction during execution of an
ALTER TABLE ... ALGORITHM=COPY
statement following the rename of the new table to the name of the original table but before dropping the original table caused mysqld to exit prematurely. (Bug #30548209)Non-MSI builds on Windows using
-DWITH_NDBCLUSTER
did not succeed unless the WiX toolkit was installed. (Bug #30536837)-
The
allowed_values
output from ndb_config--xml
--configinfo
for theArbitration
data node configuration parameter in NDB 8.0.18 was not consistent with that obtained in previous releases. (Bug #30529220)References: See also: Bug #30505003.
A faulty
ndbrequire()
introduced when implementing partial local checkpoints assumed thatm_participatingLQH
must be clear when receivingSTART_LCP_REQ
, which is not necessarily true when a failure happens for the master after sendingSTART_LCP_REQ
and before handling anySTART_LCP_CONF
signals. (Bug #30523457)A local checkpoint sometimes hung when the master node failed while sending an
LCP_COMPLETE_REP
signal and it was sent to some nodes, but not all of them. (Bug #30520818)The management server did not handle all cases of
NODE_FAILREP
correctly. (Bug #30520066)With
SharedGlobalMemory
set to 0, some resources did not meet required minimums. (Bug #30411835)Execution of ndb_restore
--rebuild-indexes
together with the--rewrite-database
and--exclude-missing-tables
options did not create indexes for any tables in the target database. (Bug #30411122)-
When writing the schema operation into the
ndb_schema
table failed, the states in theNDB_SCHEMA
object were not cleared, which led to the SQL node shutting down when it tried to free the object. (Bug #30402362)References: See also: Bug #30371590.
When synchronizing extent pages it was possible for the current local checkpoint (LCP) to stall indefinitely if a
CONTINUEB
signal for handling the LCP was still outstanding when receiving theFSWRITECONF
signal for the last page written in the extent synchronization page. The LCP could also be restarted if another page was written from the data pages. It was also possible that this issue causedPREP_LCP
pages to be written at times when they should not have been. (Bug #30397083)-
If a transaction was aborted while getting a page from the disk page buffer and the disk system was overloaded, the transaction hung indefinitely. This could also cause restarts to hang and node failure handling to fail. (Bug #30397083, Bug #30360681)
References: See also: Bug #30152258.
Data node failures with the error Another node failed during system restart... occurred during a partial restart. (Bug #30368622)
Automatic synchronization could potentially trigger an increase in the number of locks being taken on a particular metadata object at a given time, such as when a synchronization attempt coincided with a DDL or DML statement involving the same metadata object; competing locks could lead to the NDB deadlock detection logic penalizing the user action rather than the background synchronization. We fix this by changing all exclusive metadata lock acquisition attempts during auto-synchronization so that they use a timeout of 0 (rather than the 10 seconds previously allowed), which avoids deadlock detection and gives priority to the user action. (Bug #30358470)
If a
SYNC_EXTENT_PAGES_REQ
signal was received byPGMAN
while dropping a log file group as part of a partial local checkpoint, and thus dropping the page locked by this block for processing next, the LCP terminated due to trying to access the page after it had already been dropped. (Bug #30305315)-
The wrong number of bytes was reported in the cluster log for a completed local checkpoint. (Bug #30274618)
References: See also: Bug #29942998.
Added the new ndb_mgm client debugging commands
DUMP 2356
andDUMP 2357
. (Bug #30265415)Executing ndb_drop_table using the
--help
option caused this program to terminate prematurely, and without producing any help output. (Bug #30259264)A mysqld trying to connect to the cluster, and thus trying to acquire the global schema lock (GSL) during setup, ignored the setting for
ndb-wait-setup
and hung indefinitely when the GSL had already been acquired by another mysqld, such as when it was executing anALTER TABLE
statement. (Bug #30242141)When a table containing self-referential foreign key (in other words, a foreign key referencing another column of the same table) was altered using the
COPY
algorithm, the foreign key definition was removed. (Bug #30233405)-
In MySQL 8.0, names of foreign keys explicitly provided by user are generated automatically in the SQL layer and stored in the data dictionary. Such names are of the form
[
which align with the names generated by thetable_name
]_ibfk_[#
]InnoDB
storage engine in MySQL 5.7. NDB 8.0.18 introduced a change in behavior byNDB
such that it also uses the generated names, but in some cases, such as when tables were renamed,NDB
still generated and used its own format for such names internally rather than those generated by the SQL layer and stored in the data dictionary, which led to the following issues:Discrepancies in
SHOW CREATE TABLE
output and the contents ofINFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
Improper metadata locking for foreign keys
Confusing names for foreign keys in error messages
Now
NDB
also renames the foreign keys in such cases, using the names provided by the MySQL server, to align fully with those used byInnoDB
. (Bug #30210839)References: See also: Bug #96508, Bug #30171959.
When a table referenced by a foreign key was renamed, participating SQL nodes did not properly update the foreign key definitions for the referencing table in their data dictionaries during schema distribution. (Bug #30191068)
Data node handling of failures of other data nodes could sometimes not be synchronized properly, such that two or more data nodes could see different nodes as the master node. (Bug #30188414)
Some scan operations failed due to the presence of an old assert in
DbtupBuffer.cpp
that checked whether API nodes were using a version of the software previous to NDB 6.4. This was no longer necessary or correct, and has been removed. (Bug #30188411)-
When executing a global schema lock (GSL),
NDB
used a singleNdb_table_guard
object for successive retires when attempting to obtain a table object reference; it was not possible for this to succeed after failing on the first attempt, sinceNdb_table_guard
assumes that the underlying object pointer is determined once only—at initialisation—with the previously retrieved pointer being returned from a cached reference thereafter.This resulted in infinite waits to obtain the GSL, causing the binlog injector thread to hang so that mysqld considered all
NDB
tables to be read-only. To avoid this problem,NDB
now uses a fresh instance ofNdb_table_guard
for each such retry. (Bug #30120858)References: This issue is a regression of: Bug #30086352.
When upgrading an SQL node to NDB 8.0 from a previous release series, the
.frm
file whose contents are read and then installed in the data dictionary does not contain any information about foreign keys. This meant that foreign key information was not installed in the SQL node's data dictionary. This is fixed by using the foreign key information available in the NDB data dictionary to update the local MySQL data dictionary during table metadata upgrade. (Bug #30071043)-
Restoring tables with the
--disable-indexes
option resulted in the wrong table definition being installed in the MySQL data dictionary. This is because the serialized dictionary information (SDI) packed into the NDB dictionary's table definition is used to create the table object; the SDI definition is updated only when the DDL change is done through the MySQL server. Installation of the wrong table definition meant that the table could not be opened until the indexes were re-created in the NDB dictionary again using--rebuild-indexes
.This is fixed by extending auto-synchronization such that it compares the SDI to the NDB dictionary table information and fails in cases in which the column definitions do not match. Mismatches involving indexes only are treated as temporary errors, with the table in question being detected again during the next round of change detection. (Bug #30000202, Bug #30414514)
Restoring tables for which
MAX_ROWS
was used to alter partitioning from a backup made from NDB 7.4 to a cluster running NDB 7.6 did not work correctly. This is fixed by ensuring that the upgrade code handlingPartitionBalance
supplies a valid table specification to theNDB
dictionary. (Bug #29955656)The number of data bytes for the summary event written in the cluster log when a backup completed was truncated to 32 bits, so that there was a significant mismatch between the number of log records and the number of data records printed in the log for this event. (Bug #29942998)
-
mysqld sometimes aborted during a long
ALTER TABLE
operation that timed out. (Bug #29894768)References: See also: Bug #29192097.
-
When an SQL node connected to
NDB
, it did not know whether it had previously connected to that cluster, and thus could not determine whether its data dictionary information was merely out of date, or completely invalid. This issue is solved by implementing a unique schema version identifier (schema UUID) to thendb_schema
table inNDB
as well as to thendb_schema
table object in the data dictionary. Now, whenever a mysqld connects to a cluster as an SQL node, it can compare the schema UUID stored in its data dictionary against that which is stored in thendb_schema
table, and so know whether it is connecting for the first time. If so, the SQL node removes any entries that may be in its data dictionary. (Bug #29894166)References: See also: Bug #27543602.
Improved log messages generated by table discovery and table metadata upgrades. (Bug #29894127)
-
Using 2 LDM threads on a 2-node cluster with 10 threads per node could result in a partition imbalance, such that one of the LDM threads on each node was the primary for zero fragments. Trying to restore a multi-threaded backup from this cluster failed because the datafile for one LDM contained only the 12-byte data file header, which ndb_restore was unable to read. The same problem could occur in other cases, such as when taking a backup immediately after adding an empty node online.
It was found that this occurred when
ODirect
was enabled for an EOF backup data file write whose size was less than 512 bytes and the backup was in theSTOPPING
state. This normally occurs only for an aborted backup, but could also happen for a successful backup for which an LDM had no fragments. We fix the issue by introducing an additional check to ensure that writes are skipped only if the backup actually contains an error which should cause it to abort. (Bug #29892660)References: See also: Bug #30371389.
For
NDB
tables,ALTER TABLE ... ALTER INDEX
did not work withALGORITHM=INPLACE
. (Bug #29700197)-
ndb_restore failed in testing on 32-bit platforms. This issue is fixed by increasing the size of the thread stack used by this tool from 64 KB to 128 KB. (Bug #29699887)
References: See also: Bug #30406046.
An unplanned shutdown of the cluster occurred due to an error in
DBTUP
while deleting rows from a table following an online upgrade. (Bug #29616383)-
In some cases the
SignalSender
class, used as part of the implementation of ndb_mgmd andndbinfo
, buffered excessive numbers of unneededSUB_GCP_COMPLETE_REP
andAPI_REGCONF
signals, leading to unnecessary consumption of memory. (Bug #29520353)References: See also: Bug #20075747, Bug #29474136.
The setting for the
BackupLogBufferSize
configuration parameter was not honored. (Bug #29415012)-
When mysqld was run with the
--upgrade=FORCE
option, it reported the following issues:[Warning] Table 'mysql.ndb_apply_status' requires repair. [ERROR] Table 'mysql.ndb_apply_status' repair failed.
This was because
--upgrade=FORCE
causes a bootstrap system thread to runCHECK TABLE FOR UPGRADE
, butha_ndbcluster::open()
refused to open the table before schema synchronization had completed, which eventually led to the reported conditions. (Bug #29305977)References: See also: Bug #29205142.
When using explicit SHM connections, with
ShmSize
set to a value larger than the system's available shared memory, mysqld hung indefinitely on startup and produced no useful error messages. (Bug #28875553)-
The maximum global checkpoint (GCP) commit lag and GCP save timeout are recalculated whenever a node shuts down, to take into account the change in number of data nodes. This could lead to the unintentional shutdown of a viable node when the threshold decreased below the previous value. (Bug #27664092)
References: See also: Bug #26364729.
-
A transaction which inserts a child row may run concurrently with a transaction which deletes the parent row for that child. One of the transactions should be aborted in this case, lest an orphaned child row result.
Before committing an insert on a child row, a read of the parent row is triggered to confirm that the parent exists. Similarly, before committing a delete on a parent row, a read or scan is performed to confirm that no child rows exist. When insert and delete transactions were run concurrently, their prepare and commit operations could interact in such a way that both transactions committed. This occurred because the triggered reads were performed using
LM_CommittedRead
locks (seeNdbOperation::LockMode
), which are not strong enough to prevent such error scenarios.This problem is fixed by using the stronger
LM_SimpleRead
lock mode for both triggered reads. The use ofLM_SimpleRead
rather thanLM_CommittedRead
locks ensures that at least one transaction aborts in every possible scenario involving transactions which concurrently insert into child rows and delete from parent rows. (Bug #22180583) Concurrent
SELECT
andALTER TABLE
statements on the same SQL node could sometimes block one another while waiting for locks to be released. (Bug #17812505, Bug #30383887)Failure handling in schema synchronization involves pushing warnings and errors to the binary logging thread. Schema synchronization is also retried in case of certain failures which could lead to an accumulation of warnings in the thread. Now such warnings and errors are cleared following each attempt at schema synchronization. (Bug #2991036)
An
INCL_NODECONF
signal from any local blocks should be ignored when a node has failed, except in order to resetc_nodeStartSlave.nodeId
. (Bug #96550, Bug #30187779)-
When returning Error 1022,
NDB
did not print the name of the affected table. (Bug #74218, Bug #19763093)References: See also: Bug #29700174.