-
To provide protection against unauthorized recovery of data from backups, this release adds support for
NDB
native encrypted backup using AES-256-CBC. Encrypted backup files are protected by a user-supplied password.NDB
does not save this password; this needs to be done by the user or application. To create an encrypted backup, useENCRYPT PASSWORD=
with the ndb_mgm clientpassword
START BACKUP
command (in addition to any other options which may be required). You can also initiate an encrypted backup in applications by calling the MGM APIndb_mgm_start_backup4()
function.To restore from an encrypted backup, use ndb_restore with both of the options
--decrypt
and--backup-password=
. ndb_print_backup_file can also read encrypted files using thepassword
-P
option added in this release.The encryption password used with this feature can be any string of up to 256 characters from the range of printable ASCII characters other than
!
,'
,"
,$
,%
,\
, and^
. When a password is supplied for encryption or decryption, it must be quoted using either single or double quotation marks. It is possible to specify an empty password using''
or""
but this is not recommended.You can encrypt existing backup files using the ndbxfrm utility which is added to the NDB Cluster distribution in this release; this program can also decrypt encrypted backup files. ndbxfrm also compresses and decompresses NDB Cluster backup files. The compression method is the same as used by NDB Cluster for creating compressed backups when
CompressedBackup
is enabled.It is also possible to require encrypted backups using
RequireEncryptedBackup
. When this parameter is enabled (by setting it equal to 1), the management client rejects any attempt to perform a backup that is not encrypted.For more information, see Using The NDB Cluster Management Client to Create a Backup, as well as ndbxfrm — Compress, Decompress, Encrypt, and Decrypt Files Created by NDB Cluster. (WL #13474, WL #13499, WL #13548)
NDB Client Programs: Effective with this release, the MySQL NDB Cluster Auto-Installer (ndb_setup.py) has been deprecated and is subject to removal in a future version of NDB Cluster. (Bug #31888835)
ndbmemcache:
ndbmemcache
is deprecated in this release of NDB Cluster, and is scheduled for removal in the next release. (Bug #31876970)
Important Change: The
Ndb_metadata_blacklist_size
status variable was renamed asNdb_metadata_excluded_count
. (Bug #31465469)-
Packaging: Made the following improvements to the
server-minimal
RPM for NDB Cluster and the NDB Cluster Docker image:Added ndb_import and other helpful utilities.
Included NDB utilities are now linked dynamically.
The NDB Cluster Auto-Installer is no longer included.
ndbmemcache
is no longer included.
(Bug #31838832)
Added the CMake option
NDB_UTILS_LINK_DYNAMIC
, to allow dynamic linking of NDB utilities withndbclient
. (Bug #31668306)-
IPv6 addressing is now supported for connections to management and data nodes, including connections between management and data nodes with SQL nodes. For IPv6 addressing to work, the operating platform and network on which the cluster is deployed must support IPv6. Hostname resolution to IPv6 addresses must be provided by the operating platform (this is the same as when using IPv4 addressing).
Mixing IPv4 and IPv6 addresses in the same cluster is not recommended, but this can be made to work in either of the following cases, provided that
--bind-address
is not used with ndb_mgmd:Management node configured with IPv6, data nodes configured with IPv4: This works if the data nodes are started with
--ndb-connectstring
set to the IPv4 address of the management nodes.Management node configured with IPv4, data nodes configured with IPv6: This works if the data nodes are started with
--ndb-connectstring
set to the IPv6 address of the management node.
When upgrading from an NDB version that does not support IPv6 addressing to a version that does so, it is necessary that the network already support both IPv4 and IPv6. The software upgrade must be performed first; after this, you can update the IPv4 addresses used in the
config.ini
configuration file with the desired IPv6 addresses. Finally, in order for the configuration changes to take effect, perform a system restart of the cluster. (WL #12963)
Important Change; NDB Cluster APIs: The NDB Cluster adapter for Node.js was built against an obsolete version of the runtime. Now it is built using Node.js 12.18.3, and only that version or a later version of Node.js is supported by
NDB
. (Bug #31783049)-
Important Change: In order to synchronize excluded metadata objects, it was necessary to correct the underlying issue, if any, and then trigger the synchronization of the objects again. This could be achieved though discovery of individual tables, which does not scale well with an increase in the number of tables and SQL nodes. It could also be done by reconnecting the SQL node to the cluster, but doing so also incurs extra overhead.
To fix this issue, the list of database objects excluded due to synchronization failure is cleared when
ndb_metadata_sync
is enabled by the user. This makes all such objects eligible for synchronization in the subsequent detection run, which simplifies retrying the synchronization of all excluded objects.This fix also removes the validation of objects to be retried which formerly took take place at the beginning of each detection run. Since these objects are of interest only while
ndb_metadata_sync
is enabled, the list of objects to be retried is cleared when this variable is disabled, signalling that all changes have been synchronized. (Bug #31569436) Packaging: The Dojo library included with NDB Cluster has been upgraded to version 1.15.4. (Bug #31559518)
NDB Disk Data: ndbmtd sometimes terminated unexpectedly when it could not complete a lookup for a log file group during a restore operation. (Bug #31284086)
NDB Disk Data: While upgrading a cluster having 3 or 4 replicas after creating sufficient disk data objects to fill up the tablespace, and while performing inserts on the disk data tables, trying to stop some data nodes caused others to exit improperly. (Bug #30922322)
NDB Cluster APIs: In certain cases, the
Table::getColumn()
method returned the wrongColumn
object. This could happen when the full name of one table column was a prefix of the name of another, or when the names of two columns had the same hash value. (Bug #31774685)NDB Cluster APIs: It was possible to make invalid sequences of NDB API method calls using blobs. This was because some method calls implicitly cause transaction execution inline, to deal with blob parts and other issues, which could cause user-defined operations not to be handled correctly due to the use of a method executing operations relating to blobs while there still user-defined blob operations pending. Now in such cases, NDB raises a new error 4558 Pending blob operations must be executed before this call. (Bug #27772916)
ndb_restore
--remap-column
did not handle columns containingNULL
values correctly. Now any offset specified by the mapping function used with this option is not applied toNULL
, so thatNULL
is preserved as expected. (Bug #31966676)-
The ndb_print_backup_file utility did not respect byte order for row data. This tool now performs byte swapping on row page information to ensure the same results on both big-endian and little-endian platforms. (Bug #31831438)
References: See also: Bug #32470157.
-
In some cases following an upgrade from a version of NDB Cluster previous to 8.0.18 to a later one, writing the
sysfile
(see NDB Cluster Data Node File System Directory) and reading back from it did not work correctly. This could occur when explicit node group assignments to data nodes had been made (using theNodeGroup
parameter); it was possible for node group assignments to change spontaneously, and even possible for node groups not referenced in the configuration file to be added. This was due to issues with version 2 of thesysfile
format introduced in NDB 8.0.18. (Bug #31828452, Bug #31820201)References: See also: Bug #31726653.
After encountering the data node in the configuration file which used
NodeGroup=65536
, the management server stopped assigning data nodes lacking an explicitNodeGroup
setting to node groups. (Bug #31825181)Data nodes in certain cases experienced fatal memory corruption in the
PGMAN
kernel block due to an invalid assumption that pages were 32KB aligned, when in fact they are normally aligned to the system page size (4096 or 8192 bytes, depending on platform). (Bug #31768450, Bug #31773234)Fixed a misspelled define introduced in NDB 8.0.20 which made an internal function used to control adaptive spinning non-operational. (Bug #31765660)
When executing undo log records during undo log recovery it was possible when hitting a page cache miss to use the previous undo log record multiple times. (Bug #31750627)
-
When an SQL node or cluster shutdown occurred during schema distribution while the coordinator was still waiting for the participants, the schema distribution was aborted halfway but any rows in
ndb_schema_result
related to this schema operation were not cleared. This left open the possibility that these rows might conflict with a future reply from a participant if a DDL operation having the same schema operation ID originated from a client using the same node ID.To keep this from happening, we now clear all such rows in
ndb_schema_result
duringNDB
binary log setup. This assures that there are no DDL distributions in progress and any rows remaining in thendb_schema_result
table are already obsolete. (Bug #31601674) Help output from the MySQL Cluster Auto-Installer displayed incorrect version information. (Bug #31589404)
In certain rare circumstances,
NDB
missed checking for completion of a local checkpoint, leaving it uncompleted, which meant that subsequent local checkpoints could not be executed. (Bug #31577633)-
A data definition statement can sometimes involve reading or writing of multiple rows (or both) from tables;
NDBCLUSTER
starts anNdbTransaction
to perform these operations. When such a statement was rolled back,NDBCLUSTER
attempted to roll back the schema change before rolling back theNdbTransaction
and closing it; this led to the rollback hanging indefinitely while the cluster waited for theNdbTransaction
object to close before it was able to roll back the schema change.Now in such cases,
NDBCLUSTER
rolls back the schema change only after rolling back and closing any open NdbTransaction
associated with the change. (Bug #31546868) Adding a new user was not always synchronized correctly to all SQL nodes when the
NDB_STORED_USER
privilege was granted to the new user. (Bug #31486931)In some cases,
QMGR
returned conflictingNDB
engine and MySQL server version information, which could lead to unplanned management node shutdown. (Bug #31471959)SUMA
on a node that is starting up should not send aDICT_UNLOCK_ORD
signal to theDICT
block on the master node until both allSUMA_HANDOVER_REQ
signals sent have hadSUMA_HANDOVER_CONF
signals sent in response, and every switchover bucket set up on receiving aSUMA_HANDOVER_CONF
has completed switchover. In certain rare cases usingNoOfReplicas
> 2, and in which the delay between global checkpoints was unusually short, it was possible for some switchover buckets to be ready for handover before others, and for handover to proceed even though this was the case. (Bug #31459930)-
Attribute ID mapping needs to be performed when reading data from an
NDB
table using indexes or a primary key whose column order is different than that of the table. For unique indexes, a cached attribute ID map is created when the table is opened, and is then used for each subsequent read, but for primary key reads, the map was built for every read. This is changed so that an attribute ID map for primary key is built and cached when opening the table, and used whenever required for any subsequent reads. (Bug #31452597)References: See also: Bug #24444899.
During different phases of the restore process, ndb_restore used different numbers of retries for temporary errors as well as different sleep times between retries. This is fixed by implementing consistent retry counts and sleep times across all restore phases. (Bug #31372923)
Removed warnings generated when compiling
NDBCLUSTER
with Clang 10. (Bug #31344788)-
The
SPJ
block contains a load throttling mechanism used when generatingLQHKEYREQ
signals. When these were generated from parent rows from a scan, and this scan had a bushy topology with multiple children performing key lookups, it was possible to overload the job queues with too manyLQHKEYREQ
signals, causing node shutdowns due to full job buffers. This problem was originally fixed by Bug #14709490. Further investigation of this issue showed that job buffer full errors could occur even if theSPJ
query was not bushy. Due to the increase in the internal batch size forSPJ
workers in NDB 7.6.4 as part of work done to implement use of multiple fragments when sendingSCAN_FRAGREQ
signals to theSPJ
block, even a simple query could fill up the job buffers when a relatively small number of such queries were run in parallel.To fix this problem, we no longer send any further
LQHKEYREQ
signals once the number of outstanding signals in a given request exceeds 256. Instead, the parent row from which theLQHKEYREQ
is produced is buffered, and the correlation ID of this row is stored in the collection of operations to be resumed later. (Bug #31343524)References: This issue is a regression of: Bug #14709490.
-
MaxDiskWriteSpeedOwnRestart
was not honored as an upper bound for local checkpoint writes during a node restart. (Bug #31337487)References: See also: Bug #29943227.
Under certain rare circumstances,
DROP TABLE
of anNDB
table triggered an assert. (Bug #31336431)-
During a node restart, the
SUMA
block of the node that is starting must get a copy of the subscriptions (events with subscribers) and subscribers (NdbEventOperation
instances which are executing) from a node already running. Before the copy is complete, nodes which are still starting ignore any user-levelSUB_START
orSUB_STOP
requests; after the copy is done, they can participate in such requests. While the copy operation is in progress, user-levelSUB_START
andSUB_STOP
requests are blocked using aDICT
lock.An issue was found whereby a starting node could participate in
SUB_START
andSUB_STOP
requests after the lock was requested, but before it is granted, which resulted in unsuccessfulSUB_START
andSUB_STOP
requests. This fix ensures that the nodes cannot participate in these requests until after theDICT
lock has actually been granted. (Bug #31302657) -
Backups errored out with FsErrInvalidParameters when the filesystem was running with
O_DIRECT
and a data file write was not aligned with the 512-byte block size used byO_DIRECT
writes. If the total fragment size in the data file is not aligned with theO_DIRECT
block size,NDB
pads the last write to the required size, but when there were no fragments to write,BACKUP
wrote only the header and footer to the data file. Since the header and footer are less than 512 bytes, leading to the issue with theO_DIRECT
write.This is fixed by padding out the generic footer to 512 bytes if necessary, using an
EMPTY_ENTRY
, when closing the data file. (Bug #31180508) When employing an execution strategy which requires it to buffer received key rows for later use,
DBSPJ
now manages the buffer memory allocation tree node by tree node, resulting in a significant drop in CPU usage by theDBSPJ
block. (Bug #31174015)DBSPJ
now uses linear memory instead of segmented memory for storing and handlingTRANSID_AI
signals, which saves approximately 10% of the CPU previously consumed. Due to this change, it is now possible forDBSPJ
to acceptTRANSID_AI
signals in the short signal format; this is more efficient than the long signal format which requires segmented memory. (Bug #31173582, Bug #31173766)Altering the table comment of a fully replicated table using
ALGORITHM=INPLACE
led to an assertion. (Bug #31139313)-
A local data manager (LDM) has a mechanism for ensuring that a fragment scan does not continue indefinitely when it finds too few rows to fill the available batch size in a reasonable amount of time (such as when a ScanFilter evaluates to false for most of the scanned rows). When this time limit, set in
DBLQH
as 10 ms, has expired, any rows found up to that point are returned, independent of whether the specified batch size has been filled or not. This acts as a keep-alive mechanism between data and API nodes, as well as to avoid keeping any locks held during the scan for too long.A side effect of this is that returning result row batches to the
DBSPJ
block which are filled well below the expected limit could cause performance issues. This was due not only to poor utilization of the space reserved for batches, requiring moreNEXTREQ
round trips, but because it also causedDBSPJ
internal parallelism statistics to become unreliable.Since the
DBSPJ
block never requests locks when performing scans, overly long locks are not a problem for SPJ requests. Thus it is considered safe to let scans requested byDBSPJ
to continue for longer than the 10 ms allowed previously, and the limit set inDBLQH
has been increased to 100 ms. (Bug #31124065) -
For a pushed join, the output from
EXPLAIN FORMAT=TREE
did not indicate whether the table access was an index range scan returning multiple rows, or a single-row lookup on a primary or unique key.This fix provides also a minor optimization, such that the handler interface is not accessed more than once in an attempt to return more than a single row if the access type is known to be
Unique
. (Bug #31123930) -
A previous change (made in NDB 8.0.20) made it possible for a pushed join on tables allowing
READ_BACKUP
to place two SPJ workers on the data node local to theDBTC
block while placing no SPJ workers on some other node; this sometime imbalance is intentional, as the SPJ workload (and possible introduced imbalance) is normally quite low compared to the gains of enabling more local reads of the backup fragments. As an unintended side effect of the same change, these two colocated SPJ workers might scan the same subset of fragments in parallel; this broke an assumption in theDBSPJ
block that only a single SPJ worker is instantiated on each data node on which the logic for insuring that each SPJ worker starts its scans from a different fragment depends.To fix this problem, the starting fragment for each SPJ worker is now calculated based on the root fragment ID from which the worker starts, which is unique among all SPJ workers even when some of them reside on the same node. (Bug #31113005)
References: See also: Bug #30639165.
-
When upgrading a cluster from NDB 8.0.17 or earlier to 8.0.18 or later, data nodes not yet upgraded could shut down unexpectedly following upgrade of the management server (or management servers) to the new software version. This occurred when a management client
STOP
command was sent to one or more of the data nodes still running the old version and the new master node (also running the old version of theNDB
software) subsequently underwent an unplanned shutdown.It was found that this occurred due to setting the signal length and number of signal sections incorrectly when sending a
GSN_STOP_REQ
—one of a number of signals whose length has been increased in NDB 8.0 as part of work done to support greater numbers of data nodes—to the new master. This happened due to the use of stale data retained from sending aGSN_STOP_REQ
to the previous master node. To prevent this from happening, ndb_mgmd now sets the signal length and number of sections explicitly each time, prior to sending aGSN_STOP_REQ
signal. (Bug #31019990) In some cases, when failures occurred while replaying logs and restoring tuples, ndb_restore terminated instead of returning an error. In addition, the number of retries to be attempted for some operations was determined by hard-coded values. (Bug #30928114)
-
During schema distribution, if the client was killed after a DDL operation was already logged in the
ndb_schema
table, but before the participants could reply, the client simply marked all participants as failed in theNDB_SCHEMA_OBJECT
and returned. Since the distribution protocol was already in progress, the coordinator continued to wait for the participants, received theirndb_schema_result
insert and processed them; meanwhile, the client was open to send another DDL operation; if one was executed and distribution of it was begun before the coordinator could finish processing the previous schema change, this triggered an assertion there should be only one distribution of a schema operation active at any given time.In addition, when the client returned having detected a thread being killed, it also released the global schema lock (GSL); this could also lead to undefined issues since the participant could make the changes under the assumption that the GSL was still being held by the coordinator.
In such cases, the client should not return after the DDL operation has been logged in the
ndb_schema
table; from this point, the coordinator has control and the client should wait for it to make a decision. Now the coordinator aborts the distribution only in the event of a server or cluster shutdown, and otherwise waits for all participants either to reply, or to time out and mark the schema operation as completed. (Bug #30684839) When, during a restart, a data node received a
GCP_SAVEREQ
signal prior to beginning start phase 9, and thus needed to perform a global checkpoint index write to a local data manager's local checkpoint control file, it did not record information from theDIH
block originating with the node that sent the signal as part of the data written. This meant that, later in start phase 9, when attempting to send aGCP_SAVECONF
signal in response to theGCP_SAVEREQ
, this information was not available, which meant the response could not be sent, resulting in an unplanned shutdown of the data node. (Bug #30187949)-
Setting
EnableRedoControl
tofalse
did not fully disableMaxDiskWriteSpeed
,MaxDiskWriteSpeedOtherNodeRestart
, andMaxDiskWriteSpeedOwnRestart
as expected. (Bug #29943227)References: See also: Bug #31337487.
-
A
BLOB
value is stored byNDB
in multiple parts; when reading such a value, one read operation is executed per part. If a part is not found, the read fails with a row not found error, which indicates a corruptedBLOB
, since aBLOB
should never have any missing parts. A problem can arise because this error is reported as the overall result of the read operation, which means that mysqld sees no error and reports zero rows returned.This issue is fixed by adding a check specifically for the case in wich a blob part is not found. Now, when this occurs, overwriting the row not found error with corrupted blob, which causes the originating
SELECT
statement to fail as expected. Users of the NDB API should be aware that, despite this change, theNdbBlob::getValue()
method continues to report the error as row not found in such cases. (Bug #28590428) Data nodes did not start when the
RealtimeScheduler
configuration parameter was set to 1. This was due to the fact that index builds during startup are performed by temporarily diverting some I/O threads for use as index building threads, and these threads inherited the realtime properties of the I/O threads. This caused a conflict (treated as a fatal error) when index build thread specifications were checked to ensure that they were not realtime threads. This is fixed by making sure that index build threads are not treated as realtime threads regardless of any settings applying to their host I/O threads, which is as actually intended in their design. (Bug #27533538)Using an in-place
ALTER TABLE
to drop an index could lead to the unplanned shutdown of an SQL node. (Bug #24444899)As the final step when executing
ALTER TABLE ... ALGORITHM=INPLACE
,NDBCLUSTER
performed a read of the table metadata from theNDB
dictionary, requiring an extra round trip between the SQL nodes and data nodes, which unnecessarily both slowed down execution of the statement and provided an avenue for errors whichNDBCLUSTER
was not prepared to handle correctly. This issue is fixed by removing the read ofNDB
table metadata during the final phase of executing an in-placeALTER TABLE
statement. (Bug #99898, Bug #31497026)A memory leak could occur when preparing an
NDB
table for an in-placeALTER TABLE
. (Bug #99739, Bug #31419144)Added the
AllowUnresolvedHostNames
configuration parameter. When set totrue
, this parameter overrides the fatal error normally raised when ndb_mgmd cannot connect to a given host name, allowing startup to continue and generating only a warning instead. To be effective, the parameter must be set in the cluster global configuration file's[tcp default]
section. (WL #13860)