When performing an NDB backup, the
ndbinfo.logbuffers
table now displays information regarding buffer usage by the backup process on each data node. This is implemented as rows reflecting two new log types in addition toREDO
andDD-UNDO
. One of these rows has the log typeBACKUP-DATA
, which shows the amount of data buffer used during backup to copy fragments to backup files. The other row has the log typeBACKUP-LOG
, which displays the amount of log buffer used during the backup to record changes made after the backup has started. One each of theselog_type
rows is shown in thelogbuffers
table for each data node in the cluster. Rows having these two log types are present in the table only while an NDB backup is currently in progress. (Bug #25822988)Added the
--logbuffer-size
option for ndbd and ndbmtd, for use in debugging with a large number of log messages. This controls the size of the data node log buffer; the default (32K) is intended for normal operations. (Bug #89679, Bug #27550943)-
The previously experimental shared memory (SHM) transporter is now supported in production. SHM works by transporting signals through writing them into memory, rather than on a socket. NDB already attempts to use SHM automatically between a data node and an API node sharing the same host. To enable explicit shared memory connections, set the
UseShm
configuration parameter to1
for the relevant data node. When explicitly defining shared memory as the connection method, it is also necessary that the data node is identified byHostName
and the API node byHostName
.Additional tuning parameters such as
ShmSize
,ShmSpintime
, andSendBufferMemory
can be employed to improve performance of the SHM transporter. Configuration of SHM is otherwise similar to that of the TCP transporter. TheSigNum
parameter is no longer used, and any settings made for it are now ignored. NDB Cluster Shared Memory Connections, provides more information about these parameters.In addition, as part of this work,
NDB
code relating to support for the legacy SCI transporter, which had long been unsupported, has been removed. Seewww.dolphinics.com
for information about support for legacy SCI hardware or information about the newer Dolphin Express hardware. (WL #7512) -
The
SPJ
kernel block now takes into account when it is evaluating a join request in which at least some of the tables are used in inner joins. This means thatSPJ
can eliminate requests for rows or ranges as soon as it becomes known that a preceding request did not return any results for a parent row. This saves both the data nodes and theSPJ
block from having to handle requests and result rows which never take part in a result row from an inner join.NoteWhen upgrading from NDB 7.6.5 or earlier, you should be aware that this optimization depends on both API client and data node functionality, and so is not available until all of these have been upgraded.
(WL #11164)
The poll receiver which
NDB
uses to read from sockets, execute messages from the sockets, and wake up other threads now offloads wakeup of other threads to a new thread that wakes up the other threads on request, and otherwise simply sleeps. This improves the scalability of a single cluster connection by keeping the receive thread from becoming overburdened by tasks including wakeup of other threads. (WL #9663)
-
Important Change; NDB Client Programs: ndb_top ignored short forms of command-line options, and did not in all cases handle misformed long options correctly. As part of the fix for these issues, the following changes have been made to command-line options used with ndb_top to bring them more into line with those used with other NDB Cluster and MySQL programs:
The
--passwd
option is removed, and replaced by--password
(short form-p
).The short form
-t
for the--port
option has been replaced by-P
.The short form
-x
for the--text
option has been replaced by-t
.
(Bug #26907833)
References: See also: Bug #88236, Bug #20733646.
-
NDB Cluster APIs: A previous fix for an issue, in which the failure of multiple data nodes during a partial restart could cause API nodes to fail, did not properly check the validity of the associated
NdbReceiver
object before proceeding. Now in such cases an invalid object triggers handling for invalid signals, rather than a node failure. (Bug #25902137)References: This issue is a regression of: Bug #25092498.
NDB Cluster APIs: Incorrect results, usually an empty result set, were returned when
setBound()
was used to specify aNULL
bound. This issue appears to have been caused by a problem in gcc, limited to cases using the old version of this method (which does not employNdbRecord
), and is fixed by rewriting the problematic internal logic in the old implementation. (Bug #89468, Bug #27461752)-
NDB Cluster APIs: Released NDB API objects are kept in one or more
Ndb_free_list
structures for later reuse. Each list also keeps track of all objects seized from it, and makes sure that these are eventually released back to it. In the event that the internal functionNdbScanOperation::init()
failed, it was possible for anNdbApiSignal
already allocated by theNdbOperation
to be leaked. Now in such cases,NdbScanOperation::release()
is called to release any objects allocated by the failedNdbScanOperation
before it is returned to the free list.This fix also handles a similar issue with
NdbOperation::init()
, where a failed call could also leak a signal. (Bug #89249, Bug #27389894) -
NDB Client Programs: ndb_top did not support a number of options common to most
NDB
programs. The following options are now supported:In addition, ndb_top now supports a
--socket
option (short form-S
) for specifying a socket file to use for the connection. (Bug #86614, Bug #26236298) MySQL NDB ClusterJ: ClusterJ quit unexpectedly as there was no error handling in the
scanIndex()
function of theClusterTransactionImpl
class for a null returned to it internally by thescanIndex()
method of thendbTransaction
class. (Bug #27297681, Bug #88989)In some circumstances, when a transaction was aborted in the
DBTC
block, there remained links to trigger records from operation records which were not yet reference-counted, but when such an operation record was released the trigger reference count was still decremented. (Bug #27629680)-
An
NDB
online backup consists of data, which is fuzzy, and a redo and undo log. To restore to a consistent state it is necessary to ensure that the log contains all of the changes spanning the capture of the fuzzy data portion and beyond to a consistent snapshot point. This is achieved by waiting for a GCI boundary to be passed after the capture of data is complete, but before stopping change logging and recording the stop GCI in the backup's metadata.At restore time, the log is replayed up to the stop GCI, restoring the system to the state it had at the consistent stop GCI. A problem arose when, under load, it was possible to select a GCI boundary which occurred too early and did not span all the data captured. This could lead to inconsistencies when restoring the backup; these could be noticed as broken constraints or corrupted
BLOB
entries.Now the stop GCI is chosen is so that it spans the entire duration of the fuzzy data capture process, so that the backup log always contains all data within a given stop GCI. (Bug #27497461)
References: See also: Bug #27566346.
-
For
NDB
tables, when a foreign key was added or dropped as a part of a DDL statement, the foreign key metatdata for all parent tables referenced should be reloaded in the handler on all SQL nodes connected to the cluster, but this was done only on the mysqld on which the statement was executed. Due to this, any subsequent queries relying on foreign key metadata from the corresponding parent tables could return inconsistent results. (Bug #27439587)References: See also: Bug #82989, Bug #24666177.
ANALYZE TABLE
used excessive amounts of CPU on large, low-cardinality tables. (Bug #27438963)-
Queries using very large lists with
IN
were not handled correctly, which could lead to data node failures. (Bug #27397802)References: See also: Bug #28728603.
-
A data node overload could in some situations lead to an unplanned shutdown of the data node, which led to all data nodes disconnecting from the management and nodes.
This was due to a situation in which
API_FAILREQ
was not the last received signal prior to the node failure.As part of this fix, the transaction coordinator's handling of
SCAN_TABREQ
signals for anApiConnectRecord
in an incorrect state was also improved. (Bug #27381901)References: See also: Bug #47039, Bug #11755287.
In a two-node cluster, when the node having the lowest ID was started using
--nostart
, API clients could not connect, failing with Could not alloc node id at HOST port PORT_NO: No free node id found for mysqld(API). (Bug #27225212)-
Changing
MaxNoOfExecutionThreads
without an initial system restart led to an unplanned data node shutdown. (Bug #27169282)References: This issue is a regression of: Bug #26908347, Bug #26968613.
-
Race conditions sometimes occurred during asynchronous disconnection and reconnection of the transporter while other threads concurrently inserted signal data into the send buffers, leading to an unplanned shutdown of the cluster.
As part of the work fixing this issue, the internal templating function used by the Transporter Registry when it prepares a send is refactored to use likely-or-unlikely logic to speed up execution, and to remove a number of duplicate checks for NULL. (Bug #24444908, Bug #25128512)
References: See also: Bug #20112700.
ndb_restore sometimes logged data file and log file progress values much greater than 100%. (Bug #20989106)
The internal function
BitmaskImpl::setRange()
set one bit fewer than specified. (Bug #90648, Bug #27931995)-
It was not possible to create an
NDB
table usingPARTITION_BALANCE
set toFOR_RA_BY_LDM_X_2
,FOR_RA_BY_LDM_X_3
, orFOR_RA_BY_LDM_X_4
. (Bug #89811, Bug #27602352)References: This issue is a regression of: Bug #81759, Bug #23544301.
Adding a
[tcp]
or[shm]
section to the global configuration file for a cluster with multiple data nodes caused default TCP connections to be lost to the node using the single section. (Bug #89627, Bug #27532407)As a result of the reuse of code intended for send threads when performing an assist send, all of the local release send buffers were released to the global pool, which caused the intended level of the local send buffer pool to be ignored. Now send threads and assisting worker threads follow their own policies for maintaining their local buffer pools. (Bug #89119, Bug #27349118)
When sending priority A signals, we now ensure that the number of pending signals is explicitly initialized. (Bug #88986, Bug #27294856)
In a MySQL Cluster with one MySQL Server configured to write a binary log failure occurred when creating and using an
NDB
table with non-stored generated columns. The problem arose only when the product was built with debugging support. (Bug #86084, Bug #25957586)ndb_restore
--print-data
--hex
did not print trailing 0s ofLONGVARBINARY
values. (Bug #65560, Bug #14198580)When the internal function
ha_ndbcluster::copy_fk_for_offline_alter()
checked dependent objects on a table from which it was supposed to drop a foreign key, it did not perform any filtering for foreign keys, making it possible for it to attempt retrieval of an index or trigger instead, leading to a spurious Error 723 (No such table).