When performing an NDB backup, the
ndbinfo.logbufferstable now displays information regarding buffer usage by the backup process on each data node. This is implemented as rows reflecting two new log types in addition to
DD-UNDO. One of these rows has the log type
BACKUP-DATA, which shows the amount of data buffer used during backup to copy fragments to backup files. The other row has the log type
BACKUP-LOG, which displays the amount of log buffer used during the backup to record changes made after the backup has started. One each of these
log_typerows is shown in the
logbufferstable for each data node in the cluster. Rows having these two log types are present in the table only while an NDB backup is currently in progress. (Bug #25822988)
--logbuffer-sizeoption for ndbd and ndbmtd, for use in debugging with a large number of log messages. This controls the size of the data node log buffer; the default (32K) is intended for normal operations. (Bug #89679, Bug #27550943)
The previously experimental shared memory (SHM) transporter is now supported in production. SHM works by transporting signals through writing them into memory, rather than on a socket. NDB already attempts to use SHM automatically between a data node and an API node sharing the same host. To enable explicit shared memory connections, set the
UseShmconfiguration parameter to
1for the relevant data node. When explicitly defining shared memory as the connection method, it is also necessary that the data node is identified by
HostNameand the API node by
Additional tuning parameters such as
SendBufferMemorycan be employed to improve performance of the SHM transporter. Configuration of SHM is otherwise similar to that of the TCP transporter. The
SigNumparameter is no longer used, and any settings made for it are now ignored. NDB Cluster Shared Memory Connections, provides more information about these parameters.
In addition, as part of this work,
NDBcode relating to support for the legacy SCI transporter, which had long been unsupported, has been removed. See
www.dolphinics.comfor information about support for legacy SCI hardware or information about the newer Dolphin Express hardware.
SPJkernel block now takes into account when it is evaluating a join request in which at least some of the tables are used in inner joins. This means that
SPJcan eliminate requests for rows or ranges as soon as it becomes known that a preceding request did not return any results for a parent row. This saves both the data nodes and the
SPJblock from having to handle requests and result rows which never take part in a result row from an inner join.Note
When upgrading from NDB 7.6.5 or earlier, you should be aware that this optimization depends on both API client and data node functionality, and so is not available until all of these have been upgraded.
The poll receiver which
NDBuses to read from sockets, execute messages from the sockets, and wake up other threads now offloads wakeup of other threads to a new thread that wakes up the other threads on request, and otherwise simply sleeps. This improves the scalability of a single cluster connection by keeping the receive thread from becoming overburdened by tasks including wakeup of other threads.
NDB Cluster APIs: A previous fix for an issue, in which the failure of multiple data nodes during a partial restart could cause API nodes to fail, did not properly check the validity of the associated
NdbReceiverobject before proceeding. Now in such cases an invalid object triggers handling for invalid signals, rather than a node failure. (Bug #25902137)
References: This issue is a regression of: Bug #25092498.
NDB Cluster APIs: Incorrect results, usually an empty result set, were returned when
setBound()was used to specify a
NULLbound. This issue appears to have been caused by a problem in gcc, limited to cases using the old version of this method (which does not employ
NdbRecord), and is fixed by rewriting the problematic internal logic in the old implementation. (Bug #89468, Bug #27461752)
NDB Cluster APIs: Released NDB API objects are kept in one or more
Ndb_free_liststructures for later reuse. Each list also keeps track of all objects seized from it, and makes sure that these are eventually released back to it. In the event that the internal function
NdbScanOperation::init()failed, it was possible for an
NdbApiSignalalready allocated by the
NdbOperationto be leaked. Now in such cases,
NdbScanOperation::release()is called to release any objects allocated by the failed
NdbScanOperationbefore it is returned to the free list.
This fix also handles a similar issue with
NdbOperation::init(), where a failed call could also leak a signal. (Bug #89249, Bug #27389894)
In some circumstances, when a transaction was aborted in the
DBTCblock, there remained links to trigger records from operation records which were not yet reference-counted, but when such an operation record was released the trigger reference count was still decremented. (Bug #27629680)
NDBonline backup consists of data, which is fuzzy, and a redo and undo log. To restore to a consistent state it is necessary to ensure that the log contains all of the changes spanning the capture of the fuzzy data portion and beyond to a consistent snapshot point. This is achieved by waiting for a GCI boundary to be passed after the capture of data is complete, but before stopping change logging and recording the stop GCI in the backup's metadata.
At restore time, the log is replayed up to the stop GCI, restoring the system to the state it had at the consistent stop GCI. A problem arose when, under load, it was possible to select a GCI boundary which occurred too early and did not span all the data captured. This could lead to inconsistencies when restoring the backup; these could be noticed as broken constraints or corrupted
Now the stop GCI is chosen is so that it spans the entire duration of the fuzzy data capture process, so that the backup log always contains all data within a given stop GCI. (Bug #27497461)
References: See also: Bug #27566346.
NDBtables, when a foreign key was added or dropped as a part of a DDL statement, the foreign key metatdata for all parent tables referenced should be reloaded in the handler on all SQL nodes connected to the cluster, but this was done only on the mysqld on which the statement was executed. Due to this, any subsequent queries relying on foreign key metadata from the corresponding parent tables could return inconsistent results. (Bug #27439587)
References: See also: Bug #82989, Bug #24666177.
ANALYZE TABLEused excessive amounts of CPU on large, low-cardinality tables. (Bug #27438963)
Queries using very large lists with
INwere not handled correctly, which could lead to data node failures. (Bug #27397802)
References: See also: Bug #28728603.
A data node overload could in some situations lead to an unplanned shutdown of the data node, which led to all data nodes disconnecting from the management and nodes.
This was due to a situation in which
API_FAILREQwas not the last received signal prior to the node failure.
As part of this fix, the transaction coordinator's handling of
SCAN_TABREQsignals for an
ApiConnectRecordin an incorrect state was also improved. (Bug #27381901)
References: See also: Bug #47039, Bug #11755287.
In a two-node cluster, when the node having the lowest ID was started using
--nostart, API clients could not connect, failing with Could not alloc node id at HOST port PORT_NO: No free node id found for mysqld(API). (Bug #27225212)
MaxNoOfExecutionThreadswithout an initial system restart led to an unplanned data node shutdown. (Bug #27169282)
References: This issue is a regression of: Bug #26908347, Bug #26968613.
Race conditions sometimes occurred during asynchronous disconnection and reconnection of the transporter while other threads concurrently inserted signal data into the send buffers, leading to an unplanned shutdown of the cluster.
As part of the work fixing this issue, the internal templating function used by the Transporter Registry when it prepares a send is refactored to use likely-or-unlikely logic to speed up execution, and to remove a number of duplicate checks for NULL. (Bug #24444908, Bug #25128512)
References: See also: Bug #20112700.
ndb_restore sometimes logged data file and log file progress values much greater than 100%. (Bug #20989106)
The internal function
BitmaskImpl::setRange()set one bit fewer than specified. (Bug #90648, Bug #27931995)
It was not possible to create an
FOR_RA_BY_LDM_X_4. (Bug #89811, Bug #27602352)
References: This issue is a regression of: Bug #81759, Bug #23544301.
[shm]section to the global configuration file for a cluster with multiple data nodes caused default TCP connections to be lost to the node using the single section. (Bug #89627, Bug #27532407)
As a result of the reuse of code intended for send threads when performing an assist send, all of the local release send buffers were released to the global pool, which caused the intended level of the local send buffer pool to be ignored. Now send threads and assisting worker threads follow their own policies for maintaining their local buffer pools. (Bug #89119, Bug #27349118)
When sending priority A signals, we now ensure that the number of pending signals is explicitly initialized. (Bug #88986, Bug #27294856)
In a MySQL Cluster with one MySQL Server configured to write a binary log failure occurred when creating and using an
NDBtable with non-stored generated columns. The problem arose only when the product was built with debugging support. (Bug #86084, Bug #25957586)
When the internal function
ha_ndbcluster::copy_fk_for_offline_alter()checked dependent objects on a table from which it was supposed to drop a foreign key, it did not perform any filtering for foreign keys, making it possible for it to attempt retrieval of an index or trigger instead, leading to a spurious Error 723 (No such table).