Incompatible Change; NDB Disk Data: Due to changes in disk file formats, it is necessary to perform an
--initial
restart of each data node when upgrading to or downgrading from this release.-
Important Change; NDB Disk Data: NDB Cluster has improved node restart times and overall performance with larger data sets by implementing partial local checkpoints (LCPs). Prior to this release, an LCP always made a copy of the entire database.
NDB
now supports LCPs that write individual records, so it is no longer strictly necessary for an LCP to write the entire database. Since, at recovery, it remains necessary to restore the database fully, the strategy is to save one fourth of all records at each LCP, as well as to write the records that have changed since the last LCP.Two data node configuration parameters relating to this change are introduced in this release:
EnablePartialLcp
(defaulttrue
, or enabled) enables partial LCPs. When partial LCPs are enabled,RecoveryWork
controls the percentage of space given over to LCPs; it increases with the amount of work which must be performed on LCPs during restarts as opposed to that performed during normal operations. Raising this value causes LCPs during normal operations to require writing fewer records and so decreases the usual workload. Raising this value also means that restarts can take longer.ImportantUpgrading to NDB 7.6.4 or downgrading from this release requires purging then re-creating the
NDB
data node file system, which means that an initial restart of each data node is needed. An initial node restart still requires a complete LCP; a partial LCP is not used for this purpose.A rolling restart or system restart is a normal part of an
NDB
software upgrade. When such a restart is performed as part of an upgrade to NDB 7.6.4 or later, any existing LCP files are checked for the presence of the LCPsysfile
, indicating that the existing data node file system was written using NDB 7.6.4 or later. If such a node file system exists, but does not contain thesysfile
, and if any data nodes are restarted without the--initial
option,NDB
causes the restart to fail with an appropriate error message. This detection can be performed only as part of an upgrade; it is not possible to do so as part of a downgrade to NDB 7.6.3 or earlier from a later release.Exception: If there are no data node files—that is, in the event of a “clean” start or restart—using
--initial
is not required for a software upgrade, since this is already equivalent to an initial restart. (This aspect of restarts is unchanged from previous releases of NDB Cluster.)In addition, the default value for
StartPartitionedTimeout
is changed from 60000 to 0.This release also deprecates the data node configuration parameters
BackupDataBufferSize
,BackupWriteSize
, andBackupMaxWriteSize
; these are now subject to removal in a future NDB Cluster version. (Bug #27308632, WL #8069, WL #10302, WL #10993) -
Important Change: Added the ndb_perror utility for obtaining information about NDB Cluster error codes. This tool replaces perror
--ndb
; the--ndb
option for perror is now deprecated and raises a warning when used; the option is subject to removal in a future NDB version.See ndb_perror — Obtain NDB Error Message Information, for more information. (Bug #81703, Bug #81704, Bug #23523869, Bug #23523926)
References: See also: Bug #26966826, Bug #88086.
-
NDB Client Programs: NDB Cluster Auto-Installer node configuration parameters as supported in the UI and accompanying documentation were in some cases hard coded to an arbitrary value, or were missing altogether. Configuration parameters, their default values, and the documentation have been better aligned with those found in release versions of the NDB Cluster software.
One necessary addition to this task was implementing the mechanism which the Auto-Installer now provides for setting parameters that take discrete values. For example, the value of the data node parameter
Arbitration
must now be one ofDefault
,Disabled
, orWaitExternal
.The Auto-Installer also now gets and uses the amount of disk space available to
NDB
on each host for deriving reasonable default values for configuration parameters which depend on this value.See The NDB Cluster Auto-Installer (NDB 7.5) (NO LONGER SUPPORTED), for more information. (WL #10340, WL #10408, WL #10449)
-
NDB Client Programs: Secure connection support in the MySQL NDB Cluster Auto-Installer has been updated or improved in this release as follows:
Added a mechanism for setting SSH membership on a per-host basis.
Updated the Paramiko Python module to the most recent available version (2.6.1).
Provided a place in the GUI for encrypted private key passwords, and discontinued use of hardcoded passwords.
Related enhancements implemented in the current release include the following:
Discontinued use of cookies as a persistent store for NDB Cluster configuration information; these were not secure and came with a hard upper limit on storage. Now the Auto-Installer uses an encrypted file for this purpose.
In order to secure data transfer between the web browser front end and the back end web server, the default communications protocol has been switched from HTTP to HTTPS.
See The NDB Cluster Auto-Installer (NDB 7.5) (NO LONGER SUPPORTED), for more information. (WL #10426, WL #11128, WL #11289)
MySQL NDB ClusterJ: ClusterJ now supports CPU binding for receive threads through the setRecvThreadCPUids() and getRecvThreadCPUids() methods. Also, the receive thread activation threshold can be set and get with the setRecvThreadActivationThreshold() and getRecvThreadActivationThreshold() methods. (WL #10815)
-
It is now possible to specify a set of cores to be used for I/O threads performing offline multithreaded builds of ordered indexes, as opposed to normal I/O duties such as file I/O, compression, or decompression. “Offline” in this context refers to building of ordered indexes performed when the parent table is not being written to; such building takes place when an
NDB
cluster performs a node or system restart, or as part of restoring a cluster from backup using ndb_restore--rebuild-indexes
.In addition, the default behaviour for offline index build work is modified to use all cores available to ndbmtd, rather limiting itself to the core reserved for the I/O thread. Doing so can improve restart and restore times and performance, availability, and the user experience.
This enhancement is implemented as follows:
The default value for
BuildIndexThreads
is changed from 0 to 128. This means that offline ordered index builds are now multithreaded by default.The default value for
TwoPassInitialNodeRestartCopy
is changed fromfalse
totrue
. This means that an initial node restart first copies all data from a “live” node to one that is starting—without creating any indexes—builds ordered indexes offline, and then again synchronizes its data with the live node, that is, synchronizing twice and building indexes offline between the two synchonizations. This causes an initial node restart to behave more like the normal restart of a node, and reduces the time required for building indexes.A new thread type (
idxbld
) is defined for theThreadConfig
configuration parameter, to allow locking of offline index build threads to specific CPUs.
In addition,
NDB
now distinguishes the thread types that are accessible to “ThreadConfig” by the following two criteria:Whether the thread is an execution thread. Threads of types
main
,ldm
,recv
,rep
,tc
, andsend
are execution threads; thread typesio
,watchdog
, andidxbld
are not.Whether the allocation of the thread to a given task is permanent or temporary. Currently all thread types except
idxbld
are permanent.
For additonal information, see the descriptions of the parameters in the Manual. (Bug #25835748, Bug #26928111)
-
Added the
ODirectSyncFlag
configuration parameter for data nodes. When enabled, the data node treats all completed filesystem writes to the redo log as though they had been performed usingfsync
.NoteThis parameter has no effect if at least one of the following conditions is true:
ODirect
is not enabled.InitFragmentLogFiles
is set toSPARSE
.
(Bug #25428560)
-
Added the
ndbinfo.error_messages
table, which provides information about NDB Cluster errors, including error codes, status types, brief descriptions, and classifications. This makes it possible to obtain error information using SQL in the mysql client (or other MySQL client program), like this:mysql> SELECT * FROM ndbinfo.error_messages WHERE error_code='321'; +------------+----------------------+-----------------+----------------------+ | error_code | error_description | error_status | error_classification | +------------+----------------------+-----------------+----------------------+ | 321 | Invalid nodegroup id | Permanent error | Application error | +------------+----------------------+-----------------+----------------------+ 1 row in set (0.00 sec)
The query just shown provides equivalent information to that obtained by issuing ndb_perror 321 or (now deprecated) perror --ndb 321 on the command line. (Bug #86295, Bug #26048272)
ThreadConfig
now has an additionalnosend
parameter that can be used to prevent amain
,ldm
,rep
, ortc
thread from assisting the send threads, by setting this parameter to 1 for the given thread. By default,nosend
is 0. It cannot be used with threads other than those of the types just listed. (WL #11554)-
When executing a scan as a pushed join, all instances of
DBSPJ
were involved in the execution of a single query; some of these received multiple requests from the same query. This situation is improved by enabling a single SPJ request to handle a set of root fragments to be scanned, such that only a single SPJ request is sent to eachDBSPJ
instance on each node and batch sizes are allocated per fragment, the multi-fragment scan can obtain a larger total batch size, allowing for some scheduling optimizations to be done withinDBSPJ
, which can scan a single fragment at a time (giving it the total batch size allocation), scan all fragments in parallel using smaller sub-batches, or some combination of the two.Since the effect of this change is generally to require fewer SPJ requests and instances, performance of pushed-down joins should be improved in many cases. (WL #10234)
-
As part of work ongoing to optimize bulk DDL performance by ndbmtd, it is now possible to obtain performance improvements by increasing the batch size for the bulk data parts of DDL operations which process all of the data in a fragment or set of fragments using a scan. Batch sizes are now made configurable for unique index builds, foreign key builds, and online reorganization, by setting the respective data node configuration parameters listed here:
MaxFKBuildBatchSize
: Maximum scan batch size used for building foreign keys.MaxReorgBuildBatchSize
: Maximum scan batch size used for reorganization of table partitions.MaxUIBuildBatchSize
: Maximum scan batch size used for building unique keys.
For each of the parameters just listed, the default value is 64, the minimum is 16, and the maximum is 512.
Increasing the appropriate batch size or sizes can help amortize inter-thread and inter-node latencies and make use of more parallel resources (local and remote) to help scale DDL performance. (WL #11158)
-
Formerly, the data node
LGMAN
kernel block processed undo log records serially; now this is done in parallel. Therep
thread, which hands off undo records to local data handler (LDM) threads, waited for an LDM to finish applying a record before fetching the next one; now therep
thread no longer waits, but proceeds immediately to the next record and LDM.There are no user-visible changes in functionality directly associated with this work; this performance enhancement is part of the work being done in NDB 7.6 to improve undo long handling for partial local checkpoints. (WL #8478)
-
When applying an undo log the table ID and fragment ID are obtained from the page ID. This was done by reading the page from
PGMAN
using an extraPGMAN
worker thread, but when applying the undo log it was necessary to read the page again.This became very inefficient when using
O_DIRECT
(seeODirect
) since the page was not cached in the OS kernel.Mapping from page ID to table ID and fragment ID is now done using information the extent header contains about the table IDs and fragment IDs of the pages used in a given extent. Since the extent pages are always present in the page cache, no extra disk reads are required to perform the mapping, and the information can be read using existing
TSMAN
data structures. (WL #10194) Added the
NODELOG DEBUG
command in the ndb_mgm client to provide runtime control over data node debug logging.NODE DEBUG ON
causes a data node to write extra debugging information to its node log, the same as if the node had been started with--verbose
.NODELOG DEBUG OFF
disables the extra logging. (WL #11216)-
Added the
LocationDomainId
configuration parameter for management, data, and API nodes. When using NDB Cluster in a cloud environment, you can set this parameter to assign a node to a given availability domain or availability zone. This can improve performance in the following ways:If requested data is not found on the same node, reads can be directed to another node in the same availability domain.
Communication between nodes in different availability domains are guaranteed to use
NDB
transporters' WAN support without any further manual intervention.The transporter's group number can be based on which availability domain is used, such that also SQL and other API nodes communicate with local data nodes in the same availability domain whenever possible.
The arbitrator can be selected from an availability domain in which no data nodes are present, or, if no such availability domain can be found, from a third availability domain.
This parameter takes an integer value between 0 and 16, with 0 being the default; using 0 is the same as leaving
LocationDomainId
unset. (WL #10172)
-
Important Change: The
--passwd
option for ndb_top is now deprecated. It is removed (and replaced with--password
) in NDB 7.6.5. (Bug #88236, Bug #20733646)References: See also: Bug #86615, Bug #26236320, Bug #26907833.
-
NDB Disk Data: An
ALTER TABLE
that switched the table storage format betweenMEMORY
andDISK
was always performed in place for all columns. This is not correct in the case of a column whose storage format is inherited from the table; the column's storage type is not changed.For example, this statement creates a table
t1
whose columnc2
uses in-memory storage since the table does so implicitly:CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT) ENGINE NDB;
The
ALTER TABLE
statement shown here is expected to causec2
to be stored on disk, but failed to do so:ALTER TABLE t1 STORAGE DISK TABLESPACE ts1;
Similarly, an on-disk column that inherited its storage format from the table to which it belonged did not have the format changed by
ALTER TABLE ... STORAGE MEMORY
.These two cases are now performed as a copying alter, and the storage format of the affected column is now changed. (Bug #26764270)
-
ndbinfo Information Database: Counts of committed rows and committed operations per fragment used by some tables in
ndbinfo
were taken from theDBACC
block, but due to the fact that commit signals can arrive out of order, transient counter values could be negative. This could happen if, for example, a transaction contained several interleaved insert and delete operations on the same row; in such cases, commit signals for delete operations could arrive before those for the corresponding insert operations, leading to a failure inDBACC
.This issue is fixed by using the counts of committed rows which are kept in
DBTUP
, which do not have this problem. (Bug #88087, Bug #26968613) Errors in parsing
NDB_TABLE
modifiers could cause memory leaks. (Bug #26724559)Added
DUMP
code 7027 to facilitate testing of issues relating to local checkpoints. For more information, see DUMP 7027. (Bug #26661468)-
A previous fix intended to improve logging of node failure handling in the transaction coordinator included logging of transactions that could occur in normal operation, which made the resulting logs needlessly verbose. Such normal transactions are no longer written to the log in such cases. (Bug #26568782)
References: This issue is a regression of: Bug #26364729.
Due to a configuration file error, CPU locking capability was not available on builds for Linux platforms. (Bug #26378589)
Some
DUMP
codes used for theLGMAN
kernel block were incorrectly assigned numbers in the range used for codes belonging toDBTUX
. These have now been assigned symbolic constants and numbers in the proper range (10001, 10002, and 10003). (Bug #26365433)-
Node failure handling in the
DBTC
kernel block consists of a number of tasks which execute concurrently, and all of which must complete before TC node failure handling is complete. This fix extends logging coverage to record when each task completes, and which tasks remain, includes the following improvements:Handling interactions between GCP and node failure handling interactions, in which TC takeover causes GCP participant stall at the master TC to allow it to extend the current GCI with any transactions that were taken over; the stall can begin and end in different GCP protocol states. Logging coverage is extended to cover all scenarios. Debug logging is now more consistent and understandable to users.
Logging done by the
QMGR
block as it monitors duration of node failure handling duration is done more frequently. A warning log is now generated every 30 seconds (instead of 1 minute), and this now includesDBDIH
block debug information (formerly this was written separately, and less often).To reduce space used,
DBTC instance
is shortened tonumber
:DBTC
.number
:A new error code is added to assist testing.
(Bug #26364729)
-
During a restart,
DBLQH
loads redo log part metadata for each redo log part it manages, from one or more redo log files. Since each file has a limited capacity for metadata, the number of files which must be consulted depends on the size of the redo log part. These files are opened, read, and closed sequentially, but the closing of one file occurs concurrently with the opening of the next.In cases where closing of the file was slow, it was possible for more than 4 files per redo log part to be open concurrently; since these files were opened using the
OM_WRITE_BUFFER
option, more than 4 chunks of write buffer were allocated per part in such cases. The write buffer pool is not unlimited; if all redo log parts were in a similar state, the pool was exhausted, causing the data node to shut down.This issue is resolved by avoiding the use of
OM_WRITE_BUFFER
during metadata reload, so that any transient opening of more than 4 redo log files per log file part no longer leads to failure of the data node. (Bug #25965370) Following
TRUNCATE TABLE
on anNDB
table, itsAUTO_INCREMENT
ID was not reset on an SQL node not performing binary logging. (Bug #14845851)-
A join entirely within the materialized part of a semijoin was not pushed even if it could have been. In addition,
EXPLAIN
provided no information about why the join was not pushed. (Bug #88224, Bug #27022925)References: See also: Bug #27067538.
-
When the duplicate weedout algorithm was used for evaluating a semijoin, the result had missing rows. (Bug #88117, Bug #26984919)
References: See also: Bug #87992, Bug #26926666.
A table used in a loose scan could be used as a child in a pushed join query, leading to possibly incorrect results. (Bug #87992, Bug #26926666)
When representing a materialized semijoin in the query plan, the MySQL Optimizer inserted extra
QEP_TAB
andJOIN_TAB
objects to represent access to the materialized subquery result. The join pushdown analyzer did not properly set up its internal data structures for these, leaving them uninitialized instead. This meant that later usage of any item objects referencing the materialized semijoin accessed an initializedtableno
column when accessing a 64-bittableno
bitmask, possibly referring to a point beyond its end, leading to an unplanned shutdown of the SQL node. (Bug #87971, Bug #26919289)In some cases, a
SCAN_FRAGCONF
signal was received after aSCAN_FRAGREQ
with a close flag had already been sent, clearing the timer. When this occurred, the nextSCAN_FRAGREF
to arrive caused time tracking to fail. Now in such cases, a check for a cleared timer is performed prior to processing theSCAN_FRAGREF
message. (Bug #87942, Bug #26908347)-
While deleting an element in
Dbacc
, or moving it during hash table expansion or reduction, the method used (getLastAndRemove()
) could return a reference to a removed element on a released page, which could later be referenced from the functions calling it. This was due to a change brought about by the implementation of dynamic index memory in NDB 7.6.2; previously, the page had always belonged to a singleDbacc
instance, so accessing it was safe. This was no longer the case following the change; a page released inDbacc
could be placed directly into the global page pool where any other thread could then allocate it.Now we make sure that newly released pages in
Dbacc
are kept within the currentDbacc
instance and not given over directly to the global page pool. In addition, the reference to a released page has been removed; the affected internal method now returns the last element by value, rather than by reference. (Bug #87932, Bug #26906640)References: See also: Bug #87987, Bug #26925595.
-
The
DBTC
kernel block could receive aTCRELEASEREQ
signal in a state for which it was unprepared. Now it such cases it responds with aTCRELEASECONF
message, and subsequently behaves just as if the API connection had failed. (Bug #87838, Bug #26847666)References: See also: Bug #20981491.
-
When a data node was configured for locking threads to CPUs, it failed during startup with Failed to lock tid.
This was is a side effect of a fix for a previous issue, which disabled CPU locking based on the version of the available
glibc
. The specificglibc
issue being guarded against is encountered only in response to an internal NDB API call (Ndb_UnlockCPU()
) not used by data nodes (and which can be accessed only through internal API calls). The current fix enables CPU locking for data nodes and disables it only for the relevant API calls when an affectedglibc
version is used. (Bug #87683, Bug #26758939)References: This issue is a regression of: Bug #86892, Bug #26378589.
ndb_top failed to build on platforms where the
ncurses
library did not definestdscr
. Now these platforms require thetinfo
library to be included. (Bug #87185, Bug #26524441)-
On completion of a local checkpoint, every node sends a
LCP_COMPLETE_REP
signal to every other node in the cluster; a node does not consider the LCP complete until it has been notified that all other nodes have sent this signal. Due to a minor flaw in the LCP protocol, if this message was delayed from another node other than the master, it was possible to start the next LCP before one or more nodes had completed the one ongoing; this caused problems withLCP_COMPLETE_REP
signals from previous LCPs becoming mixed up with such signals from the current LCP, which in turn led to node failures.To fix this problem, we now ensure that the previous LCP is complete before responding to any
TCGETOPSIZEREQ
signal initiating a new LCP. (Bug #87184, Bug #26524096) NDB Cluster did not compile successfully when the build used
WITH_UNIT_TESTS=OFF
. (Bug #86881, Bug #26375985)Recent improvements in local checkpoint handling that use
OM_CREATE
to open files did not work correctly on Windows platforms, where the system tried to create a new file and failed if it already existed. (Bug #86776, Bug #26321303)-
A potential hundredfold signal fan-out when sending a
START_FRAG_REQ
signal could lead to a node failure due to a job buffer full error in start phase 5 while trying to perform a local checkpoint during a restart. (Bug #86675, Bug #26263397)References: See also: Bug #26288247, Bug #26279522.
Compilation of NDB Cluster failed when using
-DWITHOUT_SERVER=1
to build only the client libraries. (Bug #85524, Bug #25741111)The
NDBFS
block'sOM_SYNC
flag is intended to make sure that all FSWRITEREQ signals used for a given file are synchronized, but was ignored by platforms that do not supportO_SYNC
, meaning that this feature did not behave properly on those platforms. Now the synchronization flag is used on those platforms that do not supportO_SYNC
. (Bug #76975, Bug #21049554)