MySQL Cluster NDB 7.1.32 is a new release of MySQL Cluster,
incorporating new features in the
NDBCLUSTER storage engine and
fixing recently discovered bugs in previous MySQL Cluster NDB
Obtaining MySQL Cluster NDB 7.1. The latest MySQL Cluster NDB 7.1 binaries for supported platforms can be obtained from http://dev.mysql.com/downloads/cluster/. Source code for the latest MySQL Cluster NDB 7.1 release can be obtained from the same location. You can also access the MySQL Cluster NDB 7.1 development source tree at https://code.launchpad.net/~mysql/mysql-server/mysql-cluster-7.1.
This release also incorporates all bugfixes and changes made in previous MySQL Cluster releases, as well as all bugfixes and feature changes which were added in mainline MySQL 5.1 through MySQL 5.1.73 (see Changes in MySQL 5.1.73 (2013-12-03)).
Functionality Added or Changed
Added as an aid to debugging the ability to specify a
human-readable name for a given
Ndb object and later to
retrieve it. These operations are implemented, respectively, as
To make tracing of event handling between a user application and
NDB easier, you can use the reference (from
getReference() followed by
the name (if provided) in printouts; the reference ties together
Ndb object, the event buffer,
NDB storage engine's
Processing a NODE_FAILREP signal that contained an invalid node ID could cause a data node to fail. (Bug #18993037, Bug #73015)
References: This bug is a regression of Bug #16007980.
ndbmtd supports multiple parallel receiver
threads, each of which performs signal reception for a subset of
the remote node connections (transporters) with the mapping of
remote_nodes to receiver threads decided at node startup.
Connection control is managed by the multi-instance
TRPMAN block, which is organized as a proxy
and workers, and each receiver thread has a
TRPMAN worker running locally.
QMGR block sends signals to
TRPMAN to enable and disable communications
with remote nodes. These signals are sent to the
TRPMAN proxy, which forwards them to the
workers. The workers themselves decide whether to act on
signals, based on the set of remote nodes they manage.
The current isuue arises because the mechanism used by the
TRPMAN workers for determining which
connections they are responsible for was implemented in such a
way that each worker thought it was responsible for all
connections. This resulted in the
CLOSE_COMREQ being processed multiple times.
The fix keeps
TRPMAN instances (receiver
CLOSE_COMREQ requests. In addition, the
TRPMAN instance is now chosen when
routing from this instance for a specific remote connection.
TABLE ... REORGANIZE PARTITION after increasing the
number of data nodes in the cluster from 4 to 16 led to a crash
of the data nodes. This issue was shown to be a regression
caused by previous fix which added a new dump handler using a
dump code that was already in use (7019), which caused the
command to execute two different handlers with different
semantics. The new handler was assigned a new
DUMP code (7024).
References: This bug is a regression of Bug #14220269.
A local checkpoint (LCP) is tracked using a global LCP state
c_lcpState), and each
NDB table has a status indicator
which indicates the LCP status of that table
tabLcpStatus). If the global LCP state is
LCP_STATUS_IDLE, then all the tables should
have an LCP status of
When an LCP starts, the global LCP status is
LCP_INIT_TABLES and the thread starts setting
NDB tables to
TLS_ACTIVE. If any tables are not ready for
LCP, the LCP initialization procedure continues with
CONTINUEB signals until all tables have
become available and been marked
When this initialization is complete, the global LCP status is
This bug occurred when the following conditions were met:
An LCP was in the
and some but not all tables had been set to
The master node failed before the global LCP state changed
LCP_STATUS_ACTIVE; that is, before the
LCP could finish processing all tables.
NODE_FAILREP signal resulting from
the node failure was processed before the final
CONTINUEB signal from the LCP
initialization process, so that the node failure was
processed while the LCP remained in the
Following master node failure and selection of a new one, the
new master queries the remaining nodes with a
MASTER_LCPREQ signal to determine the state
of the LCP. At this point, since the LCP status was
LCP_INIT_TABLES, the LCP status was reset to
LCP_STATUS_IDLE. However, the LCP status of
the tables was not modified, so there remained tables with
TLS_ACTIVE. Afterwards, the failed node is
removed from the LCP. If the LCP status of a given table is
TLS_ACTIVE, there is a check that the global
LCP status is not
LCP_STATUS_IDLE; this check
failed and caused the data node to fail.
MASTER_LCPREQ handler ensures that
tabLcpStatus for all tables is updated to
TLS_COMPLETED when the global LCP status is
The logging of insert failures has been improved. This is
intended to help diagnose occasional issues seen when writing to
CHAR column that used
UTF8 character set as a table's
primary key column led to node failure when restarting data
nodes. Attempting to restore a table with such a primary key
also caused ndb_restore to fail.
(Bug #16895311, Bug #68893)
NDB$EPOCH_TRANS, conflicts between
DELETE operations were handled like conflicts
between updates, with the primary rejecting the transaction and
dependents, and realigning the secondary. This meant that their
behavior with regard to subsequent operations on any affected
row or rows depended on whether they were in the same epoch or a
different one: within the same epoch, they were considered
conflicting events; in different epochs, they were not
considered in conflict.
This fix brings the handling of conflicts between deletes by
NDB$EPOCH_TRANS with that performed when
NDB$EPOCH for conflict detection and
resolution, and extends testing with
NDB$EPOCH_TRANS to include
“delete-delete” conflicts, and encapsulate the
expected result, with transactional conflict handling modified
so that a conflict between
alone is not sufficient to cause a
transaction to be considered in conflict.
NDB data node indicates a
buffer overflow via an empty epoch, the event buffer places an
inconsistent data event in the event queue. When this was
consumed, it was not removed from the event queue as expected,
nextEvent() calls to return
0. This caused event consumption to stall because the
inconsistency remained flagged forever, while event data
accumulated in the queue.
Event data belonging to an empty inconsistent epoch can be found
either at the beginning or somewhere in the middle.
pollEvents() returns 0 for
the first case. This fix handles the second case: calling
nextEvent() call dequeues the inconsistent
event before it returns. In order to benefit from this fix, user
applications must call
nextEvent() even when
pollEvents() returns 0.
returned 1, even when called with a wait time equal to 0, and
there were no events waiting in the queue. Now in such cases it
returns 0 as expected.