Packaging; Solaris: Compilation of ndbmtd failed on Solaris 10 and 11 for 32-bit
x86
, and the binary was not included in the binary distributions for these platforms. (Bug #16620938)-
Microsoft Windows: Timers used in timing scheduler events in the
NDB
kernel have been refactored, in part to insure that they are monotonic on all platforms. In particular, on Windows, event intervals were previously calculated using values obtained fromGetSystemTimeAsFileTime()
, which reads directly from the system time (“wall clock”), and which may arbitrarily be reset backward or forward, leading to false watchdog or heartbeat alarms, or even node shutdown. Lack of timer monotonicity could also cause slow disk writes during backups and global checkpoints. To fix this issue, the Windows implementation now usesQueryPerformanceCounters()
instead ofGetSystemTimeAsFileTime()
. In the event that a monotonic timer is not found on startup of the data nodes, a warning is logged.In addition, on all platforms, a check is now performed at compile time for available system monotonic timers, and the build fails if one cannot be found; note that
CLOCK_HIGHRES
is now supported as an alternative forCLOCK_MONOTONIC
if the latter is not available. (Bug #17647637) NDB Disk Data: When using Disk Data tables and ndbmtd data nodes, it was possible for the undo buffer to become overloaded, leading to a crash of the data nodes. This issue was more likely to be encountered when using Disk Data columns whose size was approximately 8K or larger. (Bug #16766493)
-
NDB Cluster APIs:
UINT_MAX64
was treated as a signed value by Visual Studio 2010. To prevent this from happening, the value is now explicitly defined as unsigned. (Bug #17947674)References: See also: Bug #17647637.
NDB Cluster APIs: It was possible for an
Ndb
object to receive signals for handling before it was initialized, leading to thread interleaving and possible data node failure when executing a call toNdb::init()
. To guard against this happening, a check is now made when it is starting to receive signals that theNdb
object is properly initialized before any signals are actually handled. (Bug #17719439)NDB Cluster APIs: Compilation of example NDB API program files failed due to missing include directives. (Bug #17672846, Bug #70759)
-
NDB Cluster APIs: An application, having opened two distinct instances of
Ndb_cluster_connection
, attempted to use the second connection object to send signals to itself, but these signals were blocked until the destructor was explicitly called for that connection object. (Bug #17626525)References: This issue is a regression of: Bug #16595838.
Interrupting a drop of a foreign key could cause the underlying table to become corrupt. (Bug #18041636)
Monotonic timers on several platforms can experience issues which might result in the monotonic clock doing small jumps back in time. This is due to imperfect synchronization of clocks between multiple CPU cores and does not normally have an adverse effect on the scheduler and watchdog mechanisms; so we handle some of these cases by making backtick protection less strict, although we continue to ensure that the backtick is less than 10 milliseconds. This fix also removes several checks for backticks which are thereby made redundant. (Bug #17973819)
-
Under certain specific circumstances, in a cluster having two SQL nodes, one of these could hang, and could not be accessed again even after killing the mysqld process and restarting it. (Bug #17875885, Bug #18080104)
References: See also: Bug #17934985.
-
Poor support or lack of support on some platforms for monotonic timers caused issues with delayed signal handling by the job scheduler for the multithreaded data node. Variances (timer leaps) on such platforms are now handled in the same way the multithreaded data node process that they are by the singlethreaded version. (Bug #17857442)
References: See also: Bug #17475425, Bug #17647637.
-
In some cases, with
ndb_join_pushdown
enabled, it was possible to obtain from a valid query the error Got error 290 'Corrupt key in TC, unable to xfrm' from NDBCLUSTER even though the data was not actually corrupted.It was determined that a
NULL
in aVARCHAR
column could be used to construct a lookup key, but sinceNULL
is never equal to any other value, such a lookup could simple have been eliminated instead. ThisNULL
lookup in turn led to the spurious error message.This fix takes advantage of the fact that a key lookup with
NULL
never finds any matching rows, and soNDB
does not try to perform the lookup that would have led to the error. (Bug #17845161) -
The local checkpoint lag watchdog tracking the number of times a check for LCP timeout was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the LCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called. (Bug #17842035)
References: See also: Bug #17647469.
It was theoretically possible in certain cases for a number of output functions internal to the
NDB
code to supply an uninitialized buffer as output. Now in such cases, a newline character is printed instead. (Bug #17775602, Bug #17775772)Use of the
localtime()
function inNDB
multithreading code led to otherwise nondeterministic failures in ndbmtd. This fix replaces this function, which on many platforms uses a buffer shared among multiple threads, withlocaltime_r()
, which can have allocated to it a buffer of its own. (Bug #17750252)When using single-threaded (ndbd) data nodes with
RealTimeScheduler
enabled, the CPU did not, as intended, temporarily lower its scheduling priority to normal every 10 milliseconds to give other, non-realtime threads a chance to run. (Bug #17739131)-
During arbitrator selection,
QMGR
(see The QMGR Block) runs through a series of states, the first few of which are (in order)NULL
,INIT
,FIND
,PREP1
,PREP2
, andSTART
. A check for an arbitration selection timeout occurred in theFIND
state, even though the corresponding timer was not set untilQMGR
reached thePREP1
andPREP2
states. Attempting to read the resulting uninitialized timestamp value could lead to false Could not find an arbitrator, cluster is not partition-safe warnings.This fix moves the setting of the timer for arbitration timeout to the
INIT
state, so that the value later read duringFIND
is always initialized. (Bug #17738720) -
The global checkpoint lag watchdog tracking the number of times a check for GCP lag was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the GCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called.
In addition, any backticks (rare in any case) are now handled by taking the backward time as the new current time and calculating the elapsed time for this round as 0. Finally, any ill effects of a forward leap, which possibly could expire the watchdog timer immediately, are reduced by never calculating an elapsed time longer than the requested delay time for the watchdog timer. (Bug #17647469)
References: See also: Bug #17842035.
The length of the interval (intended to be 10 seconds) between warnings for
GCP_COMMIT
when the GCP progress watchdog did not detect progress in a global checkpoint was not always calculated correctly. (Bug #17647213)Trying to drop an index used by a foreign key constraint caused data node failure. Now in such cases, the statement used to perform the drop fails. (Bug #17591531)
-
In certain rare cases on commit of a transaction, an
Ndb
object was released before the transaction coordinator (DBTC
kernel block) sent the expectedCOMMIT_CONF
signal;NDB
failed to send aCOMMIT_ACK
signal in response, which caused a memory leak in theNDB
kernel could later lead to node failure.Now an
Ndb
object is not released until theCOMMIT_CONF
signal has actually been received. (Bug #16944817) Losing its connections to the management node or data nodes while a query against the
ndbinfo.memoryusage
table was in progress caused the SQL node where the query was issued to fail. (Bug #14483440, Bug #16810415)The ndbd_redo_log_reader utility now supports a
--help
option. Using this options causes the program to print basic usage information, and then to exit. (Bug #11749591, Bug #36805)