Packaging; Solaris: Compilation of ndbmtd failed on Solaris 10 and 11 for 32-bit
x86, and the binary was not included in the binary distributions for these platforms. (Bug #16620938)
Microsoft Windows: Timers used in timing scheduler events in the
NDBkernel have been refactored, in part to insure that they are monotonic on all platforms. In particular, on Windows, event intervals were previously calculated using values obtained from
GetSystemTimeAsFileTime(), which reads directly from the system time (“wall clock”), and which may arbitrarily be reset backward or forward, leading to false watchdog or heartbeat alarms, or even node shutdown. Lack of timer monotonicity could also cause slow disk writes during backups and global checkpoints. To fix this issue, the Windows implementation now uses
GetSystemTimeAsFileTime(). In the event that a monotonic timer is not found on startup of the data nodes, a warning is logged.
In addition, on all platforms, a check is now performed at compile time for available system monotonic timers, and the build fails if one cannot be found; note that
CLOCK_HIGHRESis now supported as an alternative for
CLOCK_MONOTONICif the latter is not available. (Bug #17647637)
NDB Disk Data: When using Disk Data tables and ndbmtd data nodes, it was possible for the undo buffer to become overloaded, leading to a crash of the data nodes. This issue was more likely to be encountered when using Disk Data columns whose size was approximately 8K or larger. (Bug #16766493)
NDB Cluster APIs:
UINT_MAX64was treated as a signed value by Visual Studio 2010. To prevent this from happening, the value is now explicitly defined as unsigned. (Bug #17947674)
References: See also: Bug #17647637.
NDB Cluster APIs: It was possible for an
Ndbobject to receive signals for handling before it was initialized, leading to thread interleaving and possible data node failure when executing a call to
Ndb::init(). To guard against this happening, a check is now made when it is starting to receive signals that the
Ndbobject is properly initialized before any signals are actually handled. (Bug #17719439)
NDB Cluster APIs: Compilation of example NDB API program files failed due to missing include directives. (Bug #17672846, Bug #70759)
NDB Cluster APIs: An application, having opened two distinct instances of
Ndb_cluster_connection, attempted to use the second connection object to send signals to itself, but these signals were blocked until the destructor was explicitly called for that connection object. (Bug #17626525)
References: This issue is a regression of: Bug #16595838.
Interrupting a drop of a foreign key could cause the underlying table to become corrupt. (Bug #18041636)
Monotonic timers on several platforms can experience issues which might result in the monotonic clock doing small jumps back in time. This is due to imperfect synchronization of clocks between multiple CPU cores and does not normally have an adverse effect on the scheduler and watchdog mechanisms; so we handle some of these cases by making backtick protection less strict, although we continue to ensure that the backtick is less than 10 milliseconds. This fix also removes several checks for backticks which are thereby made redundant. (Bug #17973819)
Under certain specific circumstances, in a cluster having two SQL nodes, one of these could hang, and could not be accessed again even after killing the mysqld process and restarting it. (Bug #17875885, Bug #18080104)
References: See also: Bug #17934985.
Poor support or lack of support on some platforms for monotonic timers caused issues with delayed signal handling by the job scheduler for the multithreaded data node. Variances (timer leaps) on such platforms are now handled in the same way the multithreaded data node process that they are by the singlethreaded version. (Bug #17857442)
References: See also: Bug #17475425, Bug #17647637.
In some cases, with
ndb_join_pushdownenabled, it was possible to obtain from a valid query the error Got error 290 'Corrupt key in TC, unable to xfrm' from NDBCLUSTER even though the data was not actually corrupted.
It was determined that a
VARCHARcolumn could be used to construct a lookup key, but since
NULLis never equal to any other value, such a lookup could simple have been eliminated instead. This
NULLlookup in turn led to the spurious error message.
This fix takes advantage of the fact that a key lookup with
NULLnever finds any matching rows, and so
NDBdoes not try to perform the lookup that would have led to the error. (Bug #17845161)
The local checkpoint lag watchdog tracking the number of times a check for LCP timeout was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the LCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called. (Bug #17842035)
References: See also: Bug #17647469.
It was theoretically possible in certain cases for a number of output functions internal to the
NDBcode to supply an uninitialized buffer as output. Now in such cases, a newline character is printed instead. (Bug #17775602, Bug #17775772)
Use of the
NDBmultithreading code led to otherwise nondeterministic failures in ndbmtd. This fix replaces this function, which on many platforms uses a buffer shared among multiple threads, with
localtime_r(), which can have allocated to it a buffer of its own. (Bug #17750252)
When using single-threaded (ndbd) data nodes with
RealTimeSchedulerenabled, the CPU did not, as intended, temporarily lower its scheduling priority to normal every 10 milliseconds to give other, non-realtime threads a chance to run. (Bug #17739131)
During arbitrator selection,
QMGR(see The QMGR Block) runs through a series of states, the first few of which are (in order)
START. A check for an arbitration selection timeout occurred in the
FINDstate, even though the corresponding timer was not set until
PREP2states. Attempting to read the resulting uninitialized timestamp value could lead to false Could not find an arbitrator, cluster is not partition-safe warnings.
This fix moves the setting of the timer for arbitration timeout to the
INITstate, so that the value later read during
FINDis always initialized. (Bug #17738720)
The global checkpoint lag watchdog tracking the number of times a check for GCP lag was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the GCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called.
In addition, any backticks (rare in any case) are now handled by taking the backward time as the new current time and calculating the elapsed time for this round as 0. Finally, any ill effects of a forward leap, which possibly could expire the watchdog timer immediately, are reduced by never calculating an elapsed time longer than the requested delay time for the watchdog timer. (Bug #17647469)
References: See also: Bug #17842035.
The length of the interval (intended to be 10 seconds) between warnings for
GCP_COMMITwhen the GCP progress watchdog did not detect progress in a global checkpoint was not always calculated correctly. (Bug #17647213)
Trying to drop an index used by a foreign key constraint caused data node failure. Now in such cases, the statement used to perform the drop fails. (Bug #17591531)
In certain rare cases on commit of a transaction, an
Ndbobject was released before the transaction coordinator (
DBTCkernel block) sent the expected
NDBfailed to send a
COMMIT_ACKsignal in response, which caused a memory leak in the
NDBkernel could later lead to node failure.
Ndbobject is not released until the
COMMIT_CONFsignal has actually been received. (Bug #16944817)
Losing its connections to the management node or data nodes while a query against the
ndbinfo.memoryusagetable was in progress caused the SQL node where the query was issued to fail. (Bug #14483440, Bug #16810415)
The ndbd_redo_log_reader utility now supports a
--helpoption. Using this options causes the program to print basic usage information, and then to exit. (Bug #11749591, Bug #36805)