Performance: Recent improvements made to the multithreaded scheduler were intended to optimize the cache behavior of its internal data structures, with members of these structures placed such that those local to a given thread do not overflow into a cache line which can be accessed by another thread. Where required, extra padding bytes are inserted to isolate cache lines owned (or shared) by other threads, thus avoiding invalidation of the entire cache line if another thread writes into a cache line not entirely owned by itself. This optimization improved MT Scheduler performance by several percent.
It has since been found that the optimization just described depends on the global instance of struct
thr_repositorystarting at a cache line aligned base address as well as the compiler not rearranging or adding extra padding to the scheduler struct; it was also found that these prerequisites were not guaranteed (or even checked). Thus this cache line optimization has previously worked only when
g_thr_repository(that is, the global instance) ended up being cache line aligned only by accident. In addition, on 64-bit platforms, the compiler added extra padding words in struct
thr_safe_poolsuch that attempts to pad it to a cache line aligned size failed.
The current fix ensures that
g_thr_repositoryis constructed on a cache line aligned address, and the constructors modified so as to verify cacheline aligned adresses where these are assumed by design.
Results from internal testing show improvements in MT Scheduler read performance of up to 10% in some cases, following these changes. (Bug #18352514)
NDB Cluster APIs: Two new example programs, demonstrating reads and writes of
VARBINARYcolumn values, have been added to
storage/ndb/ndbapi-examplesin the MySQL NDB Cluster source tree. For more information about these programs, including source code listings, see NDB API Simple Array Example, and NDB API Simple Array Example Using Adapter.
NDB Disk Data: An update on many rows of a large Disk Data table could in some rare cases lead to node failure. In the event that such problems are observed with very large transactions on Disk Data tables you can now increase the number of page entries allocated for disk page buffer memory by raising the value of the
DiskPageBufferEntriesdata node configuration parameter added in this release. (Bug #19958804)
NDB Disk Data: In some cases, during
DICTmaster takeover, the new master could crash while attempting to roll forward an ongoing schema transaction. (Bug #19875663, Bug #74510)
NDB Disk Data: When a node acting as a
DICTmaster fails, the arbitrator selects another node to take over in place of the failed node. During the takeover procedure, which includes cleaning up any schema transactions which are still open when the master failed, the disposition of the uncommitted schema transaction is decided. Normally this transaction be rolled back, but if it has completed a sufficient portion of a commit request, the new master finishes processing the commit. Until the fate of the transaction has been decided, no new
TRANS_END_REQmessages from clients can be processed. In addition, since multiple concurrent schema transactions are not supported, takeover cleanup must be completed before any new transactions can be started.
A similar restriction applies to any schema operations which are performed in the scope of an open schema transaction. The counter used to coordinate schema operation across all nodes is employed both during takeover processing and when executing any non-local schema operations. This means that starting a schema operation while its schema transaction is in the takeover phase causes this counter to be overwritten by concurrent uses, with unpredictable results.
The scenarios just described were handled previously using a pseudo-random delay when recovering from a node failure. Now we check before the new master has rolled forward or backwards any schema transactions remaining after the failure of the previous master and avoid starting new schema transactions or performing operations using old transactions until takeover processing has cleaned up after the abandoned transaction. (Bug #19874809, Bug #74503)
NDB Disk Data: When a node acting as
DICTmaster fails, it is still possible to request that any open schema transaction be either committed or aborted by sending this request to the new
DICTmaster. In this event, the new master takes over the schema transaction and reports back on whether the commit or abort request succeeded. In certain cases, it was possible for the new master to be misidentified—that is, the request was sent to the wrong node, which responded with an error that was interpreted by the client application as an aborted schema transaction, even in cases where the transaction could have been successfully committed, had the correct node been contacted. (Bug #74521, Bug #19880747)
NDB Cluster APIs: It was possible to delete an
Ndb_cluster_connectionobject while there remained instances of
Ndbusing references to it. Now the
Ndb_cluster_connectiondestructor waits for all related
Ndbobjects to be released before completing. (Bug #19999242)
References: See also: Bug #19846392.
NDB Cluster APIs: The buffer allocated by an
NdbScanOperationfor receiving scanned rows was not released until the
NdbTransactionowning the scan operation was closed. This could lead to excessive memory usage in an application where multiple scans were created within the same transaction, even if these scans were closed at the end of their lifecycle, unless
NdbScanOperation::close()was invoked with the
releaseOpargument equal to
true. Now the buffer is released whenever the cursor navigating the result set is closed with
NdbScanOperation::close(), regardless of the value of this argument. (Bug #75128, Bug #20166585)
The global checkpoint commit and save protocols can be delayed by various causes, including slow disk I/O. The
DIHmaster node monitors the progress of both of these protocols, and can enforce a maximum lag time during which the protocols are stalled by killing the node responsible for the lag when it reaches this maximum. This
DIHmaster GCP monitor mechanism did not perform its task more than once per master node; that is, it failed to continue monitoring after detecting and handling a GCP stop. (Bug #20128256)
References: See also: Bug #19858151, Bug #20069617, Bug #20062754.
When running mysql_upgrade on a MySQL NDB Cluster SQL node, the expected drop of the
performance_schemadatabase on this node was instead performed on all SQL nodes connected to the cluster. (Bug #20032861)
A number of problems relating to the fired triggers pool have been fixed, including the following issues:
When the fired triggers pool was exhausted,
NDBreturned Error 218 (Out of LongMessageBuffer). A new error code 221 is added to cover this case.
An additional, separate case in which Error 218 was wrongly reported now returns the correct error.
Setting low values for
MaxNoOfFiredTriggersled to an error when no memory was allocated if there was only one hash bucket.
An aborted transaction now releases any fired trigger records it held. Previously, these records were held until its
ApiConnectRecordwas reused by another transaction.
In addition, for the
Fired Triggerspool in the internal
ndbinfo.ndb$poolstable, the high value always equalled the total, due to the fact that all records were momentarily seized when initializing them. Now the high value shows the maximum following completion of initialization.
Online reorganization when using ndbmtd data nodes and with binary logging by mysqld enabled could sometimes lead to failures in the
DBLQHkernel blocks, or in silent data corruption. (Bug #19903481)
References: See also: Bug #19912988.
The local checkpoint scan fragment watchdog and the global checkpoint monitor can each exclude a node when it is too slow when participating in their respective protocols. This exclusion was implemented by simply asking the failing node to shut down, which in case this was delayed (for whatever reason) could prolong the duration of the GCP or LCP stall for other, unaffected nodes.
To minimize this time, an isolation mechanism has been added to both protocols whereby any other live nodes forcibly disconnect the failing node after a predetermined amount of time. This allows the failing node the opportunity to shut down gracefully (after logging debugging and other information) if possible, but limits the time that other nodes must wait for this to occur. Now, once the remaining live nodes have processed the disconnection of any failing nodes, they can commence failure handling and restart the related protocol or protocol, even if the failed node takes an excessively long time to shut down. (Bug #19858151)
References: See also: Bug #20128256, Bug #20069617, Bug #20062754.
A watchdog failure resulted from a hang while freeing a disk page in
TUP_COMMITREQ, due to use of an uninitialized block variable. (Bug #19815044, Bug #74380)
Multiple threads crashing led to multiple sets of trace files being printed and possibly to deadlocks. (Bug #19724313)
When a client retried against a new master a schema transaction that failed previously against the previous master while the latter was restarting, the lock obtained by this transaction on the new master prevented the previous master from progressing past start phase 3 until the client was terminated, and resources held by it were cleaned up. (Bug #19712569, Bug #74154)
When using the
NDBstorage engine, the maximum possible length of a database or table name is 63 characters, but this limit was not always strictly enforced. This meant that a statement using a name having 64 characters such
DROP DATABASE, or
ALTER TABLE RENAMEcould cause the SQL node on which it was executed to fail. Now such statements fail with an appropriate error message. (Bug #19550973)
When a new data node started, API nodes were allowed to attempt to register themselves with the data node for executing transactions before the data node was ready. This forced the API node to wait an extra heartbeat interval before trying again.
To address this issue, a number of HA_ERR_NO_CONNECTION errors (Error 4009) that could be issued during this time have been changed to Cluster temporarily unavailable errors (Error 4035), which should allow API nodes to use new data nodes more quickly than before. As part of this fix, some errors which were incorrectly categorised have been moved into the correct categories, and some errors which are no longer used have been removed. (Bug #19524096, Bug #73758)
When executing very large pushdown joins involving one or more indexes each defined over several columns, it was possible in some cases for the
DBSPJblock (see The DBSPJ Block) in the
NDBkernel to generate
SCAN_FRAGREQsignals that were excessively large. This caused data nodes to fail when these could not be handled correctly, due to a hard limit in the kernel on the size of such signals (32K). This fix bypasses that limitation by breaking up
SCAN_FRAGREQdata that is too large for one such signal, and sending the
SCAN_FRAGREQas a chunked or fragmented signal instead. (Bug #19390895)
ndb_index_stat sometimes failed when used against a table containing unique indexes. (Bug #18715165)
Queries against tables containing a CHAR(0) columns failed with ERROR 1296 (HY000): Got error 4547 'RecordSpecification has overlapping offsets' from NDBCLUSTER. (Bug #14798022)
NDBkernel, it was possible for a
TransporterFacadeobject to reset a buffer while the data contained by the buffer was being sent, which could lead to a race condition. (Bug #75041, Bug #20112981)
mysql_upgrade failed to drop and recreate the
ndbinfodatabase and its tables as expected. (Bug #74863, Bug #20031425)
Due to a lack of memory barriers, MySQL NDB Cluster programs such as ndbmtd did not compile on
POWERplatforms. (Bug #74782, Bug #20007248)
In some cases, when run against a table having an
AFTER DELETEtrigger, a
DELETEstatement that matched no rows still caused the trigger to execute. (Bug #74751, Bug #19992856)
A basic requirement of the
NDBstorage engine's design is that the transporter registry not attempt to receive data (
TransporterRegistry::performReceive()) from and update the connection status (
TransporterRegistry::update_connections()) of the same set of transporters concurrently, due to the fact that the updates perform final cleanup and reinitialization of buffers used when receiving data. Changing the contents of these buffers while reading or writing to them could lead to "garbage" or inconsistent signals being read or written.
During the course of work done previously to improve the implementation of the transporter facade, a mutex intended to protect against the concurrent use of the
update_connections()) methods on the same transporter was inadvertently removed. This fix adds a watchdog check for concurrent usage. In addition,
performReceive()calls are now serialized together while polling the transporters. (Bug #74011, Bug #19661543)
ndb_restore failed while restoring a table which contained both a built-in conversion on the primary key and a staging conversion on a
During staging, a
BLOBtable is created with a primary key column of the target type. However, a conversion function was not provided to convert the primary key values before loading them into the staging blob table, which resulted in corrupted primary key values in the staging
BLOBtable. While moving data from the staging table to the target table, the
BLOBread failed because it could not find the primary key in the
BLOBtables are checked to see whether there are conversions on primary keys of their main tables. This check is done after all the main tables are processed, so that conversion functions and parameters have already been set for the main tables. Any conversion functions and parameters used for the primary key in the main table are now duplicated in the
BLOBtable. (Bug #73966, Bug #19642978)
Corrupted messages to data nodes sometimes went undetected, causing a bad signal to be delivered to a block which aborted the data node. This failure in combination with disconnecting nodes could in turn cause the entire cluster to shut down.
To keep this from happening, additional checks are now made when unpacking signals received over TCP, including checks for byte order, compression flag (which must not be used), and the length of the next message in the receive buffer (if there is one).
Whenever two consecutive unpacked messages fail the checks just described, the current message is assumed to be corrupted. In this case, the transporter is marked as having bad data and no more unpacking of messages occurs until the transporter is reconnected. In addition, an entry is written to the cluster log containing the error as well as a hex dump of the corrupted message. (Bug #73843, Bug #19582925)
Transporter send buffers were not updated properly following a failed send. (Bug #45043, Bug #20113145)