Important Change; NDB Cluster APIs: This release introduces an epoch-driven Event API for the NDB API that supercedes the earlier GCI-based model. The new version of this API also simplifies error detection and handling, and monitoring of event buffer memory usage has been improved.
New event handling methods for
NdbEventOperationadded by this change include
clearError()methods are deprecated beginning with the same release.
Some (but not all) of the new methods act as replacements for deprecated methods; not all of the deprecated methods map to new ones. The Event Class, provides information as to which old methods correspond to new ones.
Error handling using the new API is no longer handled using dedicated
clearError()methods, which are now deprecated as previously noted. To support this change,
TableEventnow supports the values
TE_INCONSISTENT(inconsistent epoch), and
TE_OUT_OF_MEMORY(insufficient event buffer memory).
Event buffer memory management has also been improved with the introduction of the
get_event_buffer_memory_usage()methods, as well as a new NDB API error Free percent out of range (error code 4123). Memory buffer usage can now be represented in applications using the
EventBufferMemoryUsagedata structure, and checked from MySQL client applications by reading the
NDB Cluster APIs: Two new example programs, demonstrating reads and writes of
VARBINARYcolumn values, have been added to
storage/ndb/ndbapi-examplesin the MySQL NDB Cluster source tree. For more information about these programs, including source code listings, see NDB API Simple Array Example, and NDB API Simple Array Example Using Adapter.
Additional logging is now performed of internal states occurring during system restarts such as waiting for node ID allocation and master takeover of global and local checkpoints. (Bug #74316, Bug #19795029)
operations_per_fragmenttable to the
ndbinfoinformation database. Using this table, you can now obtain counts of operations performed on a given fragment (or fragment replica). Such operations include reads, writes, updates, and deletes, scan and index operations performed while executing them, and operations refused, as well as information relating to rows scanned on and returned from a given fragment replica. This table also provides information about interpreted programs used as attribute values, and values returned by them.
MaxParallelCopyInstancesdata node configuration parameter. In cases where the parallelism used during restart copy phase (normally the number of LDMs up to a maximum of 16) is excessive and leads to system overload, this parameter can be used to override the default behavior by reducing the degree of parallelism employed.
NDB Disk Data: An update on many rows of a large Disk Data table could in some rare cases lead to node failure. In the event that such problems are observed with very large transactions on Disk Data tables you can now increase the number of page entries allocated for disk page buffer memory by raising the value of the
DiskPageBufferEntriesdata node configuration parameter added in this release. (Bug #19958804)
NDB Disk Data: In some cases, during
DICTmaster takeover, the new master could crash while attempting to roll forward an ongoing schema transaction. (Bug #19875663, Bug #74510)
NDB Cluster APIs: It was possible to delete an
Ndb_cluster_connectionobject while there remained instances of
Ndbusing references to it. Now the
Ndb_cluster_connectiondestructor waits for all related
Ndbobjects to be released before completing. (Bug #19999242)
References: See also: Bug #19846392.
The global checkpoint commit and save protocols can be delayed by various causes, including slow disk I/O. The
DIHmaster node monitors the progress of both of these protocols, and can enforce a maximum lag time during which the protocols are stalled by killing the node responsible for the lag when it reaches this maximum. This
DIHmaster GCP monitor mechanism did not perform its task more than once per master node; that is, it failed to continue monitoring after detecting and handling a GCP stop. (Bug #20128256)
References: See also: Bug #19858151, Bug #20069617, Bug #20062754.
When running mysql_upgrade on a MySQL NDB Cluster SQL node, the expected drop of the
performance_schemadatabase on this node was instead performed on all SQL nodes connected to the cluster. (Bug #20032861)
The warning shown when an
ALTER TABLE ALGORITHM=INPLACE ... ADD COLUMNstatement automatically changes a column's
DYNAMICnow includes the name of the column whose format was changed. (Bug #20009152, Bug #74795)
The local checkpoint scan fragment watchdog and the global checkpoint monitor can each exclude a node when it is too slow when participating in their respective protocols. This exclusion was implemented by simply asking the failing node to shut down, which in case this was delayed (for whatever reason) could prolong the duration of the GCP or LCP stall for other, unaffected nodes.
To minimize this time, an isolation mechanism has been added to both protocols whereby any other live nodes forcibly disconnect the failing node after a predetermined amount of time. This allows the failing node the opportunity to shut down gracefully (after logging debugging and other information) if possible, but limits the time that other nodes must wait for this to occur. Now, once the remaining live nodes have processed the disconnection of any failing nodes, they can commence failure handling and restart the related protocol or protocol, even if the failed node takes an excessively long time to shut down. (Bug #19858151)
References: See also: Bug #20128256, Bug #20069617, Bug #20062754.
The matrix of values used for thread configuration when applying the setting of the
MaxNoOfExecutionThreadsconfiguration parameter has been improved to align with support for greater numbers of LDM threads. See Multi-Threading Configuration Parameters (ndbmtd), for more information about the changes. (Bug #75220, Bug #20215689)
When a new node failed after connecting to the president but not to any other live node, then reconnected and started again, a live node that did not see the original connection retained old state information. This caused the live node to send redundant signals to the president, causing it to fail. (Bug #75218, Bug #20215395)
NDBkernel, it was possible for a
TransporterFacadeobject to reset a buffer while the data contained by the buffer was being sent, which could lead to a race condition. (Bug #75041, Bug #20112981)
mysql_upgrade failed to drop and recreate the
ndbinfodatabase and its tables as expected. (Bug #74863, Bug #20031425)
Due to a lack of memory barriers, MySQL NDB Cluster programs such as ndbmtd did not compile on
POWERplatforms. (Bug #74782, Bug #20007248)
In spite of the presence of a number of protection mechanisms against overloading signal buffers, it was still in some cases possible to do so. This fix adds block-level support in the
SimulatedBlock) to make signal buffer overload protection more reliable than when implementing such protection on a case-by-case basis. (Bug #74639, Bug #19928269)
Copying of metadata during local checkpoints caused node restart times to be highly variable which could make it difficult to diagnose problems with restarts. The fix for this issue introduces signals (including
PAUSE_NOT_IN_LCP_COPY_META_DATA) to pause LCP execution and flush LCP reports, making it possible to block LCP reporting at times when LCPs during restarts become stalled in this fashion. (Bug #74594, Bug #19898269)
When a data node was restarted from its angel process (that is, following a node failure), it could be allocated a new node ID before failure handling was actually completed for the failed node. (Bug #74564, Bug #19891507)
NDBversion 7.4, node failure handling can require completing checkpoints on up to 64 fragments. (This checkpointing is performed by the
DBLQHkernel block.) The requirement for master takeover to wait for completion of all such checkpoints led in such cases to excessive length of time for completion.
To address these issues, the
DBLQHkernel block can now report that it is ready for master takeover before it has completed any ongoing fragment checkpoints, and can continue processing these while the system completes the master takeover. (Bug #74320, Bug #19795217)
Local checkpoints were sometimes started earlier than necessary during node restarts, while the node was still waiting for copying of the data distribution and data dictionary to complete. (Bug #74319, Bug #19795152)
The check to determine when a node was restarting and so know when to accelerate local checkpoints sometimes reported a false positive. (Bug #74318, Bug #19795108)
Values in different columns of the
disk_write_speed_aggregate_nodewere reported using differing multiples of bytes. Now all of these columns display values in bytes.
In addition, this fix corrects an error made when calculating the standard deviations used in the
std_dev_redo_speed_last_60seccolumns of the
ndbinfo.disk_write_speed_aggregatetable. (Bug #74317, Bug #19795072)
Recursion in the internal method
Dblqh::finishScanrec()led to an attempt to create two list iterators with the same head. This regression was introduced during work done to optimize scans for version 7.4 of the
NDBstorage engine. (Bug #73667, Bug #19480197)
Transporter send buffers were not updated properly following a failed send. (Bug #45043, Bug #20113145)