MySQL :: MySQL NDB Cluster 8.0 Release Notes :: Changes in MySQL NDB Cluster 8.0.16 (2019-04-25, Development Milestone)

MySQL NDB Cluster 8.0 Release Notes / Release Series Changelogs: MySQL NDB Cluster 8.0 / Changes in MySQL NDB Cluster 8.0.16 (2019-04-25, Development Milestone)

Changes in MySQL NDB Cluster 8.0.16 (2019-04-25, Development Milestone)

Deprecation and Removal Notes

Incompatible Change: Distribution of privileges amongst MySQL servers connected to NDB Cluster, as implemented in NDB 7.6 and earlier, does not function in NDB 8.0, and most code supporting these has now been removed. When a mysqld detects such tables in NDB, it creates shadow tables local to itself using the InnoDB storage engine; these shadow tables are created on each MySQL server connected to an NDB cluster. Privilege tables using the NDB storage engine are not employed for access control; once all connected MySQL servers are upgraded, the privilege tables in NDB can be removed safely using ndb_drop_table.

For compatibility reasons, ndb_restore --restore-privilege-tables can still be used to restore distributed privilege tables present in a backup taken from a previous release of NDB Cluster to a cluster running NDB 8.0. These tables are handled as described in the preceeding paragraph.

For additional information regarding upgrades from previous NDB Cluster release series to NDB 8.0, see Upgrading and Downgrading NDB Cluster. (WL #12507, WL #12511)

SQL Syntax Notes

Incompatible Change: For consistency with InnoDB, the NDB storage engine now uses a generated constraint name if the CONSTRAINT symbol clause is not specified, or the CONSTRAINT keyword is specified without a symbol. In previous NDB releases, NDB used the FOREIGN KEY index_name value.

This change described above may introduce incompatibilities for applications that depend on the previous foreign key constraint naming behavior. (Bug #29173134)

Functionality Added or Changed

Packaging: A Docker image for this release can be obtained from https://hub.docker.com/r/mysql/mysql-cluster/. (Bug #96084, Bug #30010921)
Allocation of resources in the transaction corrdinator (see The DBTC Block) is now performed using dynamic memory pools. This means that resource allocation determined by data node configuration parameters such as those discussed in Transaction parameters and Transaction temporary storage is now limited so as not to exceed the total resources available to the transaction coordinator.

As part of this work, several new data node parameters controlling transactional resources in DBTC, listed here, have also been added. For more information about these new parameters, see Transaction resource allocation parameters. (Bug #29164271, Bug #29194843, WL #9756, WL #12523)

References: See also: Bug #29131828.
NDB backups can now be performed in a parallel fashion on individual data nodes using multiple local data managers (LDMs). (Previously, backups were done in parallel across data nodes, but were always serial within data node processes.) No special syntax is required for the START BACKUP command in the ndb_mgm client to enable this feature, but all data nodes must be using multiple LDMs. This means that data nodes must be running ndbmtd and they must be configured to use multiple LDMs prior to taking the backup (see Multi-Threading Configuration Parameters (ndbmtd)).

The EnableMultithreadedBackup data node parameter introduced in this release is enabled (set to 1) by default. You can disable multi-threaded backups and force the creation of single-threaded backups by setting this parameter to 0 on all data nodes or in the [ndbd default] section of the cluster's global configuration file (config.ini).

ndb_restore also now detects a multi-threaded backup and automatically attempts to restore it in parallel. It is also possible to restore backups taken in parallel to a previous version of NDB Cluster by slightly modifying the usual restore procedure.

For more information about taking and restoring NDB Cluster backups that were created using parallelism on the data nodes, see Taking an NDB Backup with Parallel Data Nodes, and Restoring from a backup taken in parallel. (Bug #28563639, Bug #28993400, WL #8517)
The compile-cluster script included in the NDB source distribution no longer supports in-source builds. (WL #12303)
Building with CMake3 is now supported by the compile-cluster script included in the NDB source distribution. (WL #12303)
As part of its automatic synchronization mechanism, NDB now implements a metadata change monitor thread for detecting changes made to metadata for data objects such as tables, tablespaces, and log file groups with the MySQL data dictionary. This thread runs in the background, checking every 60 seconds for inconsistencies between the NDB dictionary and the MySQL data dictionary.

The monitor polling interval can be adjusted by setting the value of the ndb_metadata_check_interval system variable, and can be disabled altogether by setting ndb_metadata_check to OFF. The number of times that inconsistencies have been detected since mysqld was last started is shown as the status variable, Ndb_metadata_detected_count. (WL #11913)
Condition pushdown is no longer limited to predicate terms referring to column values from the same table to which the condition was being pushed; column values from tables earlier in the query plan can now also be referred to from pushed conditions. This lets the data nodes filter out more rows (in parallel), leaving less work to be performed by a single mysqld process, which is expected to provide significant improvements in query performance.

For more information, see Engine Condition Pushdown Optimization. (WL #12686)

Bugs Fixed

Important Change; NDB Disk Data: mysqldump terminated unexpectedly when attempting to dump NDB disk data tables. The underlying reason for this was that mysqldump expected to find information relating to undo log buffers in the EXTRA column of the INFORMATION_SCHEMA.FILES table but this information had been removed in NDB 8.0.13. This information is now restored to the EXTRA column. (Bug #28800252)
Important Change: When restoring to a cluster using data node IDs different from those in the original cluster, ndb_restore tried to open files corresponding to node ID 0. To keep this from happening, the --nodeid and --backupid options—neither of which has a default value—are both now explicitly required when invoking ndb_restore. (Bug #28813708)
Important Change: Starting with this release, the default value of the ndb_log_bin system variable is now FALSE. (Bug #27135706)
Packaging; MySQL NDB ClusterJ: libndbclient was missing from builds on some platforms. (Bug #28997603)
NDB Disk Data: When a log file group had more than 18 undo logs, it was not possible to restart the cluster. (Bug #251155785)

References: See also: Bug #28922609.
NDB Disk Data: Concurrent CREATE TABLE statements using tablespaces caused deadlocks between metadata locks. This occurred when Ndb_metadata_change_monitor acquired exclusive metadata locks on tablespaces and logfile groups after detecting metadata changes, due to the fact that each exclusive metadata lock in turn acquired a global schema lock. This fix attempts to solve that issue by downgrading the locks taken by Ndb_metadata_change_monitor to MDL_SHARED_READ. (Bug #29175268)

References: See also: Bug #29394407.
NDB Disk Data: The error message returned when validation of MaxNoOfOpenFiles in relation to InitialNoOfOpenFiles failed has been improved to make the nature of the problem clearer to users. (Bug #28943749)
NDB Disk Data: Schema distribution of ALTER TABLESPACE and ALTER LOGFILE GROUP statements failed on a participant MySQL server if the referenced tablespace or log file group did not exist in its data dictionary. Now in such cases, the effects of the statement are distributed successfully regardless of any initial mismatch between MySQL servers. (Bug #28866336)
NDB Disk Data: Repeated execution of ALTER TABLESPACE ... ADD DATAFILE against the same tablespace caused data nodes to hang and left them, after being killed manually, unable to restart. (Bug #22605467)
NDB Cluster APIs: NDB now identifies short-lived transactions not needing the reduction of lock contention provided by NdbBlob::close() and no longer invokes this method in cases (such as when autocommit is enabled) in which unlocking merely causes extra work and round trips to be performed prior to committing or aborting the transaction. (Bug #29305592)

References: See also: Bug #49190, Bug #11757181.
NDB Cluster APIs: When the most recently failed operation was released, the pointer to it held by NdbTransaction became invalid and when accessed led to failure of the NDB API application. (Bug #29275244)
NDB Cluster APIs: When the NDB kernel's SUMA block sends a TE_ALTER event, it does not keep track of when all fragments of the event are sent. When NDB receives the event, it buffers the fragments, and processes the event when all fragments have arrived. An issue could possibly arise for very large table definitions, when the time between transmission and reception could span multiple epochs; during this time, SUMA could send a SUB_GCP_COMPLETE_REP signal to indicate that it has sent all data for an epoch, even though in this case that is not entirely true since there may be fragments of a TE_ALTER event still waiting on the data node to be sent. Reception of the SUB_GCP_COMPLETE_REP leads to closing the buffers for that epoch. Thus, when TE_ALTER finally arrives, NDB assumes that it is a duplicate from an earlier epoch, and silently discards it.

We fix the problem by making sure that the SUMA kernel block never sends a SUB_GCP_COMPLETE_REP for any epoch in which there are unsent fragments for a SUB_TABLE_DATA signal.

This issue could have an impact on NDB API applications making use of TE_ALTER events. (SQL nodes do not make any use of TE_ALTER events and so they and applications using them were not affected.) (Bug #28836474)
When a pushed join executing in the DBSPJ block had to store correlation IDs during query execution, memory for these was allocated for the lifetime of the entire query execution, even though these specific correlation IDs are required only when producing the most recent batch in the result set. Subsequent batches require additional correlation IDs to be stored and allocated; thus, if the query took sufficiently long to complete, this led to exhaustion of query memory (error 20008). Now in such cases, memory is allocated only for the lifetime of the current result batch, and is freed and made available for re-use following completion of the batch. (Bug #29336777)

References: See also: Bug #26995027.
When comparing or hashing a fixed-length string that used a NO_PAD collation, any trailing padding characters (typically spaces) were sent to the hashing and comparison functions such that they became significant, even though they were not supposed to be. Now any such trailing spaces are trimmed from a fixed-length string whenever a NO_PAD collation is specified.

Note

Since NO_PAD collations were introduced as part of UCA-9.0 collations in MySQL 8.0, there should be no impact relating to this fix on upgrades to NDB 8.0 from previous GA releases of NDB Cluster.

(Bug #29322313)
When a NOT IN or NOT BETWEEN predicate was evaluated as a pushed condition, NULL values were not eliminated by the condition as specified in the SQL standard. (Bug #29232744)

References: See also: Bug #28672214.
Internally, NDB treats NULL as less than any other value, and predicates of the form column < value or column <= value are checked for possible nulls. Predicates of the form value > column or value >= column were not checked, which could lead to errors. Now in such cases, these predicates are rewritten so that the column comes first, so that they are also checked for the presence of NULL. (Bug #29231709)

References: See also: Bug #92407, Bug #28643463.
After folding of constants was implemented in the MySQL Optimizer, a condition containing a DATE or DATETIME literal could no longer be pushed down by NDB. (Bug #29161281)
When a join condition made a comparison between a column of a temporal data type such as DATE or DATETIME and a constant of the same type, the predicate was pushed if the condition was expressed in the form column operator constant, but not when in inverted order (as constant inverse_operator column). (Bug #29058732)
When processing a pushed condition, NDB did not detect errors or warnings thrown when a literal value being compared was outside the range of the data type it was being compared with,and thus truncated. This could lead to excess or missing rows in the result. (Bug #29054626)
If an EQ_REF or REF key in the child of a pushed join referred to any columns of a table not a member of the pushed join, this table was not an NDB table (because its format was of nonnative endianness), and the data type of the column being joined on was stored in an endian-sensitive format, then the key generated was generated, likely resulting in the return of an (invalid) empty join result.

Since only big endian platforms may store tables in nonnative (little endian) formats, this issue was expected only on such platforms, most notably SPARC, and not on x86 platforms. (Bug #29010641)
API and data nodes running NDB 7.6 and later could not use an existing parsed configuration from an earlier release series due to being overly strict with regard to having values defined for configuration parameters new to the later release, which placed a restriction on possible upgrade paths. Now NDB 7.6 and later are less strict about having all new parameters specified explicitly in the configuration which they are served, and use hard-coded default values in such cases. (Bug #28993400)
NDB 7.6 SQL nodes hung when trying to connect to an NDB 8.0 cluster. (Bug #28985685)
The schema distribution data maintained in the NDB binary logging thread keeping track of the number of subscribers to the NDB schema table always allocated some memory structures for 256 data nodes regardless of the actual number of nodes. Now NDB allocates only as many of these structures as are actually needed. (Bug #28949523)
Added DUMP 406 (NdbfsDumpRequests) to provide NDB file system information to global checkpoint and local checkpoint stall reports in the node logs. (Bug #28922609)
When a joined table was eliminated early as not pushable, it could not be referred to in any subsequent join conditions from other tables without eliminating those conditions from consideration even if those conditions were otherwise pushable. (Bug #28898811)
When starting or restarting an SQL node and connecting to a cluster where NDB was already started, NDB reported Error 4009 Cluster Failure because it could not acquire a global schema lock. This was because the MySQL Server as part of initialization acquires exclusive metadata locks in order to modify internal data structures, and the ndbcluster plugin acquires the global schema lock. If the connection to NDB was not yet properly set up during mysqld initialization, mysqld received a warning from ndbcluster when the latter failed to acquire global schema lock, and printed it to the log file, causing an unexpected error in the log. This is fixed by not pushing any warnings to background threads when failure to acquire a global schema lock occurs and pushing the NDB error as a warning instead. (Bug #28898544)
A race condition between the DBACC and DBLQH kernel blocks occurred when different operations in a transaction on the same row were concurrently being prepared and aborted. This could result in DBTUP attempting to prepare an operation when a preceding operation had been aborted, which was unexpected and could thus lead to undefined behavior including potential data node failures. To solve this issue, DBACC and DBLQH now check that all dependencies are still valid before attempting to prepare an operation.

Note

This fix also supersedes a previous one made for a related issue which was originally reported as Bug #28500861.

(Bug #28893633)
Where a data node was restarted after a configuration change whose result was a decrease in the sum of MaxNoOfTables, MaxNoOfOrderedIndexes, and MaxNoOfUniqueHashIndexes, it sometimes failed with a misleading error message which suggested both a temporary error and a bug, neither of which was the case.

The failure itself is expected, being due to the fact that there is at least one table object with an ID greater than the (new) sum of the parameters just mentioned, and that this table cannot be restored since the maximum value for the ID allowed is limited by that sum. The error message has been changed to reflect this, and now indicates that this is a permanent error due to a problem configuration. (Bug #28884880)
The ndbinfo.cpustat table reported inaccurate information regarding send threads. (Bug #28884157)
Execution of an LCP_COMPLETE_REP signal from the master while the LCP status was IDLE led to an assertion. (Bug #28871889)
NDB now provides on-the-fly .frm file translation during discovery of tables created in versions of the software that did not support the MySQL Data Dictionary. Previously, such translation of tables that had old-style metadata was supported only during schema synchronization during MySQL server startup, but not subsequently, which led to errors when NDB tables having old-style metadata, created by ndb_restore and other such tools after mysqld had been started, were accessed using SHOW CREATE TABLE or SELECT; these tables were usable only after restarting mysqld. With this fix, the restart is no longer required. (Bug #28841009)
An in-place upgrade to an NDB 8.0 release from an earlier relase did not remove .ndb files, even though these are no longer used in NDB 8.0. (Bug #28832816)
Removed storage/ndb/demos and the demonstration scripts and support files it contained from the source tree. These were obsolete and unmaintained, and did not function with any current version of NDB Cluster.

Also removed storage/ndb/include/newtonapi, which included files relating to an obsolete and unmaintained API not supported in any release of NDB Cluster, as well as references elsewhere to these files. (Bug #28808766)
There was no version compatibility table for NDB 8.x; this meant that API nodes running NDB 8.0.13 or 7.6.x could not connect to data nodes running NDB 8.0.14. This issue manifested itself for NDB API users as a failure in wait_until_ready(). (Bug #28776365)

References: See also: Bug #18886034, Bug #18874849.
Issuing a STOP command in the ndb_mgm client caused ndbmtd processes which had recently been added to the cluster to hang in Phase 4 during shutdown. (Bug #28772867)
A fix for a previous issue disabled the usage of pushed conditions for lookup type (eq_ref) operations in pushed joins. It was thought at the time that not pushing a lookup condition would not have any measurable impact on performance, since only a single row could be eliminated if the condition failed. The solution implemented at that time did not take into account the possibility that, in a pushed join, a lookup operation could be a parent operation for other lookups, and even scan operations, which meant that eliminating a single row could actually result in an entire branch being eliminated in error. (Bug #28728603)

References: This issue is a regression of: Bug #27397802.
When a local checkpoint (LCP) was complete on all data nodes except one, and this node failed, NDB did not continue with the steps required to finish the LCP. This led to the following issues:

No new LCPs could be started.

Redo and Undo logs were not trimmed and so grew excessively large, causing an increase in times for recovery from disk. This led to write service failure, which eventually led to cluster shutdown when the head of the redo log met the tail. This placed a limit on cluster uptime.

Node restarts were no longer possible, due to the fact that a data node restart requires that the node's state be made durable on disk before it can provide redundancy when joining the cluster. For a cluster with two data nodes and two fragment replicas, this meant that a restart of the entire cluster (system restart) was required to fix the issue (this was not necessary for a cluster with two fragment replicas and four or more data nodes). (Bug #28728485, Bug #28698831)

References: See also: Bug #11757421.
The pushability of a condition to NDB was limited in that all predicates joined by a logical AND within a given condition had to be pushable to NDB in order for the entire condition to be pushed. In some cases this severely restricted the pushability of conditions. This fix breaks up the condition into its components, and evaluates the pushability of each predicate; if some of the predicates cannot be pushed, they are returned as a remainder condition which can be evaluated by the MySQL server. (Bug #28728007)
Running ANALYZE TABLE on an NDB table with an index having longer than the supported maximum length caused data nodes to fail. (Bug #28714864)
It was possible in certain cases for nodes to hang during an initial restart. (Bug #28698831)

References: See also: Bug #27622643.
When a condition was pushed to a storage engine, it was re-evaluated by the server, in spite of the fact that only rows matching the pushed condition should ever be returned to the server in such cases. (Bug #28672214)
In some cases, one and sometimes more data nodes underwent an unplanned shutdown while running ndb_restore. This occurred most often, but was not always restircted to, when restoring to a cluster having a different number of data nodes from the cluster on which the original backup had been taken.

The root cause of this issue was exhaustion of the pool of SafeCounter objects, used by the DBDICT kernel block as part of executing schema transactions, and taken from a per-block-instance pool shared with protocols used for NDB event setup and subscription processing. The concurrency of event setup and subscription processing is such that the SafeCounter pool can be exhausted; event and subscription processing can handle pool exhaustion, but schema transaction processing could not, which could result in the node shutdown experienced during restoration.

This problem is solved by giving DBDICT schema transactions an isolated pool of reserved SafeCounters which cannot be exhausted by concurrent NDB event activity. (Bug #28595915)
When a backup aborted due to buffer exhaustion, synchronization of the signal queues prior to the expected drop of triggers for insert, update, and delete operations resulted in abort signals being processed before the STOP_BACKUP phase could continue. The abort changed the backup status to ABORT_BACKUP_ORD, which led to an unplanned shutdown of the data node since resuming STOP_BACKUP requires that the state be STOP_BACKUP_REQ. Now the backup status is not set to STOP_BACKUP_REQ (requesting the backup to continue) until after signal queue synchronization is complete. (Bug #28563639)
The output of ndb_config --configinfo --xml --query-all now shows that configuration changes for the ThreadConfig and MaxNoOfExecutionThreads data node parameters require system initial restarts (restart="system" initial="true"). (Bug #28494286)
After a commit failed due to an error, mysqld shut down unexpectedly while trying to get the name of the table involved. This was due to an issue in the internal function ndbcluster_print_error(). (Bug #28435082)
API nodes should observe that a node is moving through SL_STOPPING phases (graceful stop) and stop using the node for new transactions, which minimizes potential disruption in the later phases of the node shutdown process. API nodes were only informed of node state changes via periodic heartbeat signals, and so might not be able to avoid interacting with the node shutting down. This generated unnecessary failures when the heartbeat interval was long. Now when a data node is being gracefully stopped, all API nodes are notified directly, allowing them to experience minimal disruption. (Bug #28380808)
ndb_config --diff-default failed when trying to read a parameter whose default value was the empty string (""). (Bug #27972537)
ndb_restore did not restore autoincrement values correctly when one or more staging tables were in use. As part of this fix, we also in such cases block applying of the SYSTAB_0 backup log, whose content continued to be applied directly based on the table ID, which could ovewrite the autoincrement values stored in SYSTAB_0 for unrelated tables. (Bug #27917769, Bug #27831990)

References: See also: Bug #27832033.
ndb_restore employed a mechanism for restoring autoincrement values which was not atomic, and thus could yield incorrect autoincrement values being restored when multiple instances of ndb_restore were used in parallel. (Bug #27832033)

References: See also: Bug #27917769, Bug #27831990.
Executing SELECT * FROM INFORMATION_SCHEMA.TABLES caused SQL nodes to restart in some cases. (Bug #27613173)
When tables with BLOB columns were dropped and then re-created with a different number of BLOB columns the event definitions for monitoring table changes could become inconsistent in certain error situations involving communication errors when the expected cleanup of the corresponding events was not performed. In particular, when the new versions of the tables had more BLOB columns than the original tables, some events could be missing. (Bug #27072756)
When query memory was exhausted in the DBSPJ kernel block while storing correlation IDs for deferred operations, the query was aborted with error status 20000 Query aborted due to out of query memory. (Bug #26995027)

References: See also: Bug #86537.
When running a cluster with 4 or more data nodes under very high loads, data nodes could sometimes fail with Error 899 Rowid already allocated. (Bug #25960230)
mysqld shut down unexpectedly when a purge of the binary log was requested before the server had completely started, and it was thus not yet ready to delete rows from the ndb_binlog_index table. Now when this occurs, requests for any needed purges of the ndb_binlog_index table are saved in a queue and held for execution when the server has completely started. (Bug #25817834)
MaxBufferedEpochs is used on data nodes to avoid excessive buffering of row changes due to lagging NDB event API subscribers; when epoch acknowledgements from one or more subscribers lag by this number of epochs, an asynchronous disconnection is triggered, allowing the data node to release the buffer space used for subscriptions. Since this disconnection is asynchronous, it may be the case that it has not completed before additional new epochs are completed on the data node, resulting in new epochs not being able to seize GCP completion records, generating warnings such as those shown here:
```
    [ndbd] ERROR    -- c_gcp_list.seize() failed...

    ...

    [ndbd] WARNING  -- ACK wo/ gcp record...
```
And leading to the following warning:
```
    Disconnecting node %u because it has exceeded MaxBufferedEpochs
    (100 > 100), epoch ....
```
This fix performs the following modifications:
- Modifies the size of the GCP completion record pool to ensure that there is always some extra headroom to account for the asynchronous nature of the disconnect processing previously described, thus avoiding c_gcp_list seize failures.
- Modifies the wording of the MaxBufferedEpochs warning to avoid the contradictory phrase “100 > 100”.
(Bug #20344149)
Asynchronous disconnection of mysqld from the cluster caused any subsequent attempt to start an NDB API transaction to fail. If this occurred during a bulk delete operation, the SQL layer called HA::end_bulk_delete(), whose implementation by ha_ndbcluster assumed that a transaction had been started, and could fail if this was not the case. This problem is fixed by checking that the transaction pointer used by this method is set before referencing it. (Bug #20116393)
Removed warnings raised when compiling NDB with Clang 6. (Bug #93634, Bug #29112560)
When executing the redo log in debug mode it was possible for a data node to fail when deallocating a row. (Bug #93273, Bug #28955797)
An NDB table having both a foreign key on another NDB table using ON DELETE CASCADE and one or more TEXT or BLOB columns leaked memory.

As part of this fix, ON DELETE CASCADE is no longer supported for foreign keys on NDB tables when the child table contains a column that uses any of the BLOB or TEXT types. (Bug #89511, Bug #27484882)

PREV HOME UP NEXT