NDB Cluster APIs: The
Node.js
library used to build the MySQL NoSQL Connector for JavaScript has been upgraded to version 18.12.1. (Bug #35095122)MySQL NDB ClusterJ: Performance has been improved for accessing tables using a single-column partition key when the column is of type
CHAR
orVARCHAR
. (Bug #35027961)Beginning with this release, ndb_restore implements the
--timestamp-printouts
option, which causes all error, info, and debug node log messages to be prefixed with timestamps. (Bug #34110068)
Microsoft Windows: Two memory leaks found by code inspection were removed from
NDB
process handles on Windows platforms. (Bug #34872901)Microsoft Windows: On Windows platforms, the data node angel process did not detect whether a child data node process exited normally. We fix this by keeping an open process handle to the child and using this when probing for the child's exit. (Bug #34853213)
-
NDB Cluster APIs; MySQL NDB ClusterJ: MySQL ClusterJ uses a scratch buffer for primary key hash calculations which was limited to 10000 bytes, which proved too small in some cases. Now we
malloc()
the buffer if its size is not sufficient.This also fixes an issue with the
Ndb
object methodsstartTransaction()
andcomputeHash()
in the NDB API: Previously, if either of these methods was passed a temporary buffer of insufficient size, the method failed. Now in such cases a temporary buffer is allocated.Our thanks to Mikael Ronström for this contribution. (Bug #103814, Bug #32959894)
-
NDB Cluster APIs: When dropping an event operation (
NdbEventOperation
) in the NDB API, it was sometimes possible for the dropped event operation to remain visible to the application after instructing the data nodes to stop sending events related to this event operation, but before all pending buffered events were consumed and discarded. This could be observed in certain cases when performing an online alter operation, such asADD COLUMN
orRENAME COLUMN
, along with concurrent writes to the affected table.Further analysis showed that the dropped events were accessible when iterating through event operations with
Ndb::getGCIEventOperations()
. Now, this method skips dropped events when called iteratively. (Bug #34809944) NDB Cluster APIs:
Event::getReport()
always returned an error for an event opened fromNDB
, instead of returning the flags actually used by the report object. (Bug #34667384)-
Before a new
NDB
table definition can be stored in the data dictionary, any existing definition must be removed. Table definitions have two unique values, the table name and the NDB Clusterse_private_id
. During installation of a new table definition, we check whether there is any existing definition with the same table name and, if so, remove it. Then we check whether the table removed and the one being installed have the samese_private_id
; if they do not, any definition that is occupying thisse_private_id
is considered stale, and removed as well.Problems arose when no existing definition was found by the search using the table's name, since no definition was dropped even if one occupied
se_private_id
, leading to a duplicate key error when attempting to store the new table. The internalstore_table()
function attempted to clear the diagnostics area, remove the stale definition ofse_private_id
, and try to store it once again, but the diagnostics area was not actually cleared, thus leaking the error is thus leaked and presenting it to the user.To fix this, we remove any stale table definition, regardless of any action taken (or not) by
store_table()
. (Bug #35089015) -
Fixed the following two issues in the output of ndb_restore:
The backup file format version was shown for both the backup file format version and the version of the cluster which produced the backup.
To reduce confusion between the version of the file format and the version of the cluster which produced the backup, the backup file format version is now shown using hexadecimal notation.
(Bug #35079426)
References: This issue is a regression of: Bug #34110068.
-
Removed a memory leak in the
DBDICT
kernel block caused when an internal foreign key definition record was not released when no longer needed. This could be triggered by either of the following events:Drop of a foreign key constraint on an
NDB
tableRejection of an attempt to create a foreign key constraint on an
NDB
table
Such records use the
DISK_RECORDS
memory resource; you can check this on a running cluster by executingSELECT node_id, used FROM ndbinfo.resources WHERE resource_name='DISK_RECORDS'
in the mysql client. This resource usesSharedGlobalMemory
, exhaustion of which could lead not only to the rejection of attempts to create foreign keys, but of queries making use of joins as well, since theDBSPJ
block also uses shared global memory by way ofQUERY_MEMORY
. (Bug #35064142) When attempting a copying alter operation with
--ndb-allow-copying-alter-table = OFF
, the reason for rejection of the statement was not always made clear to the user. (Bug #35059079)-
When a transaction coordinator is starting fragment scans with many fragments to scan, it may take a realtime break (RTB) during the process to ensure fair CPU access for other requests. When the requesting API disconnected and API failure handling for the scan state occurred before the RTB continuation returned, continuation processing could not proceed because the scan state had been removed.
We fix this by adding appropriate checks on the scan state as part of the continuation process. (Bug #35037683)
Sender and receiver signal IDs were printed in trace logs as signed values even though they are actually unsigned 32-bit numbers. This could result in confusion when the top bit was set, as it cuased such numbers to be shown as negatives, counting upwards from
-MAX_32_BIT_SIGNED_INT
. (Bug #35037396)-
A fiber used by the
DICT
block monitors all indexes, and triggers index statistics calculations if requested byDBTUX
index fragment monitoring; these calculations are performed using a schema transaction. When theDICT
fiber attempts but fails to seize a transaction handle for requesting a schema transaction to be started, fiber exited, so that no more automated index statistics updates could be performed without a node failure. (Bug #34992370)References: See also: Bug #34007422.
-
Schema objects in NDB use composite versioning, comprising major and minor subversions. When a schema object is first created, its major and minor versions are set; when an existing schema object is altered in place, its minor subversion is incremented.
At restart time each data node checks schema objects as part of recovery; for foreign key objects, the versions of referenced parent and child tables (and indexes, for foreign key references not to or from a table's primary key) are checked for consistency. The table version of this check compares only major subversions, allowing tables to evolve, but the index version also compares minor subversions; this resulted in a failure at restart time when an index had been altered.
We fix this by comparing only major subversions for indexes in such cases. (Bug #34976028)
References: See also: Bug #21363253.
-
ndb_import sometimes silently ignored hint failure for tables having large
VARCHAR
primary keys. For hinting which transaction coordinator to use, ndb_import can use the row's partitioning key, using a 4092 byte buffer to compute the hash for the key.This was problematic when the key included a
VARCHAR
column using UTF8, since the hash buffer may require in bytes up to 24 times the number of maximum characters in the column, depending on the column's collation; the hash computation failed but the calling code in ndb_import did not check for this, and continued using an undefined hash value which yielded an undefined hint.This did not lead to any functional problems, but was not optimal, and the user was not notified of it.
We fix this by ensuring that ndb_import always uses sufficient buffer for handling character columns (regardless of their collations) in the key, and adding a check in ndb_import for any failures in hash computation and reporting these to the user. (Bug #34917498)
-
When the
ndbcluster
plugin creates thendb_schema
table, the plugin inserts a row containing metadata, which is needed to keep track of this NDB Cluster instance, and which is stored as a set of key-value pairs in a row in this table.The
ndb_schema
table is hidden from MySQL and so not possible to query using SQL, but contains a UUID generated by the same MySQL server that creates thendb_schema
table; the same UUID is also stored as metadata in the data dictionary of each MySQL Server when thendb_schema
table is installed on it.When a mysqld connects (or reconnects) to
NDB
, it compares the UUID in its own data dictionary with the UUID stored inNDB
in order to detect whether it is reconnecting to the same cluster; if not, the entire contents of the data dictionary are scrapped in order to make it faster and easier to install all tables fresh fromNDB
.One such case occurs when all
NDB
data nodes have been restarted with--initial
, thus removing all data and tables. Another happens when thendb_schema
table has been restored from a backup without restoring any of its data, since this means that the row for thendb_schema
table would be missing.To deal with these types of situations, we now make sure that, when synchronization has completed, there is always a row in the
NDB
dictionary with a UUID matching the UUID stored in the MySQL server data dictionary. (Bug #34876468) When running an NDB Cluster with multiple management servers, termination of the ndb_mgmd processes required an excessive amount of time when shutting down the cluster. (Bug #34872372)
-
Schema distribution timeout was detected by the schema distribution coordinator after dropping and re-creating the
mysql.ndb_schema
table when any nodes that were subscribed beforehand had not yet resubscribed when the next schema operation began. This was due to a stale list of subscribers being left behind in the schema distribution data; these subscribers were assumed by the coordinator to be participants in subsequent schema operations.We fix this issue by clearing the list of known subscribers whenever the
mysql.ndb_schema
table is dropped. (Bug #34843412) When requesting a new global checkpoint (GCP) from the data nodes, such as by the NDB Cluster handler in mysqld to speed up delivery of schema distribution events and responses, the request was sent 100 times. While the
DBDIH
block attempted to merge these duplicate requests into one, it was possible on occasion to trigger more than one immediate GCP. (Bug #34836471)-
When the
DBSPJ
block receives a query for execution, it sets up its own internal plan for how to do so. This plan is based on the query plan provided by the optimizer, with adaptions made to provide the most efficient execution of the query, both in terms of elapsed time and of total resources used.Query plans received by
DBSPJ
often contain star joins, in which several child tables depend on a common parent, as in the query shown here:SELECT STRAIGHT_JOIN * FROM t AS t1 INNER JOIN t AS t2 ON t2.a = t1.k INNER JOIN t AS t3 ON t3.k = t1.k;
In such cases
DBSPJ
could submit key-range lookups tot2
andt3
in parallel (but does not do so). An inner join also has the property that each inner joined row requires a match from the other tables in the same join nest, else the row is eliminated from the result set. Thus, by using the key-range lookups, we may retrieve rows from one such lookup which have no matches in the other, which effort is ultimately wasted. Instead,DBSPJ
sets up a sequential plan for such a query.It was found that this worked as intended for queries having only inner joins, but if any of the tables are left-joined, we did not take complete advantage of the preceding inner joined tables before issuing the outer joined tables. Suppose the previous query is modified to include a left join, like this:
SELECT STRAIGHT_JOIN * FROM t AS t1 INNER JOIN t AS t2 ON t2.a = t1.k LEFT JOIN t AS t3 ON t3.k = t1.k;
Using the following query against the
ndbinfo.counters
table, it is possible to observe how many rows are returned for each query before and after query execution:SELECT counter_name, SUM(val) FROM ndbinfo.counters WHERE block_name="DBSPJ" AND counter_name = "SCAN_ROWS_RETURNED";
It was thus determined that requests on
t2
andt3
were submitted in parallel. Now in such cases, we wait for the inner join to complete before issuing the left join, so that unmatched rows fromt1
can be eliminated from the outer join ont1
andt3
. This results in less work to be performed by the data nodes, and reduces the volumne handled by the transporter as well. (Bug #34782276) SPJ handling of a sorted result was found to suffer a significant performance impact compared to the same result set when not sorted. Further investigation showed that most of the additional performance overhead for sorted results lay in the implementation for sorted result retrieval, which required an excessive number of
SCAN_NEXTREQ
round trips between the client andDBSPJ
on the data nodes. (Bug #34768353)DBSPJ
now implements thefirstMatch
optimization for semijoins and antijoins, such as those found inEXISTS
andNOT EXISTS
subqueries. (Bug #34768191)-
When the
DBSPJ
block sendsSCAN_FRAGREQ
andSCAN_NEXTREQ
signals to the data nodes, it tries to determine the optimum number of fragments to scan in parallel without starting more parallel scans than needed to fill the available batch buffers, thus avoiding any need to send additionalSCAN_NEXTREQ
signals to complete the scan of each fragment.The
DBSPJ
block's statistics module calculates and samples the parallelism which was optimal for fragment scans just completed, for each completedSCAN_FRAGREQ
, providing a mean and standard deviation of the sampled parallelism. This makes it possible to calculate a lower 95th percentile of the parallelism (and batch size) which makes it possible to complete aSCAN_FRAGREQ
without needing additionalSCAN_NEXTREQ
signals.It was found that the parallelism statistics seemed unable to provide a stable parallelism estimate and that the standard deviation was unexpectedly high. This often led to the parallelism estimate being a negative number (always rounded up to 1).
The flaw in the statistics calculation was found to be an underlying assumption that each sampled
SCAN_FRAGREQ
contained the same number of key ranges to be scanned, which is not necessarily the case. Typically a full batch of rows for the firstSCAN_FRAGREQ
, and relatively few rows for the finalSCAN_NEXTREQ
returning the remaining rows; this resulted in wide variation in parallelism samples which made the statistics obtained from them unreliable.We fix this by basing the statistics on the number of keys actually sent in the
SCAN_FRAGREQ
, and counting the rows returned from this request. Based on this it is possible to obtain record-per-key statistics to be calculated and sampled. This makes it possible to calculate the number of fragments which can be scanned, without overflowing the batch buffers. (Bug #34768106) It was possible in certain cases that both the
NDB
binary logging thread and metadata synchronization attempted to synchronize thendb_apply_status
table, which led to a race condition. We fix this by making sure that thendb_apply_status
table is monitored and created (or re-created) by the binary logging thread only. (Bug #34750992)-
While starting a schema operation, the client is responsible for detecting timeouts until the coordinator has received the first schema event; from that point, any schema operation timeout should be detected by the coordinator. A problem occurred while the client was checking the timeout; it mistakenly set the state indicating that timeout had occurred, which caused the coordinator to ignore the first schema event taking longer than approximately one second to receive (that is, to write the send event plus handle in the binary logging thread). This had the effect that, in these cases, the coordinator was not involved in the schema operation.
We fix this by change the schema distribution timeout checking to be atomic, and to let it be performed by either the client or the coordinator. In addition, we remove the state variable used for keeping track of events received by the coordinator, and rely on the list of participants instead. (Bug #34741743)
-
An SQL node did not start up correctly after restoring data with ndb_restore, such that, when it was otherwise ready to accept connections, the binary log injector thread never became ready. It was found that, when a mysqld was started after a data node initial restore from which new table IDs were generated, the utility table's (
ndb_*
) MySQL data dictionary definition might not match the NDB dictionary definition.The existing mysqld definition is dropped by name, thus removing the unique
ndbcluster-
key in the MySQL data dictionary but the new table ID could also already be occupied by another (stale) definition. The resulting mistmatch prevented setup of the binary log.ID
To fix this problem we now explicitly drop any
ndbcluster-
definitions that might clash in such cases with the table being installed. (Bug #34733051)ID
-
After receiving a
SIGTERM
signal, ndb_mgmd did not wait for all threads to shut down before exiting. (Bug #33522783)References: See also: Bug #32446105.
-
When multiple operations are pending on a single row, it is not possible to commit an operation which is run concurrently with an operation which is pending abort. This could lead to data node shutdown during the commit operation in
DBACC
, which could manifest when a single transaction contained more thanMaxDMLOperationsPerTransaction
DML operations.In addition, a transaction containing insert operations is rolled back if a statement that uses a locking scan on the prepared insert fails due to too many DML operations. This could lead to an unplanned data node shutdown during tuple deallocation due to a missing reference to the expected
DBLQH
deallocation operation.We solve this issue by allowing commit of a scan operation in such cases, in order to release locks previously acquired during the transaction. We also add a new special case for this scenario, so that the deallocation is performed in a single phase, and
DBACC
tellsDBLQH
to deallocate immediately; inDBLQH
,execTUP_DEALLOCREQ()
is now able to handle this immediate deallocation request. (Bug #32491105)References: See also: Bug #28893633, Bug #32997832.
Cluster nodes sometimes reported Failed to convert connection to transporter warnings in logs, even when this was not really necessary. (Bug #14784707)
-
When started with no connection string on the command line, ndb_waiter printed
Connecting to mgmsrv at (null)
. Now in such cases, it printsConnecting to management server at nodeid=0,localhost:1186
if no other default host is specified.The
--help
option and other ndb_waiter program output was also improved. (Bug #12380163) -
NdbSpin_Init()
calculated the wrong number of loops inNdbSpin
, and contained logic errors. (Bug #108448, Bug #32497174, Bug #32594825)References: See also: Bug #31765660, Bug #32413458, Bug #102506, Bug #32478388.