Beginning with NDB 8.0.33, NDB replication supports the use of the generic MySQL Server Multithreaded Applier mechanism (MTA), which allows independent binary log transactions to be applied in parallel on a replica, increasing peak replication throughput.
The MySQL Server MTA implementation delegates the processing of separate binary log transactions to a pool of worker threads (whose size is configurable), and coordinates the worker threads to ensure that transaction dependencies encoded in the binary log are respected, and that commit ordering is maintained if required (see Section 17.2.3, “Replication Threads”). To use this functionality with NDB Cluster, it is necessary that the following three conditions be met:
Binary log transaction dependencies are determined at the source.
For this to be true, the
binlog_transaction_dependency_trackingserver system variable must be set to
WRITESETon the source. This is supported by NDB 8.0.33 and later. (The default is
Writeset maintenance work in
NDBis performed by the MySQL binary log injector thread as part of preparing and committing each epoch transaction to the binary log. This requires extra resources, and may reduce peak throughput.
Transaction dependencies are encoded into the binary log.
The replica is configured to use multiple worker threads.
NDB 8.0.33 and later supports setting
replica_parallel_workersto nonzero values to control the number of worker threads on the replica. The default is 4.
Source mysqld configuration for the
NDB MTA must include the following explicit
LOGICAL_CLOCK (the default value;
DATABASE is not supported).
In addition, it is recommended that you set the amount of memory
used to track binary log transaction writesets on the source
E is the average epoch size (as the
number of operations per epoch) and
is the maximum expected parallelism. See
Writeset Tracking Memory Usage, for
ON. This is also the default value.
Transaction dependencies are detected using analysis of each transaction's writeset, that is, the set of rows (table, key values) written by the transaction. Where two transactions modify the same row they are considered to be dependent, and must be applied in order (in other words, serially) to avoid deadlocks or incorrect results. Where a table has secondary unique keys, these values are also added to the transaction's writeset to detect the case where there are transaction dependencies implied by different transactions affecting the same unique key value, and so requiring ordering. Where dependencies cannot be efficiently determined, mysqld falls back to considering transactions dependent for reasons of safety.
Transaction dependencies are encoded in the binary log by the
source mysqld. Dependencies are encoded in an
ANONYMOUS_GTID event using a scheme called
'Logical clock'. (See
Section 220.127.116.11, “Replication Mode Concepts”.)
The writeset implementation employed by MySQL (and NDB Cluster) uses hash-based conflict detection based on matching 64-bit row hashes of relevant table and index values. This detects reliably when the same key is seen twice, but can also produce false positives if different table and index values hash to the same 64-bit value; this may result in artificial dependencies which can reduce the available parallelism.
Transaction dependencies are forced by any of the following:
Binary log rotation or encountering binary log file boundaries
Writeset history size limitations
Writes which reference parent foreign keys in the target table
More specifically, transactions which perform inserts, updates, and deletes on foreign key parent tables are serialized relative to all preceding and following transactions, and not just to those transactions affecting tables involved in a constraint relationship. Conversely, transactions performing inserts, updates and deletes on foreign key child tables (referencing) are not especially serialized with regard to one another.
The MySQL MTA implementation attempts to apply independent binary
log transactions in parallel.
NDB records all
changes occurring in all user transactions committing in an epoch
default 100 milliseconds), in one binary log transaction, referred
to as an epoch transaction. Therefore, for two consecutive epoch
transactions to be independent, and possible to apply in parallel,
it is required that no row is modified in both epochs. If any
single row is modified in both epochs, then they are dependent,
and are applied serially, which can limit the expolitable
Epoch transactions are considered independent based on the set of
rows modified on the source cluster in the epoch, but not
including the generated
WRITE_ROW events that convey epoch metadata.
This avoids every epoch transaction being trivially dependent on
the preceding epoch, but does require that the binlog is applied
at the replica with the commit order preserved. This also implies
that an NDB binary log with writeset dependencies is not suitable
for use by a replica database using a different MySQL storage
It may be possible or desirable to modify application transaction behavior to avoid patterns of repeated modifications to the same rows, in separate transactions over a short time period, to increase exploitable apply parallelism.
The amount of memory used to track binary log transaction
writesets can be set using the
server system variable, which defaults to 25000 row hashes.
If an average binary log transaction modifies
N rows, then to be able to identify
independent (parallelizable) transactions up to a parallelism
P, we need
be at least
. (The maximum is 1000000.)
The finite size of the history results in a finite maximum dependency length that can be reliably determined, giving a finite parallelism that can be expressed. Any row not found in the history may be dependent on the last transaction purged from the history.
Writeset history does not act like a sliding window over the last
N transactions; rather, it is a finite
buffer which is allowed to fill up completely, then its contents
entirely discarded when it becomes full. This means that the
history size follows a sawtooth pattern over time, and therefore
the maximum detectable dependency length also follows a sawtooth
pattern over time, such that independent transactions may still be
marked as dependent if the writeset history buffer has been reset
between their being processed.
In this scheme, each transaction in a binary log file is annotated
sequence_number (1, 2, 3, ...), and as
well as the sequence number of the most recent binary log
transaction that it depends on, to which we refer as
Within a given binary log file, the first transaction has
sequence_number 1 and
Where a binary log transaction depends on its immediate predecessor, its application is serialized. If the dependency is on an earlier transaction then it may be possible to apply the transaction in parallel with the preceding independent transactions.
The content of
ANONYMOUS_GTID events, including
last_committed (and thus the transaction
dependencies), can be seen using mysqlbinlog.
ANONYMOUS_GTID events generated on the
source are handled separately from the compressed transaction
payload with bulk
COMMIT events, allowing dependencies to be
determined prior to decompression. This means that the replica
coordinator thread can delegate transaction payload decompression
to a worker thread, providing automatic parallel decompression of
independent transactions on the replica.
Secondary unique columns. Tables with secondary unique columns (that is, unique keys other than the primary key) have all columns sent to the source so that unique-key related conflicts can be detected.
Where the current binary logging mode does not include all
columns, but only changed columns
this can increase the volume of data sent from data nodes to SQL
The impact depends on both the rate of modification (update or delete) of rows in such tables and the volume of data in columns which are not actually modified.
Replicating NDB to InnoDB.
NDB binary log injector transaction
dependency tracking intentionally ignores the inter-transaction
dependencies created by generated
mysql.ndb_apply_status metadata events, which
are handled separately as part of the commit of the epoch
transaction on the replica applier. For replication to
InnoDB, there is no special
handling; this may result in reduced performance or other issues
when using an
InnoDB multithreaded applier to
NDB MTA binary log.