Beginning with NDB 8.0.33, NDB replication supports the use of the generic MySQL Server Multithreaded Applier mechanism (MTA), which allows independent binary log transactions to be applied in parallel on a replica, increasing peak replication throughput.
Requirements
The MySQL Server MTA implementation delegates the processing of separate binary log transactions to a pool of worker threads (whose size is configurable), and coordinates the worker threads to ensure that transaction dependencies encoded in the binary log are respected, and that commit ordering is maintained if required (see Section 19.2.3, “Replication Threads”). To use this functionality with NDB Cluster, it is necessary that the following three conditions be met:
Binary log transaction dependencies are determined at the source.
For this to be true, the
binlog_transaction_dependency_tracking
server system variable must be set toWRITESET
on the source. This is supported by NDB 8.0.33 and later. (The default isCOMMIT_ORDER
.)Writeset maintenance work in
NDB
is performed by the MySQL binary log injector thread as part of preparing and committing each epoch transaction to the binary log. This requires extra resources, and may reduce peak throughput.Transaction dependencies are encoded into the binary log.
NDB 8.0.33 and later supports the
--ndb-log-transaction-dependency
startup option for mysqld; set this option toON
to enable writing ofNDB
transaction dependencies into the binary log.The replica is configured to use multiple worker threads.
NDB 8.0.33 and later supports setting
replica_parallel_workers
to nonzero values to control the number of worker threads on the replica. The default is 4.
MTA Configuration: Source
Source mysqld configuration for the
NDB
MTA must include the following explicit
settings:
binlog_transaction_dependency_tracking
must be set toWRITESET
.The replication source mysqld must be started with
--ndb-log-transaction-dependency=ON
.
If set, replica_parallel_type
must be LOGICAL_CLOCK
(the default value).
NDB
does not support
replica_parallel_type=DATABASE
.
In addition, it is recommended that you set the amount of memory
used to track binary log transaction writesets on the source
(binlog_transaction_dependency_history_size
)
to
, where
E
*
P
E
is the average epoch size (as the
number of operations per epoch) and P
is the maximum expected parallelism. See
Writeset Tracking Memory Usage, for
more information.
MTA Configuration: Replica
Replica mysqld configuration for the
NDB
MTA requires that
replica_parallel_workers
is
greater than 1. The recommended starting value when first enabling
MTA is 4, which is the default.
In addition,
replica_preserve_commit_order
must be ON
. This is also the default value.
Transaction Dependency and Writeset Handling
Transaction dependencies are detected using analysis of each transaction's writeset, that is, the set of rows (table, key values) written by the transaction. Where two transactions modify the same row they are considered to be dependent, and must be applied in order (in other words, serially) to avoid deadlocks or incorrect results. Where a table has secondary unique keys, these values are also added to the transaction's writeset to detect the case where there are transaction dependencies implied by different transactions affecting the same unique key value, and so requiring ordering. Where dependencies cannot be efficiently determined, mysqld falls back to considering transactions dependent for reasons of safety.
Transaction dependencies are encoded in the binary log by the
source mysqld. Dependencies are encoded in an
ANONYMOUS_GTID
event using a scheme called
'Logical clock'. (See
Section 19.1.4.1, “Replication Mode Concepts”.)
The writeset implementation employed by MySQL (and NDB Cluster) uses hash-based conflict detection based on matching 64-bit row hashes of relevant table and index values. This detects reliably when the same key is seen twice, but can also produce false positives if different table and index values hash to the same 64-bit value; this may result in artificial dependencies which can reduce the available parallelism.
Transaction dependencies are forced by any of the following:
DDL statements
Binary log rotation or encountering binary log file boundaries
Writeset history size limitations
Writes which reference parent foreign keys in the target table
More specifically, transactions which perform inserts, updates, and deletes on foreign key parent tables are serialized relative to all preceding and following transactions, and not just to those transactions affecting tables involved in a constraint relationship. Conversely, transactions performing inserts, updates and deletes on foreign key child tables (referencing) are not especially serialized with regard to one another.
The MySQL MTA implementation attempts to apply independent binary
log transactions in parallel. NDB
records all
changes occurring in all user transactions committing in an epoch
(TimeBetweenEpochs
,
default 100 milliseconds), in one binary log transaction, referred
to as an epoch transaction. Therefore, for two consecutive epoch
transactions to be independent, and possible to apply in parallel,
it is required that no row is modified in both epochs. If any
single row is modified in both epochs, then they are dependent,
and are applied serially, which can limit the expolitable
parallelism available.
Epoch transactions are considered independent based on the set of
rows modified on the source cluster in the epoch, but not
including the generated mysql.ndb_apply_status
WRITE_ROW
events that convey epoch metadata.
This avoids every epoch transaction being trivially dependent on
the preceding epoch, but does require that the binlog is applied
at the replica with the commit order preserved. This also implies
that an NDB binary log with writeset dependencies is not suitable
for use by a replica database using a different MySQL storage
engine.
It may be possible or desirable to modify application transaction behavior to avoid patterns of repeated modifications to the same rows, in separate transactions over a short time period, to increase exploitable apply parallelism.
Writeset Tracking Memory Usage
The amount of memory used to track binary log transaction
writesets can be set using the
binlog_transaction_dependency_history_size
server system variable, which defaults to 25000 row hashes.
If an average binary log transaction modifies
N
rows, then to be able to identify
independent (parallelizable) transactions up to a parallelism
level of P
, we need
binlog_transaction_dependency_history_size
to
be at least
. (The maximum is 1000000.)
N
*
P
The finite size of the history results in a finite maximum dependency length that can be reliably determined, giving a finite parallelism that can be expressed. Any row not found in the history may be dependent on the last transaction purged from the history.
Writeset history does not act like a sliding window over the last
N
transactions; rather, it is a finite
buffer which is allowed to fill up completely, then its contents
entirely discarded when it becomes full. This means that the
history size follows a sawtooth pattern over time, and therefore
the maximum detectable dependency length also follows a sawtooth
pattern over time, such that independent transactions may still be
marked as dependent if the writeset history buffer has been reset
between their being processed.
In this scheme, each transaction in a binary log file is annotated
with a sequence_number
(1, 2, 3, ...), and as
well as the sequence number of the most recent binary log
transaction that it depends on, to which we refer as
last_committed
.
Within a given binary log file, the first transaction has
sequence_number
1 and
last_committed
0.
Where a binary log transaction depends on its immediate predecessor, its application is serialized. If the dependency is on an earlier transaction then it may be possible to apply the transaction in parallel with the preceding independent transactions.
The content of ANONYMOUS_GTID
events, including
sequence_number
and
last_committed
(and thus the transaction
dependencies), can be seen using mysqlbinlog.
The ANONYMOUS_GTID
events generated on the
source are handled separately from the compressed transaction
payload with bulk BEGIN
,
TABLE_MAP*
, WRITE_ROW*
,
UPDATE_ROW*
, DELETE_ROW*
,
and COMMIT
events, allowing dependencies to be
determined prior to decompression. This means that the replica
coordinator thread can delegate transaction payload decompression
to a worker thread, providing automatic parallel decompression of
independent transactions on the replica.
Known Limitations
Secondary unique columns. Tables with secondary unique columns (that is, unique keys other than the primary key) have all columns sent to the source so that unique-key related conflicts can be detected.
Where the current binary logging mode does not include all
columns, but only changed columns
(--ndb-log-updated-only=OFF
,
--ndb-log-update-minimal=ON
,
--ndb-log-update-as-write=OFF
),
this can increase the volume of data sent from data nodes to SQL
nodes.
The impact depends on both the rate of modification (update or delete) of rows in such tables and the volume of data in columns which are not actually modified.
Replicating NDB to InnoDB.
NDB
binary log injector transaction
dependency tracking intentionally ignores the inter-transaction
dependencies created by generated
mysql.ndb_apply_status
metadata events, which
are handled separately as part of the commit of the epoch
transaction on the replica applier. For replication to
InnoDB
, there is no special
handling; this may result in reduced performance or other issues
when using an InnoDB
multithreaded applier to
consume an NDB
MTA binary log.