When using a replication setup involving multiple sources (including circular replication), it is possible that different sources may try to update the same row on the replica with different data. Conflict resolution in NDB Cluster Replication provides a means of resolving such conflicts by permitting a user-defined resolution column to be used to determine whether or not an update on a given source should be applied on the replica.
Some types of conflict resolution supported by NDB Cluster
NDB$MAX_DELETE_WIN()) implement this
user-defined column as a “timestamp” column (although
its type cannot be
explained later in this section). These types of conflict
resolution are always applied a row-by-row basis rather than a
transactional basis. The epoch-based conflict resolution functions
NDB$EPOCH_TRANS() compare the order in which
epochs are replicated (and thus these functions are
transactional). Different methods can be used to compare
resolution column values on the replica when conflicts occur, as
explained later in this section; the method used can be set on a
You should also keep in mind that it is the application's responsibility to ensure that the resolution column is correctly populated with relevant values, so that the resolution function can make the appropriate choice when determining whether to apply an update.
Preparations for conflict resolution must be made on both the source and the replica. These tasks are described in the following list:
On the source writing the binary logs, you must determine which columns are sent (all columns or only those that have been updated). This is done for the MySQL Server as a whole by applying the mysqld startup option
--ndb-log-updated-only(described later in this section) or on a per-table basis by entries in the
mysql.ndb_replicationtable (see ndb_replication Table).Note
If you are replicating tables with very large columns (such as
--ndb-log-updated-onlycan also be useful for reducing the size of the binary logs and avoiding possible replication failures due to exceeding
See Section 18.104.22.168, “Replication and max_allowed_packet”, for more information about this issue.
On the replica, you must determine which type of conflict resolution to apply (“latest timestamp wins”, “same timestamp wins”, “primary wins”, “primary wins, complete transaction”, or none). This is done using the
mysql.ndb_replicationsystem table, on a per-table basis (see ndb_replication Table).
NDB 7.4.1 and later also supports read conflict detection, that is, detecting conflicts between reads of a given row in one cluster and updates or deletes of the same row in another cluster. This requires exclusive read locks obtained by setting
ndb_log_exclusive_readsequal to 1 on the replica. All rows read by a conflicting read are logged in the exceptions table. For more information, see Read conflict detection and resolution.
When using the functions
NDB$MAX_DELETE_WIN() for timestamp-based
conflict resolution, we often refer to the column used for
determining updates as a “timestamp” column.
However, the data type of this column is never
TIMESTAMP; instead, its data type
“timestamp” column should also be
NDB$EPOCH_TRANS() functions discussed later
in this section work by comparing the relative order of
replication epochs applied on a primary and secondary NDB
Cluster, and do not make use of timestamps.
We can see update operations in terms of “before”
and “after” images—that is, the states of the
table before and after the update is applied. Normally, when
updating a table with a primary key, the “before”
image is not of great interest; however, when we need to
determine on a per-update basis whether or not to use the
updated values on a replica, we need to make sure that both
images are written to the source's binary log. This is done
for mysqld, as described later in this
Whether logging of complete rows or of updated columns only is done is decided when the MySQL server is started, and cannot be changed online; you must either restart mysqld, or start a new mysqld instance with different logging options.
Log complete rows (option set to
Log only column data that has been updated—that is, column data whose value has been set, regardless of whether or not this value was actually changed. This is the default behavior (option set to
It is usually sufficient—and more efficient—to log
updated columns only; however, if you need to log full rows, you
can do so by setting
Logging Changed Data as Updates.
The setting of the MySQL Server's
option determines whether logging is performed with or without
the “before” image. Because conflict resolution
is done in the MySQL Server's update handler, it is
necessary to control logging performed by the replication
source such that updates are updates and not writes; that is,
such that updates are treated as changes in existing rows
rather than the writing of new rows, even though these replace
existing rows. This option is turned on by default; in other
words, updates are treated as writes. That is, updates are by
default written as
write_row events in the
binary log, rather than as
To disable the option, start the source
--ndb-log-update-as-write=OFF. You must do this
when replicating from NDB tables to tables using a different
storage engine; see
Replication from NDB to other storage engines, and
Replication from NDB to a nontransactional storage engine,
for more information.
NBT_UPDATED_FULL_MINIMAL can be used with
NDB$EPOCH_TRANS(), because these do not
require “before” values of columns which are not
primary keys. Conflict resolution algorithms requiring the old
values, such as
NDB$OLD(), do not work correctly with these
The next few paragraphs provide detailed information about the functions which can be used for conflict detection and resolution with NDB Replication.
If the value of
column_name is the
same on both the source and the replica, then the update is
applied; otherwise, the update is not applied on the replica
and an exception is written to the log. This is illustrated by
the following pseudocode:
if (source_old_column_value == replica_current_column_value) apply_update(); else log_exception();
The column value from the source's “before” image is used by this function.
NDB$MAX(column_name). If the “timestamp” column value for a given row coming from the source is higher than that on the replica, it is applied; otherwise it is not applied on the replica. This is illustrated by the following pseudocode:
if (source_new_column_value > replica_current_column_value) apply_update();
This function can be used for “greatest timestamp wins” conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was most recently updated is the version that persists.
The column value from the sources's “after” image is used by this function.
This is a variation on
NDB$MAX(). Due to
the fact that no timestamp is available for a delete
operation, a delete using
NDB$MAX() is in
fact processed as
NDB$OLD, but for some use
cases, this is not optimal. For
NDB$MAX_DELETE_WIN(), if the
“timestamp” column value for a given row adding
or updating an existing row coming from the source is higher
than that on the replica, it is applied. However, delete
operations are treated as always having the higher value. This
is illustrated by the following pseudocode:
if ( (source_new_column_value > replica_current_column_value) || operation.type == "delete") apply_update();
This function can be used for “greatest timestamp, delete wins” conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was deleted or (otherwise) most recently updated is the version that persists.
NDB$MAX(), the column value from
the source's “after” image is the value used
by this function.
NDB$EPOCH() function tracks the order
in which replicated epochs are applied on a replica cluster
relative to changes originating on the replica. This relative
ordering is used to determine whether changes originating on
the replica are concurrent with any changes that originate
locally, and are therefore potentially in conflict.
Most of what follows in the description of
NDB$EPOCH() also applies to
NDB$EPOCH_TRANS(). Any exceptions are noted
in the text.
NDB$EPOCH() is asymmetric, operating on one
NDB Cluster in a bidirectional replication configuration
(sometimes referred to as “active-active”
replication). We refer here to cluster on which it operates as
the primary, and the other as the secondary. The replica on the
primary is responsible for detecting and handling conflicts,
while the replica on the secondary is not involved in any
conflict detection or handling.
When the replica on the primary detects conflicts, it injects events into its own binary log to compensate for these; this ensures that the secondary NDB Cluster eventually realigns itself with the primary and so keeps the primary and secondary from diverging. This compensation and realignment mechanism requires that the primary NDB Cluster always wins any conflicts with the secondary—that is, that the primary's changes are always used rather than those from the secondary in event of a conflict. This “primary always wins” rule has the following implications:
Operations that change data, once committed on the primary, are fully persistent and are not undone or rolled back by conflict detection and resolution.
Data read from the primary is fully consistent. Any changes committed on the Primary (locally or from the replica) are not reverted later.
Operations that change data on the secondary may later be reverted if the primary determines that they are in conflict.
Individual rows read on the secondary are self-consistent at all times, each row always reflecting either a state committed by the secondary, or one committed by the primary.
Sets of rows read on the secondary may not necessarily be consistent at a given single point in time. For
NDB$EPOCH_TRANS(), this is a transient state; for
NDB$EPOCH(), it can be a persistent state.
Assuming a period of sufficient length without any conflicts, all data on the secondary NDB Cluster (eventually) becomes consistent with the primary's data.
NDB$EPOCH_TRANS() do not require any user
schema modifications, or application changes to provide conflict
detection. However, careful thought must be given to the schema
used, and the access patterns used, to verify that the complete
system behaves within specified limits.
Each of the
NDB$EPOCH_TRANS() functions can take an
optional parameter; this is the number of bits to use to
represent the lower 32 bits of the epoch, and should be set to
no less than the value calculated as shown here:
CEIL( LOG2( TimeBetweenGlobalCheckpoints / TimeBetweenEpochs ), 1)
For the default values of these configuration parameters (2000
and 100 milliseconds, respectively), this gives a value of 5
bits, so the default value (6) should be sufficient, unless
other values are used for
both. A value that is too small can result in false positives,
while one that is too large could lead to excessive wasted space
in the database.
NDB$EPOCH_TRANS() insert entries for
conflicting rows into the relevant exceptions tables, provided
that these tables have been defined according to the same
exceptions table schema rules as described elsewhere in this
NDB$OLD(column_name)). You must
create any exceptions table before creating the data table with
which it is to be used.
As with the other conflict detection functions discussed in this
NDB$EPOCH_TRANS() are activated by including
relevant entries in the
table (see ndb_replication Table).
The roles of the primary and secondary NDB Clusters in this
scenario are fully determined by
mysql.ndb_replication table entries.
Because the conflict detection algorithms employed by
NDB$EPOCH_TRANS() are asymmetric, you must
use different values for the
entries of the primary and secondary replicas.
Prior to NDB 7.3.6, conflicts between
operations were handled like those for
operations, and within the same epoch were considered in
conflict. In NDB 7.3.6 and later, a conflict between
DELETE operations alone is not sufficient to
trigger a conflict using
NDB$EPOCH_TRANS(), and the relative placement
within epochs does not matter. (Bug #18459944)
For more information, see Limitations on NDB$EPOCH().
NDB$EPOCH_TRANS() extends the
NDB$EPOCH() function. Conflicts are
detected and handled in the same way using the “primary
wins all” rule (see
with the extra condition that any other rows updated in the
same transaction in which the conflict occurred are also
regarded as being in conflict. In other words, where
NDB$EPOCH() realigns individual conflicting
rows on the secondary,
realigns conflicting transactions.
In addition, any transactions which are detectably dependent on a conflicting transaction are also regarded as being in conflict, these dependencies being determined by the contents of the secondary cluster's binary log. Since the binary log contains only data modification operations (inserts, updates, and deletes), only overlapping data modifications are used to determine dependencies between transactions.
NDB$EPOCH_TRANS() is subject to the same
conditions and limitations as
and in addition requires that Version 2 binary log row events
equal to 0), which adds a storage overhead of 2 bytes per event
in the binary log. In addition, all transaction IDs must be
recorded in the secondary's binary log
option), which adds a further variable overhead (up to 13 bytes
NDB$EPOCH2() function, added in NDB
7.4.2, is similar to
NDB$EPOCH2() provides for
delete-delete handling with a bidirectional replication
topology. In this scenario, primary and secondary roles are
assigned to the two sources by setting the
system variable to the appropriate value on each source
(usually one each of
SECONDARY). When this is done,
modifications made by the secondary are reflected by the
primary back to the secondary which then conditionally applies
In NDB 7.4.2 and later,
Conflicts are detected and handled in the same way, and
assigning primary and secondary roles to the replicating
clusters, but with the extra condition that any other rows
updated in the same transaction in which the conflict occurred
are also regarded as being in conflict. That is,
NDB$EPOCH2() realigns individual
conflicting rows on the secondary, while
NDB$EPOCH_TRANS() realigns conflicting
NDB$EPOCH_TRANS() use metadata that is
specified per row, per last modified epoch, to determine on the
primary whether an incoming replicated row change from the
secondary is concurrent with a locally committed change;
concurrent changes are regarded as conflicting, with subesequent
exceptions table updates and realignment of the secondary. A
problem arises when a row is deleted on the primary so there is
no longer any last-modified epoch available to determine whether
any replicated operations conflict, which means that conflicting
delete operationss are not detected. This can result in
divergence, an example being a delete on one cluster which is
concurrent with a delete and insert on the other; this why
delete operations can be routed to only one cluster when using
NDB$EPOCH2() bypasses the issue just
described—storing information about deleted rows on the
PRIMARY—by ignoring any delete-delete conflict, and by
avoiding any potential resultant divergence as well. This is
accomplished by reflecting any operation successfully applied on
and replicated from the secondary back to the secondary. On its
return to the secondary, it can be used to reapply an operation
on the secondary which was deleted by an operation originating
from the primary.
NDB$EPOCH2(), you should keep in
mind that the secondary applies the delete from the primary,
removing the new row until it is restored by a reflected
operation. In theory, the subsequent insert or update on the
secondary conflicts with the delete from the primary, but in
this case, we choose to ignore this and allow the secondary to
“win”, in the interest of preventing divergence
between the clusters. In other words, after a delete, the
primary does not detect conflicts, and instead adopts the
secondary's following changes immediately. Because of this,
the secondary's state can revisit multiple previous
committed states as it progresses to a final (stable) state, and
some of these may be visible.
You should also be aware that reflecting all operations from the secondary back to the primary increases the size of the primary's logbinary log, as well as demands on bandwidth, CPU usage, and disk I/O.
Application of reflected operations on the secondary depends on
the state of the target row on the secondary. Whether or not
reflected changes are applied on the secondary can be tracked by
status variables, both added in NDB 7.4.2. The number of changes
applied is simply the difference between these two values (note
is always greater than or equal to
Events are applied if and only if both of the following conditions are true:
The existence of the row—that is, whether or not it exists—is in accordance with the type of event. For delete and update operations, the row must already exist. For insert operations, the row must not exist.
The row was last modified by the primary. It is possible that the modification was accomplished through the execution of a reflected operation.
If both of these conditions are not met, the reflected operation is discarded by the secondary.
Conflicts are detected using NDB Cluster epoch boundaries, with granularity proportional to
TimeBetweenEpochs(default: 100 milliseconds). The minimum conflict window is the minimum time during which concurrent updates to the same data on both clusters always report a conflict. This is always a nonzero length of time, and is roughly proportional to
2 * (latency + queueing + TimeBetweenEpochs). This implies that—assuming the default for
TimeBetweenEpochsand ignoring any latency between clusters (as well as any queuing delays)—the minimum conflict window size is approximately 200 milliseconds. This minimum window should be considered when looking at expected application “race” patterns.
Additional storage is required for tables using the
NDB$EPOCH_TRANS()functions; from 1 to 32 bits extra space per row is required, depending on the value passed to the function.
Conflicts between delete operations may result in divergence between the primary and secondary. When a row is deleted on both clusters concurrently, the conflict can be detected, but is not recorded, since the row is deleted. This means that further conflicts during the propagation of any subsequent realignment operations are not detected, which can lead to divergence.
Deletes should be externally serialized, or routed to one cluster only. Alternatively, a separate row should be updated transactionally with such deletes and any inserts that follow them, so that conflicts can be tracked across row deletes. This may require changes in applications.
Only two NDB Clusters in a birectional “active-active” configuration are currently supported when using
NDB$EPOCH_TRANS()for conflict detection.
To use the
NDB$OLD() conflict resolution
function, it is also necessary to create an exceptions table
corresponding to each
NDB table for
which this type of conflict resolution is to be employed. This
is also true when using
NDB$EPOCH_TRANS(). The name of this table is
that of the table for which conflict resolution is to be
applied, with the string
$EX appended. (For
example, if the name of the original table is
mytable, the name of the corresponding
exceptions table name should be
Prior to NDB 7.4.1, the syntax for creating the exceptions table
is as shown here:
CREATE TABLE original_table$EX ( server_id INT UNSIGNED, source_server_id INT UNSIGNED, source_epoch BIGINT UNSIGNED, count INT UNSIGNED, original_table_pk_columns, [additional_columns,] PRIMARY KEY(server_id, source_server_id, source_epoch, count) ) ENGINE=NDB;
NDB 7.4.1 and later supports an extended exceptions table definition that includes optional columns providing information about an exception's type, cause, and originating transaction. In these versions, the syntax for creating the exceptions table is as shown here:
CREATE TABLE original_table$EX ( [NDB$]server_id INT UNSIGNED, [NDB$]source_server_id INT UNSIGNED, [NDB$]source_epoch BIGINT UNSIGNED, [NDB$]count INT UNSIGNED, [NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW', 'READ_ROW') NOT NULL,] [NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,] [NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,] original_table_pk_columns, [orig_table_column|orig_table_column$OLD|orig_table_column$NEW,] [additional_columns,] PRIMARY KEY([NDB$]server_id, [NDB$]source_server_id, [NDB$]source_epoch, [NDB$]count) ) ENGINE=NDB;
The first four columns are required. The names of the first four
columns and the columns matching the original table's
primary key columns are not critical; however, we suggest for
reasons of clarity and consistency, that you use the names shown
here for the
columns, and that you use the same names as in the original
table for the columns matching those in the original
table's primary key.
Starting with NDB 7.4.1, if the exceptions table uses one or
more of the optional columns
NDB$ORIG_TRANSID discussed later in this
section, then each of the required columns must also be named
using the prefix
NDB$. If desired, you can
NDB$ prefix to name the required
columns even if you do not define any optional columns, but in
this case, all four of the required columns must be named using
Following these columns, the columns making up the original table's primary key should be copied in the order in which they are used to define the primary key of the original table. The data types for the columns duplicating the primary key columns of the original table should be the same as (or larger than) those of the original columns. In NDB Cluster 7.3 and earlier, the exceptions table's primary key must be reproduced column for column. Beginning with NDB 7.4.1, a subset of the primary key columns may be used instead.
Regardless of the NDB Cluster version employed, the exceptions
table must use the
engine. (An example that uses
an exceptions table is shown later in this section.)
Additional columns may optionally be defined following the
copied primary key columns, but not before any of them; any such
extra columns cannot be
NOT NULL. In NDB
7.4.1 and later, support is provided for three additional,
predefined optional columns
NDB$ORIG_TRANSID, which are described in the
next few paragraphs.
NDB$OP_TYPE ENUM('WRITE_ROW', 'UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW', 'READ_ROW') NOT NULL
DELETE_ROW operation types represent
operations are operations generated by conflict resolution in
compensating transactions sent back to the originating cluster
from the cluster that detected the conflict.
READ_ROW operations are user-initiated read
tracking operations defined with exclusive row locks.
NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL
ROW_DOES_NOT_EXIST can be reported as the
ROW_ALREADY_EXISTS can be reported for
DATA_IN_CONFLICT is reported when a row-based
conflict function detects a conflict;
TRANS_IN_CONFLICT is reported when a
transactional conflict function rejects all of the operations
belonging to a complete transaction.
NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL
NDB$ORIG_TRANSID is a 64-bit value generated
NDB. This value can be used to correlate
multiple exceptions table entries belonging to the same
conflicting transaction from the same or different exceptions
In NDB 7.4.1 and later, additional reference columns which are
not part of the original table's primary key can be named
references old values in update and delete operations—that
is, operations containing
used to reference new values in insert and update
operations—in other words, operations using
UPDATE_ROW events, or both types of events.
Where a conflicting operation does not supply a value for a
given reference column that is not a primary key, the exceptions
table row contains either
NULL, or a defined
default value for that column.
mysql.ndb_replication table is read
when a data table is set up for replication, so the row
corresponding to a table to be replicated must be inserted
before the table to be replicated is
Several status variables can be used to monitor conflict
detection. You can see how many rows have been found in conflict
NDB$EPOCH() since this replica was last
restarted from the current value of the
provides the number of rows that have been found directly in
both added in NDB 7.4.2, show the number of rows found in
NDB$EPOCH2_TRANS(), respectively. The number
of rows actually realigned, including those affected due to
their membership in or dependency on the same transactions as
other conflicting rows, is given by
Another server status variable
Ndb_conflict_fn_max provides a
count of the number of times that a row was not applied on the
current SQL node due to “greatest timestamp wins”
conflict resolution since the last time that
mysqld was started.
The number of times that a row was not applied as the result of
“same timestamp wins” conflict resolution on a
given mysqld since the last time it was
restarted is given by the global status variable
addition to incrementing
primary key of the row that was not used is inserted into an
exceptions table, as
explained elsewhere in this section.
The following examples assume that you have already a working NDB Cluster replication setup, as described in Section 18.6.5, “Preparing the NDB Cluster for Replication”, and Section 18.6.6, “Starting NDB Cluster Replication (Single Replication Channel)”.
On the source, perform this
INSERT INTO mysql.ndb_replication VALUES ('test', 't1', 0, NULL, 'NDB$MAX(mycol)');
Inserting a 0 into the
server_idindicates that all SQL nodes accessing this table should use conflict resolution. If you want to use conflict resolution on a specific mysqld only, use the actual server ID.
binlog_typecolumn has the same effect as inserting 0 (
NBT_DEFAULT); the server default is used.
CREATE TABLE test.t1 ( columns mycol INT UNSIGNED, columns ) ENGINE=NDB;
Now, when updates are performed on this table, conflict resolution is applied, and the version of the row having the greatest value for
mycolis written to the replica.
binlog_type options—such as
NBT_UPDATED_ONLY_USE_UPDATE should be used
to control logging on the source using the
ndb_replication table rather than by using
NDB table such as the
one defined here is being replicated, and you wish to enable
“same timestamp wins” conflict resolution for
updates to this table:
CREATE TABLE test.t2 ( a INT UNSIGNED NOT NULL, b CHAR(25) NOT NULL, columns, mycol INT UNSIGNED NOT NULL, columns, PRIMARY KEY pk (a, b) ) ENGINE=NDB;
The following steps are required, in the order shown:
First—and prior to creating
test.t2—you must insert a row into the
mysql.ndb_replicationtable, as shown here:
INSERT INTO mysql.ndb_replication VALUES ('test', 't2', 0, NULL, 'NDB$OLD(mycol)');
Possible values for the
binlog_typecolumn are shown earlier in this section. The value
'NDB$OLD(mycol)'should be inserted into the
Create an appropriate exceptions table for
test.t2. The table creation statement shown here includes all required columns; any additional columns must be declared following these columns, and before the definition of the table's primary key.
CREATE TABLE test.t2$EX ( server_id INT UNSIGNED, source_server_id INT UNSIGNED, source_epoch BIGINT UNSIGNED, count INT UNSIGNED, a INT UNSIGNED NOT NULL, b CHAR(25) NOT NULL, [additional_columns,] PRIMARY KEY(server_id, source_server_id, source_epoch, count) ) ENGINE=NDB;
In NDB 7.4.1 and later, we can include additional columns for information about the type, cause, and originating transaction ID for a given conflict. We are also not required to supply matching columns for all primary key columns in the original table. In these versions, you can create the exceptions table like this:
CREATE TABLE test.t2$EX ( NDB$server_id INT UNSIGNED, NDB$source_server_id INT UNSIGNED, NDB$source_epoch BIGINT UNSIGNED, NDB$count INT UNSIGNED, a INT UNSIGNED NOT NULL, NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW', 'READ_ROW') NOT NULL, NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL, NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL, [additional_columns,] PRIMARY KEY(NDB$server_id, NDB$source_server_id, NDB$source_epoch, NDB$count) ) ENGINE=NDB;Note
NDB$prefix is required for the four required columns since we included at least one of the columns
NDB$ORIG_TRANSIDin the table definition.
Create the table
test.t2as shown previously.
These steps must be followed for every table for which you wish
to perform conflict resolution using
NDB$OLD(). For each such table, there must be
a corresponding row in
and there must be an exceptions table in the same database as
the table being replicated.
Read conflict detection and resolution.
NDB 7.4.1 and later supports tracking of read operations,
which makes it possible in circular replication setups to
manage conflicts between reads of a given row in one cluster
and updates or deletes of the same row in another. This
department tables to model a scenario in
which an employee is moved from one department to another on
the source cluster (which we refer to hereafter as cluster
A) while the replica cluster (hereafter
B) updates the employee count of the
employee's former department in an interleaved
The data tables have been created using the following SQL statements:
# Employee table CREATE TABLE employee ( id INT PRIMARY KEY, name VARCHAR(2000), dept INT NOT NULL ) ENGINE=NDB; # Department table CREATE TABLE department ( id INT PRIMARY KEY, name VARCHAR(2000), members INT ) ENGINE=NDB;
The contents of the two tables include the rows shown in the
(partial) output of the following
mysql> SELECT id, name, dept FROM employee; +---------------+------+ | id | name | dept | +------+--------+------+ ... | 998 | Mike | 3 | | 999 | Joe | 3 | | 1000 | Mary | 3 | ... +------+--------+------+ mysql> SELECT id, name, members FROM department; +-----+-------------+---------+ | id | name | members | +-----+-------------+---------+ ... | 3 | Old project | 24 | ... +-----+-------------+---------+
We assume that we are already using an exceptions table that includes the four required columns (and these are used for this table's primary key), the optional columns for operation type and cause, and the original table's primary key column, created using the SQL statement shown here:
CREATE TABLE employee$EX ( NDB$server_id INT UNSIGNED, NDB$source_server_id INT UNSIGNED, NDB$source_epoch BIGINT UNSIGNED, NDB$count INT UNSIGNED, NDB$OP_TYPE ENUM( 'WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW','READ_ROW') NOT NULL, NDB$CFT_CAUSE ENUM( 'ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL, id INT NOT NULL, PRIMARY KEY(NDB$server_id, NDB$source_server_id, NDB$source_epoch, NDB$count) ) ENGINE=NDB;
Suppose there occur the two simultaneous transactions on the two clusters. On cluster A, we create a new department, then move employee number 999 into that department, using the following SQL statements:
BEGIN; INSERT INTO department VALUES (4, "New project", 1); UPDATE employee SET dept = 4 WHERE id = 999; COMMIT;
At the same time, on cluster B, another
transaction reads from
employee, as shown
BEGIN; SELECT name FROM employee WHERE id = 999; UPDATE department SET members = members - 1 WHERE id = 3; commit;
The conflicting transactions are not normally detected by the
conflict resolution mechanism, since the conflict is between a
SELECT) and an update operation.
Beginning with NDB 7.4.1, you can circumvent this issue by
= 1 on the replica cluster. Acquiring
exclusive read locks in this way causes any rows read on the
source to be flagged as needing conflict resolution on the
replica cluster. If we enable exclusive reads in this way prior
to the logging of these transactions, the read on cluster
B is tracked and sent to cluster
A for resolution; the conflict on the
employee row is subsequently detected and the transaction on
cluster B is aborted.
The conflict is registered in the exceptions table (on cluster
A) as a
Conflict Resolution Exceptions Table, for a
description of operation types), as shown here:
mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX; +-------+-------------+-------------------+ | id | NDB$OP_TYPE | NDB$CFT_CAUSE | +-------+-------------+-------------------+ ... | 999 | READ_ROW | TRANS_IN_CONFLICT | +-------+-------------+-------------------+
Any existing rows found in the read operation are flagged. This means that multiple rows resulting from the same conflict may be logged in the exception table, as shown by examining the effects a conflict between an update on cluster A and a read of multiple rows on cluster B from the same table in simultaneous transactions. The transaction executed on cluster A is shown here:
BEGIN; INSERT INTO department VALUES (4, "New project", 0); UPDATE employee SET dept = 4 WHERE dept = 3; SELECT COUNT(*) INTO @count FROM employee WHERE dept = 4; UPDATE department SET members = @count WHERE id = 4; COMMIT;
Concurrently a transaction containing the statements shown here runs on cluster B:
SET ndb_log_exclusive_reads = 1; # Must be set if not already enabled ... BEGIN; SELECT COUNT(*) INTO @count FROM employee WHERE dept = 3 FOR UPDATE; UPDATE department SET members = @count WHERE id = 3; COMMIT;
In this case, all three rows matching the
WHERE condition in the second
SELECT are read, and are
thus flagged in the exceptions table, as shown here:
mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX; +-------+-------------+-------------------+ | id | NDB$OP_TYPE | NDB$CFT_CAUSE | +-------+-------------+-------------------+ ... | 998 | READ_ROW | TRANS_IN_CONFLICT | | 999 | READ_ROW | TRANS_IN_CONFLICT | | 1000 | READ_ROW | TRANS_IN_CONFLICT | ... +-------+-------------+-------------------+
Read tracking is performed on the basis of existing rows only. A read based on a given condition track conflicts only of any rows that are found and not of any rows that are inserted in an interleaved transaction. This is similar to how exclusive row locking is performed in a single instance of NDB Cluster.