When using a replication setup involving multiple masters (including circular replication), it is possible that different masters may try to update the same row on the slave with different data. Conflict resolution in MySQL Cluster Replication provides a means of resolving such conflicts by permitting a user-defined resolution column to be used to determine whether or not an update on a given master should be applied on the slave.
Some types of conflict resolution supported by MySQL Cluster
NDB$MAX_DELETE_WIN()) implement this
user-defined column as a “timestamp” column (although
its type cannot be
explained later in this section). These types of conflict
resolution are always applied a row-by-row basis rather than a
transactional basis. The epoch-based conflict resolution functions
NDB$EPOCH_TRANS() compare the order in which
epochs are replicated (and thus these functions are
transactional). Different methods can be used to compare
resolution column values on the slave when conflicts occur, as
explained later in this section; the method used can be set on a
You should also keep in mind that it is the application's responsibility to ensure that the resolution column is correctly populated with relevant values, so that the resolution function can make the appropriate choice when determining whether to apply an update.
Requirements. Preparations for conflict resolution must be made on both the master and the slave. These tasks are described in the following list:
On the master writing the binary logs, you must determine
which columns are sent (all columns or only those that have
been updated). This is done for the MySQL Server as a whole by
applying the mysqld startup option
(described later in this section) or on a per-table basis by
entries in the
(see The ndb_replication system table).
If you are replicating tables with very large columns (such
also be useful for reducing the size of the master and slave
binary logs and avoiding possible replication failures due
See Section 188.8.131.52, “Replication and max_allowed_packet”, for more information about this issue.
On the slave, you must determine which type of conflict
resolution to apply (“latest timestamp wins”,
“same timestamp wins”, “primary
wins”, “primary wins, complete
transaction”, or none). This is done using the
mysql.ndb_replication system table, on a
per-table basis (see
The ndb_replication system table).
When using the functions
NDB$MAX_DELETE_WIN() for timestamp-based
conflict resolution, we often refer to the column used for
determining updates as a “timestamp” column. However,
the data type of this column is never
TIMESTAMP; instead, its data type
“timestamp” column should also be
NDB$EPOCH_TRANS() functions discussed later in
this section work by comparing the relative order of replication
epochs applied on a primary and secondary MySQL Cluster, and do
not make use of timestamps.
Master column control.
We can see update operations in terms of “before”
and “after” images—that is, the states of the
table before and after the update is applied. Normally, when
updating a table with a primary key, the “before”
image is not of great interest; however, when we need to
determine on a per-update basis whether or not to use the
updated values on a replication slave, we need to make sure that
both images are written to the master's binary log. This is
done with the
for mysqld, as described later in this
Whether logging of complete rows or of updated columns only is done is decided when the MySQL server is started, and cannot be changed online; you must either restart mysqld, or start a new mysqld instance with different logging options.
|System Variable Name|
Log complete rows
Log only column data that has been updated—that is, column data whose value has been set, regardless of whether or not this value was actually changed. This is the default behavior.
It is usually sufficient—and more efficient—to log
updated columns only; however, if you need to log full rows, you
can do so by setting
|System Variable Name|
The setting of the MySQL Server's
determines whether logging is performed with or without the
“before” image. Because conflict resolution is done
in the MySQL Server's update handler, it is necessary to
control logging on the master such that updates are updates and
not writes; that is, such that updates are treated as changes in
existing rows rather than the writing of new rows (even though
these replace existing rows). This option is turned on by default;
in other words, updates are treated as writes. (That is, updates
are by default written as
write_row events in
the binary log, rather than as
events.) To turn off the option, start the master
Conflict resolution control.
Conflict resolution is usually enabled on the server where
conflicts can occur. Like logging method selection, it is
enabled by entries in the
The ndb_replication system table.
To enable conflict resolution, it is necessary to create an
ndb_replication table in the
mysql system database on the master, the
slave, or both, depending on the conflict resolution type and
method to be employed. This table is used to control logging and
conflict resolution functions on a per-table basis, and has one
row per table involved in replication.
ndb_replication is created and filled with
control information on the server where the conflict is to be
resolved. In a simple master-slave setup where data can also be
changed locally on the slave this will typically be the slave.
In a more complex master-master (2-way) replication schema this
will usually be all of the masters involved. Each row in
mysql.ndb_replication corresponds to a table
being replicated, and specifies how to log and resolve conflicts
(that is, which conflict resolution function, if any, to use)
for that table. The definition of the
mysql.ndb_replication table is shown here:
CREATE TABLE mysql.ndb_replication ( db VARBINARY(63), table_name VARBINARY(63), server_id INT UNSIGNED, binlog_type INT UNSIGNED, conflict_fn VARBINARY(128), PRIMARY KEY USING HASH (db, table_name, server_id) ) ENGINE=NDB PARTITION BY KEY(db,table_name);
The columns in this table are described in the next few paragraphs.
The name of the database containing the table to be replicated.
You may employ either or both of the wildcards
% as part of the
database name. Matching is similar to what is implemented for
The name of the table to be replicated. The table name may
include either or both of the wildcards
%. Matching is similar to what is implemented
server_id. The unique server ID of the MySQL instance (SQL node) where the table resides.
binlog_type. The type of binary logging to be employed. This is determined as shown in the following table:
|0||Use server default|
|1||Do not log this table in the binary log|
|2||Only updated attributes are logged|
|3||Log full row, even if not updated (MySQL server default behavior)|
|4||(For generating |
|6||Use updated attributes, even if values are unchanged|
|7||Use full row, even if values are unchanged|
conflict_fn. The conflict resolution function to be applied. This function must be specified as one of those shown in the following list:
These functions are described in the next few paragraphs.
If the value of
column_name is the
same on both the master and the slave, then the update is
applied; otherwise, the update is not applied on the slave and
an exception is written to the log. This is illustrated by the
slave_current_column_value) apply_update(); else log_exception();
The column value from the master's “before” image is used by this function.
NDB$MAX(column_name). If the “timestamp” column value for a given row coming from the master is higher than that on the slave, it is applied; otherwise it is not applied on the slave. This is illustrated by the following pseudocode:
This function can be used for “greatest timestamp wins” conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was most recently updated is the version that persists.
The column value from the master's “after” image is used by this function.
This is a variation on
NDB$MAX(). Due to the
fact that no timestamp is available for a delete operation, a
NDB$MAX() is in fact processed
NDB$OLD. However, for some use cases, this
is not optimal. For
the “timestamp” column value for a given row adding
or updating an existing row coming from the master is higher
than that on the slave, it is applied. However, delete
operations are treated as always having the higher value. This
is illustrated in the following pseudocode:
if ( (
operation.type== "delete") apply_update();
This function can be used for “greatest timestamp, delete wins” conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was deleted or (otherwise) most recently updated is the version that persists.
NDB$MAX(), the column value from the
master's “after” image is the value used by
NDB$EPOCH() and NDB$EPOCH_TRANS().
NDB$EPOCH() function tracks the order in
which replicated epochs are applied on a slave MySQL Cluster
relative to changes originating on the slave. This relative
ordering is used to determine whether changes originating on the
slave are concurrent with any changes that originate locally,
and are therefore potentially in conflict.
Most of what follows in the description of
NDB$EPOCH() also applies to
NDB$EPOCH_TRANS(). Any exceptions are noted in
NDB$EPOCH() is asymmetric, operating on one
MySQL Cluster in a two-cluster circular replication configuration
(sometimes referred to as “active-active”
replication). We refer here to cluster on which it operates as the
primary, and the other as the secondary. The slave on the primary
is responsible for detecting and handling conflicts, while the
slave on the secondary is not involved in any conflict detection
When the slave on the primary detects conflicts, it injects events into its own binary log to compensate for these; this ensures that the secondary MySQL Cluster eventually realigns itself with the primary and so keeps the primary and secondary from diverging. This compensation and realignment mechanism requires that the primary MySQL Cluster always wins any conflicts with the secondary—that is, that the primary's changes are always used rather than those from the secondary in event of a conflict. This “primary always wins” rule has the following implications:
Operations that change data, once committed on the primary, are fully persistent and will not be undone or rolled back by conflict detection and resolution.
Data read from the primary is fully consistent. Any changes committed on the Primary (locally or from the slave) will not be reverted later.
Operations that change data on the secondary may later be reverted if the primary determines that they are in conflict.
Individual rows read on the secondary are self-consistent at all times, each row always reflecting either a state committed by the secondary, or one committed by the primary.
Sets of rows read on the secondary may not necessarily be
consistent at a given single point in time. For
NDB$EPOCH_TRANS(), this is a transient
NDB$EPOCH(), it can be a
Assuming a period of sufficient length without any conflicts, all data on the secondary MySQL Cluster (eventually) becomes consistent with the primary's data.
NDB$EPOCH_TRANS() do not require any user
schema modifications, or application changes to provide conflict
detection. However, careful thought must be given to the schema
used, and the access patterns used, to verify that the complete
system behaves within specified limits.
Each of the
NDB$EPOCH_TRANS() functions can take an
optional parameter; this is the number of bits to use to represent
the lower 32 bits of the epoch, and should be set to no less than
For the default values of these configuration parameters (2000 and
100 milliseconds, respectively), this gives a value of 5 bits, so
the default value (6) should be sufficient, unless other values
are used for
both. A value that is too small can result in false positives,
while one that is too large could lead to excessive wasted space
in the database.
NDB$EPOCH_TRANS() insert entries for
conflicting rows into the relevant exceptions tables, provided
that these tables have been defined according to the same
exception table schema rules as described elsewhere in this
section (see NDB$OLD(column_name)).
Note that you need to create any exception table before creating
the table with which it is to be used.
As with the other conflict detection functions discussed in this
NDB$EPOCH_TRANS() are activated by including
relevant entries in the
table (see The ndb_replication system table).
The roles of the primary and secondary MySQL Clusters in this
scenario are fully determined by
mysql.ndb_replication table entries.
Because the conflict detection algorithms employed by
NDB$EPOCH_TRANS() are asymmetric, you must use
different values for the primary slave's and secondary
Prior to MySQL Cluster NDB 7.3.6, conflict between
DELETE operations were handled like those for
UPDATE operations, and within the same epoch
were considered in conflict. In MySQL Cluster NDB 7.3.6 and later,
a conflict between
DELETE operations alone is
not sufficient to trigger a conflict using
NDB$EPOCH_TRANS(), and the relative placement
within epochs does not matter. (Bug "18454499)
NDB$EPOCH() and NDB$EPOCH_TRANS() status variables.
Several status variables can be used to monitor
NDB$EPOCH_TRANS() conflict detection. You can
see how many rows have been found in conflict by
NDB$EPOCH() since this slave was last
restarted from the current value of the
provides the number of rows that have been found directly in
NDB$EPOCH_TRANS(); the number of
rows actually realigned, including those affected due to their
membership in or dependency on the same transactions as other
conflicting rows, is given by
For more information, see Section 184.108.40.206, “MySQL Cluster Status Variables”.
Conflicts are detected using MySQL Cluster epoch boundaries,
with granularity proportional to
(default: 100 milliseconds). The minimum conflict window is
the minimum time during which concurrent updates to the same
data on both clusters always report a conflict. This is always
a nonzero length of time, and is roughly proportional to
2 * (latency + queueing +
TimeBetweenEpochs). This implies that—assuming
the default for
ignoring any latency between clusters (as well as any queuing
delays)—the minimum conflict window size is
approximately 200 milliseconds. This minimum window should be
considered when looking at expected application
Additional storage is required for tables using the
NDB$EPOCH_TRANS() functions; from 1 to 32
bits extra space per row is required, depending on the value
passed to the function.
Conflicts between delete operations may result in divergence between the primary and secondary. When a row is deleted on both clusters concurrently, the conflict can be detected, but is not recorded, since the row is deleted. This means that further conflicts during the propagation of any subsequent realignment operations will not be detected, which can lead to divergence.
Deletes should be externally serialized, or routed to one cluster only. Alternatively, a separate row should be updated transactionally with such deletes and any inserts that follow them, so that conflicts can be tracked across row deletes. This may require changes in applications.
Only two MySQL Clusters in a circular
“active-active” configuration are currently
supported when using
NDB$EPOCH_TRANS() for conflict detection.
NDB$EPOCH_TRANS() extends the
NDB$EPOCH() function. Conflicts are detected
and handled in the same way using the “primary wins
all” rule (see
NDB$EPOCH() and NDB$EPOCH_TRANS()) but with
the extra condition that any other rows updated in the same
transaction in which the conflict occurred are also regarded as
being in conflict. In other words, where
NDB$EPOCH() realigns individual conflicting
rows on the secondary,
realigns conflicting transactions.
In addition, any transactions which are detectably dependent on a conflicting transaction are also regarded as being in conflict, these dependencies being determined by the contents of the secondary cluster's binary log. Since the binary log contains only data modification operations (inserts, updates, and deletes), only overlapping data modifications are used to determine dependencies between transactions.
NDB$EPOCH_TRANS() is subject to the same
conditions and limitations as
in addition requires that Version 2 binary log row events are used
to 0), which adds a storage overhead of 2 bytes per event in the
binary log. In addition, all transaction IDs must be recorded in
the secondary's binary log
which adds a further variable overhead (up to 13 bytes per row).
NULL. Indicates that conflict resolution is not to be used for the corresponding table.
A server status variable
Ndb_conflict_fn_max provides a
count of the number of times that a row was not applied on the
current SQL node due to “greatest timestamp wins”
conflict resolution since the last time that
mysqld was started.
The number of times that a row was not applied as the result of
“same timestamp wins” conflict resolution on a given
mysqld since the last time it was restarted is
given by the global status variable
Ndb_conflict_fn_old. In addition
Ndb_conflict_fn_old, the primary
key of the row that was not used is inserted into an
exceptions table, as explained later in
To use the
NDB$OLD() conflict resolution
function, it is also necessary to create an exceptions table
corresponding to each
NDB table for
which this type of conflict resolution is to be employed. This
is also true when using
NDB$EPOCH_TRANS(). The name of this table is
that of the table for which conflict resolution is to be
applied, with the string
$EX appended. (For
example, if the name of the original table is
mytable, the name of the corresponding
exception table name should be
This table is created as follows:
original_table$EX ( server_id INT UNSIGNED, master_server_id INT UNSIGNED, master_epoch BIGINT UNSIGNED, count INT UNSIGNED,
additional_columns,] PRIMARY KEY(server_id, master_server_id, master_epoch, count) ) ENGINE=NDB;
The first four columns are required. Following these columns, the columns making up the original table's primary key should be copied in the order in which they are used to define the primary key of the original table.
The names of the first four columns and the columns matching the
original table's primary key columns are not critical; however,
we suggest for reasons of clarity and consistency, that you use
the names shown here for the
columns, and that you use the same names as in the original
table for the columns matching those in the original table's
The data types for the columns duplicating the primary key columns of the original table should be the same as for (or larger than) the original columns.
Additional columns may optionally be defined following these
columns, but not before any of them; any such extra columns cannot
NOT NULL. The exception table's primary
key must be defined as shown. The exception table must use the
NDB storage engine. An example that
NDB$OLD() with an exceptions table is
shown later in this section.
mysql.ndb_replication table is read when
a data table is set up for replication, so the row corresponding
to a table to be replicated must be inserted into
before the table to be replicated is
Examples. The following examples assume that you have already a working MySQL Cluster replication setup, as described in Section 18.6.5, “Preparing the MySQL Cluster for Replication”, and Section 18.6.6, “Starting MySQL Cluster Replication (Single Replication Channel)”.
Suppose you wish to enable “greatest timestamp
wins” conflict resolution on table
test.t1, using column
mycol as the “timestamp”.
This can be done using the following steps:
On the master, perform this
INSERT INTO mysql.ndb_replication VALUES ('test', 't1', 0, NULL, 'NDB$MAX(mycol)');
Inserting a 0 into the
server_id indicates that
all SQL nodes accessing this table should use conflict
resolution. If you want to use conflict resolution on a
specific mysqld only, use the actual
NULL into the
binlog_type column has the same effect
as inserting 0 (
server default is used.
CREATE TABLE test.t1 (
columnsmycol INT UNSIGNED,
Now, when updates are done on this table, conflict
resolution is applied, and the version of the row having
the greatest value for
mycol is written
to the slave.
binlog_type options—such as
NBT_UPDATED_ONLY_USE_UPDATE should be
used to control logging on the master using the
ndb_replication table rather than by
using command-line options.
NDB table such as
the one defined here is being replicated, and you wish to
enable “same timestamp wins” conflict
resolution for updates to this table:
CREATE TABLE test.t2 ( a INT UNSIGNED NOT NULL, b CHAR(25) NOT NULL,
columns, mycol INT UNSIGNED NOT NULL,
columns, PRIMARY KEY pk (a, b) ) ENGINE=NDB;
The following steps are required, in the order shown:
First—and prior to creating
test.t2—you must insert a row
table, as shown here:
INSERT INTO mysql.ndb_replication VALUES ('test', 't2', 0, NULL, 'NDB$OLD(mycol)');
Possible values for the
column are shown earlier in this section. The value
'NDB$OLD(mycol)' should be inserted
Create an appropriate exceptions table for
test.t2. The table creation statement
shown here includes all required columns; any additional
columns must be declared following these columns, and
before the definition of the table's primary key.
CREATE TABLE test.t2$EX ( server_id SMALLINT UNSIGNED, master_server_id INT UNSIGNED, master_epoch BIGINT UNSIGNED, count BIGINT UNSIGNED, a INT UNSIGNED NOT NULL, b CHAR(25) NOT NULL, [
additional_columns,] PRIMARY KEY(server_id, master_server_id, master_epoch, count) ) ENGINE=NDB;
Create the table
test.t2 as shown
These steps must be followed for every table for which you
wish to perform conflict resolution using
NDB$OLD(). For each such table, there must
be a corresponding row in
mysql.ndb_replication, and there must be an
exceptions table in the same database as the table being