If replication stops due to an issue with an event in a replicated transaction, you can resume replication by skipping the failed transaction on the slave. Before skipping a transaction, ensure that the replication slave's I/O thread is stopped as well as the SQL thread.
First you need to identify the replicated event that caused the
error. Details of the error and the last successfully applied
transaction are recorded in the Performance Schema table
You can use mysqlbinlog to retrieve and
display the events that were logged around the time of the
error. For instructions to do this, see
Section 7.5, “Point-in-Time (Incremental) Recovery”. Alternatively, you can
SHOW RELAYLOG EVENTS on the
SHOW BINLOG EVENTS on
Before skipping the transaction and restarting the slave, check these points:
Is the transaction that stopped replication from an unknown or untrusted source? If so, investigate the cause in case there are any security considerations that indicate the slave should not be restarted.
Does the transaction that stopped replication need to be applied on the slave? If so, either make the appropriate corrections and reapply the transaction, or manually reconcile the data on the slave.
Did the transaction that stopped replication need to be applied on the master? If not, undo the transaction manually on the server where it originally took place.
To skip the transaction, choose one of the following methods as appropriate:
When GTIDs are in use (
ON), see Section 220.127.116.11.1, “Skipping Transactions With GTIDs” .
When GTIDs are not in use or are being phased in (
ON_PERMISSIVE), see Section 18.104.22.168.2, “Skipping Transactions Without GTIDs”.
To restart replication after skipping the transaction, issue
START SLAVE, with the
FOR CHANNEL clause if the slave is a
multi-source replication slave.
When GTIDs are in use
ON), the GTID for a committed transaction
is persisted on the slave even if the content of the
transaction is filtered out. This feature prevents a slave
from retrieving previously filtered transactions when it
reconnects to the master using GTID auto-positioning. It can
also be used to skip a transaction on the slave, by committing
an empty transaction in place of the failing transaction.
If the failing transaction generated an error in a worker
thread, you can obtain its GTID directly from the
LAST_SEEN_TRANSACTION field in the
Performance Schema table
To see what the transaction is, issue
SHOW RELAYLOG EVENTS on the
SHOW BINLOG EVENTS on
the master, and search the output for a transaction preceded
by that GTID.
When you have assessed the failing transaction for any other appropriate actions as described previously (such as security considerations), to skip it, commit an empty transaction on the slave that has the same GTID as the failing transaction. For example:
SET GTID_NEXT='aaa-bbb-ccc-ddd:N'; BEGIN; COMMIT; SET GTID_NEXT='AUTOMATIC';
The presence of this empty transaction on the slave means that
when you issue a
statement to restart replication, the slave uses the auto-skip
function to ignore the failing transaction, because it sees a
transaction with that GTID has already been applied. If the
slave is a multi-source replication slave, you do not need to
specify the channel name when you commit the empty
transaction, but you do need to specify the channel name when
Note that if binary logging is in use on this slave, the empty transaction will enter the replication stream if the slave becomes a master or primary in the future. If you need to avoid this possibility, consider flushing and purging the slave's binary logs, as in this example:
FLUSH LOGS; PURGE BINARY LOGS TO 'binlog.000146';
The GTID of the empty transaction is persisted, but the transaction itself is removed by purging the binary log files.
To skip failing transactions when GTIDs are not in use or are
being phased in (
ON_PERMISSIVE), you can skip a specified
number of events by issuing a
sql_slave_skip_counter statement. Alternatively, you
can skip past an event or events by issuing a
CHANGE MASTER TO statement to
move the master log position forward.
When you use these methods, it is important to understand that you are not necessarily skipping a complete transaction, as is always the case with the GTID-based method described previously. These non-GTID-based methods are not aware of transactions as such, but instead operate on events. The binary log is organized as a sequence of groups known as event groups, and each event group consists of a sequence of events.
For transactional tables, an event group corresponds to a transaction.
For nontransactional tables, an event group corresponds to a single SQL statement.
A single transaction can contain changes to both transactional and nontransactional tables.
When you use a
sql_slave_skip_counter statement to skip events and
the resulting position is in the middle of an event group, the
slave continues to skip events until it reaches the end of the
group. Execution then starts with the next event group. The
CHANGE MASTER TO statement does
not have this function, so you must be careful to identify the
correct location to restart replication at the beginning of an
event group. However, using
TO means you do not have to count the events that
need to be skipped, as you do with a
sql_slave_skip_counter, and instead you can just
specify the location to restart.
When you have assessed the failing transaction for any other
appropriate actions as described previously (such as
security considerations), count the number of events that
you need to skip. One event normally corresponds to one SQL
statement in the binary log, but note that statements that
LAST_INSERT_ID() count as two events in
the binary log. When binary log transaction compression is
in use, a compressed transaction payload
Transaction_payload_event) is counted as
a single counter value, so all the events inside it are
skipped as a unit.
If you want to skip the complete transaction, you can count
the events to the end of the transaction, or you can just
skip the relevant event group. Remember that with
SET GLOBAL sql_slave_skip_counter, the
slave continues to skip to the end of an event group. Make
sure you do not skip too far forward and go into the next
event group or transaction, as this will then also be
SET statement as follows, where
N is the number of events from
the master to skip:
SET GLOBAL sql_slave_skip_counter = N
This statement cannot be issued if
gtid_mode=ON is set, or if
the slave threads are running.
SET GLOBAL sql_slave_skip_counter
statement has no immediate effect. When you issue the
START SLAVE statement for the
next time following this
the new value for the system variable
applied, and the events are skipped. That
START SLAVE statement also
automatically sets the value of the system variable back to
0. If the slave is a multi-source replication slave, when
you issue that
FOR CHANNEL clause is
required. Make sure that you name the correct channel,
otherwise events will be skipped on the wrong channel.
When you have assessed the failing transaction for any other appropriate actions as described previously (such as security considerations), identify the coordinates (file and position) in the master's binary log that represent a suitable position to restart replication. This can be the start of the event group following the event that caused the issue, or the start of the next transaction. The slave I/O thread will begin reading from the master at these coordinates the next time the thread starts, skipping the failing event. Make sure that you have identified the position accurately, because this statement does not take event groups into account.
CHANGE MASTER TO
statement as follows, where
master_log_name is the binary log
file that contains the restart position, and
master_log_pos is the number
representing the restart position as stated in the binary
CHANGE MASTER TO MASTER_LOG_FILE='master_log_name', MASTER_LOG_POS=master_log_pos;
If the slave is a multi-source replication slave, you must
FOR CHANNEL clause to name the
appropriate channel on the
MASTER TO statement.
This statement cannot be issued if
MASTER_AUTO_POSITION=1 is set, or if the
slave threads are running. If you need to use this method of
skipping a transaction when
MASTER_AUTO_POSITION=1 is normally set,
you can change the setting to
MASTER_AUTO_POSITION=1 while issuing the
statement, then change it back again afterwards. For
CHANGE MASTER TO MASTER_AUTO_POSITION=0, MASTER_LOG_FILE='binlog.000145', MASTER_LOG_POS=235; CHANGE MASTER TO MASTER_AUTO_POSITION=1;