WL#4783: Extend mysqlbinlog to support remote backup of binary logs

Affects: Server-5.6 — Status: Complete

Description
High Level Architecture
Low Level Design

MySQL needs a way of creating real-time backups.  The current method of doing
this is to create a slave with using the blackhole engine as can be seen at the
following URL:
http://dev.mysql.com/doc/refman/5.1/en/blackhole-storage-engine.html

This method works rather well but has the disadvantage of requiring a whole
MySQL instance to run to make the real-time backups, this can be a burden on
resources and difficult to setup if you are trying to backup many MySQL servers.

We already have mysqlbinlog to talk to remote MySQL servers to retrieve binlog
data and translate it into SQL statements so in theory it should be easy enough
to create a raw mode for the output.

Solution
mysqlbinlog has been modified to have several new parameters, key to this is the
new --raw flag which writes the data out in raw binlog format rather than
translating it.  The other parameter changes are designed to add more
functionality to the --raw flag so that it can easily be used in various backup
scenarios.

Implementation
First and foremost the --raw bool flag has been added.  This can only be used
when using the --read-from-remote-server parameter (-R).  This parameter
retrieves the data from the mysql server and stores it in binlog files locally.
 The binlog files are named after the filenames retrieved from the server but
can be prefixed with the --result-file parameter.  For example with a
--result-file=blue- you will get files like:

blue-mysqld-bin.000001
blue-mysqld-bin.000002
...

It is recommended that the --raw parameter be run with the --to-last-log or
--stop-never parameters (although this is not a requirement).  Without
--stop-never it will retrieve all the logs until the end of the last log and
stop.  If, however, you set --stop-never it will retrieve continuously
non-stop (unless the connection ends or mysqlbinlog is terminated).  mysqlbinlog
has been modified so that a kill or CTRL-C on it will safely close files and
cleanup before exiting.  With --stop-never mysqlbinlog a non-zero slave serverID
is needed, so there is --stop-never-slave-server-id to set this, it defaults to
65535 but it can be changed if this would conflict with any real slaveIDs.

Usage
A. Continuous backups
If you wish to backup a server right up until the point where a server exits the
following method can be used to do this:

1. Run mysqlbinlog --raw --read-from-remote-server --host=mysqlserver.address
--to-last-log --stop-never mysqld-bin.000001
2. Grab a mysqldump --all-databases --master-data=2 --host=mysqlserver.address
3. Repeat #2 every hour (or however long you want the snapshots to be)

You could also use --result-file to separate multiple servers
with a prefix or directory respectively.  mysqlbinlog will only stop upon an
when the connection ends or CTRL-C / kill.

To restore this data you can reload the last known good dump file and use the
positions recorded in that dump file restore all everything up to the point
where the failure occurred by processing the retrieved binlog files with
mysqlbinlog, so for example:

1. mysql < dump-200902230800.sql
2. mysqlbinlog --start-position=123456 binlog.000005 | mysql

B. Snapshots with point-in-time
If you wish to create snapshots with point-in-time between the snapshots you can
use the following method:

mysql> FLUSH TABLES WITH READ LOCK;  -- To prepare for backup
mysql> SHOW BINARY LOGS; -- To get the binary log number
mysql> FLUSH LOGS;  -- To get a good binlog for the backup
bash> mysqldump --host=my-host.com --port=4711 --all-databases
>backup-20090213/mydb.sql
/* Here we know that the previous backup was done before server-bin.000047 */
bash> mysqlbinlog --raw --read-from-remote-server --host=my-host.com --port=4711
--start-position=1234567 --to-last-log \
      --result-file=backup-20090213/ server-bin.000047 
mysql> UNLOCK TABLES;
bash> tar -jcf backup-20090213.tar.bz2 backup-20090213

This will create an empty binlog as the final log due to the FLUSH LOGS, so this
can be deleted or ignored.

* mysqlbinlog now safely exits with CTRL-C or kill (closes files properly)
* --raw added which will retrieve raw binlog files from the remote server and
save them using the filenames in the replication headers
* --stop-never added which waits for new data on a remote connection instead
of quitting at the end of the last log.
** Due to the way replication is designed, this needs a non-zero serverID is set
(0 kills the connection at the end of the last log).  So:
* --stop-never-slave-server-id added which sets the serverID used for
--stop-never (default 65535)
* --result-file modified so that when used with --raw mode it is a prefix for
the output files

* --raw requires --read-from-remote-server
* --raw with --database will give a warning because --database will be ignored
* --stop-never-slave-server-id cannot be 0
* --stop-position and --stop-datetime are ignored when --raw is used and a
warning is shown