WL#1040: Dealing safely when "disk full" when writing to the binlog

Affects: Server-4.1   —   Status: Un-Assigned   —   Priority: Medium

A customer complained about the binlog being corrupted because the partition it
was on became full.
We want to deal gracefully with such errors.
Presently in case of binlog write error, the master prints nothing (from what I
have seen), and it will try to write the next statements.
[It prints something (to .err) only when it cannot create a binary log, and then
it stops binlogging.]

Nothing is reported to the user, and if the problem was temporary (imagine a
huge temporary file) then maybe the next statements will be written so there
will be a gap but the user will not notice it.

Note that other logs, like the general query log, do print an error if they fail
to write, which is a little better. However, the server will still try to write
to this log, so there could still be a gap.

As the binary log (this stands for the relay log too) is the most critical log
MySQL has, it should not have the worst error handling ;)

Note that an error is reported when mysqld cannot flush the binlog cache to the
binlog (so people using transactions are luckier than others here).

In case of write error in the binlog, the partially written statement will cause
an error on the slave (the slave threads will stop).


A model can be how MyISAM handles disk full ("How MySQL Handles a Full Disk" in
the manual); it's just a flag MY_WAIT_IF_FULL to pass to my_write().

So there are several solutions:
1) if disk full, do like MyISAM: wait until not full (because maybe
replication&backup is more important than uptime); that's a MY_WAIT_IF_FULL to
pass to my_write() in the relevant places. This will automatically write some
error messages. The thread can be killed (i.e. the wait can be aborted) by the
user, like for MyISAM; but in that case the binlog should be closed (to avoid gaps).
2) or if disk full, do not wait (because maybe uptime is more important than
replication&backup), just go ahead but CLOSE the binary log so that it is not
used anymore (to avoid gaps).

There is a consensus that 1) should be default and 2) an option.

1) is now WL#2335.


And what if the thread gets killed in 1), or if 2) was used; i.e. what if the
binlog ends up with an incomplete event? It would be nice if when the master
gets a write error to binlog, it seeks back to the last good pos (so it needs to
tell() the file before writing to it, slowdown...) and truncates the file (TODO:
check if all exotic OSes we support can do that, I remember seeing a comment
somewhere that some don't :( ).
A susbstitute for truncation could be, when you get a write error:
- try to write 5 zeros after the last good event
- when a thread later attempts to read the event at this position, if it sees
  * 0 and EOF: seek to end of binlog (and switch to next binlog)
  * 00 and EOF, 000 and EOF, 0000 and EOF: same
  * 00000 (whatever follows): this is for sure not a good event (fifth 0 means
event type 0 i.e. "unknown") so seek to the end.

BUT this is not enough, because the binlog corruption may come from a master
brutal shutdown (no time to detect the problem and truncate it). So you have to
detect it AT RESTART. A way could be to have known separators between events.
Also it is possible to detect problems (we have the event len in the event) and
act accordingly (presently we say "this event is corrupted" and stop, maybe we
could seek to the end instead?).
This is somewhat related to a wish from an important customer: even if the
binlog is written well, there could be a problem during the commit in InnoDB, so
rollback will occur at restart, so we want to cut the binlog (remove the rolled
back queries), so we need to durably keep track of the last good position (we
thought of storing it safely in the InnoDB datafile).


 1. BUG#45449: SQL thread crashes on disk full
 2. BUG#32228: A disk full makes binary log corrupt.