WL#1040: Dealing safely when "disk full" when writing to the binlog
Affects: Server-4.1 — Status: Un-Assigned — Priority: Medium
A customer complained about the binlog being corrupted because the partition it was on became full. We want to deal gracefully with such errors. Presently in case of binlog write error, the master prints nothing (from what I have seen), and it will try to write the next statements. [It prints something (to .err) only when it cannot create a binary log, and then it stops binlogging.] Nothing is reported to the user, and if the problem was temporary (imagine a huge temporary file) then maybe the next statements will be written so there will be a gap but the user will not notice it. Note that other logs, like the general query log, do print an error if they fail to write, which is a little better. However, the server will still try to write to this log, so there could still be a gap. As the binary log (this stands for the relay log too) is the most critical log MySQL has, it should not have the worst error handling ;) Note that an error is reported when mysqld cannot flush the binlog cache to the binlog (so people using transactions are luckier than others here). In case of write error in the binlog, the partially written statement will cause an error on the slave (the slave threads will stop). FIRST SUBTASK A model can be how MyISAM handles disk full ("How MySQL Handles a Full Disk" in the manual); it's just a flag MY_WAIT_IF_FULL to pass to my_write(). So there are several solutions: 1) if disk full, do like MyISAM: wait until not full (because maybe replication&backup is more important than uptime); that's a MY_WAIT_IF_FULL to pass to my_write() in the relevant places. This will automatically write some error messages. The thread can be killed (i.e. the wait can be aborted) by the user, like for MyISAM; but in that case the binlog should be closed (to avoid gaps). 2) or if disk full, do not wait (because maybe uptime is more important than replication&backup), just go ahead but CLOSE the binary log so that it is not used anymore (to avoid gaps). There is a consensus that 1) should be default and 2) an option. 1) is now WL#2335. SECOND SUBTASK And what if the thread gets killed in 1), or if 2) was used; i.e. what if the binlog ends up with an incomplete event? It would be nice if when the master gets a write error to binlog, it seeks back to the last good pos (so it needs to tell() the file before writing to it, slowdown...) and truncates the file (TODO: check if all exotic OSes we support can do that, I remember seeing a comment somewhere that some don't :( ). A susbstitute for truncation could be, when you get a write error: - try to write 5 zeros after the last good event - when a thread later attempts to read the event at this position, if it sees * 0 and EOF: seek to end of binlog (and switch to next binlog) * 00 and EOF, 000 and EOF, 0000 and EOF: same * 00000 (whatever follows): this is for sure not a good event (fifth 0 means event type 0 i.e. "unknown") so seek to the end. BUT this is not enough, because the binlog corruption may come from a master brutal shutdown (no time to detect the problem and truncate it). So you have to detect it AT RESTART. A way could be to have known separators between events. Also it is possible to detect problems (we have the event len in the event) and act accordingly (presently we say "this event is corrupted" and stop, maybe we could seek to the end instead?). This is somewhat related to a wish from an important customer: even if the binlog is written well, there could be a problem during the commit in InnoDB, so rollback will occur at restart, so we want to cut the binlog (remove the rolled back queries), so we need to durably keep track of the last good position (we thought of storing it safely in the InnoDB datafile). RELATED REFERENCES 1. BUG#45449: SQL thread crashes on disk full 2. BUG#32228: A disk full makes binary log corrupt.
Copyright (c) 2000, 2017, Oracle Corporation and/or its affiliates. All rights reserved.