WL#1790: Detect any write error when writing to the binlog
Affects: Server-7.1
—
Status: Assigned
PART I (already partly done) Go through all code where mysql_bin_log.write() is called, verify that we do something if that function returns 1 (ideally we want to rollback). Go through the methods of log.cc, especially the MYSQL_LOG::write() which write to the binlog, and verify that there we detect any error (example: not sure we detect an error if we fail in MYSQL_LOG::new_file()). Fix for all errors to be detected. This is an important task related as binlog is necessary for point-in-time recovery, and so if it gets corrupted _silently_ then it's just bad. PART II We can choose between several ways to continue binlogging when there is a binlog write error (except "disk full", see further below): - cause the mysqld to stop ASAP (just call kill_mysql()). Sacrificing downtime for safety. - close binlog and go on with life (no more logging; maximum uptime sacrificing safety). - have --log-bin accept multiple locations (like tmpdir); start by using the first one, until it has a problem, then switch to the 2nd one, etc. As soon as you start using a directory, create a fresh new .index there, which will contain the names of binlogs of that dir. When mysqld starts, scan all dirs until you find the first one containing no .index file. Then the dir before is the one to start with, for binary logging. Print all relevant messages to .err. When the last location has a problem, just stop logging and continue without logging. "close binlog and start a new one" is not really good as it's likely that mysqld can't create the new binlog or write its name to the .index file, as the new binlog would be created to the corrupted one so may hit the same error. Exception: binlog write should wait and retry if disk is full or quota exceeded (WL#1040), if we have only one binlog location; if we have more than one it makes more sense to not wait and instead switch immediately to the second one. All other write errors are more serious ones which are likely to not fix themselves in a matter of seconds. REFERENCES ========== - BUG#45449 - BUG#37148 - BUG#46166 - BUG#51014 - BUG#51019 - http://lists.mysql.com/internals/37894
What are the standard actions when writing binary log fails? Default actions: [X] Return an error if it is a DDL statement(most of the DDL is non-transactional and cannot rollback). [X] Return error to user if it is on non-transactional engines. [X] Rollback all operations on transactional engines and return error if it is in a transaction. [X] Close the IO_CACHE and create a new one, then try to write again, if IO_CACHE errors happens. [X] Try to create a new binary log file and write again if some kinds of error happens. Extra actions: [X] Return an error and stop binary log permanently if the error is set in --stop-binlog-if-error option. [ ] Return an error and stop the server if the error is set in --stop-server-if-error option. What kinds of error are there? [X] IO_CACHE errors [X] Errors of system API write/WriteFile(Winows). [X] Errors of system API fsync/fdatasync/FlushFileBuffers. How to test it? [X] Use DBUG_EXECUTE_IF to set some execution points.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.