WL#5493: Binlog crash-safe when master crashed
Affects: Server-5.6
—
Status: Complete
RATIONALE ========= * To make binlog itself safe from crash. So that the content in the binlog can be recovered/commited/rolledback together with the changes to the table (transactioal) CONTEXT ======= It's related to BUG#53530, BUG#53527 and WL#5440. In the vicinity: WL#1790. Look at BUG#52620 too. /Alfranio When pushing this WL, the test for WL#5440 should be pushed as well. COMMENTS ======== By Yoshinori Matsunobu (12/08/2010): The following 1-4 failure scenario should be considered. +----------------------+-----+-----+-----+-----+-----+ | | 0 | 1 | 2 | 3 | 4 | +----------------------+-----+-----+-----+-----+-----+ | Binlog Event Header | o | x | xx | o | o | | Binlog Data | o | - | - | x | xx | +----------------------+-----+-----+-----+-----+-----+ o: correctly written (normal behavior) x: invalid (partly written) xx: wrongly written (written with 0x33, etc) 0: No error (normal behavior) 1: Failed to read header 2: Read header, but the value (i.e. data_len) was incorrect 3: Succeeded to read header, but failed to read body 4: Succeeded to read header, but body was incorrect If 1-3 happens, current Log_event::read_log_event() return 0 so it's easy to handle inside MYSQL_BIN_LOG::recover(), without changing binlog format. If 4 happens, Log_event::read_log_event() will return the invalid event so if a slave catches the event the slave will fail. My understanding is that Andrei's WL#2540 (Binlog Checksum) will fix this issue. Currently three implementation plans can be considered. 1) Extending Binlog File Header Write "actual binary log size" info into binlog file header. When doing crash recovery, check actual size, and trim binlog if binlog size is larger than the value. Finally binlog file size will be the actual size maintained in the binlog header. Cons 1: Replication compatibility is broken because additional info needs to be added in the binlog header Cons 2: Performance will be suffered because writing to two locations is required per event. When I tested, appending 256bytes could be done 15000 times/sec, but (writing 8 bytes + appending 256bytes) could be done only 10000 times/sec. Cons 3: sync-binlog should be 1. Otherwise binlog size info inside header is no longer trustful. For example, lots of "successfully written" events might be trimmed if binlog size info is not written to a disk for a long time. Cons 4: It is not necessary an extra write per event. In fact, we should flush the binlog cache as usual and sync the file, do the extra write and sync again. The extra activity happens everytime we need to flush the binlog cache and not when we need to write an event. 2) Writing binlog event size info into binlog event header *after* binlog data is written (From Mats) Use zero for length (or event type) and write this single byte after the entire event has been written. This will allow the file to be scanned for the first zero length (or type) and decide the length of the file based on that. 3) Using binlog checksum functionality (WL#2540) Use checksums on each event and find the first event that does not have a correct checksum and decide the length of the file based on that. PROPOSED SOLUTION ================= Trim the crashed binlog file to last valid transaction or event (non-transaction) base on binlog size(valid_pos) when MYSQL server crashed in the middle of binlog. And add a temp file to make binlog index file to be crash safe.
HOW TO GET THE POSITION OF THE LAST VALID EVENT =============================================== Using introduced "actual_size" variable to record the last valid transaction or event(non-transaction) position, which will be got by checking the first incorrect checksum of transaction or event(non-transaction), or by checking the return value of Log_event::read_log_event() when the checksum option is disabled. HOW TO TRIM THE CRASHED BINLOG FILE ==================================== Truncate the crashed binlog file to the position of the last valid transaction or event(non-transaction). CLEAR LOG_EVENT_BINLOG_IN_USE_F FLAG ===================================== Clear the LOG_EVENT_BINLOG_IN_USE_F flag in the header of the binlog crashed after it is trimmed in the recovery. HOW TO GUARANTEE BINLOG INDEX TO BE CRASH SAFE =============================================== Create a temp file to save the updated data firstly and then rename the temp file to be binlog index file for guaranteeing crash safe when appending data to the binlog index file or purging it.
The checksum part will be embedded after WL#2540 is pushed into main tree. RECORD VALID POSITION FOR CRASHED BINLOG FILE ============================================== Record the position of the last "BEGIN" Query_log_event as the valid position when encountering an incorrect event. Record the position of the last event as the valid position if did not encounter any incorrect event and the "BEGIN" and "COMMIT" OR "XID COMMIT" EVENT will be pair before it. while ((ev= Log_event::read_log_event(log,0,fdle)) && ev->is_valid()) { /* Recored valid pos for crahsed binlog file which contains incorrect events. */ if (ev->get_type_code() == QUERY_EVENT && !strcmp(((Query_log_event*)ev)->query, "BEGIN")) *valid_pos= last_valid_pos; last_valid_pos= my_b_tell(log); if (ev->get_type_code() == XID_EVENT) { ... } delete ev; } /* Recored valid pos for crashed binlog file which did not contain incorrect events. */ if (!log->error) *valid_pos= last_valid_pos; TRIM THE CRASHED BINLOG FILE ============================= Truncate the crashed binlog file to valid position. /* Change binlog file size to valid_pos */ if (valid_pos < binlog_size) { if (my_chsize(file, valid_pos, 0, MYF(MY_WME))) { sql_print_error("Failed to trim the crashed binlog file " "when master server is recovering it."); mysql_file_close(file, MYF(MY_WME)); return -1; } else { sql_print_information("Crashed binlog file %s size is %llu, " "but recovered up to %llu. Binlog trimmed to %llu bytes.", log_name, binlog_size, valid_pos, valid_pos); } } CLEAR LOG_EVENT_BINLOG_IN_USE_F ================================ Clear LOG_EVENT_BINLOG_IN_USE_F flag for the crashed binlog file by writing the cleared flag to its header. /* Clear LOG_EVENT_BINLOG_IN_USE_F */ my_off_t offset= BIN_LOG_HEADER_SIZE + FLAGS_OFFSET; uchar flags= 0; if (mysql_file_pwrite(file, &flags, 1, offset, MYF(0)) != 1) { sql_print_error("Failed to clear LOG_EVENT_BINLOG_IN_USE_F " "for the crashed binlog file when master " "server is recovering it."); mysql_file_close(file, MYF(MY_WME)); return -1; } GUARANTEE CRASH SAFE WHEN APPENDING BINLOG FILE NAME TO BINLOG INDEX ==================================================================== Copy all the content of index file to the temp file firstly and then append the log file name to the temp file. Finally move the temp file to the index file. GUARANTEE CRASH SAFE WHEN PURGING BINLOG INDEX ==================================================================== Copy the content of index file from index_file_start_offset recored in log_info to the temp file firstly and then move the temp file to index file.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.