MySQL 9.0.0
Source Code Documentation
Format of redo log

Overview

Redo log contains multiple log files, each has the same format. Consecutive files have data for consecutive ranges of lsn values. When a file ends at end_lsn, the next log file begins at the end_lsn. There is a fixed number of log files, they are re-used in circular manner. That is, for the last log file, the first log file is a successor.

Note
A single big file would remain fully cached for some of file systems, even if only a small fragment of the file is being modified. Hence multiple log files are used to make evictions always possible. Keep in mind though that log files are used in circular manner (lsn modulo size of redo log files, when size is calculated except the log file headers).

The log file names are: _::ib_redo0_, _::ib_redo1_, ... and they are stored in subdirectory innodb_redo, which is located inside the directory specified by the innodb_log_group_home_dir (or in the datadir if not specified).

Whenever a new log file is being created, it is created first with the _tmp suffix in its name. When the file is prepared, it becomes renamed (the suffix is removed from the name).

When a new data directory is being initialized, all log files that are being created, have LOG_HEADER_FLAG_NOT_INITIALIZED flag enabled in the log_flags field in the header. After the data directory is initialized, this flag is disabled (file header is re-flushed for the newest log file then).

File header contains the log_uuid field. It is a randomly chosen value when the data directory is being initialized. It is used to detect situation, in which user mixed log files from different data directories.

File header contains also start_lsn - this is start_lsn of the first log block within that file.

Log file format

Header of log file

Log file starts with a header of LOG_FILE_HDR_SIZE bytes. It contains:

  • Initial block of OS_FILE_LOG_BLOCK_SIZE (512) bytes, which has:
    • Binding of an offset within the file to the lsn value.

      This binding allows to map any lsn value which is represented within the file to corresponding lsn value.

  • Format of redo log - remains the same as before the patch.
  • Checksum of the block.
  • Two checkpoint blocks - LOG_CHECKPOINT_1 and LOG_CHECKPOINT_2.

    Each checkpoint block contains OS_FILE_LOG_BLOCK_SIZE bytes:

    • checkpoint_lsn - lsn to start recovery at.

      Note
      In earlier versions than 8.0, checkpoint_lsn pointed directly to the beginning of the first log record group, which should be recovered (but still the related page could have been flushed). However since 8.0 this value might point to some byte inside a log record. In such case, recovery is supposed to skip the group of log records which contains the checkpoint lsn (and start at the beginning of the next). We cannot easily determine beginning of the next group. There are two cases:
      • block with checkpoint_lsn has no beginning of group at all (first_rec_group = 0) - then we search forward for the first block that has non-zero first_rec_group and there we have the next group's start,
      • block with checkpoint_lsn has one or more groups of records starting inside the block - then we start parsing at the first group that starts in the block and keep parsing consecutive groups until we passed checkpoint_lsn; we don't apply these groups of records (we must not because of fil renames); after we passed checkpoint_lsn, the next group that starts is the one we were looking for to start recovery at; it is possible that the next group begins in the next block (if there was no more groups starting after checkpoint_lsn within the block)
    • checkpoint_no - checkpoint number - when checkpoint is being written, a next checkpoint number is assigned.
    • log.buf_size - size of the log buffer when the checkpoint write was started.

      It remains a mystery, why do we need that. It's neither used by the recovery, nor required for MEB. Some rumours say that maybe it could be useful for auto-config external tools to detect what configuration of MySQL should be used.

      Note
      Note that size of the log buffer could be decreased in runtime, after writing the checkpoint (which was not the case, when this field was being introduced).

      There are two checkpoint headers, because they are updated alternately. In case of crash in the middle of any such update, the alternate header would remain valid (so it's the same reason for which double write buffer is used for pages).

    Remarks
    Each log file has its own header. Checkpoints defined in checkpoint headers always refer to LSN values within that file. During the recovery one should find the file with the newest checkpoint.

Log blocks

After the header, there are consecutive log blocks. Each log block has the same format and consists of OS_FILE_LOG_BLOCK_SIZE bytes (512). These bytes are enumerated by lsn values.

Note
Bytes used by headers of log files are NOT included in lsn sequence.

Each log block contains:

  • header - LOG_BLOCK_HDR_SIZE bytes (12):
    • hdr_no

      This is a block number. Consecutive blocks have consecutive numbers. Hence this is basically lsn divided by OS_FILE_LOG_BLOCK_SIZE. However it is also wrapped at 1G (due to limited size of the field). It should be possible to wrap it at 2G (only the single flush bit is reserved as the highest bit) but for historical reasons it is 1G.

    • data_len

      Number of bytes within the log block. Possible values:

      • 0 - this is an empty block (end the recovery).
      • OS_FILE_LOG_BLOCK_SIZE - this is a full block.
      • value within [LOG_BLOCK_HDR_SIZE, OS_FILE_LOG_BLOCK_SIZE - LOG_BLOCK_TRL_SIZE), which means that this is the last block and it is an incomplete block.

        This could be then considered an offset, which points to the end of the data within the block. This value includes LOG_BLOCK_HDR_SIZE bytes of the header.

    • first_rec_group

      Offset within the log block to the beginning of the first group of log records that starts within the block or 0 if none starts. This offset includes LOG_BLOCK_HDR_SIZE bytes of the header.

    • epoch_no

      Log epoch number. Set by the log writer thread just before a write starts for the block. For details

      See also
      LOG_BLOCK_HDR_EPOCH_NO.

      It could be used during recovery to detect that we have read old block of redo log (tail) because of the wrapped log files.

  • data part - bytes up to data_len byte.

    Actual data bytes are followed by 0x00 if the block is incomplete.

    Note
    Bytes within this fragment of the block, are enumerated by sn sequence (whereas bytes of header and trailer are NOT). This is the only difference between sn and lsn sequences (lsn enumerates also bytes of header and trailer).
  • trailer - LOG_BLOCK_TRL_SIZE bytes (4):
    • checksum

      Algorithm used for the checksum depends on the configuration. Note that there is a potential problem if a crash happened just after switching to "checksums enabled". During recovery some log blocks would have checksum = LOG_NO_CHECKSUM_MAGIC and some would have a valid checksum. Then recovery with enabled checksums would point problems for the blocks without valid checksum. User would have to disable checksums for the recovery then.

Remarks
All fields except first_rec_group are updated by the log writer thread just before writing the block.