There are several versions of the binary log file format:
v1: Used in MySQL 3.23
v3: Used in MySQL 4.0.2 though 4.1
v4: Used in MySQL 5.0 and up
A v2 format was used briefly (in early MySQL 4.0.x versions), but it is obsolete and no longer supported.
Programs that process the binary log must be able to account for each of the supported binary log formats. This section describes how the server distinguishes each format to identify which one a binary log file uses. mysqlbinlog uses the same principles.
Important constants:
START_EVENT_V3 = 1
FORMAT_DESCRIPTION_EVENT = 15
EVENT_TYPE_OFFSET = 4
EVENT_LEN_OFFSET = 9
ST_SERVER_VER_LEN = 50
A binary log file begins with a 4-byte magic number followed by an initial descriptor event that identifies the format of the file.
In v1 and v3, this event is called a "start event."
In v4, it is called a "format description event."
Elsewhere you may see either term used generically to refer collectively to both types of event. This discussion uses "descriptor event" as the collective term.
The header and data parts of the descriptor event for each binary log format version are shown following. The diagrams use the same conventions as those described earlier in Event Structure.
v1 start event (size = 69 bytes):
+=====================================+ | event | timestamp 0 : 4 | | header +----------------------------+ | | type_code 4 : 1 | = START_EVENT_V3 = 1 | +----------------------------+ | | server_id 5 : 4 | | +----------------------------+ | | event_length 9 : 4 | = 69 +=====================================+ | event | binlog_version 13 : 2 | = 1 | data +----------------------------+ | | server_version 15 : 50 | | +----------------------------+ | | create_timestamp 65 : 4 | +=====================================+
v3 start event (size = 75 bytes):
+=====================================+ | event | timestamp 0 : 4 | | header +----------------------------+ | | type_code 4 : 1 | = START_EVENT_V3 = 1 | +----------------------------+ | | server_id 5 : 4 | | +----------------------------+ | | event_length 9 : 4 | = 75 | +----------------------------+ | | next_position 13 : 4 | | +----------------------------+ | | flags 17 : 2 | +=====================================+ | event | binlog_version 19 : 2 | = 3 | data +----------------------------+ | | server_version 21 : 50 | | +----------------------------+ | | create_timestamp 71 : 4 | +=====================================+
v4 format description event (size ≥ 91 bytes; the size is 76 + the number of event types):
+=====================================+ | event | timestamp 0 : 4 | | header +----------------------------+ | | type_code 4 : 1 | = FORMAT_DESCRIPTION_EVENT = 15 | +----------------------------+ | | server_id 5 : 4 | | +----------------------------+ | | event_length 9 : 4 | >= 91 | +----------------------------+ | | next_position 13 : 4 | | +----------------------------+ | | flags 17 : 2 | +=====================================+ | event | binlog_version 19 : 2 | = 4 | data +----------------------------+ | | server_version 21 : 50 | | +----------------------------+ | | create_timestamp 71 : 4 | | +----------------------------+ | | header_length 75 : 1 | | +----------------------------+ | | post-header 76 : n | = array of n bytes, one byte per event | | lengths for all | type that the server knows about | | event types | +=====================================+
In all binary log versions, the event data for the descriptor event begins with a common set of fields
binlog_version
The binary log version number (1, 3, or 4).
server_version
The server version as a string.
create_timestamp
The creation timestamp, if nonzero, is the time in seconds when this event was created; it indicates the moment when the binary log was created. This field is actually of no value: If nonzero, it is redundant because it has the same value that is in the header timestamp.
Note: In practice, the creation timestamp field should be considered reserved for future use and programs should not rely on its value. This field may be commandeered in the future to serve another purpose.
The v4 format descriptor event data contains two additional fields that enable interpretation of other types of events:
header_length
The length of the event header. This value includes the
extra_headers field, so this header length - 19
yields the size of the extra_headers field.
Currently in v4, the header length (at offset 75) is 19, which
means that in other events, no extra headers will follow the
flags field. If in the future the header length
is some value x > 19, then x-19 extra header bytes will appear
in other events in the extra_headers field
following the flags field.
Note: The
FORMAT_DESCRIPTION_EVENT itself contains no
extra_headers field. Suppose that the FDE did
have a header_length field after the
flags field. That would introduce this problem:
The value of x is given in the
header_length field, which occurs in a
position later than where the extra_headers
field would be.
Until you know the value of x, you cannot know the exact
offset of the header_length field.
In other words, you would need to know x to find the
header_length field, but you cannot know x
until you read the header_length field. (A
circular dependency.) This means that the event extensibility
mechanism afforded by the FDE does not apply to the FDE itself,
which therefore is not itself extensible.
post-header lengths
The lengths for the fixed data part of each event. This is an
array that provides post-header lengths for all events beginning
with START_EVENT_V3 (type code 1). The array
includes no length for UNKNOWN_EVENT (type code
0).
Given any binary log file, the information in this section describes how to determine the format in which it is written.
Some important points about descriptor event formats:
The v1 header fields are common to all formats. (v3 and v4
headers begin with the v1 header fields, and add
next_position and
flags fields.)
The v3 and v4 headers contain the same fields. The data part for v3 and v4 differs, such that the v4 data part enables extensions to the format without having to modify the header.
It would be possible to ascertain the binary log version
simply by reading the two binlog_version
bytes, were it not for the fact that these bytes occur at a
different position in v1 compared to v3/v4 (position 13
versus 19). Therefore, it's necessary to determine whether
the first event in a file represents a v1-format start
event.
To determine the version of a binary log file, use the following procedure:
1) The file begins with a 4-byte magic number. Skip over that to get to the first event in the file (which in most cases is a start event or format description event).
2) From the first event, read two values:
The 1-byte type code at position
EVENT_TYPE_OFFSET (4) within the event.
The 4-byte event length at position
EVENT_LEN_OFFSET (9) within the event.
3) If the type code is not START_EVENT_V3 or
FORMAT_DESCRIPTION_EVENT, the file format is
v3. (See Exceptional Condition 1 later in this section.)
4) If the type code is START_EVENT_V3 (1),
check the event length. If the length is less than 75, the file
format is v1, and v3 otherwise. Why the value 75? Because that
is the length of a v3 start event:
header (19 bytes)
binlog version (2 bytes)
server version (ST_SERVER_VER_LEN = 50
bytes)
timestamp (4 bytes)
Summing those lengths yields 19 + 2 + 50 + 4 = 75
Therefore, if the event is shorter than 75 bytes, it must be from a v1 file because that will have a shorter first event than a v3 file.
5) If the type code is
FORMAT_DESCRIPTION_EVENT (15), the file
format is v4.
The preceding steps describe the general binary log format-recognition principles. However, there are some exceptional conditions that must be accounted for:
Exceptional Condition 1: In MySQL 4.0 and 4.1, the initial event
in a binary log file might not be a start event. This occurs
because the server writes the start event only to the first
binary log file that it creates after startup. For subsequent
files, the server writes an event of type
ROTATE_EVENT to the end of the current log
file, closes it, and the begins the next file without writing a
start event to it. If a log file begins with an event that is
not START_EVENT_V3 or
FORMAT_DESCRIPTION_EVENT, it can be assumed
to be a v3 file because this behavior occurs only in MySQL 4.0
and 4.1, and all servers in those versions use v3 format.
Exceptional Condition 2: In MySQL 5.1 and 5.2, several early versions wrote binary log files using v4 format, but using different event numbers from those currently used in v4. Therefore, when the FDE is read and discovered to be v4, it is also necessary to read the server version, which is a string that occurs at position 21. If the version is one of those in the set of affected versions, event renumbering occurs such that events read from the file are mapped onto the current v4 event numbering.
To enable any future binary log formats to be correctly understood, the following conventions must hold:
a) The binary log file must start with a descriptor event
b) The descriptor event must start with a v3 header (19 bytes)
c) The 2 bytes following the header (at position 19) must contain the binary log format version number
With respect to the current formats, only a) holds for v1.
However, as indicated earlier, v1-format files can be recognized
from the initial event in the file, by a type code of
START_EVENT_V3 and an event length less than
75.
The v4 format description event is designed so that it can handle future format updates. A new format with the same layout of event packets as in v4 but with additional fields in the header and post-headers can use this format description event to correctly describe itself. Actually, it is (theoretically) possible to have different "flavors" of v4 format that have different (larger) header lengths and even a different number of events.
The current code is written to handle this possibility. That is, any code that parses a binary log and discovers that it is v4 uses the header lengths as given by the format description event (thus potentially different lengths from the values hard-wired in the server code).
Note: Although headers of events in v4 format can be longer than
19 bytes, the format description event is an exception. Its
header is always 19 bytes long to meet the preceding backward
compatibility requirements. That is, the
FORMAT_DESCRIPTION_EVENT does not include an
extra_headers field.
