MySQL 8.0.2 introduced a small yet powerful per transaction metadata information containing the transaction length in binary logs.
MySQL binary logs are being used for many other things than MySQL replication or backup/recovery: replicate to Hadoop; replicate to other databases, such as Oracle; capture data change (CDC) and extract-transform-load (ETL); record change notification for cache invalidation; change tracking for differential backups; etc.
A quick search about “binlog” into Github listed entries about projects parsing MySQL binary log event streams in many programming languages: Ruby, PHP, Python, NodeJS, Java, Golang, etc.
Identifying that the stream reached the end of a transaction
Parsing transactions when reading from a binary log event stream (either from a file or directly from a replication connection to a MySQL server) requires a good knowledge of how transactions can be represented in order to correctly identify the transactions boundaries. This is a use case for tools wanting to give some acknowledgment when receiving (and processing) a full transaction: the tool is in the beginning of a transaction in a binary log event stream and will take an action when reaching the end of the transaction.
A transaction can be represented in many ways in a binary log event stream.
A DDL will start with a GTID event, followed by zero or more pre-statement events, followed by a single QUERY event. Pre-statement events are INTVAR, RAND and USER_VAR events.
A DML will start with a GTID event, followed by a QUERY(BEGIN) event, followed by sets of either QUERY events (with their own pre-statement events) or TABLE_MAP and ROWS events, followed by a QUERY(COMMIT|ROLLBACK) or a XID event.
A XA DML will start with a GTID event, followed by a QUERY(XA START) event, followed by sets of either QUERY events (with their own pre-statement events) or TABLE_MAP and ROWS events, followed by a QUERY(XA END) event, followed by a XA_PREPARE event.
Not putting all the cases here, the problem with this logic is that we need a parser to detect that a given event was the last one of a transaction.
MySQL 5.7 introduced a Transaction_boundary_parser to perform this parsing as accurately as possible. It solved the problem, but it requires processing (at least the header) of all events belonging to the transaction.
A tool could try to be “smart” by assuming that once it receives a new GTID event it means that the previous event was the last of the previous transaction, but this will relate a transaction acknowledge to receiving more that the transaction content itself (how to handle the last transaction of a stream?).
Since MySQL 8.0.2 the GTID event carry a new field: the transaction_length
. Its value represents the full transaction size in bytes, from the beginning of the GTID event itself up to the end of the last event of the transaction.
The transaction_length
field can use 1 to 9 bytes to be stored, depending on the transaction size. See the table below:
Transaction Size (in bytes) |
transaction_length field size |
---|---|
< 251 | 1 |
>= 251 and < 65536 | 3 |
>= 65536 and < 16777216 | 4 |
>= 16777216 | 9 |
With this information, a tool reading from a binary log event stream can rely on GTID’s transaction_length
field to define the amount of bytes it should receive until reaching the end of the transaction.
Extracting a given transaction
Searching for specific transactions in a file to extract it would also require a good knowledge of how transactions can be represented, and would also require to iterate over the file, event by event, to determine the transaction boundaries: a tool will first find the beginning of the transaction and then should follow the same pattern as described in Identifying that the stream reached the end of a transaction.
Until MySQL 8.0.2, the logic to extract a given transaction from a binary log file had to iterate over all binary log events until finding the GTID of the desired transaction, and then, had to continue iterating but now parsing the events until finding the end of the transaction.
Since MySQL 8.0.2, the logic is more straightforward (and more efficient too). A tool can jump between GTIDs without reading intermediate events by using the GTID transaction_length
field. And once reaching the desired GTID, the tool can copy the transaction_length
bytes from the beginning of the transaction to extract the whole transaction.
mysqlbinlog output in MySQL 8.0
Let’s take a look at a binary log dump of MySQL 8.0.2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #171018 22:31:36 server id 1 end_log_pos 124 CRC32 0xad2659f5 Start: binlog v 4, server v 8.0.3-rc-log created 171018 22:31:36 at startup ROLLBACK/*!*/; BINLOG ' OMjnWQ8BAAAAeAAAAHwAAAAAAAQAOC4wLjMtcmMtbG9nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAA4yOdZEwANAAgAAAAABAAEAAAAYAAEGggAAAAICAgCAAAACgoKKioAEjQA CgH1WSat '/*!*/; # at 124 #171018 22:31:36 server id 1 end_log_pos 155 CRC32 0x6cbd7706 Previous-GTIDs # [empty] ... # at 547 #171018 22:31:42 server id 1 end_log_pos 622 CRC32 0xe08cb684 GTID last_committed=2 sequence_number=3 rbr_only=yes original_committed_timestamp=1508362302617078 immediate_commit_timestamp=1508362302617078 transaction_length=285 /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; # original_commit_timestamp=1508362302617078 (2017-10-18 22:31:42.617078 WEST) # immediate_commit_timestamp=1508362302617078 (2017-10-18 22:31:42.617078 WEST) /*!80001 SET @@session.original_commit_timestamp=1508362302617078*//*!*/; SET @@SESSION.GTID_NEXT= 'b258feab-b44b-11e7-9839-e4b318a30e85:3'/*!*/; # at 622 #171018 22:31:42 server id 1 end_log_pos 694 CRC32 0x9ece2d81 Query thread_id=7 exec_time=0 error_code=0 SET TIMESTAMP=1508362302/*!*/; BEGIN /*!*/; # at 694 #171018 22:31:42 server id 1 end_log_pos 749 CRC32 0x46a7fa42 Table_map: `test`.`t1` mapped to number 67 # at 749 #171018 22:31:42 server id 1 end_log_pos 801 CRC32 0xc5f325fe Write_rows: table id 67 flags: STMT_END_F BINLOG ' PsjnWRMBAAAANwAAAO0CAAAAAEMAAAAAAAEABHRlc3QAAnQxAAID/AECAgEBAAID/P8AQvqnRg== PsjnWR4BAAAANAAAACEDAAAAAEMAAAAAAAEAAgAC/wABAAAACgBhYWFhYWFhYWFh/iXzxQ== '/*!*/; ### INSERT INTO `test`.`t1` ### SET ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2='aaaaaaaaaa' /* BLOB/TEXT meta=2 nullable=1 is_null=0 */ # at 801 #171018 22:31:42 server id 1 end_log_pos 832 CRC32 0x0a56f85e Xid = 9 COMMIT/*!*/; # at 832 #171018 22:31:42 server id 1 end_log_pos 907 CRC32 0x3512a41f GTID last_committed=3 sequence_number=4 rbr_only=yes original_committed_timestamp=1508362302626108 immediate_commit_timestamp=1508362302626108 transaction_length=295 |
We notice that the GTID event has more information than in 5.7. Two new fields (original and immediate commit timestamps) were introduced by the work described in “New monitoring replication features and more!”. We notice also the transaction_length
field (with 285 bytes for the transaction ‘:3’). Yes! This is the new GTID event transaction_length
field!!!
Note: the MySQL 8 mysqbinlog
client tool will display 0 as transaction length for the GTID events without the information (i.e. from MySQL 5.7 binary log files).
Conclusion
Tools working with MySQL 8.0 binary log events streams relying on new GTID’s transaction_length
field can accomplish their work more efficiently without the labor of parsing binary log events just to know the proper transaction boundaries.
Using MySQL 8 binary logs, a tool that needs to find transaction boundaries can now just read the length from the GTID event. Before, the tool had to read each event of the transaction and maintain a state machine to parse the transaction structure.
Using MySQL 8 binary logs, a tool that seeks to a GTID now just needs to read each GTID event, and can quickly jump directly to the next GTID event using the length information. Before, the tool had to read the header of each event within the transactions.