MySQL is not only a relational database, but can also be used as a sch
emaless/NOSQL document store, or a mix of both. This is realized by the JSON datatype, and is useful for any data that is hard to fit in the ”tabular” format of a traditional table. For instance, the inventory of a shop for gardening equipment might contain rows for various types of gloves, as well as rows for wheelbarrows. The glove rows might have size, material, and waterproofness properties, whereas rows for wheelbarrows might have capacity properties measured in both litres and kilograms. So not only the values but also the mere existence of those metadata fields differs between rows in the table. Such fields are conveniently stored inside a JSON column.

In MySQL 8.0.3, we made the replication of JSON much more efficient. The new feature, called Partial JSON Replication, ensures that an UPDATE statement that only modifies parts of a JSON document will not write the full document to the binary log, but rather only those parts that were actually changed. Thus, the bigger your JSON document is, and the smaller part you modify, the more space will you save by using this feature. This reduces the disk usage for binary logs and relay logs, as well as the amount of data that needs to be sent over the network. Therefore, the throughput of the entire system can improve so that you get more transactions per second for JSON-heavy workloads.
Requirements
The feature works when the UPDATE for the JSON column consists of the following JSON functions: JSON_SET, JSON_REPLACE, JSON_REMOVE. It also works for combinations of these functions. Here is an example:
|
1
2
3
4
5
|
-- Change capacity to 25 kg, and add a color attribute. UPDATE inventory SET data = JSON_SET(JSON_REPLACE(data, '$.capacity_kg', 25), '$.color', 'red') WHERE name = 'Wheelbarrow model X'; |
These JSON functions operate on a JSON document. The JSON document must be taken from the same column and row that is being updated. In other words, this does not work for copying values between different columns.
This is implemented for row-based replication (binlog_format=row); for statement-based replication such statements were already replicated efficiently.
Enabling the feature
The feature is disabled by default, in order to keep backward compatibility with any application that relies on having the full document in the binary log. To enable the feature, set this configuration option:
|
1 |
--binlog-row-value-options=PARTIAL_JSON |
You can of course set this dynamically, too, using SET GLOBAL binlog_row_value_options = PARTIAL_JSON (or SET SESSION or SET PERSIST depending on your needs). In all cases it requires the SYSTEM_VARIABLES_ADMIN or SUPER privilege.
In addition, note that by default all columns are included in row updates. An update consists of both a before_image (the row before the update) and an after_image (the row after the update). Partial JSON only applies to the after image, so this option alone can only remove half the full documents. To get rid of the other half, consider using this option together with --binlog-row-image=MINIMAL (or SET GLOBAL binlog_row_image = MINIMAL, or SET SESSION, or SET PERSIST). This ensures that full documents are not sent even in the before_image.
More human-readable mysqlbinlog
As part of this work, we also made mysqlbinlog print both full JSON documents and partial JSON updates correctly, when using the -v option to decode row events. This makes the output more human-readable (applying the binlog on a server always worked). Here is an example:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
$ mysqlbinlog -v /path/to/binlog.000016 […] # at 732 (0x2dc) #171010 20:10:11 server id 1 end_log_pos 797 CRC32 0x1965bb0f Table_map: `shop`.`inventory` mapped to number 87 # at 797 (0x31d) #171010 20:10:11 server id 1 end_log_pos 910 CRC32 0xdd58ff86 Write_rows: table id 87 flags: STMT_END_F BINLOG ' Aw3dWRMBAAAAQQAAAB0DAAAAAFcAAAAAAAEABHNob3AACWludmVudG9yeQAEA/wD9QICBA4BAQAC A/z/AA+7ZRk= Aw3dWR4BAAAAcQAAAI4DAAAAAFcAAAAAAAEAAgAE/wAvAAAAEwBXaGVlbGJhcnJvdyBtb2RlbCBY HAAAACwAAAAAAgArABIACgAcAAsADCcABcgAY2FwYWNpdHlfbGNhcGFjaXR5X2tnAzE0MIb/WN0= '/*!*/; ### INSERT INTO `shop`.`inventory` ### SET ### @1=47 ### @2='Wheelbarrow model X' ### @3=28 ### @4='{"capacity_l": "140", "capacity_kg": 200}' […] # at 1455 (0x5af) #171010 20:10:11 server id 1 end_log_pos 1520 CRC32 0x26f53e42 Table_map: `shop`.`inventory` mapped to number 87 # at 1520 (0x5f0) #171010 20:10:11 server id 1 end_log_pos 1602 CRC32 0x1936a04f Update_rows_partial: table id 87 flags: STMT_END_F BINLOG ' Aw3dWRMBAAAAQQAAAPAFAAAAAFcAAAAAAAEABHNob3AACWludmVudG9yeQAEA/wD9QICBA4BAQAC A/z/AEI+9SY= Aw3dWScBAAAAUgAAAEIGAAAAAFcAAAAAAAEAAgAEAQgALwAAAAEBACIAAAAADSQuY2FwYWNpdHlf a2cDBRkAAQckLmNvbG9yBQwDcmVkT6A2GQ== '/*!*/; ### UPDATE `shop`.`inventory` ### WHERE ### @1=47 ### SET ### @4=JSON_INSERT( ### JSON_REPLACE(@4, '$.capacity_kg', 25), ### '$.color', 'red') |
Low-level format
If you are into the bits and bytes of the binary log, here is a quick overview of the format (most users do not need to know this):
- When
binlog_row_value_options=PARTIAL_JSON, we use a new event type code,PARTIAL_UPDATE_ROWS_EVENT, with the numeric value 39. - At the heart of any row update in the binary log are the
before_imageand theafter_image. Thebefore_imagecontains values for columns that the applier can use to find the row in the table (e.g. the primary key), and theafter_imagecontains new values that will be written to the row. For events with the type codePARTIAL_UPDATE_ROWS_EVENT, we additionally store a so-calledshared_image, between thebefore_imageand theafter_image. Theshared_imagehas one of the following structures:

or

wherevalue_optionsis a single byte with the value 0 or 1, andpartial_columnsis a bitmap having one bit for each JSON column. Whenvalue_options=0, all JSON columns are stored in full format. Whenvalue_options=1, all JSON columns included in the before_image are still stored in full format, whereas JSON columns included in the after_image may be stored in either partial format or full format. The nth bit ofpartial_columnsis 1 if the nth JSON column in the after_image is stored in partial format, and 0 if stored in full format. (We count bits starting at the least significant bit of the first byte.) - The format for JSON columns stored in full format is the same as in 5.7 and earlier:

wherelengthis the length of data in bytes in 32 bit little endian format, anddatais a binary encoding of JSON. - The format for JSON columns in partial format is:

wherelengthis total the length of all thediffifields, in bytes in 32 bit little endian format, and eachdiffifield has one of the following forms:

or

where:-
operationis a 1 byte constant. It is 0 when the object atpathwill be replaced bydata; 1 whendatawill be inserted atpathwithout replacing existing objects; and 2 when the object atpathwill be removed. -
path_lengthis the length in bytes ofpath, anddata_lengthis the length in bytes ofdata. These use a variable-length integer encoding which occupies 1, 3, 4, or 9 bytes (see the source code inmysys/pack.cc). -
pathis the JSON path, as a string in utf8mb4 format (no null termination). -
datais the JSON document that will be inserted, in binary JSON format.
-
In case the partial format would take more space than the full document, it will fall back on storing the full document instead.
See also WL#2955 for details.
Summary
When you update only a small part of a JSON document using JSON_SET, JSON_REMOVE, or JSON_REPLACE, we now replicate only that small part, and not the remaining, unchanged parts of the document, if you have previously set binlog_row_value_options=PARTIAL_JSON. This saves disk space.