MySQL is not only a relational database, but can also be used as a schemaless/NOSQL document store, or a mix of both. This is realized by the JSON datatype, and is useful for any data that is hard to fit in the ”tabular” format of a traditional table. For instance, the inventory of a shop for gardening equipment might contain rows for various types of gloves, as well as rows for wheelbarrows. The glove rows might have size, material, and waterproofness properties, whereas rows for wheelbarrows might have capacity properties measured in both litres and kilograms. So not only the values but also the mere existence of those metadata fields differs between rows in the table. Such fields are conveniently stored inside a JSON column.
In MySQL 8.0.3, we made the replication of JSON much more efficient. The new feature, called Partial JSON Replication, ensures that an UPDATE
statement that only modifies parts of a JSON document will not write the full document to the binary log, but rather only those parts that were actually changed. Thus, the bigger your JSON document is, and the smaller part you modify, the more space will you save by using this feature. This reduces the disk usage for binary logs and relay logs, as well as the amount of data that needs to be sent over the network. Therefore, the throughput of the entire system can improve so that you get more transactions per second for JSON-heavy workloads.
Requirements
The feature works when the UPDATE
for the JSON column consists of the following JSON functions: JSON_SET
, JSON_REPLACE
, JSON_REMOVE
. It also works for combinations of these functions. Here is an example:
1
2
3
4
5
|
-- Change capacity to 25 kg, and add a color attribute. UPDATE inventory SET data = JSON_SET(JSON_REPLACE(data, '$.capacity_kg', 25), '$.color', 'red') WHERE name = 'Wheelbarrow model X'; |
These JSON functions operate on a JSON document. The JSON document must be taken from the same column and row that is being updated. In other words, this does not work for copying values between different columns.
This is implemented for row-based replication (binlog_format=row
); for statement-based replication such statements were already replicated efficiently.
Enabling the feature
The feature is disabled by default, in order to keep backward compatibility with any application that relies on having the full document in the binary log. To enable the feature, set this configuration option:
1 |
--binlog-row-value-options=PARTIAL_JSON |
You can of course set this dynamically, too, using SET GLOBAL binlog_row_value_options = PARTIAL_JSON
(or SET SESSION
or SET PERSIST
depending on your needs). In all cases it requires the SYSTEM_VARIABLES_ADMIN
or SUPER
privilege.
In addition, note that by default all columns are included in row updates. An update consists of both a before_image (the row before the update) and an after_image (the row after the update). Partial JSON only applies to the after image, so this option alone can only remove half the full documents. To get rid of the other half, consider using this option together with --binlog-row-image=MINIMAL
(or SET GLOBAL binlog_row_image = MINIMAL
, or SET SESSION
, or SET PERSIST
). This ensures that full documents are not sent even in the before_image.
More human-readable mysqlbinlog
As part of this work, we also made mysqlbinlog
print both full JSON documents and partial JSON updates correctly, when using the -v
option to decode row events. This makes the output more human-readable (applying the binlog on a server always worked). Here is an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
$ mysqlbinlog -v /path/to/binlog.000016 […] # at 732 (0x2dc) #171010 20:10:11 server id 1 end_log_pos 797 CRC32 0x1965bb0f Table_map: `shop`.`inventory` mapped to number 87 # at 797 (0x31d) #171010 20:10:11 server id 1 end_log_pos 910 CRC32 0xdd58ff86 Write_rows: table id 87 flags: STMT_END_F BINLOG ' Aw3dWRMBAAAAQQAAAB0DAAAAAFcAAAAAAAEABHNob3AACWludmVudG9yeQAEA/wD9QICBA4BAQAC A/z/AA+7ZRk= Aw3dWR4BAAAAcQAAAI4DAAAAAFcAAAAAAAEAAgAE/wAvAAAAEwBXaGVlbGJhcnJvdyBtb2RlbCBY HAAAACwAAAAAAgArABIACgAcAAsADCcABcgAY2FwYWNpdHlfbGNhcGFjaXR5X2tnAzE0MIb/WN0= '/*!*/; ### INSERT INTO `shop`.`inventory` ### SET ### @1=47 ### @2='Wheelbarrow model X' ### @3=28 ### @4='{"capacity_l": "140", "capacity_kg": 200}' […] # at 1455 (0x5af) #171010 20:10:11 server id 1 end_log_pos 1520 CRC32 0x26f53e42 Table_map: `shop`.`inventory` mapped to number 87 # at 1520 (0x5f0) #171010 20:10:11 server id 1 end_log_pos 1602 CRC32 0x1936a04f Update_rows_partial: table id 87 flags: STMT_END_F BINLOG ' Aw3dWRMBAAAAQQAAAPAFAAAAAFcAAAAAAAEABHNob3AACWludmVudG9yeQAEA/wD9QICBA4BAQAC A/z/AEI+9SY= Aw3dWScBAAAAUgAAAEIGAAAAAFcAAAAAAAEAAgAEAQgALwAAAAEBACIAAAAADSQuY2FwYWNpdHlf a2cDBRkAAQckLmNvbG9yBQwDcmVkT6A2GQ== '/*!*/; ### UPDATE `shop`.`inventory` ### WHERE ### @1=47 ### SET ### @4=JSON_INSERT( ### JSON_REPLACE(@4, '$.capacity_kg', 25), ### '$.color', 'red') |
Low-level format
If you are into the bits and bytes of the binary log, here is a quick overview of the format (most users do not need to know this):
- When
binlog_row_value_options=PARTIAL_JSON
, we use a new event type code,PARTIAL_UPDATE_ROWS_EVENT
, with the numeric value 39. - At the heart of any row update in the binary log are the
before_image
and theafter_image
. Thebefore_image
contains values for columns that the applier can use to find the row in the table (e.g. the primary key), and theafter_image
contains new values that will be written to the row. For events with the type codePARTIAL_UPDATE_ROWS_EVENT
, we additionally store a so-calledshared_image
, between thebefore_image
and theafter_image
. Theshared_image
has one of the following structures:
or
wherevalue_options
is a single byte with the value 0 or 1, andpartial_columns
is a bitmap having one bit for each JSON column. Whenvalue_options
=0, all JSON columns are stored in full format. Whenvalue_options
=1, all JSON columns included in the before_image are still stored in full format, whereas JSON columns included in the after_image may be stored in either partial format or full format. The nth bit ofpartial_columns
is 1 if the nth JSON column in the after_image is stored in partial format, and 0 if stored in full format. (We count bits starting at the least significant bit of the first byte.) - The format for JSON columns stored in full format is the same as in 5.7 and earlier:
wherelength
is the length of data in bytes in 32 bit little endian format, anddata
is a binary encoding of JSON. - The format for JSON columns in partial format is:
wherelength
is total the length of all thediffi
fields, in bytes in 32 bit little endian format, and eachdiffi
field has one of the following forms:
or
where:-
operation
is a 1 byte constant. It is 0 when the object atpath
will be replaced bydata
; 1 whendata
will be inserted atpath
without replacing existing objects; and 2 when the object atpath
will be removed. -
path_length
is the length in bytes ofpath
, anddata_length
is the length in bytes ofdata
. These use a variable-length integer encoding which occupies 1, 3, 4, or 9 bytes (see the source code inmysys/pack.cc
). -
path
is the JSON path, as a string in utf8mb4 format (no null termination). -
data
is the JSON document that will be inserted, in binary JSON format.
-
In case the partial format would take more space than the full document, it will fall back on storing the full document instead.
See also WL#2955 for details.
Summary
When you update only a small part of a JSON document using JSON_SET
, JSON_REMOVE
, or JSON_REPLACE
, we now replicate only that small part, and not the remaining, unchanged parts of the document, if you have previously set binlog_row_value_options=PARTIAL_JSON
. This saves disk space.