WL#2955: RBR replication of partial JSON updates

Affects: Server-8.0   —   Status: Complete

==== Executive Summary ====

MySQL shall replicate small updates of big JSON documents more space
efficiently.  More precisely, when using RBR, we will write only
modified parts of JSON documents, instead of the whole JSON document.

==== User stories ====

U1. As a DBA, I do partial updates to JSON values, and I don't use
    replication. Then I enable (row-based) replication. That causes
    performance to drop and disk usage to grow. The reason is that
    the full JSON document is written to the binary log and replicated
    to slaves.  (Writes of full JSON documents were not done by the
    single server because InnoDB supports partial JSON updates.)

    So enabling replication may give a performance degradation.
    The effect will be reduced when we replicate partial JSON documents
    and user sets binlog_row_image=MINIMAL.

U2. As a DBA, I do partial updates to JSON documents, and I use
    statement-based replication. Then I switch to row-based
    replication. Then performance drops and the binary logs grow. The
    reason is that the full JSON document is written to the binary log
    and replicated to slaves. (Such writes of full JSON documents were
    not done by SBR.)

    So switching from SBR to RBR may give a performance degradation.
    The effect will be reduced if we replicate partial JSON documents
    and user sets binlog_row_image=MINIMAL.

U3. As a DBA, I use row-based replication, and I don't use JSON. Then
    I start to use JSON and do partial updates. Then my slave starts to
    fall behind / lag. The reason is that the slave needs to hand the
    full JSON document to the engine, as well as handle the full JSON
    document a couple of extra times in the replication pipeline. (The
    master makes the partial update more efficiently.)

    So doing partial JSON updates can cause slave lag.
    The effect will be reduced if we replicate partial JSON documents
    and user sets binlog_row_image=MINIMAL.

U4. As a DBA, I would like to use both replication and partial updates
    to JSON documents, but I cannot do that in 5.7 because it is too
    slow. I upgrade to 8.0 because it is supposed to optimize the
    partial JSON documents. I get disappointed because this did not help
    much. (The slave is equally slow, and the master is not so much
    faster because it writes full JSON documents to the binary log.)

    So the InnoDB optimization in 8.0 did not help much.  The positive
    effect will be more visible if we replicate partial JSON documents.

U5. As a power user, I have various scripts that mine the binary log,
    extract the after-image, and do interesting things. I do not want
    to update my scripts, and/or my scripts can only work if the full
    after-image is there.

    So partial JSON replication should be optional.

U6. Both optimizer and NDB are interested in using binary diff
    algorithms to implement partial blob replication for blob updates.
    This will make blob updates more efficient in InnoDB and NDB,
    respectiely (for roughly the same reasons as explained in U1-U4).

    So the user interface and the binary log format should be extensible
    so that we can add support for other data types in the future.