WL#11328: InnoDB: Optimizing Small Changes to BLOBs

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

This is a follow up worklog for WL#8960.  In WL#8960, the minimum change to a
BLOB is a single LOB page.  Even if only a few bytes are modified, minimum one
LOB page will be modified.  So, there is scope for improvement for small changes
to the BLOBs.

The solution provided by WL#8960 is a general solution suitable for all sizes of
BLOBs and for all size of changes (from few bytes to even 1GB of changes).  But
it is less efficient for small changes done to the BLOBs.  Also, in the case of
WL#8960, the old versions of BLOBs are stored in the BLOB itself.  Undo log
format is not changed.

To optimize small changes to the BLOBs, we plan to do the regular undo logging.
 For this the undo log format needs to be changed.  When a BLOB is
modified/updated, then we need to store the old and new portion of the BLOB in
the undo log record.  Currently there is a restriction that the undo log record
must fit within an undo log page.  We need to perform our operation within that
constraint. 
This WL is for performance improvement. There are no functional requirements. 
The performance must improve for the targeted use case.  For other scenarios,
the performance remains unaffected.  
Design #1: Extending the UNDO RECORD for update operation
---------------------------------------------------------

Changing the undo log record format.  In the current undo log record format,
there is 1 byte of type_cmpl flag.  In this flag 1 bit is available for use.  I
am introducing TRX_UNDO_MODIFY_BLOB to represent the bit.

In all the older versions of datafiles, this bit will be 0.  If this bit is set
to 1, then I am introducing one more 1 byte flag next to the type_cmpl flag.
This new flag can be used for future extensions.  Having the
TRX_UNDO_MODIFY_BLOB bit set also means that the undo log record has provisions
to represent partial update of BLOBs.

Refer this diagram for undo record format of update operation:

https://github.com/jeremycole/innodb_diagrams/blob/master/images/InnoDB_Structures/Undo%20Record%20for%20Update.png

Design #2: Define Small Change to BLOB
--------------------------------------

The goal of this worklog is to improve the performance of small changes to
BLOBs.  What do we mean by small changes?  Our undo log record has limitations
in its length.  It cannot be bigger than the undo page size (or more correctly
the payload of the undo page).  So bigger changes to BLOBs cannot be stored in
the undo log.  I am defining a const ref_t::LOB_SMALL_CHANGE_THRESHOLD to
identify small changes to BLOBs.  If the length of modification (in bytes) is
less than or equal to this threshold, then it is considered as a small change
to BLOB.  Currently I have set the threshold to 100 bytes.  But it should be
trivial to change it to higher values.

Design #3: Accessing older versions of LOB
------------------------------------------

[Note - When an LOB is modified with undo logging (the optimization by
WL#11328), then the LOB version number stored in the LOB will not be
changed.]

If the small changes to the BLOB are stored in the undo log, then building
previous versions of clustered index record (along with the correct version
of the LOB) needs to be modified.  Let us look at it here:

Current Algorithm for Looking at Versions of clust_rec:
[Note: Refer to function trx_undo_prev_version_build()]:

1. Let clust_rec point to the latest clustered index record.
2. Using rollptr obtain the undo log record.
3. Construct the update vector from undo log record.
4. Using clust_rec and update vector, build older version of clustered
   index record.
5. Let clust_rec point to this version of clustered index record.
6. Check if clust_rec is the version needed.
   If yes, goto (7), otherwise goto (2).
7. Now fetch the BLOBs for clust_rec.

This will be changed to the following:

OPTION: Fetching Many Versions of LOB
-------------------------------------

1. Let clust_rec point to the latest clustered index record, and
   let ext point to the list of BLOBs in clust_rec.
2. Using rollptr obtain the undo log record.
3. Construct the update vector from undo log record.
4. Using clust_rec and update vector, build older version of clustered
   index record.  If the LOB version is the same, then check if it is
   modified in update vector, and apply the update vector on the LOB.
   If LOB version is different, then discard ext and fetch the LOB of
   this version.
5. Let clust_rec point to this version of clustered index record, and
   let ext point to this version of BLOBs.
6. Check if clust_rec is the version needed.
   If yes, goto (7), otherwise goto (2).
7. Now correct version of LOBs are pointed to by ext.

As can be seen, step 4 can result in fetching each version of LOB.  If we go
this route, it is better to have some limit on LOB size for applying the
"small change optimization".

OPTION: Accumulating the Update Vector (My Preference)
------------------------------------------------------

1. Let clust_rec point to the latest clustered index record.
2. Using rollptr obtain the undo log record.
3. Construct the update vector from undo log record. 
   Save the update vector (in a queue) related to BLOBs for later use.
4. Using clust_rec and update vector, build older version of clustered
   index record.
5. Let clust_rec point to this version of clustered index record.
6. Check if clust_rec is the version needed.
   If yes, goto (7), otherwise goto (2).
7. Now fetch the BLOBs for clust_rec. Apply the update vectors matching
   the LOB version from the queue.

Design #4: Information to be stored in undo log
-----------------------------------------------

The modified data in one LOB is given in a Binary_diff_vector (vector
of Binary_diff objects).  This needs to be serialized and written to
the undo log.  Also, for each Binary_diff one or two LOB index entries
can be modified.  They also need to be saved.  Tentatively the following
can be the information stored in undo log.

1.  1 byte of flags
2.  Count of number of Binary_diff (1 byte)
  i.  LOB first page number.
  ii. LOB version number.
  iii.  Last trx id.
  iv. Last trx undo no. 
  a.  Offset where to change (4 bytes)
  b.  Length of modification (1 byte)
  c.  The old LOB data contents (max 100 bytes)
  d.  Number of LOB index entries modified (1 byte)
  e.  modifier trx id (6 bytes)
  f.  modifier trx undo number (4 bytes)
  g.  modifier trx id (6 bytes)
  h.  modifier trx undo number (4 bytes)

Increase in undo log record size for 100 bytes change -> 128 bytes per LOB.
Increase in undo log record size for 1 bytes change -> 29 bytes per LOB.