WL#11328: InnoDB: Optimizing Small Changes to BLOBs
Affects: Server-8.0
—
Status: Complete
This is a follow up worklog for WL#8960. In WL#8960, the minimum change to a BLOB is a single LOB page. Even if only a few bytes are modified, minimum one LOB page will be modified. So, there is scope for improvement for small changes to the BLOBs. The solution provided by WL#8960 is a general solution suitable for all sizes of BLOBs and for all size of changes (from few bytes to even 1GB of changes). But it is less efficient for small changes done to the BLOBs. Also, in the case of WL#8960, the old versions of BLOBs are stored in the BLOB itself. Undo log format is not changed. To optimize small changes to the BLOBs, we plan to do the regular undo logging. For this the undo log format needs to be changed. When a BLOB is modified/updated, then we need to store the old and new portion of the BLOB in the undo log record. Currently there is a restriction that the undo log record must fit within an undo log page. We need to perform our operation within that constraint.
This WL is for performance improvement. There are no functional requirements. The performance must improve for the targeted use case. For other scenarios, the performance remains unaffected.
Design #1: Extending the UNDO RECORD for update operation --------------------------------------------------------- Changing the undo log record format. In the current undo log record format, there is 1 byte of type_cmpl flag. In this flag 1 bit is available for use. I am introducing TRX_UNDO_MODIFY_BLOB to represent the bit. In all the older versions of datafiles, this bit will be 0. If this bit is set to 1, then I am introducing one more 1 byte flag next to the type_cmpl flag. This new flag can be used for future extensions. Having the TRX_UNDO_MODIFY_BLOB bit set also means that the undo log record has provisions to represent partial update of BLOBs. Refer this diagram for undo record format of update operation: https://github.com/jeremycole/innodb_diagrams/blob/master/images/InnoDB_Structures/Undo%20Record%20for%20Update.png Design #2: Define Small Change to BLOB -------------------------------------- The goal of this worklog is to improve the performance of small changes to BLOBs. What do we mean by small changes? Our undo log record has limitations in its length. It cannot be bigger than the undo page size (or more correctly the payload of the undo page). So bigger changes to BLOBs cannot be stored in the undo log. I am defining a const ref_t::LOB_SMALL_CHANGE_THRESHOLD to identify small changes to BLOBs. If the length of modification (in bytes) is less than or equal to this threshold, then it is considered as a small change to BLOB. Currently I have set the threshold to 100 bytes. But it should be trivial to change it to higher values. Design #3: Accessing older versions of LOB ------------------------------------------ [Note - When an LOB is modified with undo logging (the optimization by WL#11328), then the LOB version number stored in the LOB will not be changed.] If the small changes to the BLOB are stored in the undo log, then building previous versions of clustered index record (along with the correct version of the LOB) needs to be modified. Let us look at it here: Current Algorithm for Looking at Versions of clust_rec: [Note: Refer to function trx_undo_prev_version_build()]: 1. Let clust_rec point to the latest clustered index record. 2. Using rollptr obtain the undo log record. 3. Construct the update vector from undo log record. 4. Using clust_rec and update vector, build older version of clustered index record. 5. Let clust_rec point to this version of clustered index record. 6. Check if clust_rec is the version needed. If yes, goto (7), otherwise goto (2). 7. Now fetch the BLOBs for clust_rec. This will be changed to the following: OPTION: Fetching Many Versions of LOB ------------------------------------- 1. Let clust_rec point to the latest clustered index record, and let ext point to the list of BLOBs in clust_rec. 2. Using rollptr obtain the undo log record. 3. Construct the update vector from undo log record. 4. Using clust_rec and update vector, build older version of clustered index record. If the LOB version is the same, then check if it is modified in update vector, and apply the update vector on the LOB. If LOB version is different, then discard ext and fetch the LOB of this version. 5. Let clust_rec point to this version of clustered index record, and let ext point to this version of BLOBs. 6. Check if clust_rec is the version needed. If yes, goto (7), otherwise goto (2). 7. Now correct version of LOBs are pointed to by ext. As can be seen, step 4 can result in fetching each version of LOB. If we go this route, it is better to have some limit on LOB size for applying the "small change optimization". OPTION: Accumulating the Update Vector (My Preference) ------------------------------------------------------ 1. Let clust_rec point to the latest clustered index record. 2. Using rollptr obtain the undo log record. 3. Construct the update vector from undo log record. Save the update vector (in a queue) related to BLOBs for later use. 4. Using clust_rec and update vector, build older version of clustered index record. 5. Let clust_rec point to this version of clustered index record. 6. Check if clust_rec is the version needed. If yes, goto (7), otherwise goto (2). 7. Now fetch the BLOBs for clust_rec. Apply the update vectors matching the LOB version from the queue. Design #4: Information to be stored in undo log ----------------------------------------------- The modified data in one LOB is given in a Binary_diff_vector (vector of Binary_diff objects). This needs to be serialized and written to the undo log. Also, for each Binary_diff one or two LOB index entries can be modified. They also need to be saved. Tentatively the following can be the information stored in undo log. 1. 1 byte of flags 2. Count of number of Binary_diff (1 byte) i. LOB first page number. ii. LOB version number. iii. Last trx id. iv. Last trx undo no. a. Offset where to change (4 bytes) b. Length of modification (1 byte) c. The old LOB data contents (max 100 bytes) d. Number of LOB index entries modified (1 byte) e. modifier trx id (6 bytes) f. modifier trx undo number (4 bytes) g. modifier trx id (6 bytes) h. modifier trx undo number (4 bytes) Increase in undo log record size for 100 bytes change -> 128 bytes per LOB. Increase in undo log record size for 1 bytes change -> 29 bytes per LOB.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.