WL#3708: NdbRecord for blobs
Affects: Server-6.0
—
Status: Complete
Extend WL#2223 (NdbRecord) to also handle BLOBs. Start by understanding existing BLOB code in the API, to learn how to - Get an API that is both convenient and efficient. - Get an implementation that re-uses as much of existing BLOB code as possible / reasonable. The initial time estimate is for doing this preliminary design phase only.
Idea is to keep the actual blob operations using the old NdbRecAttr method, but have a clean way of obtaining the blob handle using NdbRecord. When a primary key (or unique hash index key) NdbRecord operation is created, check if any of the columns to be included are of blob type. If so, a blob handle is created (for each blob column). The pointer to the blob handle is stored inside the result row, so the application programmer must reserve space for a pointer there. The main reason for storing the blob handle in-row is to have easy access to it in NdbReceiver when receiving the ATTRINFO signal, so that the place to store the received blob head (size + inline part) can be found. But it also provides a way for the application to obtain the blob handle pointer for its own use, or we could make the getBlobHandle() method also work in the NdbRecord case (or both, not sure). There will be a special case in the code in several places for blobs and NdbRecord, since now the data is not always stored as simply inline in row with (offset, size). The issue is similar for mysqld format bitfields, where the overflow part (any trailing fractional byte) is stored separately from the main part. We will try minimize the performance impact from this by using a fast path / slow path approach. The special cases will be in the slow path, and there will be a column flag in NdbRecord for each. The code will be something like: if ( ! (column->flag & (BLOB_TYPE | SPLIT_BIT_TYPE)) ) { /* Fast path. */ } else { /* Slow path. */ if (column->flag & BLOB_TYPE) { /* ... */ } else if (column->flag & SPLIT_BIT_TYPE) { /* ... */ } } Primary key and unique key operations. 1. Will implement an NdbRecord variant of NdbBlob::atPrepare(). 2. NdbRecord variant can read key directly from key row, instead of getKeyFromTCREQ() and unpackKeyValue(). How to obtain blob handles for NdbRecord scans? 1. When blobs are included in result columns for scans, we need to request keyinfo. 2. Need to upgrade lockmode LM_CommittedRead -> LM_Read (so as to read consistent blob parts). 3. We cannot pass the blob head to the NdbBlob handle in the receiver, as we have only one blob handle but parallellism * batchsize rows. Instead we store the blob head(s) after the proper row data (like keyinfo20), and pass it to the blob handle in nextResult(). 4. Implement NdbRecord variant of NdbBlob::atNextResult(), using NdbRecord instead of getKeyFromKEYINFO20(). 5. For scans, blob handles are created at prepare time. And nextResult() will call NdbRecord variant of NdbBlob::atNextResult(), similar to the NdbRecAttr implementation. Actual blob operations, reading and writing, will use the existing blob API, using the blob handles obtained from the NdbRecord methods. The reads and writes will internally create new NdbRecAttr operations, executed in the same way as for the NdbRecAttr methods. Interface to mysqld / ha_ndbcluster.cc: The internal row format for mysqld reserves N+sizeof(char *) bytess of space in the row for a BLOB or TEXT field, where 1<=N<=4 depending on the maximum blob length. So there is already sufficient space reserved for an NdbBlob * for the NDB API to use. After retrieving a row, the NdbBlob * must be overwritten with the proper mysqld length/data pointer before returning from the handler, of course (using Field_blob::set_ptr()). For update, there is no need for any space in the row for blobs. We will not attempt to do any actual blob data update in the NdbRecord operation (not even the inline blob head), that will be done in a separate operation using the existing NdbRecAttr-based implementation. [The existing code already uses a separate operation for the inline blob head for UPDATE, only for WRITE does it piggy-bag the blob head update on the same operation]. There are no issues about blobs in index keys, as NDB does not support index on blob.
Detailed implementation plan. These are the changes to make to the existing implementation: NdbRecord: Add a column flag IsBlob, and a NdbRecord flag RecHasBlob. Add another NdbRecord flag RecHasAllBlobs. NdbTransaction::setupRecordOp(): If NdbRecord RecHasBlob flag is set, loop over all columns, creating NdbBlob handles as needed (may be zero if all blobs masked out). Call NdbBlob::atPrepareNdbRecord() to set up the blob handle. Store the handle in the row (read request only). For delete request, create blob handles for _all_ blobs in row, regardless of mask. And give an error for delete unless RecHasAllBlobs. NdbScanOperation::takeOverScanOpNdbRecord(): Same as for NdbTransaction::setupRecordOp() (hopefully sharing common code). NdbTransaction::scanTable(): NdbTransaction::scanIndex(): For any blob column, create blob handle, calling NdbBlob::atPrepareNdbRecordScan(). And request keyinfo. And add ATTRINFO to read the blob head. And allocate extra space in row buffer for storing all blob heads after the row data proper. NdbScanOperation::nextResult(): Call atNextResultNdbRecord(), passing KEYINFO20. NdbOperation::prepareSendNdbRecord(): For write() and insert(), must fetch blob head data from new NdbBlob method (handle obtained from the row), and write it to ATTRINFO (for update, this is not needed, as NdbBlob injects new operations for this). NdbReceiver::execTRANSID_AI(): When receiving a blob column (== blob head), fetch the blob handle from the destination row, and call a new method in NdbBlob to receive the blob head data and set theNullFlag and theLength (replacing getHeadFromRecAttr()). For scans, instead store the blob head after the row data for later processing in nextResult(). NdbScanOperation::updateCurrentTuple() et al: Check if anything special is needed. NdbOperation::getBlobHandle(): For NdbRecord operation, fail unless the blob handle is found in list (do not create a new handle). NdbBlob::atPrepare(): Create new methods atPrepareNdbRecord() and atPrepareNdbRecordScan(). atPrepareNdbRecord() receives keyinfo from NdbRecord and key row instead of using getKeyFromTCREQ(), into both packed and unpacked buffer. And does not call getHeadInlineValue(). atPrepareNdbRecordScan() does not call getHeadInlineValue(). Split out common code in a fourth method. NdbBlob::preExecute(): For NdbRecord, do not call setHeadInlineValue() (handled in prepareSendNdbRecord()). NdbBlob::postExecute(): Do not call getHeadFromRecAttr() for ReadOp in NdbRecord case (Receiver handles it). NdbBlob::atNextResult(): Add new atNextResultNdbRecord(), which takes KEYINFO20 as parameter instead of calling getKeyFromKEYINFO20(), and which does not call getHeadFromRecAttr(). NdbBlob::setValue(): Do not call setHeadInlineValue() in NdbRecord case, will be handled in prepareSendNdbRecord().
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.