WL#3708: NdbRecord for blobs

Affects: Server-6.0 — Status: Complete

Description
Dependent Tasks
High Level Architecture
Low Level Design

Extend WL#2223 (NdbRecord) to also handle BLOBs.

Start by understanding existing BLOB code in the API, to learn how to

 - Get an API that is both convenient and efficient.
 - Get an implementation that re-uses as much of existing BLOB code as possible
   / reasonable.

The initial time estimate is for doing this preliminary design phase only.

WL#2223: NdbRecord

Idea is to keep the actual blob operations using the old NdbRecAttr method, but
have a clean way of obtaining the blob handle using NdbRecord.

When a primary key (or unique hash index key) NdbRecord operation is created,
check if any of the columns to be included are of blob type.

If so, a blob handle is created (for each blob column). The pointer to the
blob handle is stored inside the result row, so the application programmer
must reserve space for a pointer there.

The main reason for storing the blob handle in-row is to have easy access to
it in NdbReceiver when receiving the ATTRINFO signal, so that the place to
store the received blob head (size + inline part) can be found. But it also
provides a way for the application to obtain the blob handle pointer for its
own use, or we could make the getBlobHandle() method also work in the
NdbRecord case (or both, not sure).

There will be a special case in the code in several places for blobs and
NdbRecord, since now the data is not always stored as simply inline in row
with (offset, size). The issue is similar for mysqld format bitfields, where
the overflow part (any trailing fractional byte) is stored separately from the
main part.

We will try minimize the performance impact from this by using a fast path /
slow path approach. The special cases will be in the slow path, and there will
be a column flag in NdbRecord for each. The code will be something like:

    if ( ! (column->flag & (BLOB_TYPE | SPLIT_BIT_TYPE)) )
    {
      /* Fast path. */
    }
    else
    {
      /* Slow path. */
      if (column->flag & BLOB_TYPE)
      {
	/* ... */
      }
      else if (column->flag & SPLIT_BIT_TYPE)
      {
	/* ... */
      }
    }

Primary key and unique key operations.

1. Will implement an NdbRecord variant of NdbBlob::atPrepare().

2. NdbRecord variant can read key directly from key row, instead of
   getKeyFromTCREQ() and unpackKeyValue().


How to obtain blob handles for NdbRecord scans?

1. When blobs are included in result columns for scans, we need to request
   keyinfo.

2. Need to upgrade lockmode LM_CommittedRead -> LM_Read (so as to read
   consistent blob parts).

3. We cannot pass the blob head to the NdbBlob handle in the receiver, as
   we have only one blob handle but parallellism * batchsize rows. Instead
   we store the blob head(s) after the proper row data (like keyinfo20),
   and pass it to the blob handle in nextResult().

4. Implement NdbRecord variant of NdbBlob::atNextResult(), using NdbRecord
   instead of getKeyFromKEYINFO20().

5. For scans, blob handles are created at prepare time. And nextResult() will
   call NdbRecord variant of NdbBlob::atNextResult(), similar to the
   NdbRecAttr implementation.


Actual blob operations, reading and writing, will use the existing blob API,
using the blob handles obtained from the NdbRecord methods. The reads and
writes will internally create new NdbRecAttr operations, executed in the same
way as for the NdbRecAttr methods.


Interface to mysqld / ha_ndbcluster.cc:

The internal row format for mysqld reserves N+sizeof(char *) bytess of space
in the row for a BLOB or TEXT field, where 1<=N<=4 depending on the maximum
blob length.

So there is already sufficient space reserved for an NdbBlob * for the NDB
API to use.

After retrieving a row, the NdbBlob * must be overwritten with the proper
mysqld length/data pointer before returning from the handler, of course
(using Field_blob::set_ptr()).

For update, there is no need for any space in the row for blobs. We will not
attempt to do any actual blob data update in the NdbRecord operation (not
even the inline blob head), that will be done in a separate operation using
the existing NdbRecAttr-based implementation. [The existing code already
uses a separate operation for the inline blob head for UPDATE, only for
WRITE does it piggy-bag the blob head update on the same operation].

There are no issues about blobs in index keys, as NDB does not support index
on blob.

Detailed implementation plan.

These are the changes to make to the existing implementation:

NdbRecord:
  Add a column flag IsBlob, and a NdbRecord flag RecHasBlob.
  Add another NdbRecord flag RecHasAllBlobs.

NdbTransaction::setupRecordOp():
  If NdbRecord RecHasBlob flag is set, loop over all columns, creating NdbBlob
  handles as needed (may be zero if all blobs masked out).
  Call NdbBlob::atPrepareNdbRecord() to set up the blob handle.
  Store the handle in the row (read request only).
  For delete request, create blob handles for _all_ blobs in row, regardless
  of mask. And give an error for delete unless RecHasAllBlobs.

NdbScanOperation::takeOverScanOpNdbRecord():
  Same as for NdbTransaction::setupRecordOp() (hopefully sharing common code).

NdbTransaction::scanTable():
NdbTransaction::scanIndex():
  For any blob column, create blob handle, calling
  NdbBlob::atPrepareNdbRecordScan(). And request keyinfo. And add ATTRINFO to
  read the blob head. And allocate extra space in row buffer for storing all
  blob heads after the row data proper.

NdbScanOperation::nextResult():
  Call atNextResultNdbRecord(), passing KEYINFO20.

NdbOperation::prepareSendNdbRecord():
  For write() and insert(), must fetch blob head data from new NdbBlob method
  (handle obtained from the row), and write it to ATTRINFO (for update, this
  is not needed, as NdbBlob injects new operations for this).

NdbReceiver::execTRANSID_AI():
  When receiving a blob column (== blob head), fetch the blob handle from the
  destination row, and call a new method in NdbBlob to receive the blob head
  data and set theNullFlag and theLength (replacing getHeadFromRecAttr()).
  For scans, instead store the blob head after the row data for later
  processing in nextResult().

NdbScanOperation::updateCurrentTuple() et al:
  Check if anything special is needed.

NdbOperation::getBlobHandle():
  For NdbRecord operation, fail unless the blob handle is found in list (do
  not create a new handle).


NdbBlob::atPrepare():
  Create new methods atPrepareNdbRecord() and atPrepareNdbRecordScan().
  atPrepareNdbRecord() receives keyinfo from NdbRecord and key row instead of
  using getKeyFromTCREQ(), into both packed and unpacked buffer. And does not
  call getHeadInlineValue().
  atPrepareNdbRecordScan() does not call getHeadInlineValue().
  Split out common code in a fourth method.

NdbBlob::preExecute():
  For NdbRecord, do not call setHeadInlineValue() (handled in
  prepareSendNdbRecord()).

NdbBlob::postExecute():
  Do not call getHeadFromRecAttr() for ReadOp in NdbRecord case (Receiver
  handles it).

NdbBlob::atNextResult():
  Add new atNextResultNdbRecord(), which takes KEYINFO20 as parameter instead
  of calling getKeyFromKEYINFO20(), and which does not call
  getHeadFromRecAttr().

NdbBlob::setValue():
  Do not call setHeadInlineValue() in NdbRecord case, will be handled in
  prepareSendNdbRecord().