WL#3583: BLOB Locator API

Affects: Benchmarks-3.0 — Status: Un-Assigned

Description
High Level Architecture

Using BLOB Locators is a means of working with BLOBs that many other RDBMSs
support. This is a convenient and natural way to work with BLOBs in an
application, similar to working with normal files.

A BLOB Locator is similar to a FILE Handle, and in some cases the term BLOB
Handle is used.

A BLOB Locator is some kind of BINARY datatype that provides access to a single
BLOB in the database. Fetching a BLOB Locator will retreive just this handle,
not the BLOB itself. The locator is subsequenly used to provide access to the BLOB.

A BLOB Locator can, in addition to provide a means to access the BLOB, also
indicate attributes of the BLOB and/or the access to it. The reason for this is
that BLOBs are often treated differently from other data in the database, for
example in the case of transactions, for example a BLOB Locator might well be
transactional, whereas the BLOB is not. This minimize the size of the
transaction and still provides the full benefit of using BLOBs. Also, in some
cases, BLOB are transactional in a different context than the locator that
references this data.

Finally, as Locators are handled together with the data, but the BLOB itself is
not necessarily so, a BLOB might have multiple references or more references
than the data (i.e. if a row that reference a BLOB is DELETEd, but the BLOB is
currently in use by someone, then the row is delted, but the reference count in
the BLOB stays at 1, allowing any ongoing operations to complete before commencing).

The way I foresee this to be implemented is this:
- The existing C-API is complemented with the BLOB_LOCATOR datatype. From the C
API side, this is binary value of considerable length (64 bytes or so).
- The prepared statement API is extended to be able to describe a BLOB Locator
for both in and output.
- A separate set of API functions for BLOB Locators is added.
- Information functions are added to check if:
-- Locators can be transactional / non-transactional or both.
-- Ability to set transactional state of BLOB Locators.
-- Ability to check for Locator support. Initially I expect the API to be
supported only by selected Storage Handles. Eventually I guess that we can
emulate there in the cases where it is not supported.

C API:
- mysql_get_server_blob_flags() - Returns flags telling if BLOB locators are
supported at all and other information about BLOB support on the server side.

C Prepared Statement API:
- BLOB_LOCATOR datatype.- Allows one to bind to a BLOB locator.
- mysql_stmt_bl_flags() - Get the BLOB capabilities for a statement. This is
necessary, as the capabilities of the different tables in a statement might be
different, and the combination might limit these capabilities for an individual
table.

C BLOB Locator API:
- mysql_bl_table_flags() - Get the BLOB relevant flags for a specific table,
like transactional capabilities and if BLOB support is in the engine or
"virtual" in the server (not yet supported).
- mysql_bl_create() - Create a BLOB locator
- mysql_bl_open() - Open a BLOB using a BLOB_LOCATOR binary and a set of flags
signifying the transactional state.
- mysql_bl_open_string() - Open a BLOB using a BLOB_LOCATOR represented as a
string from the normal C API. Returns a BLOB_LOCATOR.
- mysql_bl_seek() - Obvious use.
- mysql_bl_close() - Obvious use.
- mysql_bl_tell() - Obvious use.
- mysql_bl_read() - Obvious use.
- mysql_bl_write() - Obvious use.
- mysql_bl_flush() - Like a "commit" for a transactional BLOB.
- mysql_bl_get_refcount() - Number of references to the BLOB. Usually returns 1
or 2 with normal storarge engines, where BLOBs are handled just like normal
data. I.e. 1 for the current API and 1 for the referencing row. If a new BLOB,
the 1 an existing one will return 2, but any value can potentially be returned
if BLOB handling is separate from normal row handling.
- mysql_bl_commit() / mysql_bl_rollback() - These are a bit less obvious than
the ones above, but as we might have a case where transactions are handled
differently with BLOBs than other data, it might be a good idea to allocate
space for them. An example of where it might be a good to have BLOB and non-BLOB
transaction handlling different is in the case of multi-versioning. For BLOBs,
you might want to have a simple locking scheme, whereas you want MVCC with the
rest of the data.

Storage engine API:
The following flags are added, any or all of these might be supported:
- HTON_BLOB_LOCATOR - Engines that support BLOB Locators (supporting this in the
Stoarge Engine is what makes this really useful and cool) 
- A separate BLOB handlerton that is pointed to by the Stoarge Engine
handlerton. The reason for this is that the same BLOB handling code might be
shared between Storage Engines (which should be a pretty cool concept, and which
simplifies development as BLOB Locator code gets sepatared from the Storage Engine).
- The following flags for the BLOB handlerton are added:
* HTON_BLOB_TRANSACTIONAL - Flag that blobs may be transactional, but separate
from the normal transactional context.
* HTON_BLOB_NONTRANSACTIONAL - Flag that BLOBs are non-transactional.
* HTON_BLOB_DATA_TRANSACTIONAL - Flags that BLOBs are handled in the same
transaction as the data. A COMMIT means an implicit mysql_bl_close() on the BLOB.
- The BLOB handlerton has an init() method.
- The BLOB handlerton also contains basically the same methods that corresponds
to the BLOB Locator interface above.
- The BLOB handlerton might also optionally define commit() and rollback()
methods. These are then used when the is an implicit commit / rollback in the
enclosing Storage Engine.
- The BLOB handlerton might also have a few more methods, like a speratate
recovery function, backup function etc. This needs some more thinking, but that
is probably not necessary in a first implementation.