WL#4305: storage-engine private data area per physical table

Affects: Server-5.6 — Status: Complete

Description
High Level Architecture
Low Level Design

Most, if not all, storage engines need to store some data related to a physical
table, that is once per table, not per table instance (one physical table may be
opened many times yielding many handler instances of the same table). The
handler class provides storage area for engine private data per table instance,
but to store data common to all instances on the same physical table storage
engine needed until recently to maintain a hash by table name. Examples of this
approach are (in 5.1): ARCHIVE_SHARE, st_blackhole_share, EXAMPLE_SHARE,
TINA_SHARE, FEDERATED_SHARE, INNOBASE_SHARE, NDB_SHARE. Partitioning engine
doesn't use PARTITION_SHARE structure, but has it fields added directly to
TABLE_SHARE.

This is code duplication (copied from one storage engine to another, ha_berkeley
started using this approach). And functionality duplication as SQL level already
has a shared structure like this - TABLE_SHARE. Storage engines should be able
to benefit from it.

THD structure has a dedicated pointer to storage engine private data - ha_data.
TABLE_SHARE should have it too, so that storage could use it.

Recently, as a bugfix for BUG#33479, TABLE_SHARE::ha_data was implemented.

The goal of this WorkLog task is to make use of it - partitioning data should be
moved from TABLE_SHARE to  PARTITION_SHARE. Archive, blackhole, example, tina,
innodb, and ndb should drop the code that maintains the hash by name and
store the structure in TABLE_SHARE::ha_data instead.

Scope of this worklog has been decreased to: Archive, Example and the general
partitioning storage engine, ha_partition, after the InnoDB implementation was
reverted (see bug#13838761).

Perhaps MyISAM, HEAP and MERGE could do the same (although these three
maintain the shared data differently - not copy-pasted from BDB engine - so
changing them would be not trivial).

NOTE: Currently the ha_data introduced in BUG#33479 is used by
ha_partitioning.cc, but this use must probably move to a specific auto_increment
area or a specific partitioning area, since it would not otherwise allow
partitioned tables handlers to use it.

Also BUG#49177 could be fixed by attaching the MYISAM_SHARE on the TABLE_SHARE
instead of using a linear search in test_if_reopen. The same problem seems to
also affect Heap in hp_find_named_heap.

This also solves BUG#55878 / bug#11763195

NOTE: this worklog was pushed to trunk (5.6.5) but later disabled for InnoDB
because of bug#13838761. The remaining work is now WL#6324.

By adding a pointer to a shared data area in the TABLE_SHARE, that is shared by
all instances of handlers used by a single table, we can remove the need of
each storage engine having its own methods (HASH/LIST's) of shared data.

By creating a base class for the Handler_share, it can be inherited by all
handlers who need to share data between instances for the same table/partition
data. The TABLE_SHARE will have a new variable for storing this Handler_share
*TABLE_SHARE::ha_share.

The handler class will have a pointer to the TABLE_SHARE::ha_share (or for
partitions, see below) and a function to set it. handler::set_ha_share_storage
and Handler_share **ha_share. This way all handlers of the same table/partition
will have direct access to the same shared data, without needing separate HASH
or LIST's.

For partitioned tables, the Partition_share (inherited from Handler_share) needs
to allocate Handler_share pointers where the individual partitions can store
pointers to their shares (Same as TABLE_SHARE::ha_share, but for partitions).
This will be allocated when the Partition_share is created. They will be set
through handler::set_ha_share_storage when ha_partition::set_ha_share_storage is
called.

The Handler_share will be destroyed when the TABLE_SHARE is destroyed. The
Partition_share it will destroy every partitions Handler_share when it is destroyed.

This means that the table->file always points to the handler which uses the
table_share->ha_share.

If a handler is created with a not NULL TABLE_SHARE, one must also set its
ha_share storage by calling handler::set_ha_share_storage as the first call
after get_new_handler is done.

To protect the shared pointer storage one must use the TABLE_SHARE::LOCK_ha_data
mutex (or another engine specific mutex). However, it is only allowed to set the
pointer once, and never change it.
It is safe to check if it is set without taking the mutex. (If not set, one must
take the mutex and re-check before setting it).

The allocated memory for the Handler_share must live at least until the
TABLE_SHARE is destroyed.

Also note that even if the shared resources is freed during the freeing of the
TABLE_SHARE, there must not be any open files used by any handler if all
instances of the handler is closed (i.e. have not got a call to open() or 
close() has been called after open()).

New base class (added in handler.h):
/** Base class to be used by handlers different shares */
class Handler_share
{
public:
  Handler_share() {}
  virtual ~Handler_share() {}
};


Additions to the TABLE_SHARE struct:
  /** Main handler's share */
  Handler_share *ha_share;


Additions to the handler class:
protected:
  Handler_share **ha_share;
public:
  virtual bool set_ha_share_storage(Handler_share **arg_ha_share)
  {
    DBUG_ASSERT(!ha_share);
    ha_share= arg_ha_share;
    return false;
  }
protected:
  /** Helper functions to decrease duplicate code */
  Handler_share *get_ha_share_ptr();
  void set_ha_share_ptr(Handler_share *arg_ha_share);
  void lock_shared_ha_data();
  void unlock_shared_ha_data();


Implementation:

TABLE_SHARE::ha_share must be written only when holding the
TABLE_SHARE::LOCK_ha_data. (unless in free_table_share, which is guaranteed to
to have exclusive lock of the TABLE_SHARE). The data that
TABLE_SHARE::ha_share points to is up to the storage engine to protect.

During ALTER ... PARTITION the altered (new/temporary) partitions may also need
handler::ha_share's so the partitioning engine must provide these. It is however
OK to have them allocated within the threads mem_root since they will be
destroyed before the thread ends, because the TABLE_SHARE will be freed and
reopened at the end of the ALTER.

Note that the fix for BUG#53676/BUG#53770/BUG#51042 must be changed to not open
a temporary TABLE_SHARE instance for the intermediate table, but to use the
original TABLE_SHARE and make sure it is destroyed after the statement finishes
(both on success and on error).

Engines that should use the new TABLE_SHARE::ha_share (and included in this
worklog) is based on the ha_example engine:
InnoDB, Partitioning, Archive, Example.

Engines excluded from this worklog:
CSV, Blackhole, NDB, Merge, Federated, Perfschema, Heap/Memory, MyISAM.
Heap/Memory needs the HP_SHARE even after all tables are closed/flushed. It
could benefit from using the TABLE_SHARE::ha_share to speed up the search
for an already opened instance of the table, instead of using the function
hp_find_named_heap which is based on linear search. But that is outside of this
worklog and could be reported as a new bug.
MyISAM: Does not use the handler class when opening/closing the table. BUG#49177
could be based on this worklog, but is not a part of it.



Example of use in Archive:

Archive_share *ha_archive::get_share(const char *table_name, int *rc)
{
  Handler_share *tmp_ha_share;
  Archive_share *tmp_share;

  DBUG_ENTER("ha_archive::get_share");

  if ((tmp_ha_share= get_ha_share_ptr()))
    tmp_share= static_cast(tmp_ha_share);
  else
  {
    char *tmp_name;
    azio_stream archive_tmp;

    tmp_share= new Archive_share;

    if (!tmp_share)
    {
      *rc= HA_ERR_OUT_OF_MEM;
      goto err;
    }
    DBUG_PRINT("ha_archive", ("new Archive_share: %p",
                              tmp_share));

    [Archive specific code... ]

    set_ha_share_ptr(static_cast(tmp_share));
  }
  mysql_mutex_lock(&tmp_share->mutex);

 [Archive specific code... ]

  DBUG_RETURN(tmp_share);

err:
  unlock_shared_ha_data();

  DBUG_ASSERT(*rc);

  DBUG_RETURN(NULL);
}