WL#3196: Generic table space API

Affects: Connector/.NET-5.2   —   Status: In-Design

The generic table space interface allows storage of multiple tables in a single
OS-level file or raw partition.  It aims to be as transparent as possible, and
should coexist with the possibility of storing tables in individual files.

Features
--------

·    Multiple tables per space.

·    Multiple table spaces in a database.

·    The user can choose the table space  on  a  table-by-table
     basis.

·    Where  the  underlying operating system supports it, table
     spaces can be placed on raw partitions.  No  code  changes 
     should be needed to do this.

·    It  should  be possible to store tables from different en-
     gines in the same table space.  This needs  consideration.
     
Advantages
----------

We  expect  a number of advantages from storing tables in table
space instead of in individual files:

·    Many fewer open files.  Under UNIX and  similar  operating  
     system,  each  open  table currently requires two file de-
     scriptors.  With table spaces, a single  kernel  file  de- 
     scriptor can handle multiple tables.

·    Better  storage  efficiency.  General purpose file systems
     offer many features not needed for storing the fixed-sized
     large  blocks  used  in databases.  Some of these features
     add overhead that is unnecessary for a database table.

·    The option of optimizing layout  for  performance.   Since
     the  storage allocation is under the direct control of the
     server, it could be allocated to ensure locality of refer-
     ence.  In a RAID system, the allocator could use knowledge
     of the RAID layout to determine optimum allocation.

Constraints
-----------

·    As far as possible, the use  of  table  spaces  should  be
     transparent.   In  particular, this means that little code
     should be rewritten.

·    All blocks in a specific table space must be of  the  same
     size.

·    It must be possible to represent a table space with a file
     in an existing file system.

Table description files
-----------------------

The current proposal does not support storage of table descrip-
tion files (.frm files) in table space.  It may be possible  to
add  this  at  a later point, but currently there is a "chicken
and egg" problem: the .frm file is needed to locate  the  table
space file.

Functional interface
--------------------

The  current proposal is to integrate the table spaces in mysys
in the my_open, my_write, my_read and my_close  functions.   As
far  as  possible,  it should be transparent to the caller that
the table is stored in table space and not in individual files.

Changes in interface
--------------------

The  main  change  in  the interface is the manner in which the
file descriptors are used.  Currently this is the UNIX file de-
scriptor  or  similar,  a small positive number returned by the
The  main  change  in  the interface is the manner in which the
file descriptors are used.  Currently this is the UNIX file de-
scriptor  or  similar,  a small positive number returned by the
operating system:         

+---------------------------------------+
|                                       |
|           file descriptor             |
+---------------------------------------+
31                                    0

The proposed change places a non-zero table  space  ID  in  the
first few bits of the descriptor:

+--------------+------------------------+
|              |                        |
table space ID |    file descriptor     |
+--------------+------------------------+
31                                    0

There are a number of considerations:
     
·    The  table  space  ID  is  non-zero,  so the interface for
     normal files remains unchanged.
     
·    For table spaces,  the  functions  my_read,  my_write  and
     friends  identify  file  descriptors  referring to a table
     space and to act accordingly.

·    This approach assumes that the number of bits required  to
     represent  file  descriptors  allocated  by  the kernel is
     significantly less than 32.  The exact number of  bits  is
     difficult  to  determine; currently it seems unlikely that
     any system will have more than 65536  files  open  at  the
     same  time,  so  the implementation might use the first 16
     bits to identify the table  and  the  second  16  bits  to
     identify  the file.  If this proves to be a limitation, it
     should be  possible  to  provide  for  a  different  split
     between table space id and table within the space.

New function
------------

Currently new tables are created and existing tables are opened
by calling my_open  with  appropriate  parameters.   For  table
spaces,  a  function  my_open_table  will  be  provided  with a
similar interface:  

File my_open(const char *FileName, int Flags, myf MyFlags)
                                /* Path-name of file */
                                /* Read | write .. */
                                /* Special flags */


File my_open_table(const char *TableSpaceName, const char *FileName, int Flags,
                   myf MyFlags)
                                /* Path-name of table file */
                                /* SQL-visible name of table */
                                /* Read | write .. */
                                /* Special flags */
                                
my_open does not need to be modified; it returns a kernel  file
descriptor  as  before.   my_open_table  performs the following
steps:

·    Ensure that the  specified  table  space  is  open.   This
     requires keeping a list of open table spaces.

·    Locate the FileName within the table space.

·    Return  a  "file  descriptor" derived from the table space
     and the table itself.  The first component  could  be  the
     index of the table space in the list of open table spaces,
     and the second could be related to  the  location  of  the
     table   within  the  table  space.   This  could  make  it
     unnecessary to maintain any further information in  memory
     about the individual tables.
     
Utility programs
----------------

A  number  of  a  priori objections have been addressed towards
table spaces.  One of the biggest is that it is possible to use
UNIX  commands  to  copy  individual tables when stored in UNIX
files.  This is no longer possible in this  form  when  storing
files in table spaces.

This document does not address the issue of loss of consistency
when using  this  method;  it  is  practised,  and  most  users
probably  understand  the  dangers.   On  the other hand, it is
relatively simple to write a program that extracts tables  from
a table space and converts them into individual file pairs.  It
would also be a good idea to have the  converse  functionality,
would also be a good idea to have the  converse  functionality,
to copy file pairs into a table space.  Neither program appears
to be complicated.

New functionality
-----------------

Function my_open_table()

Functions requiring modifications
---------------------------------

my_register_filename() should register both file and table names.

At  the  current stage of this draft, it is possible that other
functions also require modification.