WL#3196: Generic table space API
Affects: Connector/.NET-5.2
—
Status: In-Design
The generic table space interface allows storage of multiple tables in a single OS-level file or raw partition. It aims to be as transparent as possible, and should coexist with the possibility of storing tables in individual files. Features -------- · Multiple tables per space. · Multiple table spaces in a database. · The user can choose the table space on a table-by-table basis. · Where the underlying operating system supports it, table spaces can be placed on raw partitions. No code changes should be needed to do this. · It should be possible to store tables from different en- gines in the same table space. This needs consideration. Advantages ---------- We expect a number of advantages from storing tables in table space instead of in individual files: · Many fewer open files. Under UNIX and similar operating system, each open table currently requires two file de- scriptors. With table spaces, a single kernel file de- scriptor can handle multiple tables. · Better storage efficiency. General purpose file systems offer many features not needed for storing the fixed-sized large blocks used in databases. Some of these features add overhead that is unnecessary for a database table. · The option of optimizing layout for performance. Since the storage allocation is under the direct control of the server, it could be allocated to ensure locality of refer- ence. In a RAID system, the allocator could use knowledge of the RAID layout to determine optimum allocation. Constraints ----------- · As far as possible, the use of table spaces should be transparent. In particular, this means that little code should be rewritten. · All blocks in a specific table space must be of the same size. · It must be possible to represent a table space with a file in an existing file system. Table description files ----------------------- The current proposal does not support storage of table descrip- tion files (.frm files) in table space. It may be possible to add this at a later point, but currently there is a "chicken and egg" problem: the .frm file is needed to locate the table space file.
Functional interface -------------------- The current proposal is to integrate the table spaces in mysys in the my_open, my_write, my_read and my_close functions. As far as possible, it should be transparent to the caller that the table is stored in table space and not in individual files. Changes in interface -------------------- The main change in the interface is the manner in which the file descriptors are used. Currently this is the UNIX file de- scriptor or similar, a small positive number returned by the The main change in the interface is the manner in which the file descriptors are used. Currently this is the UNIX file de- scriptor or similar, a small positive number returned by the operating system: +---------------------------------------+ | | | file descriptor | +---------------------------------------+ 31 0 The proposed change places a non-zero table space ID in the first few bits of the descriptor: +--------------+------------------------+ | | | table space ID | file descriptor | +--------------+------------------------+ 31 0 There are a number of considerations: · The table space ID is non-zero, so the interface for normal files remains unchanged. · For table spaces, the functions my_read, my_write and friends identify file descriptors referring to a table space and to act accordingly. · This approach assumes that the number of bits required to represent file descriptors allocated by the kernel is significantly less than 32. The exact number of bits is difficult to determine; currently it seems unlikely that any system will have more than 65536 files open at the same time, so the implementation might use the first 16 bits to identify the table and the second 16 bits to identify the file. If this proves to be a limitation, it should be possible to provide for a different split between table space id and table within the space. New function ------------ Currently new tables are created and existing tables are opened by calling my_open with appropriate parameters. For table spaces, a function my_open_table will be provided with a similar interface: File my_open(const char *FileName, int Flags, myf MyFlags) /* Path-name of file */ /* Read | write .. */ /* Special flags */ File my_open_table(const char *TableSpaceName, const char *FileName, int Flags, myf MyFlags) /* Path-name of table file */ /* SQL-visible name of table */ /* Read | write .. */ /* Special flags */ my_open does not need to be modified; it returns a kernel file descriptor as before. my_open_table performs the following steps: · Ensure that the specified table space is open. This requires keeping a list of open table spaces. · Locate the FileName within the table space. · Return a "file descriptor" derived from the table space and the table itself. The first component could be the index of the table space in the list of open table spaces, and the second could be related to the location of the table within the table space. This could make it unnecessary to maintain any further information in memory about the individual tables. Utility programs ---------------- A number of a priori objections have been addressed towards table spaces. One of the biggest is that it is possible to use UNIX commands to copy individual tables when stored in UNIX files. This is no longer possible in this form when storing files in table spaces. This document does not address the issue of loss of consistency when using this method; it is practised, and most users probably understand the dangers. On the other hand, it is relatively simple to write a program that extracts tables from a table space and converts them into individual file pairs. It would also be a good idea to have the converse functionality, would also be a good idea to have the converse functionality, to copy file pairs into a table space. Neither program appears to be complicated. New functionality ----------------- Function my_open_table() Functions requiring modifications --------------------------------- my_register_filename() should register both file and table names. At the current stage of this draft, it is possible that other functions also require modification.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.