A user defined table and its corresponding index data, in InnoDB, is stored in files that have an extension .ibd. There are two types of tablespaces, general (or shared) tablespace and file-per-table. For shared tablespaces, data from many different tables and their corresponding indexes may reside in a single .ibd file. Whereas, for file-per-table tablespaces, the data of a single table and its indexes reside in one .ibd file.
This blog post will talk about the space management in these .ibd files in details.
These files generally reside in data directory. Let’s try to create a table test.t1.
CREATE TABLE test.t1 (c INT) engine=InnoDB;
$ cd <PATH_TO_DATA_DIR>/test
Above is a file-per-table tablespcae i.e. table and index data related to table t1 will reside in this file.
For a file-per-table tablespace the tablespace name is the same as that of the file/table name i.e. for table t1 above, tablespace name will be t1. Had it been a general (or shared) tablespace created with name my_tablespace, then the tablespace name would have been my_tablespace..
These tablespaces are identified with a unique ID which is called tablesapce ID.
Tablespace file is made of number of fixed size pages. There are different type of pages to serve different purpose. We will look all of them in details in coming sections. Here just keep in mind that tablespace file is a collection of many fixed size pages.
An extent is a collection of contiguous pages within tablespace. Extent size is 1 MB. Thus if page size is 16Kb then there could be 64 pages in an Extent.
Thus if we look again at tablespace file, it’s a collection of extents.
There is no separate storage for meta-data information of tablespace. It is stored in the same file in header page (always page 0). Let’s look at those in details now.
FREE FRAGS LIST:
Base node pointer of the linked-list of extents which have pages to be allocated ‘individually’. This list contains the Extents which have at least one free page to be allocated.
FULL FRAGS LIST:
Base node pointer of the linked-list of extents which have pages to be allocated ‘individually’. This list contains the Extents which have no free page left to be allocated.
Base node pointer of the linked-list for Extents which are free to be allocated. An Extent from this list could be allocated to a File Segment (described later) or could be allocated to FREE FRAG LIST.
Extent descriptors entries (described later) for first set of Extents in tablespace.
EXTENT DESCRIPTOR PAGE (XDES PAGE)
An extent is a collection of pages. We need to store some metadata information related to pages which belongs to an Extent. To store this information, we have ‘extent descriptor page’.
For 16K page size, one XDES entry (described later) is of size 40 Bytes and is to give metadata information about 64 pages. For ease of implementation, number of pages covered by one XDES page entries is equal to page size. Based on this, it is easy to find out number of XDES entries in XDES page for each page size.
|XDES Entry Size
|Pages covered in an XDES page
|XDES entries in an XDES page
Refer here for very detailed explanation about extent descriptors by Annamalai.
As the tablespaces grows (i.e. more data is added), more extents (more pages) will be allocated. Once the number of total extents is greater than what an XDES page can track a new XDES page is allocated which will be used to track the next set of extents.
NOTE: For first set of extents, header page is used to store XDES entries.
Following figure depicts an EXTENT Descriptor Page and Individual XDES Entries.
NOTE: previous and next pointers in LIST NODE, points to Previous/Next Extents in the list:
- FULL, NOT_FULL and FREE list, If this Extent belongs to a File Segment.
- FREE_FRAG, FULL_FRAG, FREE, otherwise.
One extent size = 1 MB
One page size = 16 KB
Total pages in one extent = 64 Pages
Total XDES entries in one XDES page = 256
Total Extents could be covered in one XDES page = 256
Total pages could be covered with one XDES Page = 16384
So once tablespace size exceeds 16384 pages, we need to allocate a new XDES page to keep metadata of further extents (to be allocated).
These are the pages which keep information about File Segments (FSEG). So before going into INDOE pages entry let’s understand File Segments.
A File Segment is a logical unit which is a collection of Pages and Extents. Following figure depicts the high level logical (not physical) view of a File Segment.
- An array of single pages (32 entries) which are allocated to this segment.
NOT FULL LIST
- Base node pointer to a linked-list of extents assigned to this segment and have at least one free page.
- Base node pointer to a linked-list of extents assigned to this segment and have no free page.
- Base node pointer to a linked-list of extents assigned to this segment and have all free pages.
NOTE 1: Pages in FRAG ARRAY belong to an Extent which is part of FRAG_FULL/FRAG_FREE list (i.e. have File Segment id 0) which are maintained at Tablespace Header Page. As described before, pages in these extents are shared by many segments. As depicted below:
NOTE 2: When an extent becomes free (i.e. no more used pages) it DOES NOT move to FREE list of File Segment. Instead, it moves to FREE list maintained at tablespace level.
Q: Why do we need File Segment?
A: It’s to make the page management easy. So once we delete the File Segment, we know which all extents and pages are to be freed.
As a table grows it will allocate individual pages in each file segment until the fragment array becomes full, and then switch to allocating 1 extent at a time, and eventually to allocating 4 extents at a time.
INODE PAGES (revisit)
Now, let’s come back to INODE pages. An INODE page keep entries for File Segments i.e. each entry in an INODE page represents one File Segment. Figure X (where it describes the high level view of File Segment) described above is actually an entry in INODE page and is called “INODE File Segment entry”. Let’s look at it again:
Q: How are File Segments used in index?
A: Each index in InnoDB, which is represented by a B+ tree, uses two File Segments.
- Leaf Page Segment: to store Leaf pages in B Tree.
- Non Leaf Page Segment: to store non-leaf (intermediate) pages in B Tree.
On a page, FSEG HEADER is the place where these two File Segments INODE entries information is stored. From these entries, we consult the INODE page to go the respective File Segment information.
NOTE: For an index, only root page’s FSEG HEADER is used to store these info because there is only one LEAF and one NON-LEAF Segment for an index. For rest of the pages in B Tree the FSEG HEADER is 0 filled.
A very nice illustrative example given by Jeremy Cole in his blog:
“For example in a newly created table, the only page that exists will be the root page, which is also a leaf page, but is present in the “internal” file segment (so that it doesn’t have to be moved later). The “leaf” file segment INODE lists and fragment array will all be empty. The “internal” file segment INODE lists will all be empty, and the single root page will be in the fragment array.”
Source : https://blog.jcole.us/2013/01/04/page-management-in-innodb-space-files/
So how does it all work when we CREATE/DROP an index?
As described above, as soon as an index is created (i.e. at least root page is created), two file segments will be allocated for the index. One for Leaf pages, which will have no page as of now and one for Non-Leaf Pages which will have only one single page allocated which is root page.
Now as and when index size grows, i.e. B Tree grows,
- New pages are allocated in FRAG ARRAY.
- Once demands crosses 32 pages, an extent is allocated to segment and is moved to FREE List.
- Once pages of this new extent are used, this extent is moved to NOT FULL List.
- Once all pages of this extent are used it’s moved to FULL List and a new extent is allocated (same as step 3).
Once we drop the index, from the root page, we would know the two File Segments. We go ahead and mark all the extents in those two File Segments as free.
Thank you for using MySQL !