Within the MySQL data directory, the InnoDB storage engine creates two types of files — the data files and the redo log files. Each data file (or ibd file) belongs to exactly one tablespace. Each tablespace is given a unique identifier called the space_id. One tablespace can have 1 or more data files. If a tablespace has more than one data file, then the data files have a specific order or sequence. The data files can be thought of as being concatenated to each other in that specific order.
The data file is made up of a series of equal sized pages. Each page in the data file is given a unique number identifier called the page number (page_no). The first page of the first ibd file is given the page_no of 0. The page number of the first page of the second ibd file of the tablespace is one more than the page number of the last page of the first ibd file and so on. So the page number is unique within a tablespace even if it has multiple data files. Given the combination of “space_id, page_no”, a single page in any data file can be uniquely identified within the given data directory.
The data files contain user data and system data. The user data is what the database applications and users will store in the database. The system data is what InnoDB creates and maintains in order to implement its storage engine features (for example: the page headers, tablespace headers, segment information, and undo logs). Each individual page is also categorized into a given page type based on how that particular page is used. One such page type is an extent descriptor (xdes) page. In this post we’ll examine the details of an extent descriptor page; we’ll look at both the its structure and its contents.
An Extent
An InnoDB data file or ibd file is a sequence of equal sized pages (supported page sizes). These pages are grouped into extents and segments. One extent is a contiguous sequence of pages in an ibd file. The number of pages belonging to an extent depends on the page size. The following table provides the relationship between page size and extent size (in pages and megabytes). The extent size is accessed by using the macro FSP_EXTENT_SIZE
in the source code.
Page Size | Extent Size in Pages | Extent Size in MB |
---|---|---|
4KB | 256 | 1MB |
8KB | 128 | 1MB |
16KB | 64 | 1MB |
32KB | 64 | 2MB |
64KB | 64 | 4MB |
An extent is the basic unit of file space allocation in InnoDB. A segment is made up of a sequence of extents. A segment can contain non-contiguous extents. Refer to Data Organization in InnoDB for more details.
An Extent Descriptor
An extent descriptor (xdes) provides information about an extent. It contains the following pieces of information:
- The identifier of the segment to which the extent belongs.
- The location of the previous extent descriptor within the same segment.
- The location of the next extent descriptor within the same segment.
- The state of the extent.
- The allocation bitmap of the pages within the extent. The size of this bitmap depends on the number of pages in the extent. For each page, 2 bits (
XDES_BITS_PER_PAGE
) are used. So if the extent has 64 pages, then the size of this allocation bitmap per extent is 16 bytes.
For each page, the allocation bitmap has 2 bits — XDES_FREE_BIT
and XDES_CLEAN_BIT
. The XDES_CLEAN_BIT
is currently unused. The XDES_FREE_BIT
tells you whether the page is free or it’s currently in use. So the extent descriptor is what tells us whether each page is free or used within that extent. Thus the extent descriptor helps to locate free pages within an extent.
The extent descriptor also tells us the state of an extent. The extent can be in any of the following states:
-
XDES_FREE
— The extent is in the free list and the space can be used. -
XDES_FSEG
— The extent belongs to a segment. -
XDES_FREE_FRAG
— The extent belongs to the free fragment list. -
XDES_FULL_FRAG
— The extent belongs to the full fragment list.
When an extent is yet to be used by any segments, it will be in the XDES_FREE
state, meaning that it is “free space”. When the whole extent is allocated to a segment, then it will go into the XDES_FSEG
state. There are situations in which InnoDB will decide that individual pages within a particular extent can be allocated to different segments. Such an extent will then be put in the free fragment list using the XDES_FREE_FRAG
state. When a fragmented extent is completely used (no free pages), then it will go into the XDES_FULL_FRAG
state.
The Structure of an Extent Descriptor
The following C++ code snippet is provided as a means to inspect the structure of an extent descriptor (xdes). All other structures needed to define an xdes is also provided in this code snippet. Note the use of the “packed” attribute for the structures. These are necessary to avoid unnecessary padding. (This also means that you should be aware of data structure alignment issues.) Also, note that the data stored in the file will be in network byte order, thus any necessary byte order conversions must be done while reading and writing to the file. The complete compilable sample program is given in the appendix. Here’s a snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
/* The segment identifier */ struct segid_t { uint32_t high; uint32_t low; }; /* The file address */ struct fil_addr_t { uint32_t page; /* page number within a space */ uint16_t boffset; /* byte offset within the page */ } __attribute__((__packed__)); /* Number of bits used per page in the allocation bitmap */ #define XDES_BITS_PER_PAGE 2 /* Given number of bits, calculate bytes */ #define UT_BITS_IN_BYTES(b) (((b) + 7) / 8) /* Macro to calculate the bitmap size */ #define XDES_BITMAP_SIZE UT_BITS_IN_BYTES(FSP_EXTENT_SIZE * XDES_BITS_PER_PAGE) /* Extent Descriptor */ struct xdes_t { segid_t segid; /* The identifier of the segment to which this extent belongs */ fil_addr_t prev; /* The list node data structure for the descriptors */ fil_addr_t next; /* The list node data structure for the descriptors */ uint32_t state; /* contains state information of the extent */ unsigned char bitmap[XDES_BITMAP_SIZE]; /* Descriptor bitmap of the pages in the extent */ } __attribute__((__packed__)); |
An Extent Descriptor Page
Now let’s discuss the contents of an extent descriptor page. An extent descriptor page contains primarily an array of extent descriptors. The primary purpose of this page type is to maintain the allocation information of the pages within the extents that they describe. Each page of the tablespace whose page number is a multiple of the page size is an extent descriptor (xdes) page. If the page size is 4K, then the page numbers 0, 4096, 8192, 12288 and so on are the xdes pages. The first extent descriptor page is referred to by using the macro FSP_XDES_OFFSET
within the InnoDB source code. The page type of an extent descriptor page is denoted with the FIL_PAGE_TYPE_XDES
type. One xdes page will contain an array of extent descriptors, and each extent descriptor will provide allocation information of one extent. The extent descriptor page contains the following items:
- The file page header (38 bytes).
- The file space header (112 bytes)
- An array of extent descriptors (variable, depending on the page size).
- The file page trailer (8 bytes).
The following table provides information about the number of extent descriptors that will be stored in an extent descriptor page, for each given page size.
Page Size (in KB) | Extent Size in Pages | Extent Descriptor Page Numbers | Allocation Bitmap Size in One Extent Descriptor | Size of 1 Extent Descriptor | Number of Extent Descriptors stored in an xdes page |
---|---|---|---|---|---|
4KB | 256 | 0, 4096, 8192, 12288, ... | 64 bytes | 88 bytes | 16 |
8KB | 128 | 0, 8192,16384, 24576, ... | 32 bytes | 56 bytes | 64 |
16KB (default) | 64 | 0, 16384, 32768, 49152, ... | 16 bytes | 40 bytes | 256 |
32KB | 64 | 0, 32768, 65536, ... | 16 bytes | 40 bytes | 512 |
64KB | 64 | 0, 65536, 131072, ... | 16 bytes | 40 bytes | 1024 |
Accessing an Extent Descriptor
With the given information, one can now print the extent descriptors stored in the xdes pages of an InnoDB data file. Here is a short program (mainly for demonstration purposes) to print the state of the first extent descriptor in the first xdes page of an InnoDB data file (ibd file). An important point to remember is that data stored on the disk will be in network byte order, so while reading data into memory, appropriate byte order conversion must be done. This program takes an ibd file as an argument (if it’s the system tablespace, then provide the first ibd file of the system tablespace). And yes, I am not checking the value of argc! Again, the complete sample program is provided in the appendix. Here’s a snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
typedef unsigned char byte; int main(int argc, char* argv[]) { byte page[UNIV_PAGE_SIZE]; int fd = open(argv[1], O_RDONLY); if (fd < 0) { return(0); } ssize_t bytes = read(fd, page, UNIV_PAGE_SIZE); if (bytes != UNIV_PAGE_SIZE) { return(0); } /* Access the first xdes as follows */ const size_t page_header_size = 38; const size_t space_header_size = 112; xdes_t* obj = reinterpret_cast<xdes_t*> (&page[page_header_size + space_header_size]); /* Data stored in the file will be in network byte order */ obj->state = ntohl(obj->state); std::cout << "xdes state: " << obj->state << std::endl; } |
In the above sample code, the first xdes page (with page number 0) is being read into memory. The two headers (the page header and the space header) are being skipped and the first extent descriptor is read. The byte order conversion is then performed and the state of the extent descriptor is printed.
I will leave it as an exercise for the reader to print the complete extent descriptor. 🙂
Conclusion
This article provided information about the extent descriptor (xdes) pages in InnoDB. The list of free and used pages within data (ibd file) files can be identified by making use of the allocation bitmap within an xdes page.
I hope that this has been interesting and proves helpful! Please let me know if you have any questions or comments. As always, THANK YOU for using MySQL!
Appendix: The Complete Sample Program
For the convenience of the reader, I am providing the complete compilable code sample by combining the code snippets above. I have hard coded the page size to the default 16K bytes. I have tested it on Ubuntu 14.10 (Utopic Unicorn).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
#include <iostream> #include <stdint.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <arpa/inet.h> /* The segment identifier */ struct segid_t { uint32_t high; uint32_t low; }; /* The file address */ struct fil_addr_t { uint32_t page; /* page number within a space */ uint16_t boffset; /* byte offset within the page */ } __attribute__((__packed__)); /* Number of bits used per page in the allocation bitmap */ #define XDES_BITS_PER_PAGE 2 /* Setting the page size to 16K */ #define UNIV_PAGE_SIZE 16396 /* Since the page size is 16K, the extent size will be 64 */ #define FSP_EXTENT_SIZE 64 /* Given number of bits, calculate bytes */ #define UT_BITS_IN_BYTES(b) (((b) + 7) / 8) /* Macro to calculate the bitmap size */ #define XDES_BITMAP_SIZE UT_BITS_IN_BYTES(FSP_EXTENT_SIZE * XDES_BITS_PER_PAGE) /* Extent Descriptor */ struct xdes_t { segid_t segid; /* The identifier of the segment to which this extent belongs */ fil_addr_t prev; /* The list node data structure for the descriptors */ fil_addr_t next; /* The list node data structure for the descriptors */ uint32_t state; /* contains state information of the extent */ unsigned char bitmap[XDES_BITMAP_SIZE]; /* Descriptor bitmap of the pages in the extent */ } __attribute__((__packed__)); typedef unsigned char byte; int main(int argc, char* argv[]) { byte page[UNIV_PAGE_SIZE]; if (argc < 2) { return(0); } int fd = open(argv[1], O_RDONLY); if (fd < 0) { return(0); } ssize_t bytes = read(fd, page, UNIV_PAGE_SIZE); if (bytes != UNIV_PAGE_SIZE) { return(0); } /* Access the first xdes as follows */ const size_t page_header_size = 38; const size_t space_header_size = 112; xdes_t* obj = reinterpret_cast<xdes_t*> (&page[page_header_size + space_header_size]); /* Data stored in the file will be in network byte order */ obj->state = ntohl(obj->state); std::cout << "xdes state: " << obj->state << std::endl; return(0); } |
Disclaimer: The code snippet and sample program in this article is provided for educational purposes only. While I tested the code on my laptop, it is not guaranteed to work on all platforms. While the above sample program uses GCC‘s __attribute__((__packed__)) , InnoDB does not use them. So using the packed attribute could potentially introduce data alignment issues, thus making the code somewhat platform dependent.