MySQL 9.1.0
Source Code Documentation
Row format

The handler interface uses two different formats.

Assume the following table:

CREATE TABLE tt (
c1 INT(11) DEFAULT NULL,
c2 INT(11) NOT NULL,
c3 VARCHAR(8) DEFAULT NULL,
c4 VARCHAR(8) NOT NULL,
c5 CHAR(8) DEFAULT NULL,
c6 CHAR(8) NOT NULL,
c7 VARCHAR(300) DEFAULT NULL,
KEY `i1` (`c1`,`c2`,`c3`,`c4`,`c5`,`c6`,`c7`)
);

write_row() format

This format is used in the input of the write_row() method. The same format is used by the storage engine in the output buffers of rnd_next() and index_read(). Consider the following INSERT statement:

INSERT INTO t VALUES (123, 123, 'abcd', 'abcd', 'abcd', 'abcd', 'abcd');

the row buffer that is passed to write_row() is 352 bytes:

hex raw description
f0 . NULLs bitmask 11110000 denoting that out of 4 columns that could possibly be NULL (c1, c3, c5 and c7) none is actually NULL.
7b { c1=123 stored in 4 bytes (little endian) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
7b { c2=123 stored in 4 bytes (little endian) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
04 . the length of the following data (for VARCHAR cells): 4
61 a c3='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
a5 . 4 wasted bytes for c3 (1/4)
a5 . from above (2/4)
a5 . from above (3/4)
a5 . from above (4/4)
04 . the length of the following data (for VARCHAR cells): 4
61 a c4='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
a5 . 4 wasted bytes for c4 (1/4)
a5 . from above (2/4)
a5 . from above (3/4)
a5 . from above (4/4)
61 a c5='abcd ' (padded to 8 bytes with spaces at the end) (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c6='abcd ' (padded to 8 bytes with spaces at the end) (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
04 . the length (occupying 2 bytes) of the following data (for VARCHAR cells): 4 (1/2)
00 . from above (2/2)
61 a c7='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
a5 . a5 repeats 296 times, wasted bytes for c7 (1/296)
a5 . from above (2/296)
.. .. ..
a5 . from above (296/296)

index_read() format

Consider the following SELECT statement:

SELECT * FROM t WHERE
c1=123 AND
c2=123 AND
c3='abcd' AND
c4='abcd' AND
c5='abcd' AND
c6='abcd' AND
c7='abcd';

the indexed cells buffer that is passed to index_read() is 350 bytes:

hex raw description
00 . c1 NULL byte, denoting that c1 is not NULL (would have been 01 is c1 was NULL)
7b { c1=123 stored in 4 bytes (little endian) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
7b { c2=123 stored in 4 bytes (little endian) (because c2 cannot be NULL there is no leading byte to indicate NULL or not NULL) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
00 . c3 NULL byte, denoting that c3 is not NULL (would have been 01 if c3 was NULL)
04 . c3 length (4), always 2 bytes (1/2)
00 . from above (2/2)
61 a c3='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
00 . 4 wasted bytes for c3 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
04 . c4 length (4), always 2 bytes (no NULL byte for c4) (1/2)
00 . from above (2/2)
61 a c4='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
00 . 4 wasted bytes for c4 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
00 . c5 NULL byte, denoting that c5 is not NULL (would have been 01 if c5 was NULL)
61 a c5='abcd ' (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c6='abcd ' (c6 cannot be NULL) (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
00 . c7 NULL byte
04 . c7 length (4), always 2 bytes (1/2)
00 . from above (2/2)
61 a c7='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
00 . 296 wasted bytes for c7 (1/296)
00 . from above (2/296)
.. .. ..
00 . from above (296/296)

TempTable format

We introduce a new format, lets call it TempTable format, that fulfills the following:

  1. Does not waste space for VARCHAR cells
  2. It must be possible to convert from write_row() format to this new format
  3. It must be possible to convert the new format to write_row() format
  4. When a row is stored internally in this new format, it must be possible to compare its relevant cells to cells in the index_read() format without any heap memory allocation (malloc()/new) and without copying of user data (memcpy()).

For this we introduce a Cell class, which has the following properties:

  1. NULL byte (bool, 1 byte)
  2. user data length (uint32_t, 4 bytes)
  3. pointer to the user data (void*, 8 bytes on 64-bit machines)
  4. can compare itself to another cell
  5. can hash itself

A Cell object does not store actual user data, only a pointer to it. This way we can create cells that point inside the buffer provided to write_row() or point inside our own buffer, where the user data is copied for storage.

A row in the TempTable format consists of a set of Cells stored in one buffer, together with the actual user data. The size of a row is the size of all user data + 16 bytes overhead for each cell (for the Cell object).

In the above example both the row (write_row() format) and the indexed cells (index_read() format) would be represented like in the table below, in 148 bytes. Think of a POD

struct Cell {
bool is_null;
uint32_t len;
void* data;
};
hex raw description
00 . c1 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c1 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f1 . address in memory where c1 user data is stored (1/8)
f1 . from above (2/8)
f1 . from above (3/8)
f1 . from above (4/8)
f1 . from above (5/8)
f1 . from above (6/8)
f1 . from above (7/8)
f1 . from above (8/8)
00 . c2 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c2 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f2 . address in memory where c2 user data is stored (1/8)
f2 . from above (2/8)
f2 . from above (3/8)
f2 . from above (4/8)
f2 . from above (5/8)
f2 . from above (6/8)
f2 . from above (7/8)
f2 . from above (8/8)
00 . c3 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c3 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f3 . address in memory where c3 user data is stored (1/8)
f3 . from above (2/8)
f3 . from above (3/8)
f3 . from above (4/8)
f3 . from above (5/8)
f3 . from above (6/8)
f3 . from above (7/8)
f3 . from above (8/8)
00 . c4 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c4 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f4 . address in memory where c4 user data is stored (1/8)
f4 . from above (2/8)
f4 . from above (3/8)
f4 . from above (4/8)
f4 . from above (5/8)
f4 . from above (6/8)
f4 . from above (7/8)
f4 . from above (8/8)
00 . c5 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c5 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
08 . from above (4/4)
f5 . address in memory where c5 user data is stored (1/8)
f5 . from above (2/8)
f5 . from above (3/8)
f5 . from above (4/8)
f5 . from above (5/8)
f5 . from above (6/8)
f5 . from above (7/8)
f5 . from above (8/8)
00 . c6 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c6 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
08 . from above (4/4)
f6 . address in memory where c6 user data is stored (1/8)
f6 . from above (2/8)
f6 . from above (3/8)
f6 . from above (4/8)
f6 . from above (5/8)
f6 . from above (6/8)
f6 . from above (7/8)
f6 . from above (8/8)
00 . c7 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c7 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f7 . address in memory where c7 user data is stored (1/8)
f7 . from above (2/8)
f7 . from above (3/8)
f7 . from above (4/8)
f7 . from above (5/8)
f7 . from above (6/8)
f7 . from above (7/8)
f7 . from above (8/8)
7b { c1=123, the address of this is f1f1f1f1 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
7b { c2=123, the address of this is f2f2f2f2 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
61 a c3='abcd', the address of this is f3f3f3f3 (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
61 a c4='abcd', the address of this is f4f4f4f4 (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
61 a c5='abcd ', the address of this is f5f5f5f5 (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c6='abcd ', the address of this is f6f6f6f6 (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c7='abcd', the address of this is f7f7f7f7 (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)