MySQL 9.2.0
Source Code Documentation
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages Concepts
Row format

The handler interface uses two different formats.

Assume the following table:

CREATE TABLE tt (
c1 INT(11) DEFAULT NULL,
c2 INT(11) NOT NULL,
c3 VARCHAR(8) DEFAULT NULL,
c4 VARCHAR(8) NOT NULL,
c5 CHAR(8) DEFAULT NULL,
c6 CHAR(8) NOT NULL,
c7 VARCHAR(300) DEFAULT NULL,
KEY `i1` (`c1`,`c2`,`c3`,`c4`,`c5`,`c6`,`c7`)
);

write_row() format

This format is used in the input of the write_row() method. The same format is used by the storage engine in the output buffers of rnd_next() and index_read(). Consider the following INSERT statement:

INSERT INTO t VALUES (123, 123, 'abcd', 'abcd', 'abcd', 'abcd', 'abcd');

the row buffer that is passed to write_row() is 352 bytes:

hex raw description
f0 . NULLs bitmask 11110000 denoting that out of 4 columns that could possibly be NULL (c1, c3, c5 and c7) none is actually NULL.
7b { c1=123 stored in 4 bytes (little endian) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
7b { c2=123 stored in 4 bytes (little endian) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
04 . the length of the following data (for VARCHAR cells): 4
61 a c3='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
a5 . 4 wasted bytes for c3 (1/4)
a5 . from above (2/4)
a5 . from above (3/4)
a5 . from above (4/4)
04 . the length of the following data (for VARCHAR cells): 4
61 a c4='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
a5 . 4 wasted bytes for c4 (1/4)
a5 . from above (2/4)
a5 . from above (3/4)
a5 . from above (4/4)
61 a c5='abcd ' (padded to 8 bytes with spaces at the end) (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c6='abcd ' (padded to 8 bytes with spaces at the end) (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
04 . the length (occupying 2 bytes) of the following data (for VARCHAR cells): 4 (1/2)
00 . from above (2/2)
61 a c7='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
a5 . a5 repeats 296 times, wasted bytes for c7 (1/296)
a5 . from above (2/296)
.. .. ..
a5 . from above (296/296)

index_read() format

Consider the following SELECT statement:

SELECT * FROM t WHERE
c1=123 AND
c2=123 AND
c3='abcd' AND
c4='abcd' AND
c5='abcd' AND
c6='abcd' AND
c7='abcd';

the indexed cells buffer that is passed to index_read() is 350 bytes:

hex raw description
00 . c1 NULL byte, denoting that c1 is not NULL (would have been 01 is c1 was NULL)
7b { c1=123 stored in 4 bytes (little endian) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
7b { c2=123 stored in 4 bytes (little endian) (because c2 cannot be NULL there is no leading byte to indicate NULL or not NULL) (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
00 . c3 NULL byte, denoting that c3 is not NULL (would have been 01 if c3 was NULL)
04 . c3 length (4), always 2 bytes (1/2)
00 . from above (2/2)
61 a c3='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
00 . 4 wasted bytes for c3 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
04 . c4 length (4), always 2 bytes (no NULL byte for c4) (1/2)
00 . from above (2/2)
61 a c4='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
00 . 4 wasted bytes for c4 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
00 . c5 NULL byte, denoting that c5 is not NULL (would have been 01 if c5 was NULL)
61 a c5='abcd ' (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c6='abcd ' (c6 cannot be NULL) (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
00 . c7 NULL byte
04 . c7 length (4), always 2 bytes (1/2)
00 . from above (2/2)
61 a c7='abcd' - the actual data (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
00 . 296 wasted bytes for c7 (1/296)
00 . from above (2/296)
.. .. ..
00 . from above (296/296)

TempTable format

We introduce a new format, lets call it TempTable format, that fulfills the following:

  1. Does not waste space for VARCHAR cells
  2. It must be possible to convert from write_row() format to this new format
  3. It must be possible to convert the new format to write_row() format
  4. When a row is stored internally in this new format, it must be possible to compare its relevant cells to cells in the index_read() format without any heap memory allocation (malloc()/new) and without copying of user data (memcpy()).

For this we introduce a Cell class, which has the following properties:

  1. NULL byte (bool, 1 byte)
  2. user data length (uint32_t, 4 bytes)
  3. pointer to the user data (void*, 8 bytes on 64-bit machines)
  4. can compare itself to another cell
  5. can hash itself

A Cell object does not store actual user data, only a pointer to it. This way we can create cells that point inside the buffer provided to write_row() or point inside our own buffer, where the user data is copied for storage.

A row in the TempTable format consists of a set of Cells stored in one buffer, together with the actual user data. The size of a row is the size of all user data + 16 bytes overhead for each cell (for the Cell object).

In the above example both the row (write_row() format) and the indexed cells (index_read() format) would be represented like in the table below, in 148 bytes. Think of a POD

struct Cell {
bool is_null;
uint32_t len;
void* data;
};
hex raw description
00 . c1 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c1 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f1 . address in memory where c1 user data is stored (1/8)
f1 . from above (2/8)
f1 . from above (3/8)
f1 . from above (4/8)
f1 . from above (5/8)
f1 . from above (6/8)
f1 . from above (7/8)
f1 . from above (8/8)
00 . c2 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c2 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f2 . address in memory where c2 user data is stored (1/8)
f2 . from above (2/8)
f2 . from above (3/8)
f2 . from above (4/8)
f2 . from above (5/8)
f2 . from above (6/8)
f2 . from above (7/8)
f2 . from above (8/8)
00 . c3 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c3 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f3 . address in memory where c3 user data is stored (1/8)
f3 . from above (2/8)
f3 . from above (3/8)
f3 . from above (4/8)
f3 . from above (5/8)
f3 . from above (6/8)
f3 . from above (7/8)
f3 . from above (8/8)
00 . c4 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c4 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f4 . address in memory where c4 user data is stored (1/8)
f4 . from above (2/8)
f4 . from above (3/8)
f4 . from above (4/8)
f4 . from above (5/8)
f4 . from above (6/8)
f4 . from above (7/8)
f4 . from above (8/8)
00 . c5 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c5 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
08 . from above (4/4)
f5 . address in memory where c5 user data is stored (1/8)
f5 . from above (2/8)
f5 . from above (3/8)
f5 . from above (4/8)
f5 . from above (5/8)
f5 . from above (6/8)
f5 . from above (7/8)
f5 . from above (8/8)
00 . c6 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c6 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
08 . from above (4/4)
f6 . address in memory where c6 user data is stored (1/8)
f6 . from above (2/8)
f6 . from above (3/8)
f6 . from above (4/8)
f6 . from above (5/8)
f6 . from above (6/8)
f6 . from above (7/8)
f6 . from above (8/8)
00 . c7 NULL byte (00 means not NULL)
00 . 3 bytes padding (1/3)
00 . from above (2/3)
00 . from above (3/3)
00 . c7 length in 4 bytes in whatever is the machine's native byte order (1/4)
00 . from above (2/4)
00 . from above (3/4)
04 . from above (4/4)
f7 . address in memory where c7 user data is stored (1/8)
f7 . from above (2/8)
f7 . from above (3/8)
f7 . from above (4/8)
f7 . from above (5/8)
f7 . from above (6/8)
f7 . from above (7/8)
f7 . from above (8/8)
7b { c1=123, the address of this is f1f1f1f1 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
7b { c2=123, the address of this is f2f2f2f2 (1/4)
00 . from above (2/4)
00 . from above (3/4)
00 . from above (4/4)
61 a c3='abcd', the address of this is f3f3f3f3 (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
61 a c4='abcd', the address of this is f4f4f4f4 (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)
61 a c5='abcd ', the address of this is f5f5f5f5 (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c6='abcd ', the address of this is f6f6f6f6 (1/8)
62 b from above (2/8)
63 c from above (3/8)
64 d from above (4/8)
20 from above (5/8)
20 from above (6/8)
20 from above (7/8)
20 from above (8/8)
61 a c7='abcd', the address of this is f7f7f7f7 (1/4)
62 b from above (2/4)
63 c from above (3/4)
64 d from above (4/4)