MySQL Internals Manual  /  MyISAM Storage Engine  /  The .MYI File

21.2 The .MYI File

A .MYI file for a MyISAM table contains the table's indexes.

The .MYI file has two parts: the header information and the key values. So the next sub-sections will be "The .MYI Header" and "The .MYI Key Values".

The .MYI Header

A .MYI file begins with a header, with information about options, about file sizes, and about the "keys". In MySQL terminology, a "key" is something that you create with CREATE [UNIQUE] INDEX.

Program files which read and write .MYI headers are in the ./myisam directory: mi_open.c has the routines that write each section of the header, mi_create.c has a routine that calls the mi_open.c routines in order, and myisamdef.h has structure definitions corresponding to what we're about to describe.

These are the main header sections:

Section                       Occurrences
-------                       -----------
state                         Occurs 1 time
base                          Occurs 1 time
keydef (including keysegs)    Occurs once for each key
recinfo                       Occurs once for each field

Now we will look at each of these sections, showing each field.

We are going to use an example table throughout the description. To make the example table, we executed these statements:

  CREATE TABLE T (S1 CHAR(1), S2 CHAR(2), S3 CHAR(3));
  CREATE UNIQUE INDEX I1 ON T (S1);
  CREATE INDEX I2 ON T (S2,S3);
  INSERT INTO T VALUES ('1', 'aa', 'b');
  INSERT INTO T VALUES ('2', 'aa', 'bb');
  INSERT INTO T VALUES ('3', 'aa', 'bbb');
  DELETE FROM T WHERE S1 = '2';

We took a hexadecimal dump of the resulting file, T.MYI.

In all the individual descriptions below, the column labeled Dump From Example File has the exact bytes that are in T.MYI. You can verify that by executing the same statements and looking at a hexadecimal dump yourself. With Linux this is possible using od -h T.MYI; with Windows you can use the command-line debugger.

Along with the typical value, we may include a comment. The comment usually explains why the value is what it is. Sometimes the comment is derived from the comments in the source code.

state

This section is written by mi_open.c, mi_state_info_write().

Name                         Size Dump From Example File  Comment
----                         ---- ----------------------  -------

file_version                  4   FE FE 07 01             from myisam_file_magic
options                       2   00 02                   HA_OPTION_COMPRESS_RECORD
                                                          etc.
header_length                 2   01 A2                   this header example has
                                                          0x01A2 bytes
state_info_length             2   00 B0                   = MI_STATE_INFO_SIZE
                                                          defined in myisamdef.h
base_info_length              2   00 64                   = MI_BASE_INFO_SIZE
                                                          defined in myisamdef.h
base_pos                      2   00 D4                   = where the base
                                                          section starts
key_parts                     2   00 03                   a key part is a column
                                                          within a key
unique_key_parts              2   00 00                   key-parts+unique-parts
keys                          1   02                      here are 2 keys --
                                                          I1 and I2
uniques                       1   00                      number of hash unique
                                                          keys used internally
                                                          in temporary tables
                                                          (nothing to do with
                                                          'UNIQUE' definitions)
language                      1   08                      "language for indexes"
max_block_size                1   01
fulltext_keys                 1   00                      # of fulltext keys.
                                                          = 0 if version <= 4.0
not_used                      1   00                      to align to 8-byte
                                                          boundary

state->open_count             2   00 01
state->changed                1   39                      set if table updated;
                                                          reset if shutdown (so
                                                          one can examine this
                                                          to see if there was an
                                                          update without proper
                                                          shutdown)
state->sortkey                1   FF                      "sorted by this key"
                                                          (not used)
state->state.records          8   00 00 00 00 00 00 00 02 number of actual,
                                                          un-deleted, records
state->state.del              8   00 00 00 00 00 00 00 01 # of deleted records
state->split                  8   00 00 00 00 00 00 00 03 # of "chunks" (e.g.
                                                          records or spaces left
                                                          after record deletion)
state->dellink                8   00 00 00 00 00 00 00 07 "Link to next removed
                                                          "block". Initially =
                                                          HA_OFFSET_ERROR
state->state.key_file_length  8   00 00 00 00 00 00 0c 00 2048
state->state.data_file_length 8   00 00 00 00 00 00 00 15 = size of .MYD file
state->state.empty            8   00 00 00 00 00 00 00 00
state->state.key_empty        8   00 00 00 00 00 00 00 00
state->auto_increment         8   00 00 00 00 00 00 00 00
state->checksum               8   00 00 00 00 00 00 00 00
state->process                4   00 00 09 E6             from getpid(). process
                                                          of last update
state->unique                 4   00 00 00 0B             initially = 0
state->status                 4   00 00 00 00
state->update_count           4   00 00 00 04             updated for each write
                                                          lock (there were 3
                                                          inserts + 1 delete,
                                                          total 4 operations)
state->key_root               8   00 00 00 00 00 00 04 00 offset in file where
                                                          I1 keys start, can be
                                                          = HA_OFFSET_ERROR
                                  00 00 00 00 00 00 08 00 state->key_root occurs
                                                          twice because there
                                                          are two keys
state->key_del                8   FF FF FF FF FF FF FF FF delete links for keys
                                                          (occurs many times if
                                                          many delete links)
state->sec_index_changed      4   00 00 00 00             sec_index = secondary
                                                          index (presumably)
                                                          not currently used
state->sec_index_used         4   00 00 00 00             "which extra indexes
                                                          are in use"
                                                          not currently used
state->version                4   3F 3F EB F7             "timestamp of create"
state->key_map                8   00 00 00 03             "what keys are in use"
state->create_time            8   00 00 00 00 3F 3F EB F7 "time when database
                                                          created" (actually:
                                                          time when file made)
state->recover_time           8   00 00 00 00 00 00 00 00 "time of last recover"
state->check_time             8   00 00 00 00 3F 3F EB F7 "time of last check"
state->rec_per_key_rows       8   00 00 00 00 00 00 00 00
state->rec_per_key_parts      4   00 00 00 00             (key_parts = 3, so
                                  00 00 00 00              rec_per_key_parts
                                  00 00 00 00              occurs 3 times)

base

This section is written by mi_open.c, mi_base_info_write(). The corresponding structure in myisamdef.h is MI_BASE_INFO.

In our example T.MYI file, the first byte of the base section is at offset 0x00d4. That's where it's supposed to be, according to the header field base_pos (above).

Name                         Size Dump From Example File  Comment
----                         ---- ----------------------  -------

base->keystart               8    00 00 00 00 00 00 04 00 keys start at offset
                                                          1024 (0x0400)
base->max_data_file_length   8    00 00 00 00 00 00 00 00
base->max_key_file_length    8    00 00 00 00 00 00 00 00
base->records                8    00 00 00 00 00 00 00 00
base->reloc                  8    00 00 00 00 00 00 00 00
base->mean_row_length        4    00 00 00 00
base->reclength              4    00 00 00 07             length(s1)+length(s2)
                                                          +length(s3)=7
base->pack_reclength         4    00 00 00 07
base->min_pack_length        4    00 00 00 07
base->max_pack_length        4    00 00 00 07
base->min_block_length       4    00 00 00 14
base->fields                 4    00 00 00 04             4 fields: 3 defined,
                                                          plus 1 extra
base->pack_fields            4    00 00 00 00
base->rec_reflength          1    04
base->key_reflength          1    04
base->keys                   1    02                      was 0 at start
base->auto_key               1    00
base->pack_bits              2    00 00
base->blobs                  2    00 00
base->max_key_block_length   2    04 00                   length of block = 1024
                                                          bytes (0x0400)
base->max_key_length         2    00 10                   including length of
                                                          pointer
base->extra_alloc_bytes      2    00 00
base->extra_alloc_procent    1    00
base->raid_type              1    00
base->raid_chunks            2    00 00
base->raid_chunksize         4    00 00 00 00
[extra] that is, filler      6    00 00 00 00 00 00

keydef

This section is written by mi_open.c, mi_keydef_write(). The corresponding structure in myisamdef.h is MI_KEYDEF.

This is a multiple-occurrence structure. Since there are two indexes in our example (I1 and I2), we will see that keydef occurs two times below. There is a subordinate structure, keyseg, which also occurs multiple times (once within the keydef for I1 and two times within the keydef for I2).

Name                         Size Dump From Example File  Comment
----                         ---- ----------------------  -------

/* key definition for I1 */

keydef->keysegs              1    01                      there is 1 keyseg (for
                                                          column S1).
keydef->key_alg              1    01                      algorithm = Rtree or
                                                          Btree
keydef->flag                 2    00 49                   HA_NOSAME +
                                                          HA_SPACE_PACK_USED +
                                                          HA_NULL_PART_KEY
keydef->block_length         2    04 00                   that is, 1024
key def->keylength           2    00 06                   field-count+sizeof(S1)
                                                          sizeof(ROWID)
keydef->minlength            2    00 06
keydef->maxlength            2    00 06
  /* keyseg for S1 in I1 */
  keyseg->type               1    01                      /* I1(S1) size(S1)=1,
                                                             column = 1 */
                                                          = HA_KEYTYPE_TEXT
  keyseg->language           1    08
  keyseg->null_bit           1    02
  keyseg->bit_start          1    00
  keyseg->bit_end            1    00
  [0] that is, filler        1    00
  keyseg->flag               2    00 14                   HA_NULL_PART +
                                                          HA_PART_KEY
  keyseg->length             2    00 01                   length(S1) = 1
  keyseg->start              4    00 00 00 01             offset in the row
  keyseg->null_pos           4    00 00 00 00

/* key definition for I2 */

keydef->keysegs              1    02                      keysegs=2, for columns
                                                          S2 and S3
keydef->key_alg              1    01                      algorithm = Rtree or
                                                          Btree
keydef->flag                 2    00 48                   HA_SPACE_PACK_USED +
                                                          HA_NULL_PART_KEY
keydef->block_length         2    04 00                   that is, 1024
key def->keylength           2    00 0B                   field-count+ sizeof(all fields)+
                                                            sizeof(RID)
keydef->minlength            2    00 0B
keydef->maxlength            2    00 0B
  /* keyseg for S2 in I2 */
  keyseg->type               1    01                      /* I2(S2) size(S2)=2,
                                                             column = 2 */
  keyseg->language           1    08
  keyseg->null_bit           1    04
  keyseg->bit_start          1    00
  keyseg->bit_end            1    00
  [0] that is, filler        1    00
  keyseg->flag               2    00 14                   HA_NULL_PART +
                                                          HA_PART_KEY
  keyseg->length             2    00 02                   length(S2) = 2
  keyseg->start              4    00 00 00 02
  keyseg->null_pos           4    00 00 00 00
  /* keyseg for S3 in I2 */
  keyseg->type               1    01                      /* I2(S3) size(S3)=3,
                                                             column = 3 */
  keyseg->language           1    08
  keyseg->null_bit           1    08
  keyseg->bit_start          1    00
  keyseg->bit_end            1    00
  [0] thatis, filler         1    00
  keyseg->flag               2    00 14                   HA_NULL_PART +
                                                          HA_PART_KEY
  keyseg->length             2    00 03                   length(S3) = 3
  keyseg->start              4    00 00 00 04
  keyseg->null_pos 4 00 00 00 00

recinfo

The recinfo section is written by mi_open.c, mi_recinfo_write(). The corresponding structure in myisamdef.h is MI_COLUMNDEF.

This is another multiple-occurrence structure. It appears once for each field that appears in a key, including an extra field that appears at the start and has flags (for deletion and for null fields).

Name                         Size Dump From Example File  Comment
----                         ---- ----------------------  -------

recinfo->type                2    00 00                   extra
recinfo->length              2    00 01
recinfo->null_bit            1    00
recinfo->null_pos            2    00 00

recinfo->type                2    00 00                   I1 (S1)
recinfo->length              2    00 01
recinfo->null_bit            1    02
recinfo->null_pos            2    00 00

recinfo->type                2    00 00                   I2 (S2)
recinfo->length              2    00 02
recinfo->null_bit            1    04
recinfo->null_pos            2    00 00

recinfo->type                2    00 00                   I2 (S3)
recinfo->length              2    00 03
recinfo->null_bit            1    08
recinfo->null_pos 2 00 00 

We are now at offset 0xA2 within the file T.MYI. Notice that the value of the third field in the header, header_length, is 0xA2. Anything following this point, up till the first key value, is filler.

The .MYI Key Values

And now we look at the part which is not the information header: we look at the key values. The key values are in blocks (MySQL's term for pages). A block contains values from only one index. To continue our example: there is a block for the I1 key values, and a block for the I2 key values.

According to the header information (state->key_root above), the I1 block starts at offset 0x0400 in the file, and the I2 block starts at offset 0x0800 in the file.

At offset 0x0400 in the file, we have this:

Name                         Size Dump From Example File  Comment
----                         ---- ----------------------  -------

(block header)               2    00 0E                   = size (inclusive)
                                                          (first bit of word =
                                                          0 meaning this is a
                                                          B-Tree leaf, see the
                                                          mi_test_if_nod macro)
(first key value)            2    01 31                   Value is "1" (0x31).
(first key pointer)          4    00 00 00 00             Pointer is to Record
                                                          #0000.
(second key value)           2    01 33                   Value is "3" (0x33).
(second key pointer)         4    00 00 00 02             Pointer is to Record
                                                          #0002.
(junk)                       1010 .. .. .. .. .. .. ..    rest of the 1024-byte
                                                          block is unused

At offset 0800x in the file, we have this:

Name                         Size Dump From Example File  Comment
----                         ---- ----------------------  -------

(block header)               2    00 18                   = size (inclusive)
(first key value)            7    01 61 61 01 62 20 20    Value is "aa/b  "
(first key pointer)          4    00 00 00 00             Pointer is to Record
                                                          #0000.
(second key value)           7    01 61 61 01 62 62 62    Value is "aa/bbb"
(second key pointer)         4    00 00 00 02             Pointer is to Record
                                                          #0002.
(junk)                       1000 .. .. .. .. .. .. ..    rest of the 1024-byte
                                                          block is unused

From the above illustrations, these facts should be clear:

  • Each key contains the entire contents of all the columns, including trailing spaces in CHAR columns. There is no front truncation. There is no back truncation. (There can be space truncation if keyseg->flag HA_SPACE_PACK flag is on.)

  • For fixed-row tables: The pointer is a fixed-size (4-byte) number which contains an ordinal row number. The first row is Record #0000. This item is analogous to the ROWID, or RID (row identifier), which other DBMSs use. For dynamic-row tables: The pointer is an offset in the .MYD file.

  • The normal block length is 0x0400 (1024) bytes.

These facts are not illustrated, but are also clear:

  • If a key value is NULL, then the first byte is 0x00 (instead of 001 as in the preceding examples) and that's all. Even for a fixed CHAR(3) column, the size of the key value is only 1 byte.

  • Initially the junk at the end of a block is filler bytes, value = A5. If MySQL shifts key values up after a DELETE, the end of the block is not overwritten.

  • A normal block is at least 65% full, and typically 80% full. (This is somewhat denser than the typical B-tree algorithm would cause, it is thus because myisamchk -rq will make blocks nearly 100% full.)

  • There is a pool of free blocks, which increases in size when deletions occur. If all blocks have the same normal block length (1024), then MySQL will always use the same pool.

  • The maximum number of keys is 32 (MI_MAX_KEY). The maximum number of segments in a key is 16 (MI_MAX_KEY_SEG). The maximum key length is 500 (MI_MAX_KEY_LENGTH). The maximum block length is 16384 (MI_MAX_KEY_BLOCK_LENGTH). All these MI_... constants are expressed by #defines in the myisamdef.h file.


User Comments
Sign Up Login You must be logged in to post a comment.