21.1.1 Introduction

When you say:


MySQL creates files named Table1.MYD ("MySQL Data"), Table1.MYI ("MySQL Index"), and Table1.frm ("Format"). These files will be in the directory:


For example, if you use Linux, you might find the files in the /usr/local/var/test directory (assuming your database name is test). if you use Windows, you might find the files in the \mysql\data\test\ directory.

Let's look at the .MYD Data file (MyISAM SQL Data file) more closely. There are three possible formats — fixed, dynamic, and packed. First, let's discuss the fixed format.

  • Page Size

    Unlike most DBMSs, MySQL doesn't store on disk using pages. Therefore you will not see filler space between rows. (Reminder: This does not refer to BDB and InnoDB tables, which do use pages).

  • Record Header

    The minimal record header is a set of flags:

    • "X bit" = 0 if row is deleted, = 1 if row is not deleted

    • "Null Bits" = 1 if row contains any null fields, or = 0 otherwise.

    • "Filler Bits" = 1

The length of the record header is thus:

(1 + number of NULL columns + 7) / 8 bytes

After the header, all columns are stored in the order that they were created, which is the same order that you would get from SHOW COLUMNS.

Here's an example. Suppose you say:

CREATE TABLE Table1 (column1 CHAR(1), column2 CHAR(1), column3 CHAR(1));
INSERT INTO Table1 VALUES ('a', 'b', 'c');
INSERT INTO Table1 VALUES ('d', NULL, 'e');

A CHAR(1) column takes precisely one byte (plus one bit of overhead that is assigned to every column — I'll describe the details of column storage later). So the file Table1.MYD looks like this:

Hexadecimal Display of Table1.MYD file

F1 61 62 63 00 F5 64 20 65 00              ... .abc..d e.

Here's how to read this hexadecimal-dump display:

  • The hexadecimal numbers F1 61 62 63 00 F5 64 20 65 00 are byte values and the column on the right is an attempt to show the same bytes in ASCII.

  • The F1 byte means that there are no null fields in the first row.

  • The F5 byte means that the second column of the second row is NULL.

(It's probably easier to understand the flag setting if you restate F5 as 11110101 binary, and (a) notice that the third flag bit from the right is on, and (b) remember that the first flag bit is the X bit.)

There are complications — the record header is more complex if there are variable-length fields — but the simple display shown in the example is exactly what you'd see if you looked at the MySQL Data file with a debugger or a hexadecimal file dumper.

So much for the fixed format. Now, let's discuss the dynamic format.

The dynamic file format is necessary if rows can vary in size. That will be the case if there are BLOB columns, or "true"VARCHAR columns. (Remember that MySQL may treat VARCHAR columns as if they're CHAR columns, in which case the fixed format is used.) A dynamic row has more fields in the header. The important ones are "the actual length", "the unused length", and "the overflow pointer". The actual length is the total number of bytes in all the columns. The unused length is the total number of bytes between one physical record and the next one. The overflow pointer is the location of the rest of the record if there are multiple parts.

For example, here is a dynamic row:

 03                 start of header - Block type, see mi_dynrec.c, _mi_get_block_info()
 04, 00             actual length
 0c                 unused length
 01, fc             flags + overflow pointer
 ****               data in the row
 ************       unused bytes
                    <-- next row starts here)

In the example, the actual length and the unused length are short (one byte each) because the table definition says that the columns are short — if the columns were potentially large, then the actual length and the unused length could be two bytes each, three bytes each, and so on. In this case, actual length plus unused length is 10 hexadecimal (sixteen decimal), which is a minimum.

In a dynamic row, there is no deleted bit. Instead, deleted rows are marked with a block of type 0.

As for the third format — packed — we will only say briefly that:

  • Numeric values are stored in a form that depends on the range (start/end values) for the data type.

  • All columns are packed using either Huffman or enum coding.

For details, see the source files /myisam/mi_statrec.c (for fixed format), /myisam/mi_dynrec.c (for dynamic format), and /myisam/mi_packrec.c (for packed format).

Note: Internally, MySQL uses a format much like the fixed format which it uses for disk storage. The main differences are:

  1. BLOB values have a length and a memory pointer rather than being stored inline.

  2. "True VARCHAR" (a column storage which will be fully implemented in version 5.0) will have a 16-bit length plus the data.

  3. All integer or floating-point numbers are stored with the low byte first. Point (3) does not apply for ISAM storage or internals.