When you say:
CREATE TABLE Table1 ...
MySQL creates files named Table1.MYD
("MySQL Data"), Table1.MYI ("MySQL Index"),
and Table1.frm ("Format"). These files will
be in the directory:
/<datadir>/<database>/
For example, if you use Linux, you might find the files in the
/usr/local/var/test directory (assuming
your database name is test). if you use
Windows, you might find the files in the
\mysql\data\test\ directory.
Let's look at the .MYD Data file
(MyISAM SQL Data file) more closely. There
are three possible formats — fixed, dynamic, and packed.
First, let's discuss the fixed format.
Page Size
Unlike most DBMSs, MySQL doesn't store on disk using pages.
Therefore you will not see filler space between rows.
(Reminder: This does not refer to BDB and
InnoDB tables, which do use pages).
Record Header
The minimal record header is a set of flags:
"X bit" = 0 if row is deleted, = 1 if row is not deleted
"Null Bits" = 1 if row contains any null fields, or = 0 otherwise.
"Filler Bits" = 1
The length of the record header is thus:
(1 + number of NULL columns + 7) / 8 bytes
After the header, all columns are stored in the order that they
were created, which is the same order that you would get from
SHOW COLUMNS.
Here's an example. Suppose you say:
CREATE TABLE Table1 (column1 CHAR(1), column2 CHAR(1), column3 CHAR(1));
INSERT INTO Table1 VALUES ('a', 'b', 'c');
INSERT INTO Table1 VALUES ('d', NULL, 'e');
A CHAR(1) column takes precisely one byte
(plus one bit of overhead that is assigned to every column
— I'll describe the details of column storage later). So
the file Table1.MYD looks like this:
Hexadecimal Display of
Table1.MYD file
F1 61 62 63 00 F5 64 00 66 00 ... .abc..d e.
Here's how to read this hexadecimal-dump display:
The hexadecimal numbers F1 61 62 63 00 F5 64 20 65
00 are byte values and the column on the right is
an attempt to show the same bytes in ASCII.
The F1 byte means that there are no null
fields in the first row.
The F5 byte means that the second column
of the second row is NULL.
(It's probably easier to understand the flag setting if you
restate F5 as 11110101
binary, and (a) notice that the third flag bit from
the right is on, and (b) remember that the
first flag bit is the X bit.)
There are complications — the record header is more complex if there are variable-length fields — but the simple display shown in the example is exactly what you'd see if you looked at the MySQL Data file with a debugger or a hexadecimal file dumper.
So much for the fixed format. Now, let's discuss the dynamic format.
The dynamic file format is necessary if rows can vary in size.
That will be the case if there are BLOB
columns, or "true"VARCHAR columns. (Remember
that MySQL may treat VARCHAR columns as if
they're CHAR columns, in which case the fixed
format is used.) A dynamic row has more fields in the header.
The important ones are "the actual length", "the unused length",
and "the overflow pointer". The actual length is the total
number of bytes in all the columns. The unused length is the
total number of bytes between one physical record and the next
one. The overflow pointer is the location of the rest of the
record if there are multiple parts.
For example, here is a dynamic row:
03 start of header - Block type, see mi_dynrec.c, _mi_get_block_info()
04, 00 actual length
0c unused length
01, fc flags + overflow pointer
**** data in the row
************ unused bytes
<-- next row starts here)
In the example, the actual length and the unused length are short (one byte each) because the table definition says that the columns are short — if the columns were potentially large, then the actual length and the unused length could be two bytes each, three bytes each, and so on. In this case, actual length plus unused length is 10 hexadecimal (sixteen decimal), which is a minimum.
In a dynamic row, there is no deleted bit. Instead, deleted rows are marked with a block of type 0.
As for the third format — packed — we will only say briefly that:
Numeric values are stored in a form that depends on the range (start/end values) for the data type.
All columns are packed using either Huffman or enum coding.
For details, see the source files
/myisam/mi_statrec.c (for fixed format),
/myisam/mi_dynrec.c (for dynamic format),
and /myisam/mi_packrec.c (for packed
format).
Note: Internally, MySQL uses a format much like the fixed format which it uses for disk storage. The main differences are:
BLOB values have a length and a memory
pointer rather than being stored inline.
"True VARCHAR" (a column storage which
will be fully implemented in version 5.0) will have a 16-bit
length plus the data.
All integer or floating-point numbers are stored with the
low byte first. Point (3) does not apply for
ISAM storage or internals.
Next I'll describe the physical attributes of each column in a row. The format depends entirely on the data type and the size of the column, so, for every data type, I'll give a description and an example.
All the types are defined within the include/mysql_com.h file
within the enum_field_types enumerated
structure. Here's a sample of the key values and corresponding
numbers:
MYSQL_TYPE_BIT 16 MYSQL_TYPE_BLOB 252 MYSQL_TYPE_DATE 10 MYSQL_TYPE_DATETIME 12 MYSQL_TYPE_DECIMAL 0 MYSQL_TYPE_DOUBLE 5 MYSQL_TYPE_ENUM 247 MYSQL_TYPE_FLOAT 4 MYSQL_TYPE_GEOMETRY 255 MYSQL_TYPE_INT24 9 MYSQL_TYPE_LONG 3 MYSQL_TYPE_LONGLONG 8 MYSQL_TYPE_LONG_BLOB 251 MYSQL_TYPE_MEDIUM_BLOB 250 MYSQL_TYPE_NEWDATE 14 MYSQL_TYPE_NEWDECIMAL 246 MYSQL_TYPE_NULL 6 MYSQL_TYPE_SET 248 MYSQL_TYPE_SHORT 2 MYSQL_TYPE_STRING 254 MYSQL_TYPE_TIME 11 MYSQL_TYPE_TIMESTAMP 7 MYSQL_TYPE_TINY 1 MYSQL_TYPE_TINY_BLOB 249 MYSQL_TYPE_VARCHAR 15 MYSQL_TYPE_VAR_STRING 253 MYSQL_TYPE_YEAR 13
The character data types
CHAR
Storage: fixed-length string with space padding on the right.
Example: a CHAR(5) column
containing the value 'A' looks
like: hexadecimal 41 20 20 20 20 --
(length = A??'</code>)
Storage: variable-length string with a preceding length.
Example: a VARCHAR(7) column
containing 'A' looks like:
hexadecimal 01 41 -- (length =
A'</code>)
In MySQL 4.1 the length is always 1 byte. In MySQL 5.0
the length may be either 1 byte (for up to 255) or 2
bytes (for 256 to 65535). Some further random notes
about the new format: In old tables (from MySQL 4.1 and
earlier), VARCHAR columns have type
MYSQL_TYPE_VAR_STRING, which works
exactly like a CHAR with the
exception that if you do an ALTER
TABLE, it's converted to a true
VARCHAR
(MYSQL_TYPE_VARCHAR). (This means
that old tables will work as before for users.) ...
Apart from the above case, there are no longer any
automatic changes from CHAR to
VARCHAR or from
VARCHAR to CHAR.
MySQL will remember the declared type and stick to it
... VARCHAR is implemented in
field.h and
field.cc through the new class
Field_varstring ...
MyISAM implements
VARCHAR both for dynamic-length and
fixed-length rows (as signaled with the
ROW_FORMAT flag) ...
VARCHAR now stores trailing spaces.
(If they don't fit, that's an error in strict mode.)
Trailing spaces are not significant in comparisons ...
In table->record, the space is
reserved for length (1 or 2 bytes) plus data ... The
number of bytes used to store the length is in the field
Field_varchar->length_bytes. Note
that internally this can be 2 even if
Field_varchar->field_length <
256 (for example, for a shortened key to a
varchar(256)) ... There is a new
macro,
HA_VARCHAR_PACKLENGTH(field_length),
that can be used on field->length
in write_row / read_row to check how many length bytes
are used. (In this context we can't have a field_length
< 256 with a 2-byte pack length) ... When creating a
key for the handler,
HA_KEYTYPE_VARTEXT1 and
HA_KEYTYPE_BINARY1 are used for a key
on a column that has a 1-byte length prefix and
HA_KEYTYPE_VARTEXT2 and
HA_KEYTYPE_BINARY2 for a column that
has a 2-byte length prefix. (In the future we will
probably delete HA_KEYTYPE_BINARY#,
as this can be instead be done by just using the
binary character set with
HA_KEYTYPE_VARTEXT#.) ... When
sending a key to the handler for
index_read() or records_in_range, we
always use a 2-byte length for the
VARCHAR to make things simpler. (For
version 5.1 we intend to change CHARs
to also use a 2-byte length for these functions, as this
will speed up and simplify the key handling code on the
handler side.) ... The test case file
mysql-test/include/varchar.inc
should be included in the code that tests the handler.
See t/myisam.test for how to use
this. You should verify the result against the one in
mysql-test/t/myisam.result to
ensure that you get the correct results ... A client
sees both the old and new VARCHAR
type as MYSQL_TYPE_VAR_STRING. It
will never (at least for 5.0) see
MYSQL_TYPE_VARCHAR. This ensures that
old clients will work as before ... If you run MySQL 5.0
with the --new option, MySQL will
show old VARCHAR columns as
'CHAR' in SHOW CREATE
TABLE. (This is useful when testing whether a
table is using the new VARCHAR type
or not.)
The numeric data types
Important: MySQL almost always stores multi-byte binary numbers with the low byte first. This is called "little-endian" numeric storage; it's normal on Intel x86 machines; MySQL uses it even for non-Intel machines so that databases will be portable.
TINYINT
Storage: fixed-length binary, always one byte.
Example: a TINYINT column containing
65 looks like: hexadecimal
41 -- (length = 1, value = 65)
SMALLINT
Storage: fixed-length binary, always two bytes.
Example: a SMALLINT column containing
65 looks like: hexadecimal
41 00 -- (length = 2, value = 65)
MEDIUMINT
Storage: fixed-length binary, always three bytes.
Example: a MEDIUMINT column
containing 65 looks like:
hexadecimal 41 00 00 -- (length = 3,
value = 65)
INT
Storage: fixed-length binary, always four bytes.
Example: an INT column containing
65 looks like: hexadecimal
41 00 00 00 -- (length = 4, value = 65)
BIGINT
Storage: fixed-length binary, always eight bytes.
Example: a BIGINT column containing
65 looks like: hexadecimal
41 00 00 00 00 00 00 00 -- (length = 8, value
= 65)
FLOAT
Storage: fixed-length binary, always four bytes.
Example: a FLOAT column containing
approximately 65 looks like:
hexadecimal 00 00 82 42 -- (length =
4, value = 65)
DOUBLE PRECISION
Storage: fixed-length binary, always eight bytes.
Example: a DOUBLE PRECISION column
containing approximately 65 looks
like: hexadecimal 00 00 00 00 00 40 50
40 -- (length = 8, value = 65)
REAL
Storage: same as FLOAT, or same as
DOUBLE PRECISION, depending on the
setting of the --ansi option.
DECIMAL
MySQL 4.1 Storage: fixed-length string, with a leading byte for the sign, if any.
Example: a DECIMAL(2) column
containing 65 looks like:
hexadecimal 20 36 35 -- (length = 3,
value = ' 65')
Example: a DECIMAL(2) UNSIGNED column
containing 65 looks like:
hexadecimal 36 35 -- (length = 2,
value = '65')
Example: a DECIMAL(4,2) UNSIGNED
column containing 65 looks like:
hexadecimal 36 35 2E 30 30 -- (length
= 5, value = '65.00')
MySQL 5.0 Storage: high byte first, four-byte chunks. We call the four-byte chunks "*decimal* digits". Since 2**32 = There is an implied decimal point. Details are in /strings/decimal.c.
Example: a MySQL 5.0 DECIMAL(21,9)
column containing
111222333444.555666777 looks like:
hexadecimal 80 6f 0d 40 8a 04 21 1e cd
59 -- (flag + '111', '222333444',
'555666777').
NUMERIC
Storage: same as DECIMAL.
BOOL
Storage: same as TINYINT.
The temporal data types
DATE
Storage: 3 byte integer, low byte first. Packed as: 'day + month*32 + year*16*32'
Example: a DATE column containing
'1962-01-02' looks like:
hexadecimal 22 54 0F
DATETIME
Storage: eight bytes.
Part 1 is a 32-bit integer containing year*10000 + month*100 + day.
Part 2 is a 32-bit integer containing hour*10000 + minute*100 + second.
Example: a DATETIME column for
'0001-01-01 01:01:01' looks like:
hexadecimal B5 2E 11 5A 02 00 00 00
TIME
Storage: 3 bytes, low byte first. This is stored as seconds: days*24*3600+hours*3600+minutes*60+seconds
Example: a TIME column containing
'1 02:03:04' (1 day 2 hour 3 minutes
and 4 seconds) looks like: hexadecimal 58 6E
01
TIMESTAMP
Storage: 4 bytes, low byte first. Stored as unix
time(), which is seconds since the
Epoch (00:00:00 UTC, January 1, 1970).
Example: a TIMESTAMP column
containing '2003-01-01 01:01:01'
looks like: hexadecimal 4D AE 12 23
YEAR
Storage: same as unsigned TINYINT
with a base value of 0 = 1901.
Others
SET
Storage: one byte for each eight members in the set.
Maximum length: eight bytes (for maximum 64 members).
This is a bit list. The least significant bit corresponds to the first listed member of the set.
Example: a SET('A','B','C') column
containing 'A' looks like:
01 -- (length = 1, value = 'A')
ENUM
Storage: one byte if less than 256 alternatives, else two bytes.
This is an index. The value 1 corresponds to the first
listed alternative. (Note: ENUM
always reserves the value 0 for an erroneous value. This
explains why 'A' is 1 instead of 0.)
Example: an ENUM('A','B','C') column
containing 'A' looks like:
01 -- (length = A')
The Large-Object data types
Warning: Because TINYBLOB's preceding length
is one byte long (the size of a TINYINT) and
MEDIUMBLOB's preceding length is three bytes
long (the size of a MEDIUMINT), it's easy to
think there's some sort of correspondence between the
BLOB and INT types. There
isn't a BLOB's preceding length is not four
bytes long (the size of an INT).
TINYBLOB
Storage: variable-length string with a preceding one-byte length.
Example: a TINYBLOB column
containing 'A' looks like:
hexadecimal 01 41 -- (length = A')
Storage: same as TINYBLOB.
BLOB
Storage: variable-length string with a preceding two-byte length.
Example: a BLOB column containing
'A' looks like: hexadecimal
01 00 41 -- (length = A')
TEXT
Storage: same as BLOB.
MEDIUMBLOB
Storage: variable-length string with a preceding length.
Example: a MEDIUMBLOB column
containing 'A' looks like:
hexadecimal 01 00 00 41 -- (length =
A')
MEDIUMTEXT
Storage: same as MEDIUMBLOB.
LONGBLOB
Storage: variable-length string with a preceding four-byte length.
Example: a LONGBLOB column containing
'A' looks like: hexadecimal
01 00 00 00 41 -- (length = A')
LONGTEXT
Storage: same as LONGBLOB.
References:
Most of the formatting work for MyISAM
columns is visible in the program
/sql/field.cc in the source code directory.
And in the MyISAM directory, the files that
do formatting work for different record formats are:
/myisam/mi_statrec.c,
/myisam/mi_dynrec.c, and
/myisam/mi_packrec.c.
