MySQL Internals Manual  /  ...  /  Detailed Specification of the Decoding

21.4.7 Detailed Specification of the Decoding

Below follows the detailed specification of the encoding:

Datafile fixed header (32 bytes):

4 byte  magic number
4 byte  total header length (fixed + column info + code trees)
4 byte  minimum packed record length
4 byte  maximum packed record length
4 byte  total number of elements in all code trees
4 byte  total number of bytes collected for distinct column values
2 byte  number of code trees
1 byte  maximum number of bytes required to represent record+blob lengths
1 byte  record pointer length, number of bytes for compressed data file length
4 byte  zeros

Column Information. For every column in the table:

5 bits  field type
    FIELD_NORMAL          0
    FIELD_SKIP_ENDSPACE   1
    FIELD_SKIP_PRESPACE   2
    FIELD_SKIP_ZERO       3
    FIELD_BLOB            4
    FIELD_CONSTANT        5
    FIELD_INTERVALL       6
    FIELD_ZERO            7
    FIELD_VARCHAR         8
    FIELD_CHECK           9

6 bits  pack type as a set of flags
    PACK_TYPE_SELECTED      1
    PACK_TYPE_SPACE_FIELDS  2
    PACK_TYPE_ZERO_FILL     4

5 bits  if pack type contains PACK_TYPE_ZERO_FILL
    minimum number of trailing zero bytes in this column
        else
    number of bits to encode the number of
    packed bytes in this column (length_bits)

x bits  number of the code tree used to encode this column
    x is the minimum number of bits required to represent the highest
    tree number.

Alignment:

x bits alignment to the next byte border

Code Trees. For every tree:

1 bit   compression type
    0 = byte value compression
        8 bits  minimum byte value coded by this tree
        9 bits  number of byte values encoded by this tree
        5 bits  number of bits used to encode the byte values
        5 bits  number of bits used to encode offsets to next tree elements
    1 = distinct column value compression
        15 bits number of distinct column values encoded by this tree
        16 bits length of the buffer with all column values
        5 bits  number of bits used to encode the index of the column value
        5 bits  number of bits used to encode offsets to next tree elements
For each code tree element:
    1 bit   IS_OFFSET
    x bits  the announced number of bits for either a value or an offset
x bits  alignment to the next byte border
If compression by distinct column values:
    The number of 8-bit values that make up the column value buffer

Compressed Records. For every record:

1-5 bytes  length of the compressed record in bytes
    1. byte  0..253 length
             254    length encoded in the next two bytes little endian
             255    length encoded in the next  x  bytes little endian
                    x = 3  for pack file version 1
                    x = 4  for pack file version > 1
1-5 bytes  total length of all expanded blobs of this record
    1. byte  0..253 length
             254    length encoded in the next two bytes little endian
             255    length encoded in the next  x  bytes little endian
                    x = 3  for pack file version 1
                    x = 4  for pack file version > 1
For every column:
    If pack type includes PACK_TYPE_SPACE_FIELDS,
        1 bit   1 = spaces only, 0 = not only spaces
    In case the field type is of:
        FIELD_SKIP_ZERO
            1 bit   1 = zeros only, 0 = not only zeros
                    In the latter case
                        x bits  the Huffman code for every byte
        FIELD_NORMAL
            x bits  the Huffman code for every byte
        FIELD_SKIP_ENDSPACE
            If pack type includes PACK_TYPE_SELECTED,
                1 bit   1 = more than min endspace, 0 = not more
                        In the former case
                            x bits  nr of extra spaces, x = length_bits
            else
                x bits  nr of extra spaces, x = length_bits
            x bits  the Huffman code for every byte
        FIELD_SKIP_PRESPACE
            If pack type includes PACK_TYPE_SELECTED,
                1 bit   1 = more than min prespace, 0 = not more
                        In the former case
                            x bits  nr of extra spaces, x = length_bits
            else
                x bits  nr of extra spaces, x = length_bits
            x bits  the Huffman code for every byte
        FIELD_CONSTANT or FIELD_ZERO or FIELD_CHECK
            nothing for these
        FIELD_INTERVALL
            x bits  the Huffman code for the buffer index of the column value
        FIELD_BLOB
            1 bit   1 = blob is empty, 0 = not empty
                    In the latter case
                        x bits  blob length, x = length_bits
                        x bits  the Huffman code for every byte
        FIELD_VARCHAR
            1 bit   1 = varchar is empty, 0 = not empty
                    In the latter case
                        x bits  blob length, x = length_bits
                        x bits  the Huffman code for every byte
    x bits  alignment to the next byte border

User Comments
Sign Up Login You must be logged in to post a comment.