This section describes some internal implementation details about compression for InnoDB tables. The information presented here may be helpful in tuning for performance, but is not necessary to know for basic use of compression.
Compression Algorithms
Some operating systems implement compression at the file system level. Files are typically divided into fixed-size blocks that are compressed into variable-size blocks, which easily leads into fragmentation. Every time something inside a block is modified, the whole block is recompressed before it is written to disk. These properties make this compression technique unsuitable for use in an update-intensive database system.
MySQL implements compression with the help of the well-known zlib library, which implements the LZ77 compression algorithm. This compression algorithm is mature, robust, and efficient in both CPU utilization and in reduction of data size. The algorithm is “lossless”, so that the original uncompressed data can always be reconstructed from the compressed form. LZ77 compression works by finding sequences of data that are repeated within the data to be compressed. The patterns of values in your data determine how well it compresses, but typical user data often compresses by 50% or more.
        Unlike compression performed by an application, or compression
        features of some other database management systems, InnoDB
        compression applies both to user data and to indexes. In many
        cases, indexes can constitute 40-50% or more of the total
        database size, so this difference is significant. When
        compression is working well for a data set, the size of the
        InnoDB data files (the
        file-per-table
        tablespace or general
        tablespace .ibd files) is 25% to 50%
        of the uncompressed size or possibly smaller. Depending on the
        workload, this smaller
        database can in turn lead to a reduction in I/O, and an increase
        in throughput, at a modest cost in terms of increased CPU
        utilization. You can adjust the balance between compression
        level and CPU overhead by modifying the
        innodb_compression_level
        configuration option.
InnoDB Data Storage and Compression
All user data in InnoDB tables is stored in pages comprising a B-tree index (the clustered index). In some other database systems, this type of index is called an “index-organized table”. Each row in the index node contains the values of the (user-specified or system-generated) primary key and all the other columns of the table.
Secondary indexes in InnoDB tables are also B-trees, containing pairs of values: the index key and a pointer to a row in the clustered index. The pointer is in fact the value of the primary key of the table, which is used to access the clustered index if columns other than the index key and primary key are required. Secondary index records must always fit on a single B-tree page.
        The compression of B-tree nodes (of both clustered and secondary
        indexes) is handled differently from compression of
        overflow pages used to
        store long VARCHAR, BLOB,
        or TEXT columns, as explained in the
        following sections.
Compression of B-Tree Pages
Because they are frequently updated, B-tree pages require special treatment. It is important to minimize the number of times B-tree nodes are split, as well as to minimize the need to uncompress and recompress their content.
One technique MySQL uses is to maintain some system information in the B-tree node in uncompressed form, thus facilitating certain in-place updates. For example, this allows rows to be delete-marked and deleted without any compression operation.
In addition, MySQL attempts to avoid unnecessary uncompression and recompression of index pages when they are changed. Within each B-tree page, the system keeps an uncompressed “modification log” to record changes made to the page. Updates and inserts of small records may be written to this modification log without requiring the entire page to be completely reconstructed.
When the space for the modification log runs out, InnoDB uncompresses the page, applies the changes and recompresses the page. If recompression fails (a situation known as a compression failure), the B-tree nodes are split and the process is repeated until the update or insert succeeds.
        To avoid frequent compression failures in write-intensive
        workloads, such as for OLTP
        applications, MySQL sometimes reserves some empty space
        (padding) in the page, so that the modification log fills up
        sooner and the page is recompressed while there is still enough
        room to avoid splitting it. The amount of padding space left in
        each page varies as the system keeps track of the frequency of
        page splits. On a busy server doing frequent writes to
        compressed tables, you can adjust the
        innodb_compression_failure_threshold_pct,
        and
        innodb_compression_pad_pct_max
        configuration options to fine-tune this mechanism.
      
        Generally, MySQL requires that each B-tree page in an InnoDB
        table can accommodate at least two records. For compressed
        tables, this requirement has been relaxed. Leaf pages of B-tree
        nodes (whether of the primary key or secondary indexes) only
        need to accommodate one record, but that record must fit, in
        uncompressed form, in the per-page modification log. If
        innodb_strict_mode is
        ON, MySQL checks the maximum row size during
        CREATE TABLE or
        CREATE INDEX. If the row does not
        fit, the following error message is issued: ERROR
        HY000: Too big row.
      
        If you create a table when
        innodb_strict_mode is OFF, and
        a subsequent INSERT or
        UPDATE statement attempts to create an index
        entry that does not fit in the size of the compressed page, the
        operation fails with ERROR 42000: Row size too
        large. (This error message does not name the index for
        which the record is too large, or mention the length of the
        index record or the maximum record size on that particular index
        page.) To solve this problem, rebuild the table with
        ALTER TABLE and select a larger
        compressed page size (KEY_BLOCK_SIZE),
        shorten any column prefix indexes, or disable compression
        entirely with ROW_FORMAT=DYNAMIC or
        ROW_FORMAT=COMPACT.
      
        innodb_strict_mode is not
        applicable to general tablespaces, which also support compressed
        tables. Tablespace management rules for general tablespaces are
        strictly enforced independently of
        innodb_strict_mode. For more
        information, see Section 15.1.21, “CREATE TABLESPACE Statement”.
Compressing BLOB, VARCHAR, and TEXT Columns
        In an InnoDB table, BLOB,
        VARCHAR, and
        TEXT columns that are not part of
        the primary key may be stored on separately allocated
        overflow pages. We
        refer to these columns as
        off-page columns.
        Their values are stored on singly-linked lists of overflow
        pages.
      
        For tables created in ROW_FORMAT=DYNAMIC or
        ROW_FORMAT=COMPRESSED, the values of
        BLOB,
        TEXT, or
        VARCHAR columns may be stored
        fully off-page, depending on their length and the length of the
        entire row. For columns that are stored off-page, the clustered
        index record only contains 20-byte pointers to the overflow
        pages, one per column. Whether any columns are stored off-page
        depends on the page size and the total size of the row. When the
        row is too long to fit entirely within the page of the clustered
        index, MySQL chooses the longest columns for off-page storage
        until the row fits on the clustered index page. As noted above,
        if a row does not fit by itself on a compressed page, an error
        occurs.
          For tables created in ROW_FORMAT=DYNAMIC or
          ROW_FORMAT=COMPRESSED,
          TEXT and
          BLOB columns that are less than
          or equal to 40 bytes are always stored in-line.
        Tables that use ROW_FORMAT=REDUNDANT and
        ROW_FORMAT=COMPACT store the first 768 bytes
        of BLOB,
        VARCHAR, and
        TEXT columns in the clustered
        index record along with the primary key. The 768-byte prefix is
        followed by a 20-byte pointer to the overflow pages that contain
        the rest of the column value.
      
        When a table is in COMPRESSED format, all
        data written to overflow pages is compressed “as
        is”; that is, MySQL applies the zlib compression
        algorithm to the entire data item. Other than the data,
        compressed overflow pages contain an uncompressed header and
        trailer comprising a page checksum and a link to the next
        overflow page, among other things. Therefore, very significant
        storage savings can be obtained for longer
        BLOB, TEXT, or
        VARCHAR columns if the data is highly
        compressible, as is often the case with text data. Image data,
        such as JPEG, is typically already compressed
        and so does not benefit much from being stored in a compressed
        table; the double compression can waste CPU cycles for little or
        no space savings.
      
The overflow pages are of the same size as other pages. A row containing ten columns stored off-page occupies ten overflow pages, even if the total length of the columns is only 8K bytes. In an uncompressed table, ten uncompressed overflow pages occupy 160K bytes. In a compressed table with an 8K page size, they occupy only 80K bytes. Thus, it is often more efficient to use compressed table format for tables with long column values.
        For file-per-table
        tablespaces, using a 16K compressed page size can reduce storage
        and I/O costs for BLOB,
        VARCHAR, or
        TEXT columns, because such data
        often compress well, and might therefore require fewer overflow
        pages, even though the B-tree nodes themselves take as many
        pages as in the uncompressed form. General tablespaces do not
        support a 16K compressed page size
        (KEY_BLOCK_SIZE). For more information, see
        Section 17.6.3.3, “General Tablespaces”.
Compression and the InnoDB Buffer Pool
        In a compressed InnoDB table, every
        compressed page (whether 1K, 2K, 4K or 8K) corresponds to an
        uncompressed page of 16K bytes (or a smaller size if
        innodb_page_size is set). To
        access the data in a page, MySQL reads the compressed page from
        disk if it is not already in the
        buffer pool, then
        uncompresses the page to its original form. This section
        describes how InnoDB manages the buffer pool
        with respect to pages of compressed tables.
      
To minimize I/O and to reduce the need to uncompress a page, at times the buffer pool contains both the compressed and uncompressed form of a database page. To make room for other required database pages, MySQL can evict from the buffer pool an uncompressed page, while leaving the compressed page in memory. Or, if a page has not been accessed in a while, the compressed form of the page might be written to disk, to free space for other data. Thus, at any given time, the buffer pool might contain both the compressed and uncompressed forms of the page, or only the compressed form of the page, or neither.
MySQL keeps track of which pages to keep in memory and which to evict using a least-recently-used (LRU) list, so that hot (frequently accessed) data tends to stay in memory. When compressed tables are accessed, MySQL uses an adaptive LRU algorithm to achieve an appropriate balance of compressed and uncompressed pages in memory. This adaptive algorithm is sensitive to whether the system is running in an I/O-bound or CPU-bound manner. The goal is to avoid spending too much processing time uncompressing pages when the CPU is busy, and to avoid doing excess I/O when the CPU has spare cycles that can be used for uncompressing compressed pages (that may already be in memory). When the system is I/O-bound, the algorithm prefers to evict the uncompressed copy of a page rather than both copies, to make more room for other disk pages to become memory resident. When the system is CPU-bound, MySQL prefers to evict both the compressed and uncompressed page, so that more memory can be used for “hot” pages and reducing the need to uncompress data in memory only in compressed form.
Compression and the InnoDB Redo Log Files
        Before a compressed page is written to a
        data file, MySQL writes a
        copy of the page to the redo log (if it has been recompressed
        since the last time it was written to the database). This is
        done to ensure that redo logs are usable for
        crash recovery, even
        in the unlikely case that the zlib library is
        upgraded and that change introduces a compatibility problem with
        the compressed data. Therefore, some increase in the size of
        log files, or a need for
        more frequent
        checkpoints, can be
        expected when using compression. The amount of increase in the
        log file size or checkpoint frequency depends on the number of
        times compressed pages are modified in a way that requires
        reorganization and recompression.
      
        To create a compressed table in a file-per-table tablespace,
        innodb_file_per_table must be
        enabled. There is no dependence on the
        innodb_file_per_table setting
        when creating a compressed table in a general tablespace. For
        more information, see Section 17.6.3.3, “General Tablespaces”.