WL#7696: InnoDB: Transparent page compression
Affects: Server-Prototype Only
—
Status: Complete
Allow transparent page level compression in the IO layer. The InnoDB row level compression a.k.a InnoDB compressed tables is not fast enough and secondly the implementation is more complicated than it should be. The transparent page level compression complements the old scheme it doesn't aim to replace it. This new scheme relies on sparse file and "hole punching" support. There are two parts to this though, the OS kernel support and the File System support. For the Linux kernel, it has been available since 2.6.38. XFS has had this support since 2.6.38, EXT4 support was added in 3.0 (backported to RHEL 6's 2.6.32 based kernel in RHEL 6.4), Btrfs has support for this as of Linux 3.7. {Add other FS info here}. Solaris 10 and 11 kernels both have sparse file and "hole punching" support with ZFS. FreeBSD 9+ also seems to have support for it with ZFS (?). OSX (HFS+) seems to lack support for sparse files generally. On all UNIX/POSIX based systems, you can use fpathconf(_PC_MIN_HOLE_SIZE) or pathconf(_PC_MIN_HOLE_SIZE) to determine if a filesystem supports SEEK_HOLE, which seems to be a good way to check for "hole punching" support specifically. Since "hole punching" support *also* requires sparse file support, this is the best single call to see if we have support for everything needed to implement this scheme. Windows (all currently supported versions) supports sparse files and "hole punching" with NTFS. We would only need to check that NTFS is being used, but support can also be checked using the FILE_SUPPORTS_SPARSE_FILES bit flag in the lpFileSystemFlags parameter returned from the GetVolumeInformation function. The scheme works like this: Write page -> compress -> write compressed data to disk -> release empty block(s) from end of the page If the compression fails (for whatever reason) then write the page out as is. Read page -> decompress If the page read in is not a compressed page then return page to the upper layer as is. The advantage of this scheme is that it is very simple and elegant and one can use the best compression algorithm available for the required data set. It can be used with the current row compression of InnoDB as is i.e., they can coexist in the same server. It can be used with any table type, UNDO and the system tablespace if required. The compression of a table can be changed on the fly. No rebuild required, old pages can be read in using the old compression format (or none) and written using the new algorithm. A table's pages can be in a compressed format in any arbitrary mix of compression algorithms (in theory). The algorithm selection can be controlled at a page level in other words.
Observability ============= Use Information Schema to display the displayed total size and actual allocated size. The compression is currently per tablespace therefore the information will be displayed in the INNODB_SYS_TABLESPACES view. mysql> select * from information_schema.INNODB_SYS_TABLESPACES; +-------+----------------------------+------+-------------+---------------------- +-----------+---------------+------------+----------------+ | SPACE | NAME | FLAG | FILE_FORMAT | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | FILE_SIZE | ALLOCATED_SIZE | +-------+----------------------------+------+-------------+---------------------- +-----------+---------------+------------+----------------+ | 128 | mysql/innodb_table_stats | 0 | Antelope | Compact or Redundant | 16384 | 0 | 98304 | 16384 | | 129 | mysql/innodb_index_stats | 0 | Antelope | Compact or Redundant | 16384 | 0 | 98304 | 16384 | | 130 | mysql/slave_relay_log_info | 0 | Antelope | Compact or Redundant | 16384 | 0 | 98304 | 16384 | | 131 | mysql/slave_master_info | 0 | Antelope | Compact or Redundant | 16384 | 0 | 98304 | 16384 | | 132 | mysql/slave_worker_info | 0 | Antelope | Compact or Redundant | 16384 | 0 | 98304 | 16384 | | 135 | sbtest/sbtest1 | 0 | Antelope | Compact or Redundant | 16384 | 0 | 2734686208 | 1423495168 | +-------+----------------------------+------+-------------+---------------------- +-----------+---------------+------------+----------------+ 6 rows in set (0.01 sec)
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.